intrinsic versus image-related utility in social media: why do … · 2015-07-28 · intrinsic...
TRANSCRIPT
1
Intrinsic versus Image-Related Utility in Social Media: Why Do People Contribute
Content to Twitter?
Olivier Toubia
Glaubinger Professor of Business Columbia Business School
522 Uris Hall, 3022 Broadway, New York, NY 10027-6902 [email protected]
Andrew T. Stephen
Assistant Professor of Business Administration & Katz Fellow in Marketing University of Pittsburgh, Joseph M. Katz Graduate School of Business
318 Mervis Hall, Pittsburgh, PA 15260 [email protected]
forthcoming, Marketing Science
2
Intrinsic versus Image-Related Utility in Social Media: Why Do People Contribute
Content to Twitter?
We empirically study the motivations of users to contribute content to social media in the context
of the popular microblogging site Twitter. We focus on non-commercial users who do not
benefit financially from their contributions. Previous literature suggests two main sources of
utility that may motivate these users to post content: intrinsic utility and image-related utility.
We leverage the fact that these two types of utility give rise to different predictions as to whether
users should increase their contributions when their number of followers increases. To address
the issue that the number of followers is endogenous, we conducted a field experiment in which
we exogenously added followers (or follow requests in the case of protected accounts) to a set of
users over a period of time, and compared their posting activities to those of a control group. We
estimated each treated user’s utility function using a dynamic discrete choice model. While our
results are consistent with both types of utility being at play, our model suggests that image-
related utility is larger for most users. We discuss the implications of our findings for the
evolution of Twitter and the type of value firms may derive from such platforms in the future.
Keywords: Social Media, Field Experiments, Dynamic Discrete Choice Models.
3
1. Introduction
In recent years, social media has emerged as a major channel for broadcasting information. For
instance, by late 2011 there were over 173 million public blogs,1 and 250 million messages
(“tweets”) were sent each day through the popular microblogging platform Twitter.2 Although
some contributors to social media are able to derive advertising revenue from their content (using
for example platforms such as Google’s AdSense; cf. Sun and Zhu 2011), social media platforms
rely predominantly on the benevolent contributions of millions of individuals as “content
providers.” While publishers’ incentives in traditional media are well understood and are
typically a function of the number of “eyeballs” reached by their content, motivations to
benevolently contribute content in social media are not well understood.
A social media platform may be utilized by a firm for different (non-exclusive) purposes.
For example, it may be used as a media outlet (i.e., the firm broadcasts content to consumers), a
viral marketing platform (i.e., the firm induces consumers to share information about its brands
with other consumers and/or tracks naturally occurring word of mouth), or a customer insights
platform (i.e., the firm monitors consumers’ conversations). We argue that a firm cannot decide
how to leverage social media and devise a fully efficient social media strategy unless it
understands what motivates consumers to be active on such platforms in the first place.
Moreover, for the platforms themselves, understanding what motivates their users to contribute is
important since the viability of these platforms as businesses depends not only on how many
users they have but also on how active their users are as content contributors. However, extant
marketing research on social media and related phenomena such as online word of mouth has
1 Nielsen Blogpulse, October 7, 2011. http://www.blogpulse.com. 2 Mediabistro, October 18, 2011. http://www.mediabistro.com/alltwitter/costolo-future-of-twitter_b14936.
4
focused primarily on the outcomes of user activity, and less on the motivations underlying user
activity (e.g., Godes and Mayzlin 2004; Trusov, Bucklin, and Pauwels 2009; Katona, Zubcsek,
and Sarvary 2011; Stephen and Galak 2012).
In the absence of explicit economic incentives, the literature suggests two relevant types of
utility that may motivate non-commercial social media users to contribute content: intrinsic
utility and image-related utility. Intrinsic utility assumes that users receive direct utility from
posting content, and leads to “the doing of an activity for its inherent satisfactions rather than for
some separable consequence” (Ryan and Deci 2000). Image-related utility, on the other hand,
assumes users are motivated by the perceptions of others (see Fehr and Falk 2002 for a review of
the psychological foundations of incentives).3 Image-related utility is also related to status
seeking or prestige motivation (e.g., Glazer and Konrad 1996; Harbraugh 1998a, 1998b;
Fershtman and Gandal 2007; Lampel and Bhalla 2007).
Intrinsic and image-related utility have been studied quite extensively in the domain of
prosocial behavior (see for example Glazer and Konrad 1996; Harbaugh 1998a, 1998b; Bénabou
and Tirole 2006; Ariely, Bracha and Meier 2009). In a domain closer to social media, Lerner and
Tirole (2002, 2005) contrast the intrinsic pleasure open-source developers derive from working
on “cool” projects with the (image-related) desire for peer recognition. See also Bitzer, Schrettl
and Schröder (2007) or von Hippel and von Krogh (2003) for a theoretical discussion of the
motivations to contribute to open-source projects, and von Krogh and von Hippel (2006) for a
review. Several papers have provided additional survey-based empirical evidence that intrinsic
and image-related utility are indeed relevant in open-source development (e.g., Ghosh et al.
2002; Hars and Ou 2002; Lakhani and Wolf 2005; Roberts, Il-Horn Hann and Slaughter 2006).
3 Fehr and Falk also discuss reciprocity as a psychological source of motivation, which depends on whether an agent perceives the action of another agent as hostile versus kind. This is less relevant in our context.
5
Survey-based evidence for intrinsic and image-related utility has also been found in the context
of electronic knowledge repositories (see for example Kankanhalli, Tan and Wei 2005; Wasko
and Faraj 2005; Lampel and Bhalla 2007; Nov 2007). In the domain of social media specifically,
Bughin (2007) surveys users of an online video-sharing site and finds that their primary
motivations to upload videos are image-related (“I seek fame”) and intrinsic (“It is fun”).
Hennig-Thurau et al. (2004) survey the motivations of contributors to web-based opinion-
platforms. Besides some motivations specific to their particular context, the motivations found
by these authors tend to be either intrinsic (e.g., “It is fun to communicate this way with other
people in the community”) or image-related (e.g., “My contributions show others that I am a
clever consumer”).
Therefore, based on the extant literature is appears that intrinsic and image-related utility are
both plausible and realistic motivations for people to contribute content in social media.
However, to the best of our knowledge the empirical evidence to date is only survey-based. In
this paper we compare these two types of utility using a different empirical approach, focusing
specifically on the context of the popular microblogging platform Twitter.4 Twitter is an ideal
social media context in which to empirically study intrinsic and image-relate utility, because
these two types of utility give rise to opposite predictions as to whether users should increase or
decrease their posting activities when their number of followers increases. In order to address the
issue that the number of followers is endogenous, we conducted a field experiment in which we
exogenously added followers (or follow requests in the case of protected accounts) to a set of
users (treatment group), and compared their posting activities to those of a control group.
4 Whereas a blog is a website or part of a website that displays entries or elements of content (text, graphics, video, etc.) usually posted by an individual, a microblog is a type of blog that allows users to exchange smaller elements of content (e.g., short sentences, individual images, links).
6
We report two sets of analyses in this paper. An initial model-free analysis of our data shows
that while our intervention did not have a statistically significant main effect, it had a significant
positive effect on posting activities for treated users with a moderately low initial number of
followers and a significant negative effect for treated users with a moderately high initial number
of followers. These findings are consistent with both intrinsic and image-related utility being
relevant, with the dominant motivation being different for users with different numbers of
followers. While having the benefit of being free of any functional form assumption, this model-
free analysis does not allow us to quantify the relative magnitudes of these two types of utility
and how they vary across users based on observed and unobserved factors. Accordingly, we then
analyze our data using a dynamic discrete choice model. This allows us also to make
counterfactual predictions on the evolution of the Twitter platform as the network becomes
stable.
Two recent papers related to our research are Kumar (2009) and Shriver, Nair, and
Hofstetter (2012). Kumar (2009) studies consumers’ purchase of ring-back tones for their mobile
phones (a ring-back tone is not consumed by the user purchasing it but rather by those who call
that user), and estimates the utility consumers derive from having a high status (i.e., more
recently updated tones), from consuming the tones purchased by their peers, and from expressing
themselves through the tones they purchase. Shriver et al. (2012) study the causal relations
between content generation and the number of social ties in an online windsurfing community,
using a set of instrumental variables. However, neither paper studies specifically the utility
derived from posting content in social media. Kumar (2009) uses a context slightly different
from social media, and Shriver et al. (2012) do not study utility or motivation directly.
7
The remainder of the paper is organized as follows. In Section 2, we provide an overview of
the Twitter platform, discuss how the concepts or intrinsic and image-related utility are
operationalized in this context, and describe our empirical strategy. We describe our data in
Section 3, provide some model-free analysis in Section 4, and analyze the data using a dynamic
discrete choice model in Section 5. We conclude in Section 6.
2. Background
2.1. Twitter
Twitter is a very popular social media platform that allows users to share “tweets” (text messages
up to 140 characters long) with their “followers” (other users who choose to subscribe to a user’s
feed of tweets). The ability to follow other users creates a directed social network (unlike other
social networks such as Facebook or Linkedin which are undirected networks, user A following
user B on Twitter does not automatically imply that B follows A). A user’s home page (as seen
by that user) contains a “timeline” that captures all the tweets posted by the users this user
follows (in reverse chronological order), a text box labeled “what’s happening” that allows the
user to post a tweet, and a reminder of the number of users following the user and the number of
users followed by the user.5 Twitter users may be split into non-commercial and commercial
users. Commercial users may be classified into celebrities, media organizations, non-media
organizations, and brands (Wu et al. 2011). In this paper we focus on non-commercial users for
whom there exists no apparent financial incentive to contribute content.
5 Besides typing a message in the “What’s happening” window of their home page, users may also post tweets as
“replies” or “retweets.” A “reply” to a previous tweet is a text message of up 140 characters that will be seen by users who follow both the user who posted the initial tweet and the user replying to that tweet. A “retweet” forwards a previous tweet to a user’s followers. In our data “tweets” includes tweets, retweets and replies.
8
Other features of Twitter include the ability for a user to “unfollow” another user (i.e., stop
following a user that he or she had been following), to “block” another user (i.e., prevent that
other user from following him or her), and to make his or her own account “protected.” Accounts
that are not protected are called “public” (this is the default setting) and may be followed and
accessed by any user. If a user elects to protect his or her account, then requests by other users to
follow that account need to be approved by him or her, and the text of his or her tweets may only
be accessed by his or her followers. However, the number of users followed, number of
followers, and cumulative number of tweets are public information both for public and protected
accounts. According to Cha et al. (2010), approximately 8% of Twitter accounts are protected.
One final characteristic of Twitter that is critical to our analysis is that posting content is a
way for users to attract new followers. This claim is supported by our data (i.e., the state
transition probabilities reported in Section 5.1.4), and is consistent with Shriver et al. (2012) who
find a positive causal effect of content generation on the number of social ties in an online
windsurfing community. Note that unlike other directed social networks, reciprocity (i.e., A
follows B and B follows A) on Twitter is only moderate. Kwak et al. (2010) report that of all
user pairs on Twitter with at least one link between them, only 22.1% have a reciprocal
relationship (i.e., each user in the pair follows the other user). In other words, a user’s number of
followers is not simply a by-product of his or her following activities, and posting content in the
form of tweets is one way for users to attract new followers.
Twitter usage has been steadily growing. The number of unique US visitors to twitter.com in
September 2011 was estimated at 35 million, up from 28 million in September 2010.6 In March
6 Source: http://siteanalytics.compete.com/twitter.com/, accessed November 15, 2011.
9
2012 the number of active users throughout the world was reported to be 140 million.7 The
average number of tweets per day grew from an average of 5,000 in 2007 to 300,000 in 2008 to
2.5 million in 2009. In October 2011 this number reached 250 million tweets per day8 and in
March 2012 it reached 340 million tweets per day.9 Even if each tweet takes only a few seconds
to write, with 340 million tweets written per day, the equivalent of multiple decades of one
person’s life are spent each day posting content on Twitter (340 million tweets times 5 seconds
per tweet = 53.9 years).
Given the scale and relevance of Twitter in society, it is not surprising that academic
research on Twitter has started to emerge, mostly from Computer Science and Information
Systems. Extant research has focused primarily on studying the structure and the nature of the
Twitter social network, and on issues related to influence and information diffusion on this
network (see for example Cha et al. 2010; Kwak et al. 2010; Weng, Lim and Jiang 2010; Bakshy
et al. 2011; Romero et al. 2011; Wu et al. 2011; Goel, Watts, and Goldstein 2012). However, to
the best of our knowledge academic research on Twitter in marketing and other social sciences to
date has been limited. Exceptions include Ghose, Goldfarb, and Han (2011) who compare user
search costs in online versus mobile platforms using data from a microblogging site comparable
to Twitter; and Stephen, Dover, Muchnik, and Goldenberg (2012) who study how user activity
on Twitter affects the extent to which URLs posted by users in tweets spread through the Twitter
network.
7Source: http://blog.twitter.com/2012/03/twitter-turns-six.html, accessed October 9, 2012.
8 Mediabistro, October 18, 2011. http://www.mediabistro.com/alltwitter/costolo-future-of-twitter_b14936. 9
Source: http://blog.twitter.com/2012/03/twitter-turns-six.html, accessed October 9, 2012.
10
2.2. Intrinsic versus image-related utility on Twitter
2.2.1. Intrinsic utility
Twitter’s initial positioning was as “a real-time information network powered by people all
around the world that lets you share and discover what’s happening now” (twitter.com/about,
accessed 02/2010). Twitter further states that “Twitter asks ‘what’s happening’ and makes the
answer spread across the globe to millions.” The public nature of Twitter and the claims that the
information spreads “to millions across the globe” suggest that the intrinsic utility derived by a
non-commercial user from posting content on Twitter should be monotonically non-decreasing
in that user’s number of followers. Put simply, a user should derive more intrinsic utility from
broadcasting content as the size of his or her audience increases. This is similar to the case where
content publishers receive explicit financial incentives, which are typically monotonically non-
decreasing in the size of the publisher’s audience. While not critical to our argument, we also
assume that intrinsic utility from posting content is concave in the number of followers.
2.2.2. Image-related utility
The definition of image-related utility on Twitter should not be limited to the management of the
user’s image (i.e., how the user is portrayed on the platform). Instead, image-related utility
should be defined more broadly, to encompass the sense of self-worth and social acceptance
provided by a user’s activities on the platform.
In particular, there is some evidence suggesting that image-related utility on Twitter is related
to a user’s number of followers. While any user is able to contribute as much content and follow
as many users as he or she wants, followers need to be “earned” and a user’s number of
followers is an informative social signal. The number of followers has been used as a measure of
influence by academics (Cha et al. 2010; Kwak et al. 2010) and is often associated with
11
popularity by the general public (e.g., www.twitaholic.com, wefollow.com).10 There have been
several reports in the press of Twitter users attaching a lot of importance to their number of
followers. According to Poletti (2009), Twitter has become an avenue for self-promotion, and
one’s number of followers is becoming “the new barometer of how we gauge our self worth”.11
Leonhardt (2011) claims that the number of followers on Twitter is “just how people keep score
on the site and compare themselves to friends and colleagues.”12 Teitell (2011) reports on the
social pressures to achieve high numbers of followers on Twitter and high scores on sites such as
www.klout.com and www.peerindex.net that rate all Twitter users based on their number of
followers (as well as other metrics, using proprietary scoring rules).13 The importance for many
Twitter users of having a large number of followers is further revealed by the plethora of
websites that offer advice on how to increase that number (a partial list may be obtained by
typing “increase Twitter followers” in any search engine). Therefore it seems appropriate to
measure the stature or prestige of a Twitter user by his or her number of followers.
It is reasonable to model utility from stature as a non-decreasing concave function of the
number of followers. For example, Baumeister and Leary (1995) argue that humans have a
fundamental need for a certain minimum number of social bonds, but that “the formation of
further social attachments beyond that minimal level should be subject to diminishing returns;
that is, people should experience less satisfaction on formation of such extra relationships,”
(Baumeister and Leary 1995, p. 500) and they review empirical evidence supporting this claim.
Also consistent with image-related utility being concave, DeWall, Baumeister and Vohs (2008)
10 See also Beck, Howard. 2009. “New Way to Gauge Popularity.” The New York Times, October 21. 11 Poletti, Therese. 2009. “What if your friends won’t follow you on Twitter?” MarketWatch, November 12. 12 Leonhardt, David. 2011. “A Better Way to Measure Twitter Influence.” The New York Times, March 27. 13 Teitell, Beth. 2011. “Ascent of the Social-Media Climbers.” The Boston Globe, February 18.
12
provide experimental evidence that satiating the need for social acceptance leads to a reduction
in the drive to satisfy that need.
In summary, both intrinsic utility from posting content and image-related utility from having
many followers may be assumed to be monotonically non-decreasing and concave in the user’s
number of followers. However, one key difference is that while intrinsic utility is derived from
posting content viewed by many followers, image-related utility is derived from having many
followers. If a user does not post content on a given day, he or she will obviously not derive any
intrinsic utility from posting content on that day. On the other hand, image-related utility from
having many followers is a measure of stature which is independent of contemporaneous posting
activities. We will see next how, as a result of this difference, the motivation to post content (i.e.,
the total expected incremental utility derived from posting content on a given day) takes a
different form under intrinsic versus image-related utility.
2.3. Empirical strategy
Twitter offers a unique opportunity to study and contrast intrinsic and image-related utility in
social media for at least two reasons. First, by focusing on non-commercial users we are able to
study contributions to social media in a context in which financial or other extrinsic incentives
are minimal, if present at all. Second and most importantly, Twitter provides a context for
empirically comparing intrinsic versus image-related utility, because they give rise to different
predictions as to how users should react to an increase in their number of followers.
Throughout this section we illustrate the opposite predictions made by intrinsic vs. image-
related utility with a highly stylized and simplified model. This two-period model is presented
only for illustration purposes and is not used anywhere else in the paper. We show in Appendix 1
13
that similar results are obtained with an infinite-horizon version of this model and illustrate the
results graphically.
Let n denote a user’s number of followers. Let U(n) be the per-period utility derived from
posting content to n followers in a given period (e.g., day). In the case of intrinsic utility, this
utility is derived in a given period only if content is posted in that period. On the other hand, in
the case of image-related utility, this utility is derived in a given period irrespective of whether
content is posted in that period. We assume that U(n) is monotonically increasing and concave in
n. We further assume (for the purposes of this illustrative model only) that if a user posts content
in Period 1, his or her number of followers will increase to n+1 in the next period with probabil-
ity δ, and stay the same with probability (1- δ). If content is not posted in Period 1, we assume
that the number of followers will remain the same in the next period. (Note that in Section 5 we
use empirical state transition probabilities instead of making these simplifying assumptions.) Ta-
ble 1 lists the utility derived by the user in each period as a function of his or her action in each
period, in the case of intrinsic utility. Table 2 does the same, in the case of image-related utility.
[ Insert Tables 1 and 2 Here ]
2.3.1. Intrinsic utility: implication when number of followers increases
If users contribute content to Twitter because of the intrinsic value they derive from broadcasting
information to their followers, and if the utility derived from posting is monotonically non-
decreasing and concave in a user’s number of followers, then we should expect users to increase
their posting activities as they receive additional followers. Quite simply, if utility from posting
content is increasing in the number of followers, having more followers should lead to more
posting.
14
In terms of our illustrative model, we see in Table 1 that the user derives an additional
utility U(n) by posting in period 1, which is monotonically increasing in n by assumption. In case
the user also posts in Period 2, then posting in Period 1 increases the expected utility derived in
Period 2 from U(n) to δU(n+1)+ (1-δ)U(n) (because of the potential increase in the number of
followers due to posting in Period 1). In that case posting content in Period 1 provides a total
(over both periods) expected additional intrinsic utility of U(n)+[δU(n+1)+(1-δ)U(n)-
U(n)]=δU(n+1)+(1-δ)U(n). This quantity is also monotonically increasing in n, because δ≥0, 1-
δ≥0 and U(n) is monotonically increasing in n. In other words, the total expected incremental
utility from posting content in Period 1 is increasing in n, irrespective of whether the users posts
content in Period 2.
2.3.2. Image-related utility: implication when number of followers increases.
With image-related utility, posting content is not the direct source of utility, but rather a means
towards an end, i.e., a way to attract new followers. The utility comes from having many follow-
ers, not from broadcasting content to them. Posting content on a given day influences future ex-
pected image-related utility, by increasing the expected number of followers the user will have in
the future. Therefore, in contrast to intrinsic utility, the incremental image-related utility
achieved by posting content on a given day will be derived in the future, and is based on the ad-
ditional followers the user will gain by posting that day. If there are diminishing returns to addi-
tional followers, this incremental future expected utility is decreasing in the current number of
followers. Therefore the motivation to post content in order to attract new followers should be
decreased as the current number of followers is increased.14
14 One may think that users who are motivated by image would feel compelled to actually post more as they amass more followers, in order to maintain their number of followers. However, our data suggest that the expected change in the number of followers when no posting occurs is not negative (see state transition probabilities reported in Sec-
15
In terms of our illustrative model, if the user posts content in Period 1, there is a probability
δ that image-related utility in Period 2 will be increased from U(n) to U(n+1). As shown in Table
2, posting content in Period 1 provides an additional total expected image-related utility of
δ(U(n+1)-U(n)), which is realized in Period 2 irrespective of whether the user posts content in
Period 2. This illustrates that under image-related utility, the incremental benefit from posting
content on a given day is realized in the future, and is a result of attracting new followers. Be-
cause U(n) is concave in n, U(n+1)-U(n) is decreasing in n, i.e., the incremental total expected
image-related utility derived from posting content in Period 1 is decreasing in the number of fol-
lowers at the beginning of Period 1.
Interestingly, under image-related utility, the incremental benefit from posting content in a
given day is increasing in the likelihood that posting content will increase the number of follow-
ers (parameter δ in our illustrative model). Therefore, users motivated by image-related utility
should also post less content as the structure of the network becomes stable (i.e., a non-evolving
static structure of connections is achieved) and as posting activities become less likely to lead to
additional followers. This raises questions on the longer-term sustainability of the Twitter plat-
form, and has implications for the type of value firms may be able to derive from social media in
the future. This issue will be addressed using counterfactual analyses in Section 5.3.4.
In sum, intrinsic utility from posting content and image-related utility from having many
followers give rise to opposite predictions as to how users should react to an increase in their
number of followers. If users are motivated by the intrinsic utility from broadcasting content to
many followers, then having more followers should lead to an increase in posting activities. On
the other hand, if users derive their utility from having many followers and post content in order
tion 5.1.4), so this scenario seems unlikely. More generally, our model in Section 5 will enable us to take such sce-narios into account, by explicitly capturing and quantifying the impact of posting on a user’s future number of fol-lowers.
16
to gain additional followers, then the motivation to post content should be diminished as the
current number of followers is increased (due to additional followers having diminishing
returns). In Appendix 1, we show how the results of the simple two-period illustrative model
used here generalize to an infinite horizon, and we illustrate graphically how the incremental
value from posting content in a given period varies with the number of followers, under intrinsic
and image-related utility.15
3. Data
Our data were collected directly from Twitter using Twitter’s application programming interface
(API; see dev.twitter.com). We selected a random set of 2,493 non-commercial Twitter users
from an initial database of approximately 3 million user accounts. We ensured that our users
were non-commercial by checking account names, and checking against lists and classifications
on sites such as wefollow.com and twitterholic.com. The users in our dataset are a mix of public
and protected accounts. We collected data daily on the following variables for each user in our
sample: (i) the number of followers, (ii) the number of users followed, and (iii) the cumulative
number of tweets posted by that user since the account was created. Unfortunately, the structure
of the social network to which these users belong was not available to us.
3.1. Initial calibration dataset
We first collected data daily for these 2,493 users for 52 days, between May 8, 2009 and June 28,
2009. This initial dataset allowed us to identify active users among the set. We classified a user
15 We note that there are conditions, related to the way posting affects one’s future number of followers, under
which intrinsic and image-related utility could in fact have the opposite effects to those just described. These conditions have low face validity and are discussed at the beginning of Section 5. Notwithstanding, our model in Section 5 enables us to quantify intrinsic versus image-related utility even under such conditions. In particular, the identification of our model does not rely on the assumption that intrinsic (image-related) utility always gives rise to an increase (decrease) in posting activity following an increase in the number of followers.
17
as “active” if he or she increased his or her cumulative number of tweets or number of users
followed at least once during this screening observation window. Out of all users, 1,355 were
classified as active.
3.2. Field experiment
We collected daily data again from the same set of 2,493 users for 160 days, between September
14, 2009 and February 20, 2010 (our main observation window). We selected 100 users
randomly from the set of 1,355 active users as our treatment group. In order to introduce
exogenous variations in the number of followers, we gradually added 100 followers to public
accounts in the treatment group over a 50-day period (days 57 to 106). For protected accounts in
the treatment group, we sent 100 follow requests over the same 50-day period.
In order to execute our treatment, we created and managed 100 synthetic Twitter users (50
males, 50 females) and created one link from each synthetic user to each treated user (i.e.,
followed or sent a follow request) between days 57 and 106. With the help of two undergraduate
research assistants who were avid Twitter users, we attempted to make our synthetic users as
realistic as possible (we will test the realism of these users experimentally in Section 4.3). The
names of the synthetic users were generated using the name generator available at
www.fakenamegenerator.com. Before linking to the treated users, profile pictures were uploaded
to the synthetic users’ profiles and each synthetic user followed an average of five other
synthetic users as well as some celebrities and media organizations (as is typical for many
Twitter users). The synthetic users also posted tweets on a regular basis. In order to increase the
credibility of the exogenous links to the treated users from the synthetic users, we started by
creating one link (i.e., adding one synthetic follower or sending one follow request in the case of
protected accounts) per day to each treated user. After doing so each day for four days, we
18
increased the daily number of exogenous links per treated user to two per day, and so on until the
rate increased to five per day for four days, after which it was decreased to four per day for four
days, and so on. By day 106 each synthetic user had created one link to each treated user. Figure
1 shows the number of exogenous links created to each treated user on each of the 160 days in
our main observation window.16 Note that our experimental procedure respects Twitter’s Terms
of Service (available at twitter.com/tos).
[ Insert Figure 1 Here ]
4. Model-free analysis
4.1. Descriptive statistics
We first report some key descriptive statistics. Figure 2 shows histograms and log-log plots of
the distribution of the number followers on the first day of the main observation window, for all
2,493 users and for all 1,355 users who were active during the screening period (i.e., the set of
users from which our treated users were drawn). The distribution of the number of followers is
close to a truncated power-law (the log-log plots are close to linear), which is typical of social
networks (e.g., de Solla Price 1965; Barabási and Albert 1999; Stephen and Toubia 2009). Figure
3 shows the distribution across all 1,355 active users of the average daily posting rate during the
main observation window. The average daily posting rate is measured as the total number of
posts during the window divided by the number of days. We see that the distribution is heavily
skewed, with many users posting very little and few users posting heavily. Figure 4 shows the
evolution of the median number of followers over time for treated versus control users. Figure 5
shows the distribution, among treated users only, of the difference between the numbers of
16 The gaps in Figure 1 are due to our RAs needing to take breaks from this labor-intensive activity.
19
followers at the end versus the start of the intervention (day 107 minus day 57). We see that the
control and treatment groups had very comparable median numbers of followers before the start
of the intervention (days 1-57). We also see that the actual increase in number of followers for
treated users may be larger than 100 (due to the addition of “organic” new followers) or smaller
than 100, because some treated users had protected accounts and did not accept all synthetic
users’ follow requests, and because all users have the ability to block any of their followers.17
Nevertheless, by the end of the main observation window, the median number of followers for
treated users was greater than the median number for non-treated users by a margin of 85.00.
In order to verify that the randomization between treatment and control groups was done ap-
propriately, we conducted non-parametric rank sum tests comparing the number of followers on
day 1, the number of users followed on day 1, and the average daily posting rate before treatment
(days 1 to 56) for treated versus non-treated users. None of these tests were significant (all p >
0.16). Similar results were obtained with two-sample t-tests (all p > 0.20). We also compared the
distributions of the number of followers on day 1 using the Kolmogorov-Smirnov (KS) statistic.
The two distributions are not statistically significantly different (p > 0.34).18
[ Insert Figures 2-5 Here ]
4.2. Impact of intervention on posting activity
We now consider the posting behavior of treated versus control users. Studying how treated
users reacted to our intervention is interesting and relevant in and of itself for Twitter and other
social media platforms. Moreover, as argued earlier, increasing (decreasing) posting activity
17 The correlation coefficient between the number of followers on day 1 and the increase in number of followers is
not significant (ρ=0.126, p-value>0.21). 18
Because the KS test itself only applies to continuous distributions, we use bootstrapping to determine the correct
p-value. A similar p-value is obtained using a standard KS test.
20
following the addition of new followers is consistent with intrinsic utility (image-related utility).
We note that we can only argue that each reaction is consistent with a different type of utility,
not equivalent to it. As is often the case in social sciences, we do not observe or measure users’
motivations directly, and instead we disentangle different sources of motivation by identifying a
setting in which they make divergent predictions. However we acknowledge that we cannot rule
out all alternative explanations for the behavior of our treated users.
We compare each user’s average daily posting rate after the intervention (days 107 to 160)
to before the intervention (days 1 to 56). We find that the proportion of users for whom the
average daily posting rate increased after the intervention is somewhat greater among treated
users than it is among the control users. Specifically, 40.82% of treated users had a greater
posting rate after the intervention than before, compared to 34.19% of control users. However,
the difference between these two proportions is not statistically significant (Z = 1.32, p = 0.19).19
Therefore, our intervention did not have a significant main effect on posting activity.
We now explore whether our intervention had different effects based a user’s initial number
of followers. This is plausible for at least two reasons. First, we should expect intrinsic and
image-related utility to vary differently as a function of a user’s number of followers. Therefore,
while the behavior of a user may be more consistent with one source of utility when that user has
few followers, it may be more consistent with the other source as the number of followers
increases. Second, there is likely heterogeneity across users in the relative importance of image-
related versus intrinsic utility, and this heterogeneity may be reflected in the number of
followers. For example, users for whom image-related utility is prevalent may be more likely to
19 Consistent with this result, the average daily posting rate after the treatment (days 107 to 160) is not statistically
significantly different for treated vs. non-treated users (Wilcoxon rank sum test, z=0.935, p >0.35; two-sample t-test, t=1.32, p > 0.18).
21
have made an effort to amass larger numbers of followers. Both of these factors would lead to
users with different numbers of followers reacting differently to the treatment.
Figures 6 and 7 respectively plot the probabilities that a user increased and decreased his or
her posting rate after vs. before the intervention, as a function of the log of that user’s initial
number of followers (on day 1 of the main observation window), for treated and non-treated
users. These figures were obtained by smoothing the raw data using a Gaussian kernel function
(bandwidth = 1).20 We see that treated users with lower initial numbers of followers tended to
increase their posting rates relative to control users. However, treated users with higher initial
numbers of followers tended to decrease their posting rates relative to control users.
[ Insert Figures 6-7 Here ]
In order to statistically compare the impact of the treatment on posting behavior as a
function of the initial number of followers, we split our treated users into quintiles based on their
initial numbers of followers on day 1 of the main observation window. The five quintiles are
described in Table 3. Table 4 reports the proportion of users with increased average daily posting
rates and with decreased average daily posting rates (after versus before the intervention) in each
quintile. Treated users in the 2nd quintile were significantly more likely to increase their posting
rates compared to users in the control group (z = 2.42, p < 0.02), and marginally significantly
less likely to decrease their posting rates compared to users in the control group (z = -1.85, p =
0.06). Users in the 4th quintile, however, show the opposite result: treated users in that group
20 We also ran parametric logistic regressions where the DV was whether the user increased (respectively, de-
creased) their posting rate after vs. before the intervention, and the IVs included a dummy for treatment, log (1+ number of followers on day 1), log(1+number of followers on day 1)2, the interaction between the treatment dummy and log (1+ number of followers on day 1), and the interaction between the treatment dummy and log (1+ number of followers on day 1)2. Comparable figures were obtained, although these parametric regressions do not seem to capture the relationship between the number of followers and the impact of the intervention as well as the nonpara-metric ones. Details are available from the authors.
22
were significantly less likely to increase their posting rates compared to users in the control
group (z = -2.18, p < 0.03), and significantly more likely to decrease their posting rates
compared to users in the control group (z = 1.94, p = 0.05). The differences in the other quintiles
are not statistically significant. Therefore, our results suggest that exogenously adding followers
(or in the case of protected accounts, follow requests) made some users post more (users in the
2nd quintile), made some users post less (users in the 4th quintile), and had little effect on the
others.21
[ Insert Tables 3-4 Here ]
The fact that the treatment had no effect on the 1st and 5th quintiles is not surprising. The 1st
quintile is composed of users with very few followers who are only marginally active and may
hardly visit the Twitter platform. The 5th quintile is composed of users with over one thousand
followers on average, for whom the addition of up to 100 followers over 50 days may have gone
largely unnoticed. The results in the 2nd quintile are consistent with intrinsic utility. As discussed
above, intrinsic utility from posting content should lead users to post more content on average
following our intervention. The results in the 4th quintile, however, are consistent with image-
related utility. As argued above, if the benefits from posting comes from attracting additional
followers and if additional followers provide diminishing marginal utility, we should expect
posting activity to be reduced on average following our intervention. The results in the 3rd
quintile are consistent with the effects of the two sources of utility canceling each other on
average for users with an initial number of followers within the corresponding range.
21 Similar results are obtained when running a logistic regression where the DV is whether each user increased (re-
spectively, decreased) their posting rate after vs. before the intervention, and the IVs include a dummy for treatment, dummies for the various quintiles, and interactions between the treatment and quintile dummies. Results are availa-ble from the authors.
23
It is very common for companies to follow consumers on Twitter, partly in the hope that
these consumers will become advocates and contribute content related to their brand. Our model-
free analysis has managerial implications related to this practice. Indeed, based on our results it
appears that following consumers on Twitter may have the counterintuitive effect of making
them less active and therefore less likely to contribute content related to the brand. This is partic-
ularly true for consumers with relatively large numbers of followers, who are precisely the ones
typically targeted by companies for their ability to reach more people with brand-related messag-
es (e.g., Goldenberg et al. 2009). In Section 5 we present a complementary model-based analysis
that provides additional managerial insights by quantifying intrinsic and image-related utility and
making counterfactual predictions regarding the evolution of Twitter.
4.3. Ecological validity: perceived realism of synthetic followers
One potential concern with our results is that our synthetic followers may have been more likely
to be perceived by treated users as being “fake,” which may have led treated users in our
experiment to react to the addition of followers (or follow requests in the case of protected
accounts) differently than they would have normally. We addressed this concern as follows. In
April 2012, we created a snapshot image for the profile of each synthetic user, each treated user,
and one randomly selected follower of each treated user who did not have a protected account at
that time (76 treated users were in that case).22 The image was a screenshot from the profile
summary publicly available on Twitter which contained the user’s name, picture, number of
tweets to date, number of users followed and following, and the three most recent tweets by the
22 By the time we ran this study, 5 of our treated users did not exist anymore. Also, it was not possible for us to iden-
tify the followers of users with protected accounts.
24
user (with the exception of protected accounts for whom recent tweets are not publicly
available).
Three hundred fifty-five members of the Amazon Mechanical Turk panel, who were pre-
screened as having Twitter accounts, assessed these profiles based on the snapshot images. Each
respondent evaluated a random set of 20 profiles in exchange for $1, and was asked to indicate
whether each profile seemed fake (a fake profile was defined as one that “pretends to be another
person or another entity in order to confuse or deceive other users”). By the end of the survey,
each profile had received an average of 26.199 evaluations.
We computed the proportion of times each profile was judged to be fake. The mean
(median) of this proportion was 0.199 (0.192) among synthetic users, 0.209 (0.192) among
treated users, and 0.307 (0.241) among the followers of treated users. The mean and median
proportions were not statistically significantly different between synthetic and treated users (p >
0.46). Followers of treated users, however, were significantly more likely to be evaluated as fake
compared to synthetic and treated users (all p < 0.01). This is probably because these users
included both commercial and non-commercial users.23 In conclusion, this survey suggests that
our synthetic users were not perceived as being more fake than other non-commercial users (our
treated users), and were perceived as significantly less fake than a random subset of the followers
of non-commercial users. This suggests that our treatment has good ecological validity.
23 The statistical analysis reported here is based on the point estimates of the probability that each profile is judged
to be faked. As an alternative approach, we used a parametric bootstrapping approach to construct confidence inter-vals around the mean and median probabilities reported in the text. We considered a model in which the posterior distribution of the probability that profile i will be judged to be fake is given by: pi~Beta(0.01+fi, 0.01+nfi), where fi (respectively nfi) is the number of times the profile was evaluated as fake (re-spectively, non fake) in the data, and where the beta distribution follows from an uninformative prior (be-ta(0.01,0.01)) and binomial likelihood. We constructed confidence intervals by drawing 10,000 values of pi for each profile. Identical conclusions were reached.
25
4.4. Impact of protected accounts
As mentioned above, while our intervention involved adding new followers to public accounts,
for protected accounts it involved sending follow requests that the treated users could either ac-
cept or reject. The fact that protected accounts had the ability to reject these requests may have
reduced the impact of our intervention, which should make the significant treatment effects
found in quintiles 2 and 4 more conservative, and could artificially attenuate the effect of the
treatment in the other quintiles. We recorded (manually) which treated users had protected ac-
counts at the time of the intervention (unfortunately we do not have this information for the non-
treated users). Fifteen of our treated users were protected at the time of the experiment. These
users were equally spread across the first four quintiles reported in Table 3 (4,4,4, and 3 protect-
ed users in quintiles 1, 2, 3, and 4 respectively, none in quintile 5). Therefore the null treatment
effects in quintiles 1, 3, and 5 appear unlikely to have been driven by a larger proportion of pro-
tected accounts in these quintiles. Moreover, the results in Table 4 do not change qualitatively
when limited to the public treated users. Details are available from the authors.
4.5. Alternative explanation
One alternative explanation for the effect of our treatment on the 4th quintile (those users who
decreased their posting rate after the intervention) is that some users feel comfortable posting on
Twitter when their followers are limited to immediate relations and when some level of intimacy
is preserved, but become less comfortable as their posts become more public. This would drive
these users to contribute less content after receiving additional, unknown followers.
In order to investigate this issue, we were able to download, in April 2012, the text of all
tweets posted by 44 of our treated users during the pre-treatment (days 1 to 56) and post-
26
treatment (days 107 to 160) periods of our main observation window.24 These 44 users posted
3,580 tweets in total during these periods. We asked 749 members of the Amazon Mechanical
Turk panel (prescreened to be Twitter users) to classify these tweets, in exchange for $2. Each
respondent was shown the text of 60 tweets randomly selected from the full set (no information
about each tweet other than its text was provided) and was asked to answer the following ques-
tion about each tweet: “Is this tweet meant for the user’s close friends and family members on-
ly?” The response categories were “Yes,” “No,” and “I do not know / I do not understand this
tweet.” Each tweet was classified by an average of 12.797 respondents.
We compute the proportion of occurrence of each response category for each tweet. Tables
5 to 7 report averages across users. We construct confidence intervals using a parametric
bootstrapping approach.25 Table 5 reports the average across treated users, before and after the
treatment. We see that the treatment decreased the proportion of “private” tweets slightly,
although the 95% confidence intervals before and after the treatment overlap. Next, in order to
investigate whether the treatment had a different effect on users who were posting more private
tweets before the treatment, we report in Table 6 the average categorization across treated users
before the treatment, for those users who increased their posting rate after the intervention vs.
those who decreased their posting rate. We see that indeed, treated users who decreased their
posting rate after the intervention tended to post tweets that were more “private” before the
24 We were not able to retrieve these data for treated users who did not exist anymore as of April 2012, who had
protected accounts, and who had posted more than 3,200 tweets since the end of our observation window (due to limits imposed by the Twitter API). The number of users for whom we have text data in each quintile (first to fifth) is 7, 8, 8, 10, and 11. 25
We denote as pijk the multinomial probability that a random evaluation of tweet j by treated user i would fall in
response category k. We draw 10,000 random sets of probabilities for each tweet, according to: {pij1, pij2, pij3}~Dirichlet(0.01+ nij1, 0.01+ nij2, 0.01+ nij3), where nijk is the observed number of times tweet j by treated user i was classified in category k. This Dirichlet distribution results from an uninformative prior (Dirichlet(0.01, 0.01, 0.01)) combined with the multinomial likelihood function.
27
treatment. This is consistent with the hypothesis that some users decreased their posting rate
after the treatment because their audience changed from being intimate to being more public.
However, in order for this phenomenon to explain why treated users in the 4th quintile
posted less as a result of the intervention, it would need to be the case that the tweets posted by
these users before the treatment were relatively more private. Table 7 reports the average
categorization across treated users before the treatment, for users in the first three quintiles vs.
the fourth vs. the fifth quintile (we group the first three quintiles in order to increase statistical
power and similar conclusions are reached if the five quintiles are considered separately). Users
with more followers at the beginning of the observation window tended to post tweets that were
less private before the treatment: the difference between the first three quintiles and the fourth
quintile is statistically significant, as well as the difference between the fourth and the fifth
quintiles. Therefore if the effect of the treatment were solely driven by privacy considerations,
we should expect treated users in the lower quintiles, and not the fourth quintile, to be the ones
decreasing their posting rate after the treatment, and similarly we should expect treated users in
the higher quintiles, and not the second quintile, to be the ones increasing their posting rate.
In conclusion, while our analysis does provide support to the hypothesis that users who use
Twitter more privately are more likely to decrease their posting rate after the addition of
unknown followers, this phenomenon does not appear to drive our results.
[Insert Tables 5-7 Here]
5. Dynamic Discrete Choice Model
The results of the previous model-free analysis are consistent with the existence of both intrinsic
and image-related utility among Twitter users, with each source of motivation being more or less
28
predominant as a function of the user’s number of followers. In this section we introduce and
estimate a dynamic discrete choice model that attempts to address the following limitations of
our model-free analysis.
First, our model-free analysis did not quantify the relative importance of intrinsic versus
image-related utility. Instead it simply provided patterns of results consistent with the existence
of these two types of utility.
Second, our model-free analysis did not allow us to determine whether the different
responses to the intervention were driven by heterogeneity in the preferences of the treated users
versus heterogeneity in their initial number of followers. It could be the case that the same type
of utility is always dominant for each user irrespective of his or her number of followers, and that
users for whom one versus the other type of utility is dominant happen to have different numbers
of followers. However, it could also be the case that each user goes through different phases on
Twitter, where one source of utility tends to be dominant initially, and the other tends to become
dominant as the user amasses more followers. While we should expect some heterogeneity in
preferences, it is not clear a priori whether the dominant source of utility may vary within a user
over time.
Third, our model-free analysis did not allow us to make any counterfactual predictions on
how users’ motivations to post content on Twitter are likely to impact the platform in the future.
Such predictions are valuable in light of the recent public debate on the sustainability and the
future growth of social media platforms such as Twitter,26 and have implications for the type of
value firms may be able to derive from such platforms in the future.
26 See for example Hagan, Joe. 2011. “Twitter Science.” New York Magazine, October 02.
29
Fourth, while we have argued that intrinsic utility should lead to an increase in posting
activities as the number of followers is increased and that image-related utility should lead to a
decrease, there exist conditions under which these theoretical predictions would be reversed.
These conditions, which do not have high face validity, are related to the way posting affects
one’s future number of followers. For example, if attracting new followers became much harder
as users acquired more followers, it might be possible that although a user motivated by image-
related utility receives less marginal value from each additional follower as his or her number of
followers is increased, he or she has to post much more heavily in order to attract new followers,
leading to a net increase in posting activities.27 Conversely, it is possible that users driven by
intrinsic motivation would post less after having more followers, if for example their early
posting activities were targeted toward building an audience to which they would broadcast later
and if the only way to build such an audience was to post heavily early on. By capturing the
effect of posting on the number of followers, through the state transition probabilities, our model
allows quantifying intrinsic versus image-related utility irrespective of whether these conditions
are satisfied. More generally, the identification of our model does not rely on the assumption that
intrinsic (image-related) utility always gives rise to an increase (decrease) in posting activity
following an increase in the number of followers.
5.1 Model
We index users by i=1,…I. We index time (day) by t=1,…∞. In each time period, each user
chooses one of four possible actions: (i) follow at least one new user and post content, (ii) follow
at least one new user and post no content, (iii) follow no new users and post content, and (iv)
27 This would go against the popular notion of preferential attachment, which is often believed to govern the evolu-tion of “scale-free” social networks like this one (Barabási and Albert 1999).
30
follow no new users and post no content. We model each user’s decision in each time period as a
multinomial choice over these four possible actions. The utility derived by a user from each
action in each time period is a function of the number of followers this user has at that time.
Because the user’s action will have an impact on his or her future number of followers, it will
also have an impact on the utility offered by each possible action in the future. In order to
capture these dynamic effects, we build a dynamic discrete choice model in which users take into
account how their current actions will impact their future decisions. Like with every dynamic
discrete choice model, developing and estimating our model involves (i) defining the states, (ii)
defining the actions, (iii) defining the utility function, (iv) modeling the state transition
probabilities, and (v) specifying the likelihood function. We describe each of these steps next.
5.1.1. States
We define user i’s state at time t by his or her number of followers on that day, sit.
5.1.2. Actions
We denote the action taken by user i at time t by ait={nit,pit}, where nit is a binary variable equal
to 1 if user i followed at least one new user at time t and pit is a binary variable equal to 1 if user i
posted content at time t.28 As mentioned above, each user faces a choice between four possible
actions in each period: {1,1},{1,0},{0,1},{0,0}. We next describe the costs and benefits
associated with each action.
5.1.3. Utility function
We model the utility derived by user i in period t as:
28 An alternative formulation would consider the number of posts and the number of new users followed. However this would make the action space unbounded, and would pose significant computational challenges. Therefore, we treat these actions as binary. Note however that this is different from assuming that users may only follow one (or any other fixed number of) new user or post only one tweet on each day. Indeed, our empirical state transition prob-abilities are based on the actual observations, i.e., they are based on the true numbers of posts and new followers.
31
where θi ={θi1, θi2, θi3, θi4, θi5, θi6} and the following constraints are imposed: θi1, θi3 ≥0; {θi2,
θi4}∈[0,1]2; θi5, θi6 ≤0 (when estimating the model, we further constrain θi to a compact set Θ).
The specification of our utility function is driven by the discussion in Section 2.2. The first
term, 2)1(1i
iti s θθ + , captures image-related utility from having many followers. As discussed
above, the stature or prestige of a Twitter user may be measured as a monotonically non-
decreasing and concave function of that user’s number of followers. This term does not depend
on the action chosen in period t, but it does depend on the current state which is the result of past
actions. The next term, 4)1(3i
itiit sp θθ +⋅ , captures intrinsic utility from posting content. This term
is positive only if the user posts content in period t, and is equal to 0 otherwise. When positive,
this term is also monotonically non-decreasing and concave in the number of followers. Finally,
the last two terms capture the cost of following a new user and of posting, respectively.
5.1.4. State transition probabilities
We denote the state transition probabilities by f(sʹ|s,a), the probability of reaching state sʹ in the
next period given a state s and an action a in the current period. Because we have access to a
substantial amount of data (daily states and actions of 2,493 users over 160 days) and because
our action space includes only four possible actions, we are able to use the observed transition
frequencies as state transition probabilities instead of estimating them parametrically (see for
example Bajari, Benkard and Levin 2007). These empirical state transition probabilities are
based on all observations from all 2,493 users in our main observation window. Each observation
consists of a triplet {sit, ait, si(t+1)}, i.e., a starting state, an action and a resulting next state. The
itiitiitiititiiitit pnspsasu ii
653142 )1()1()|,()1( θθθθθ θθ +++⋅++=
32
empirical transition probability f(sʹ|s,a) is simply the proportion of times state sʹ was observed
among all the observations where action a was taken in state s.29
Across all states and observations, we find that the actions {follow, post}, {follow, don’t
post}, {don’t follow, post} and {don’t follow, don’t post} give rise to expected changes in the
number of followers E(sʹ-s|a) of 1.209, 0.487, 0.197, and 0.022 respectively, and that the
probability of an increase in the number of followers, Prob(sʹ-s>0|a), is 0.511, 0.335, 0.257, and
0.075 respectively. This confirms that both posting content and following new users are ways to
attract new followers.
5.1.5. Likelihood function
Following Rust (1987), we assume an unobservable shock εit to utility for consumer i at time t
which follows a double-exponential distribution. The value function for user i in state {s,ε} is the
solution to the following Bellman equation:
(2)
where β is a discount factor, A is the action space, and ))|,(()|( ii sVEsV θεθε
= . Note that we
make the standard assumption that all users have correct beliefs regarding the state transition
probabilities. We define action-specific value functions as:
(3) ∑+='
)|'().,|'(.)|,()|(s
iiia sVassfasusV θβθθ
29 Using observed transition frequencies implies that we need to limit our analysis to the largest set of states in which all four actions were ob-
served and in which these actions always led to states in that same set of states. This results in a state space containing 539 states with numbers of followers ranging from 0 to 1618, which covers 93.36% of the initial observations (only 2.08% of the initial observations involve states with more than 1618 followers).
∑++=∈
'
)]|'().,|'(.)()|,([max)|,(s
iiAa
i sVassfaasusV θβεθθε
33
Rust (1987) showed that:
(4) ∑∈
=Aa
iai sVsV )))|(exp(log()|( θθ
and that the probability that user i chooses action ait at time t in state sit is given by:
(5) ∑
∈
=
Aaiita
iita
iititsV
sVsaP it
'
' ))|(exp(
))|(exp(),|(
θ
θθ
Equation (5) defines our likelihood function. Maximizing this likelihood function poses great
computational challenges, due to the fact that the likelihood involves the solution to Bellman’s
equation. We use a method recently proposed by Norets (2009a) to estimate the set of parameters
{θi}. We describe our estimation procedure next.
5.2. Estimation
Although we use the full set of users to estimate the state transition probabilities, we estimate the
parameters of the model{θi} for treated users only, mainly for tractability and identification
purposes. In particular, we estimate the model on the set of treated users over the main
observation window (t=1,…160) and remove from the analysis users with fewer than 10 usable
observations. The estimation method proposed by Norets (2009a) is closely related to the method
proposed by Imai, Jain, and Ching (2009), and uses a Bayesian approach to simulate the
posterior distribution of the parameters conditional on the data. Given our particular dataset and
model, we specifically adapt the approach proposed in Corollary 2 of Norets (2009a). This
method offers the combined benefits of being computationally tractable, of being based on the
full solution of the dynamic program, and of allowing us to capture heterogeneity in the
estimated parameters.
34
Our estimation algorithm relies on a hierarchical Bayes model that consists of a likelihood, a
first-stage prior and a second-stage prior. As mentioned above, our likelihood function is given
by Equation (5). We then impose a first-stage prior on the parameters, according to which θi
comes from a truncated normal distribution with mean θ0 and covariance matrix Dθ, where the
truncation ensures that θi remains in a compact set Θ: θi~TN(θ0,Dθ). The arrays θ0 and Dθ are
themselves parameters of the model, on which we impose a second-stage prior. Following
standard practice (Allenby and Rossi 1998), we use a non-informative second-stage prior, in
order to let these parameters be driven by the data. We fix β=0.995. We draw values of all the
parameters using a Gibbs sampler. Details are provided in Appendix 2.
5.3. Results
5.3.1. Model fit
A common measure of model fit in Bayesian statistics is the marginal density of the data
according to the model, defined as: θθθ dPdataPLMD iiI}){(}){|(∫Θ= , where
∏∏= =
=I
i
T
tiititi saPdataP
1 1
),|(}){|( θθ and P({θi}) is the prior distribution on the parameters. The log
marginal density is approximated by the harmonic mean across the Gibbs sampler iterations of
the likelihood of the data (see for example Sorensen and Gianola 2002). To make the results
more intuitive, we report the marginal density to the power of the inverse of the total number of
observations, i.e., we report the geometric mean of the marginal density per observation. We
obtain a value of 0.4726. For comparison, a null model that assumes that all four actions are
equally likely would achieve a per-observation marginal density of 0.2500. We also assess the fit
of the model using posterior predictive checks (Gelman, Meng, and Stern 1996). Posterior
predictive checks assess how well the model fits key aggregate statistics of the data, by
35
comparing the posterior distribution of these statistics based on the model with the observed
values. We consider the proportion of observations in our data in which posting was observed
(i.e., ait={0,1} or {1,1}). The observed proportion is 0.2470. The predicted value of this quantity
is computed at each iteration of the Gibbs sampler, as
∑∑= =
=+=I
iiitit
T
tiitit saPsaP
1 1
),|}1,0{(),|}1,1{( θθ . The posterior distribution (across iterations of the
Gibbs sampler) of the predicted value of this quantity is shown in Figure 8. The mean of the
posterior distribution is 0.2604, and the 95% credible interval ([0.2405;0.2797]) contains the
observed value of 0.2470.
We also consider the distribution (across users) of the proportion of observations in which
posting was observed. We compute a point estimate of the posting frequency for each user (this
point estimate for user i is the average across posterior draws of
),|}1,0{(),|}1,1{(1
iitit
T
tiitit saPsaP θθ =+=∑
=
). Figure 9 provides a scatter plot of the posting
frequency as predicted by the model versus observed, across users. We see that the model is able
to recover these frequencies very well at the individual level.
[ Insert Figures 8-9 Here ]
5.3.2. Parameter estimates
Table 8 reports the point estimates and 95% credible intervals of the average parameters across
users, and Figure 10 plots the distribution across users of the point estimates of the parameters.
[ Insert Table 8 and Figure 10 Here ]
36
We use the parameter estimates to explore the distinction between heterogeneity in prefer-
ences versus heterogeneity in the number of followers. In our model, both image-related utility
and intrinsic utility from posting are monotonically increasing and concave in the number of fol-
lowers. The parameters θi2 and θi4 capture the curvature of these utility curves, and the parame-
ters θi1 and θi3 influence their scale. It is easy to show that the two curves cross exactly once in
the [0,∞) range if (θi2 - θi4)(θi1 - θi3)<0, and do not cross otherwise. Moreover, as sit goes to 0,
image-related utility is larger if θi1 > θi3 and the reverse is true if θi1 < θi3. As sit goes to ∞, im-
age-related utility is larger if θi2 > θi4 and the reverse is true if θi2 < θi4. This gives rise to four
possible segments of treated users, based on which source of utility is larger for lower versus
higher values of sit.
The proportion of treated users in each segment according to our parameter estimates is re-
ported in Table 9. For 34.0% of the treated users, the same type of utility is larger irrespective of
the number of followers (image-related utility is always larger for 10.6% of the treated users and
intrinsic utility for 23.4% of the users). However, and perhaps more interestingly, there is a large
proportion of treated users, 64.9%, for whom the larger source of utility varies within user as the
number of followers increases. We also see a large asymmetry, such that in almost all cases
(64.9% out of 66.0%), the evolution is such that intrinsic utility is initially larger when the num-
ber of followers is smaller, and image-related utility eventually becomes the larger one as the
number of followers grows.
This analysis suggests that the differences in the behavior of treated users in our experi-
ment were not only driven by heterogeneity in preferences across users, but also by the fact that
for the majority of the users, intrinsic utility is larger when the number of followers is smaller
and image-related utility takes over as the number of followers grows.
37
[ Insert Table 9 Here ]
5.3.3. Intrinsic versus image-related utility derived by users
The previous analysis was simply based on the parameter estimates, and did not take into
account the users’ actions. We now estimate the proportion of intrinsic versus image-related
utility derived by each user, according to the model. We define the total image-related utility and
total intrinsic utility for user i as: ∑=
=T
timageimage tii
uu1
,and ∑
=
=T
tintrinsicintrinsic tii
uu1
,, where
2
,)1(1
i
ti itiimage su θθ += and 4
,)1(3int
i
ti itiitrinsic spu θθ +⋅= . We estimate the proportion of intrinsic
utility for user i as:
ii
i
imageintrinsic
intrinsici
uu
uproportion
+= .
Figure 11 plots the distribution across users of the estimate of this proportion. The average
across users is 0.2533 and the median is 0.1313. Therefore, according to the model most treated
users derived more image-related utility from Twitter than they did intrinsic utility during our
main observation window.30
[ Insert Figure 11 Here ]
5.3.4. Counterfactual analysis: change in posting activity if the network’s structure became stable
Finally, we consider the question of how posting frequency would be affected if the network’s
structure were stable, i.e., if a user’s actions did not influence his or her future states. If actions
did not influence future states, actions would be chosen according to the utility derived in the
current period only. For each observation in our data, we estimate the probability of each four
actions based only on the immediate utility provided by that action:
30 One may argue that our treatment affected the proportion of intrinsic vs. image-related utility derived by users.
However computing this proportion on the pre-treatment period gives rise to similar conclusions (average proportion across users is 0.2184 and median is 0.0580).
38
∑∈
=
Aaiit
iititiitit
stable
asu
asusaP
'
))|',(exp(
))|,(exp(),|(
θ
θθ . For each user we estimate the posting frequency under
these probabilities and compare these frequencies with those obtained under the initial model
(used to construct the scatter plots in Figure 9). Figure 12 plots the distribution across users of
the change in posting frequency resulting from the assumption that actions do not influence
future states. The average predicted change across users is -0.0502 (with a 95% credible interval
of [-0.0617; -0.0384]), the median is -0.0347, and the predicted change is negative for 94.68% of
the users. In other words, this counterfactual analysis suggests that if users’ posting activities did
not influence their future number of followers, posting frequency would decrease by an average
of approximately 5%.
[ Insert Figure 12 Here ]
This analysis suggests that as Twitter matures and the network’s structure becomes stable,
content is likely to be contributed more prominently by commercial users. In that case the value
derived from Twitter by non-commercial users is likely to shift somewhat from the production of
content to the consumption of content. Therefore our analysis is consistent with a prediction that
Twitter is likely to shift from a platform used by non-commercial users to share content with
each other, towards a more traditional media platform where non-commercial users consume
content contributed primarily by commercial users. This is consistent with recent research by
Goel, Watts and Goldstein (2012) who find that the diffusion of popular content on Twitter oper-
ates through users with very large numbers of followers (e.g., Justin Bieber), rather than through
peer-to-peer social influence.
This prediction is also consistent with recent changes in Twitter’s positioning. As mentioned
above, Twitter’s initial positioning was as “a real-time information network powered by people
39
all around the world that lets you share and discover what’s happening now” (twitter.com/about,
accessed 02/2010). As of April 2012, Twitter’s positioning had shifted to “a real-time infor-
mation network that connects you to the latest stories, ideas, opinions and news about what you
find interesting.” Twitter advises users to “simply find the accounts you find most compelling
and follow the conversations” and clearly states that “You don’t have to tweet to get value from
Twitter.” (twitter.com/about, accessed 04/2012).
Managerially, this shift implies that in the future firms are likely to derive more value from
social media platforms such as Twitter by using them as media channels where they broadcast
content to consumers, rather than as viral marketing platforms where they create or track word of
mouth, or customer insights platforms where they monitor consumers’ conversations.
6. Discussion and Conclusion
While publishers’ incentives in traditional media are well understood, individuals’ motivations
for contributing content in social media platforms such as Twitter are still under-researched and
have been explored so far only using surveys (e.g., Bughin 2007, Hennig-Thurau et al. 2004).
Previous literature suggests that the two primary types of utility that motivate non-
commercial users to contribute content to social media are intrinsic and image-related. These two
sources of utility may be identified empirically on Twitter, because they give rise to opposite
predictions as to whether users should increase or decrease their posting activities when their
number of followers increases. In order to address the issue that the number of followers is
endogenous, we conducted a field experiment in which we exogenously added followers (or
made follow requests in the case of protected accounts) for a set of users (treatment group), and
compared their posting activities to those of a control group. We then estimated a dynamic
discrete choice model that quantified intrinsic versus image-related utility for each treated user.
40
6.1. Summary of substantive findings and predictions
Our key substantive findings and predictions may be summarized as follows:
• While some non-commercial users respond to the addition of new followers (or new follow
requests in the case of protected accounts) by increasing their posting activities, others re-
spond by posting less content.
• The majority of non-commercial users go through two phases, where intrinsic utility from
posting is larger than image-related utility when they have fewer followers, but image-related
utility becomes larger than intrinsic utility as they amass more followers.
• Most non-commercial users on Twitter appear to derive more image-related utility from their
posting activities than they do intrinsic utility.
• Non-commercial user contributions to Twitter are likely to decrease as the platform matures
and the network’s structure becomes stable. Twitter is likely to become more of a platform
where non-commercial users consume content posted by commercial users, rather than a
platform where non-commercial users share content with each other.
6.2. Managerial implications
Our findings are relevant not only theoretically but also managerially. Understanding what
motivates consumers to be active on social media platforms like Twitter is a prerequisite for
marketers interested in devising efficient social media strategies and optimizing the ways they
engage with consumers on these platforms. We hope that our research will provide specific
guidelines to marketers interested in leveraging Twitter. For example:
• It is standard for firms to follow consumers on Twitter, partly in the hope that these
consumers will become advocates for their brands. According to our results, this practice
may have the counterintuitive effect of making consumers less active and therefore less
41
likely to contribute content related to a brand. Firms and other commercial accounts may
need to exert greater caution when deciding whether to follow non-commercial users.
• If non-commercial users reduce their contributions as Twitter matures, in the future firms
may benefit more from using Twitter as a media channel where they broadcast content to
consumers, rather than a platform for creating or tracking word-of-mouth or a platform for
gathering consumer insights.
6.3. Future research
Our research also offers several areas for future research. First, future research may enrich our
findings by using data that would include the structure of the users’ social network, and include
the text of the tweets in a more systematic manner. Second, beyond the context of the present
paper, future research may explore further the use of field experiments as a way to address
endogeneity issues in social networks (Hartmann et al. 2008; Manski 1993; Moffitt 2001). Given
that social media platforms such as Twitter exist in the public domain, future research on social
media and social interactions may adopt identification strategies similar to ours. Finally, future
research may study motivations in other social media contexts. Intrinsic and Image-related utility
are fundamental concepts that have been shown to be relevant across many domains, therefore
our results may be expected to generalize at least to some degree. However the relative im-
portance of these two types of utility may vary for example based on the type of content posted
by users (e.g., short messages vs. videos vs. pictures) or the structure of the social network (e.g.,
directed vs. undirected).
42
Appendix 1: Illustration of opposite predictions of intrinsic vs. image-related utility
We extend the two-period model used in Section 3 to an infinite horizon, with a discount factor of β<1. We denote by V(n) the value function. In the case of intrinsic utility, the Bellman equa-tion takes the following form: V(n)=max{U(n)+β[δV(n+1)+(1-δ)V(n)], βV(n)}. In the case of im-age-related utility, the Bellman equation takes the following form (the only difference is that the per-period utility U(n) is derived irrespective of whether content is posted in that period): V(n)=max{U(n)+β[δV(n+1)+(1-δ)V(n)], U(n)+βV(n)}. We solve for the value function in closed form:
Lemma 1: The value function is as follows: ∑+∞
=
−
−−−−=
nm
nmmUnV )
)1(1(
)1(1
)()(
δβ
βδ
δβ
Proof: In order to focus on the benefits obtained from posting content, we have assumed that posting content is costless, such that it is optimal to post content in each period under both types of utility. (In Section 5 we estimate a cost of posting content instead of making this assumption.) Because the utility derived when content is posted takes the same form under intrinsic and im-age-related utility, the Bellman equation takes the following form under both types of utility: V(n)=U(n)+β[δV(n+1)+(1-δ)V(n)]. If V(n) is as specified in Lemma 1, then: U(n)+β[δV(n+1)+(1-δ)V(n)]
= U(n)+
∑ ∑+∞
+=
+∞
=
−−−
−−−−−+
−−−−1
1 ))1(1
()1(1
)()1()
)1(1(
)1(1
)(
nm nm
nmnm mUmU
δβ
βδ
δβδβ
δβ
βδ
δββδ
= ∑+∞
+=
−
−−−−−+
−−+
−−
−+
1
))1(1
()1(1
)()]1(
)1(1[)
)1(1
)1(1)((
nm
nmmUnU
δβ
βδ
δβδβ
βδ
δββδ
δβ
δβ
)())1(1
()1(1
)()
)1(1(
)1(1
)(
)1(1
)(
1
nVmUmUnU
nm
nm
nm
nm =−−−−
=−−−−
+−−
= ∑∑+∞
=
−+∞
+=
−
δβ
βδ
δβδβ
βδ
δβδβ Q.E.D. In addition, we show that the value function itself is concave in n : Lemma 2: The value function is concave in n. Proof: Based on the Bellman equation we have: V(n)=U(n)+β[δV(n+1)+ (1-δ)V(n)] and V(n+1)=U(n+1)+β[δV(n+2)+ (1-δ)V(n+1)] Taking the difference between these two equalities yields: βδ[(V(n+2)-V(n+1)) - (V(n+1)-V(n))]=(V(n+1)-V(n))(1-β)-(U(n+1)-U(n)). We show that V(n) is concave by showing that for all n, V(n+2)-V(n+1) < V(n+1)-V(n), which based on the equality above, is equivalent to showing that: (V(n+1)-V(n))(1-β)< (U(n+1)-U(n)). Based on Lemma 1 we have:
∑∑
∑∑∞+
=
−∞+
=
−
+∞
=
−+∞
+=
−−
−−−−−
−−−−
+=
−−−−−
−−−−=−+
nm
nm
nm
nm
nm
nm
nm
nm
mUmU
mUmUnVnV
))1(1
()1(1
)()
)1(1(
)1(1
)1'(
))1(1
()1(1
)()
)1(1(
)1(1
)()()1(
'
'
1
1
δβ
βδ
δβδβ
βδ
δβ
δβ
βδ
δβδβ
βδ
δβ
(where we replaced m-1 with m’ in the first expression)
43
∑∑+∞
=
−+∞
=
−
−−−−
−+<
−−−−
−+=
nm
nm
nm
nm nUnUmUmU)
)1(1(
)1(1
)()1()
)1(1(
)1(1
)()1(
δβ
βδ
δβδβ
βδ
δβ
(because U(n) is concave)
βδβ
βδδβ −
−+=
−−−
−−
−+=
1
)()1(
)1(11
1
)1(1
)()1( nUnUnUnU
Q.E.D. We now show the following two propositions: Proposition 1: In the case of intrinsic utility, the benefit from posting content in a given period is monotonically increasing in the number of followers in that period. Proof: Given Lemma 1, V(n) is increasing in n. If a user posts content in a given period, he or she derives an action-specific value function of V(n). On the other hand, if the user decides not to post content, he or she derives no utility in that period and will derive V(n) starting from the next period, giving rise to an action specific value function of βV(n). The benefit from posting in the current period is V(n)(1- β), which is monotonically increasing in n because β<1 and V(n) is monotonically increasing in n. Q.E.D. Proposition 2: In the case of image-related utility, the benefit from posting content in a given period is monotonically decreasing in the number of followers in that period. Proof: Under image-related utility, the action-specific value function when posting content is: V(n)= U(n)+β[δV(n+1)+(1-δ)V(n)], and the action-specific value function when content is not posted is: U(n)+βV(n). The difference is equal to βδ(V(n+1)-V(n)), which is decreasing in n be-cause V(n) is concave (Lemma 2). Q.E.D.
Finally, we illustrate propositions 1 and 2 graphically.31 We assume nnU =)( , set β=0.995, and
compute the value function for n=1 to 1000, for δ=0.1, 0.5, 0.9 and 1. (We approximate the value
function by ∑+
=
−
−−−−
1000
))1(1
()1(1
)(n
nm
nmmU
δβ
βδ
δβ instead of ∑
+∞
=
−
−−−−nm
nmmU)
)1(1(
)1(1
)(
δβ
βδ
δβ). The
figure below shows the incremental value for posting content in a given period, as a function of n and δ, together with U(n) as a function of n.
31 We are indebted to an anonymous reviewer for suggesting this illustration.
44
Figure A1. Incremental value for posting content in a given period and per-period utility, under intrinsic and image-
related utility.
Notes: We assume nnU =)( and set β=0.995.
45
Appendix 2: Details of the estimation procedure
The estimation algorithm proposed by Norets (2009a) relies on an approximation of the value function. Let m index the iterations of the Gibbs sampler. The approximation of the value function at iteration m leverages the value function approximations from previous iterations. Let
Iim
ii ,...,1
)1*()1*( },...,{ =−θθ be the parameter draws from the first m-1 iterations. For each user i, the
approximation at iteration m uses only the past N(m) iterations, where N(m)=[mγ1] (we set γ1=0.6). Among these iterations, the approximation relies on the ones at which the draws of the parameters were the closest to the current draw of θi. In particular, the Ñ(m) closest neighbors are considered, where Ñ(m)=[mγ1] (we set γ2=0.4). At iteration m, we consider the following
approximation of Va for user i: ∑ ∑=
+=)(
~
1 '
)(*)()( ),|'().|'(~
)(~
1.)|,()|(
~mN
l s
ki
kii
ma assfsV
mNasusV ll θβθθ
where)}(
~,...1{
}{mNllk
∈index the closest neighbors to θi among the N(m) past draws, and )(~
lkV refers to
value function approximations computed in previous iterations. Norets (2009a) proves theoretically that these approximations converge uniformly and completely to the exact values (see Theorem 1, p. 1677). We run the Gibbs sampler for 2,000 iterations, using the first 500 as burn-in. Convergence was assessed visually using plots of the parameters. At each iteration m of the Gibbs sampler:
-A proposed value of θi ,)*(m
iθ , is drawn for each user. Let)1( −m
iθ denote the value retained at
iteration m-1. For each i, the new proposed value is drawn such that: im
im
i d+= − )1()*( θθ where
),0(~ θγDNdi and γ is adjusted such that the acceptance rate is around 30%. Using rejection
sampling (see for example Allenby, Arora and Ginter 1995), we constrain)*(m
iθ to a compact set
Θ=[0,10]*[0,0.6]*[0,10]* [0,0.6]*[-10,0]2 (Note: we use 0.6 instead of 1 as an upper bound for θi,2 and θi,4 for numerical stability issues – this constraint does not appear to be binding). The
new value is retained (i.e.,)*()( m
im
i θθ = ) with probability:
}1,)|}({
~).,|(
)|}({~
).,|(min{
)1(
,...1
)(
0
)1(
)*(
,...1
)(
0
)(*
−=
−
=
miTtit
mmi
miTtit
mm
i
aLDP
aLDP
θθθ
θθθ
θ
θ , where )|}({~
,...1
)(
iTtitm aL θ= ∏
=
=T
tiitit
m saP1
)( ),|(~
θ ,
and where:∑
∈
=
Aaiit
ma
iitm
aiitit
m
sV
sVsaP it
'
)(
'
)(
)(
))|(~
exp(
))|(~
exp(),|(
~
θ
θθ . Otherwise we set
)1( −= mi
mi θθ .
-A new set of value function approximations are generated to be used in future iterations. These new approximations are obtained by applying the Bellman operator:
)))|(~
exp(log()|(~ )*()()*()( ∑
∈
=Aa
mi
ma
mi
m sVsV θθ . As recommended by Norets (2009b), we apply the
Bellman operator more than once. In our implementation we apply the Bellman operator until the new values are close enough to the previous ones (maximum absolute deviation less than 1), with a minimum of 10 iterations.
-The first-stage prior parameters θ0 and Dθ are updated. Our second stage prior on θ0 is uniform on the compact set Θ, and our second-stage prior on Dθ follows an inverse-Wishart distribution: Dθ
-1~W(v0,V0) with v0=k+3 (where k is the number of elements in θi) and V0=0.001I (where I is
46
the identity matrix). These second-stage priors have the attractive property of being conjugate with the likelihood for the first-stage prior parameters, which allows us to draw the parameters
directly from their respective conditional posterior distributions: ),(~ 10
I
D
ITN
I
ii
θ
θ
θ∑
= and
)
))((
,(~ 1
00
00
1
IVIvWD
I
i
Tii∑
=−
−−
++
θθθθ
θ
47
References
Allenby, Greg M., and Peter E. Rossi. 1998. “Marketing Models of Consumer Heterogeneity.” Journal of Econometrics: 57-78.
Allenby, Greg M., Neeraj Arora, and James L. Ginter. 1995. “Incorporating Prior Knowledge into the Analysis of Conjoint Studies.” Journal of Marketing Research: 152-162.
Ariely, Dan, Anat Bracha, and Stephan Meier. 2009. “Doing Good or Doing Well? Image Motivation and Monetary Incentives in Behaving Prosocially.” The American Economic Review: 544-555.
Bajari, Patrick, C. Lanier Benkard, and Jonathan Levin. 2007. “Estimating Dynamic Models of Imperfect Competition.” Econometrica: 1331-1370.
Bakshy, Eytan, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. “Everyone's an influencer quantifying influence on twitter.” Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11: 65-74.
Barabási, Albert-László and Réka Albert. 1999. “Emergence of Scaling in Random Networks.” Science: 509-512.
Baumeister, Roy F., and Mark R. Leary. 1995. “The Need to Belong: Desire for Interpersonal Attachment as a Fundamental Human Motivation.” Psychological Bulletin: 497-529.
Bénabou, Roland, and Jean Tirole. 2006. “Incentives and Prosocial Behavior.” American Economic Review: 1652-1678.
Bitzer, Jűrgen, Wolfram Schrettl, and Philipp Schröder. 2007. “Intrinsic Motivation in Open Source Software Development.” Journal of Comparative Economics, 35: 160-169.
Bughin, Jacques R. 2007. “How Companies Can Make the Most of User-Generated Content.” McKinsey Quarterly: 1-4.
Cha, Meeyoung, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. 2010. “Measuring User Influence in Twitter: The Million Follower Fallacy.” Proceedings of the fourth international AAAI conference on weblogs and social media: 10-17.
de Solla Price, Derek J. 1965. “Networks of Scientific Papers.” Science: 510-515. DeWall, Nathan C., Roy F. Baumeister, and Kathleen D. Vohs. 2008. “Satiated with
Belongingness” Effects of Acceptance, Rejection, and Task Framing on self-Regulatory Performance.” Journal of Personality and Social Psychology, 95(6): 1367-1382.
Fehr, Ernst, and Armin Falk. 2002. “Psychological Foundations of Incentives.” European Economic Review: 687-724.
Fershtman, Chaim, and Neil Gandal. 2007. “Open Source Software: Motivation and Restrictive Licensing.” International Economics and Economic Policy: 209-225.
Gelman, Andrew, Xiao-Li Meng, and Hal Stern. 1996. “Posterior Predictive Assessment of Model Fitness Via Realized Discrepancies.” Statistica Sinica: 733-807.
Ghose, Anindya, Avi Goldfarb Sang-Pil Han. 2011. “How is the Mobile Internet Different? Search Costs and Local Activities.” New York University working paper.
Ghosh, Rishab A., Ruediger Glott, Bernhard Krieger, and Gregorio Robles. 2002. “Free / Libre and Open Source Software: Survey and Study – Floss final report,” working paper, University of Maastricht, available at http://www.flossproject.org/report/index.htm.
Glazer, Amihai, and Kai Konrad. 1996. “A Signaling Explanation for Charity.” American Economic Review: 1019-1028.
48
Godes, David and Dina Mayzlin. 2004. “Using Online Conversations to Study Word-of-Mouth Communication.” Marketing Science: 545-560.
Goel, Sharad, Duncan J. Watts, and Daniel G. Goldstein (2012), “The Structure of Online Diffusion Networks,” Proceedings of 13th ACM Conference on Electronic Commerce.
Goldenberg, Jacob, Sangman Han, Donald R. Lehmann, and Jae Weon Hong. 2009. “The Role of Hubs in the Adoption Process.” Journal of Marketing: 1-13.
Harbaugh, William T. 1998a. “The Prestige Motive for Making Charitable Transfers.” American Economic Review: 277-282.
Harbaugh, William T. 1998b. “What Do Donations Buy? A Model of Philanthropy Based on Prestige and Warm Glow.” Journal of Public Economy: 269-284.
Hars, Alexander, and Shaosong Ou. 2002. “Working for Free? – Motivations of Participating in Open Source Projects.” International Journal of Electronic Commerce, 6(3): 25-39.
Hartmann, Wesley, Puneet Manchanda, Matthew Bothner, Peter Dodds, David Godes, Kartik Hosanagar, and Catherine Tucker. 2008. “Modeling Social Interactions: Identification, Empirical Methods and Policy Implications.” Marketing Letters: 287:304.
Hennig-Thurau, Thorsten, Kevin P. Gwinner, Gianfranco Walsh, and Dwayne D. Gremler. 2004. “Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet?” Journal of Interactive Marketing: 38:52.
Imai, Susumu, Neelam Jain, and Andrew Ching. 2009. “Bayesian Estimation of Dynamic Discrete Choice Models.” Econometrica: 1865-1899.
Kankanhalli, Atreyi, Bernard C.Y. Tan, and Kwok-Kee Wei. 2005. “Contributing Knowledge to Electronic Knowledge Repositories: An Empirical Investigation.” MIS Quarterly: 113-143.
Katona, Zsolt, Peter P. Zubcsek, and Miklos Sarvary. 2011. “Network Effects and Personal Influences: Diffusion of an Online Social Network.” Journal of Marketing Research: 425-443.
Kumar, Vineet. 2009. “Why do Consumers Contribute to Connected Goods? A Dynamic Game of Competition and Cooperation in Social Networks.” Tepper School of Business working paper.
Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. 2010. “What is Twitter, a social network or a news media?” Proceedings of the 19th international conference on World wide web - WWW '10: 591-600.
Lakhani, Karim R. and Robert Wolf. 2005. “Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects.” In Perspectives on Free and Open Source Software, edited by J. Feller, B. Fitzgerald, S. Hissam and K. Lakhani. Cambridge, Mass: MIT Press.
Lampel, Joseph, and Ajay Bhalla. 2007. “The Role of Status Seeking in Online Communities: Giving the Gift of Experience.” Journal of Computer-Mediated Communication: 434-455.
Lerner, Josh, and Jean Tirole. 2002. “Some Simple Economics of Open Source.” The Journal of Industrial Economics: 197-234.
Lerner, Josh, and Jean Tirole. 2005. “The Economics of Technology Sharing: Open Source and Beyond.” Journal of Economic Perspectives: 99-120.
Manski, Charles F. 1993. “Identification of Endogenous Social Effects: The Reflection Problem.” Review of Economic Studies: 531-42.
Moffitt, Robert A. 2001. “Policy Interventions, Low-Level Equilibria, and Social Interactions.” In Social Dynamics, edited by S. Durlauf and P. Young, 45-82 Washington, D.C.: Brookings Institution Press and MIT.
49
Norets, Andriy. 2009a. “Inference in Dynamic Discrete Choice Models with Serially Correlated Unobserved State Variables.” Econometrica: 1665-1682.
Norets, Andriy. 2009b. “Implementation of Bayesian Inference in Dynamic Discrete Choice Models.” Princeton University working paper.
Nov, Oded. 2007. “What Motivates Wikipedians?” Communications of the ACM: 60-64. Roberts, Jeffrey A., Il-Horn Hann, and Sandra Slaughter. 2006. “Understanding the Motivations,
Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects.” Management Science, 52(7): 984-999.
Romero, Daniel M., Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman. 2011. “Influence and passivity in social media.” Proceedings of the 20th international conference companion on World wide web - WWW '11: 113-114.
Rust, John. 1987. “Optimal Replacement of GMC Bus Engines: an Empirical Model of Harold Zucker.” Econometrica: 999-1033.
Ryan, Richard M., and Edward L. Deci. 2000. “Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions.” Contemporary Educational Psychology: 54-67.
Shriver, Scott K., Harikesh S. Nair, and Reto Hofstetter. 2012. “Social Ties and User-Generated Content: Evidence from an online Social Network.” forthcoming, Management Science.
Sorensen, Daniel, and Daniel Gianola. 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. New York: Springer.
Stephen, Andrew T. and Olivier Toubia. 2009. “Explaining the Power-Law Degree Distribution in a Social Commerce Network.” Social Networks: 262-270.
Stephen, Andrew T., Yaniv Dover, Lev Muchnik, and Jacob Goldenberg. 2012. “The Effect of Transmitter Activity on Information Diffusion Over Online Social Networks.” Working paper, University of Pittsburgh.
Stephen, Andrew T. and Jeff Galak. 2012. “The Effects of Traditional and Social Earned Media on Sales: A Study of a Microlending Marketplace.” Journal of Marketing Research: 624-639.
Sun, Monic, and Feng Zhu. 2011. “Ad Revenue and Content Commercialization: Evidence from Blogs.” Stanford GSB working paper.
Trusov, Michael, Randolph Bucklin, and Koen Pauwels. 2009. “Effects of Word-of-Mouth Versus Traditional Marketing: Findings from an Internet Social Networking Site.” Journal of Marketing: 90-102.
Von Hippel, Eric, and Georg von Krogh. 2003. “Open Source Software and the ‘Private-Collective’ Innovation Model: Issues for Organization Science.” Organization Science, 14(2): 209-223.
Von Krogh, Georg, and Eric von Hippel. 2006. “The Promise of Research on Open Source Software.” Management Science, 52(7): 975-983.
Wasko, Molly McLure, and Samer Faraj. 2005. “Why Should I Share? Examining Social Capital and Knowledge Contribution in Electronic Networks of Practice.” MIS Quarterly: 35-57.
Weng, Jianshu, Ee-Peng Lim, and Jing Jiang. 2010. “TwitterRank: Finding Topic-sensitive Influential Twitterers.” Proceedings of the third ACM international conference on Web search and data mining - WSDM '10: 261-270.
Wu, Shaomei, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. “Who Says What to Whom on Twitter.” Proceedings of the 20th international conference on World wide web - WWW '11: 705-714.
50
Tables and Figures
Table 1—Illustrative model. Utility derived in each period, under intrinsic utility.
post in Period 1 don’t post in Period 1
utility in Period 1 U(n) 0
post in Period 2 don’t post in Period 2 post in Period 2 don’t post in Period 2
expected utility in Period 2
δU(n+1)+(1-δ)U(n) 0 U(n) 0
total expected utility (Period 1+Period 2) δU(n+1)+(2-δ)U(n) U(n) U(n) 0
Table 2—Illustrative model. Utility derived in each period, under image-related utility.
post in Period 1 don’t post in Period 1
utility in Period 1 U(n) U(n)
post in Period 2 don’t post in Period 2 post in Period 2 don’t post in Period 2
expected utility in Period 2
δU(n+1)+(1-δ)U(n) δU(n+1)+(1-δ)U(n) U(n) U(n)
total expected utility (Period 1+Period 2) δU(n+1)+(2-δ)U(n) δU(n+1)+(2-δ)U(n) 2U(n) 2U(n)
Table 3—Five quintiles based on number of followers on day 1.
Quintile Range of number of followers
Median number of followers
Average number of followers
1 0-12 7 6.499
2 13-26 19 18.941
3 27-61 39.5 40.988
4 62-245 109 125.550
5 246-18940 704 1378.949
Notes: The quintiles (i.e., range of the number of followers) are determined based on the treated users to ensure an equal spread of these users across the five groups. The median and average numbers of followers reported in the table are for the entire set of active users.
Table 4—Proportion of users with increased / decreased average daily posting rate (after versus before
intervention).
Increased average daily posting rate Decreased average daily posting rate
Quintile Treated Control Treated Control
1 0.286 0.205 0.238 0.185 2 0.632 0.350 0.158 0.370 3 0.556 0.370 0.278 0.426 4 0.200 0.451 0.700 0.474 5 0.400 0.375 0.500 0.574
Notes: Treated users in the 2nd quintile were significantly more likely to increase their posting rates compared to users in the control group (z=2.42, p<0.02), and marginally significantly less likely to decrease their posting rates compared to users in the control group (z=-1.85, p=0.06). Treated users in the 4th quintile were significantly less likely to increase their posting rates compared to users in the control group (z = -2.18, p < 0.03), and significantly more likely to decrease their posting rates compared to users in the control group (z=1.94, p=0.05). The differences in the other quintiles are not statistically significant.
51
Table 5— Average categorization of tweets posted before vs. after the treatment. 95% credible intervals are reported in brackets.
“Is this tweet meant for the user’s close friends and family members only?”
before treatment after treatment
No 0.536 [0.519,0.552] 0.536 [0.523,0.548]
Yes 0.369 [0.353,0.385] 0.354 [0.343,0.367]
Notes: In the interest of space, the third response category (“I do not know / I do not understand this tweet”) is not reported in the table.
Table 6— Average categorization of tweets posted before the treatment by users who increased vs. decreased their
posting rate after the treatment. 95% credible intervals are reported in brackets. “Is this tweet meant for the user’s close friends and
family members only?” user increased posting rate
user decreased posting rate
No 0.572 [0.544,0.600] 0.507 [0.486,0.525]
Yes 0.364 [0.337,0.392] 0.373 [0.355,0.392]
Notes: In the interest of space, the third response category (“I do not know / I do not understand this tweet”) is not reported in the table.
Table 7— Average categorization of tweets posted before the treatment by users in different quintiles (as defined in
Table 3). 95% credible intervals are reported in brackets. quintile to which user belongs
“Is this tweet meant for the user’s close friends and family members only?”
1st-3rd 4th 5th
No 0.486
[0.453,0.518] 0.561
[0.538,0.583] 0.602
[0.592,0.611]
Yes 0.447
[0.415,0.478] 0.328
[0.308,0.349] 0.269
[0.260,0.278]
Notes: In the interest of space, the third response category (“I do not know / I do not understand this tweet”) is not reported in the table.
52
Table 8— Point estimates and 95% credible intervals of the average parameters across users.
Parameter Point estimate Credible interval
θ1 0.309 [0.170;0.497]
θ2 0.329 [0.241;0.388]
θ3 0.690 [0.586;0.791]
θ4 0.265 [0.226;0.309]
θ5 -5.008 [-5.301;-4.648]
θ6 -5.075 [-5.380;-4.775]
Notes: The utility derived by user i at period t is:u�sit, ait|θi�=θi1(1+sit)θi2+p
itθ
i3(1+sit)
θi4+θi5nit+θi6pit,
where sit is user i’s number of followers in period t, pit is a binary variable equal to 1 if user i posts content in period t, and nit is a binary variable equal to 1 if user i follows at least one new user in period t. We fix β=0.995.
Table 9— User segmentation based on parameter estimates. (Proportion of treated users in each segment)
image-related utility > intrinsic utility when si→∞ (θi2 > θi4)
intrinsic utility > image-related utility when si→∞ (θi2 < θi4)
image-related utility > intrinsic utility when si→0
(θi1 > θi3)
0.106 0.011
intrinsic utility > image-related utility when si→0
(θi1 < θi3) 0.649 0.234
53
Figure 1. Daily number of exogenous links created per treated user over the main observation window.
Figure 2. Histograms and log-log plots of the distribution of the number of followers on day 1 for all users (top
panel) and all active users (bottom panel).
0
1
2
3
4
5
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160
nu
mb
er
of
lin
ks
day number
54
Figure 3. Distribution of average daily posting rate among all active users.
Figure 4. Median number of followers as a function of time for treated and control users.
55
Figure 5. Distribution among treated users of the increase in the number of followers after vs. before the
intervention (day 107-day 57).
Figure 6. Probability of increasing posting rate after versus before the intervention as a function of the log of the number of followers on day 1.
Notes: This Figure is obtained by smoothing the raw data using a Gaussian kernel function (with a bandwidth of 1).
56
Figure 7. Probability of decreasing posting rate after versus before the intervention as a function of the log of the
number of followers on day 1.
Notes: This Figure is obtained by smoothing the raw data using a Gaussian kernel function (with a bandwidth of 1).
Figure 8. Posterior check of proportion of observations that include posting.
Notes: The histogram plots the posterior distribution of the proportion of observations in which users post content,
as predicted by the model. The solid line represents the observed proportion.
57
Figure 9. Posting frequency as predicted by the model versus observed
Notes: The x-axis corresponds to the posting frequency (proportion of observations that involve posting) as predicted by the model, the y-axis corresponds to the observed frequency. Each dot corresponds to one user.
58
Figure 10. Distribution of the parameters across users.
Notes: Distribution of θ1 to θ6 across users (from left to right and from top to bottom).
59
Figure 11. Distribution across users of the proportion of intrinsic utility derived during the main observation window (intrinsic / (intrinsic + image-related)).
Figure 12. Distribution across users of the predicted change in posting frequency if network’s structure became
stable.