inferring the impacts of social media on crowdfundingclu/doc/wsdm14_poster.pdf · extracted from...
TRANSCRIPT
![Page 1: Inferring the Impacts of Social Media on Crowdfundingclu/doc/wsdm14_poster.pdf · extracted from social media, instead of the properties of a project on its own. Table 2 shows that](https://reader036.vdocuments.us/reader036/viewer/2022081616/6003ceee104ada0ca229684a/html5/thumbnails/1.jpg)
1. Impacts of Social Media on Crowdfunding
Inferring the Impacts of Social Media on Crowdfunding
Chun-Ta Lu ([email protected]) Sihong Xie Xiangnan Kong Philip S. Yu
2. Data Analysis: Kickstarter and Twitter
3. Dynamics of Crowdfunding
5 10 250.8
0.85
0.9
0.95
Elapsed time of total project duration (%)
AUC
A A,B A,B,C
0 5 10 15 20 250.1
0.15
0.2
0.25
0.3
0.35
0.4
Elapsed time of total project duration (%)
mR
SE
S7BaseLine
0.5 1 1.5 2 2.5 3 3.50.25
0.3
0.35
0.4
0.45
0.5
0.55
# of backers (log base 10)
mR
SE
S7BaseLine
0 0.5 1 1.5 2 2.5 30
20
40
60
80
100
120
140
Pledged funds/Fundraising Goal
# of
pro
ject
s
0 1 2 3 4 50
0.5
1
1.5
2
2.5
3
3.5
# of users (log base 10)
# of
pro
ject
s (lo
g ba
se 1
0)
BackersPromotersPatrons
Film & VideoMusicGamesPublishingDesignOthers
0 1 2 3 4 50
1
2
3
4
5
6
7
# of backers (log base 10)
$ of
ple
dged
fund
s (lo
g ba
se 1
0)
0 0.5 1 1.5 2 2.5 3 3.50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
# of
bac
kers
(log
bas
e 10
)
# of promoters (log base 10)
0 1 2 3 4 50
0.5
1
1.5
2
2.5
3
pled
ged
fund
s/fu
ndra
isin
g go
al
# of backers (log base 10)
0 25 50 75 1001
1.5
2
2.5
3
Elapsed time of total project duration (%)
# of
bac
kers
(log
bas
e 10
)
LessMedianMassive
0 25 50 75 1000
0.5
1
1.5
2
Elapsed time of total project duration (%)
pled
ged
fund
s / g
oal
LessMedianMassive
0 25 50 75 1000
10
20
30
Elapsed time of total project duration (%)
aver
age
num
ber p
er p
roje
ct (c
ount
)
Pled
ged
Fund
s (%
)
0
5
10
15backerratio
0 25 50 75 1000
3
6
Elapsed time of total project duration (%)
aver
age
num
ber p
er p
roje
ct (c
ount
)
Patro
ns/P
rom
oter
s (%
)
0
50
100promoterpatronratio
(1)Three phases of crowdfunding activities
(2)Correlations between social promotion and fundraising results
project duration (%)Feature 5 10 25
# of backers 0.82 0.89 0.92# of patrons 0.79 0.81 0.85# of promoters 0.77 0.78 0.79# of tweets 0.73 0.70 0.65# of weakly connected components 0.70 0.69 0.68
Table 2: Correlation of the total number of back-ers at target deadline and the features extracted atspecified time.
project duration (%)Feature 5 10 25
raised funds/goal 0.77 0.89 0.95# of triads 0.47 0.46 0.45# of edges 0.43 0.43 0.38# of patrons 0.26 0.26 0.24# of backers 0.18 0.29 0.30
Table 3: Correlation of the ratio of raised funds tofundraising goal at target deadline and the featuresextracted at specified time.
of 5% means that the correlation between the accumulatednumber of promoters by the end of the 5% period of thefundraising duration to the number of backers at the targetdeadline is 0.77. This correlation increases slightly to 0.78and 0.79 at the end of the 10% and 25% periods, respectively.For each table, we only report the top 5 features that
have the highest correlation at each time frame. As shownin both tables, the feature that has the best correlation is thefundraising result itself. This indicates the early status ofthe fundraising result has good predictive power on its finalvalue, and we should consider it as baseline for each task.Interestingly, the other top features for both objectives areextracted from social media, instead of the properties of aproject on its own.Table 2 shows that the number of backers is strongly cor-
related to the volume of promotional activities, i.e. the num-ber of patrons, the number of promoters, and the numberof tweets. On the other hand, Table 3 shows that the ra-tio of fundraising goal achievement is more correlated to thestructure of campaign graphs, i.e. the number of triads andthe number of edges. These observations are due to the fun-damentally different properties of the volume and networkstructures. The number of backers is measured as the quan-tity of support and is primarily affected by the volume ofpromotion, whereas its success is determined by the qualityof support, i.e. the amount of money. A popular projectmay receive lots of very small contributions, e.g. $1 dol-lar, from general public but still not be able to reach thefundraising goal. However, a successful project would tar-get specific groups that have interests in it and acquire moremoney from them. The number of edges and the number oftriads both suggest the strength of information flows withingroups. A large number of edges and/or triads show a signof the intensive interests among groups, and thus they aremore correlated to the ratio of fundraising goal achievement.Concurrent processes of social promotions and ex-
ternal stimulations.
1 1.5 2 2.5 3 3.50
0.5
1
1.5
2
# of backers (log base 10)
# of
edg
es (l
og b
ase
10)
(a) Number of edges
1 1.5 2 2.5 3 3.50
1
2
3
4
# of backers (log base 10)
diam
eter
(b) Diameter
1 1.5 2 2.5 3 3.50
0.5
1
1.5
# of backers (log base 10)
# of
con
nect
ed c
ompo
nent
s(lo
g ba
se 1
0)
(c) Number of weakly con-nected components
1 1.5 2 2.5 3 3.555
60
65
70
75
# of backers (log base 10)
exte
rnal
influ
ence
d no
des
(%)
(d) fraction of nodes fromexternal stimulations
Figure 10: Each figure shows, given the promotionalactivity in the starting bursty phase, the value of aparticular property as a function of the number ofbackers at target deadline. Top row and bottomrow display the properties from the aspect of socialpromotions and external stimulations, respectively.
Our above observations show that there are strong corre-lations between fundraising results and social features. Weclaim they are primarily due to the fact that if a projectis more persuasive to intrigue authors in social networks todiscuss it (either by social promotions or external stimula-tions), it is more attractive to investors. From the aspect ofsocial promotions, we expect such a project to have a largercampaign graph. From the aspect of external stimulations,we expect there exists more groups in the campaign, eachof which is represented by a weakly connected componentas each component can be viewed as started by an externalstimulus.
We consider the relationship between the number of back-ers and the social structure of its campaign in the startingbursty phase as an example to illustrate our claim. Fromthe aspect of social promotions, as shown in Figure 10(a)and Figure 10(b), we notice that the number of backers in-creases with the number of edges and/or the diameter. Bothfeatures indicate the expansion of crowdfunding campaignswhich is activated from the influences of social promotions.From the aspect of external stimulations, as can be seen inFigure 10(c), we observe the similar rise in the number ofweakly connected components, which indicates the increasein the number of groups.
These results from both aspects suggest that social pro-motions and external stimulations concurrently activate theprocesses of crowdfunding campaigns. One may ask whichone is more important? In Figure 10(d), we find that over60% of nodes are from external stimulations. This reflectsthe fact that most authors access a project from externalsources outside social media and motivated by personal in-terests to promote it. However, we also notice if there are
project duration (%)Feature 5 10 25
# of backers 0.82 0.89 0.92# of patrons 0.79 0.81 0.85# of promoters 0.77 0.78 0.79# of tweets 0.73 0.70 0.65# of weakly connected components 0.70 0.69 0.68
Table 2: Correlation of the total number of back-ers at target deadline and the features extracted atspecified time.
project duration (%)Feature 5 10 25
raised funds/goal 0.77 0.89 0.95# of triads 0.47 0.46 0.45# of edges 0.43 0.43 0.38# of patrons 0.26 0.26 0.24# of backers 0.18 0.29 0.30
Table 3: Correlation of the ratio of raised funds tofundraising goal at target deadline and the featuresextracted at specified time.
of 5% means that the correlation between the accumulatednumber of promoters by the end of the 5% period of thefundraising duration to the number of backers at the targetdeadline is 0.77. This correlation increases slightly to 0.78and 0.79 at the end of the 10% and 25% periods, respectively.For each table, we only report the top 5 features that
have the highest correlation at each time frame. As shownin both tables, the feature that has the best correlation is thefundraising result itself. This indicates the early status ofthe fundraising result has good predictive power on its finalvalue, and we should consider it as baseline for each task.Interestingly, the other top features for both objectives areextracted from social media, instead of the properties of aproject on its own.Table 2 shows that the number of backers is strongly cor-
related to the volume of promotional activities, i.e. the num-ber of patrons, the number of promoters, and the numberof tweets. On the other hand, Table 3 shows that the ra-tio of fundraising goal achievement is more correlated to thestructure of campaign graphs, i.e. the number of triads andthe number of edges. These observations are due to the fun-damentally different properties of the volume and networkstructures. The number of backers is measured as the quan-tity of support and is primarily affected by the volume ofpromotion, whereas its success is determined by the qualityof support, i.e. the amount of money. A popular projectmay receive lots of very small contributions, e.g. $1 dol-lar, from general public but still not be able to reach thefundraising goal. However, a successful project would tar-get specific groups that have interests in it and acquire moremoney from them. The number of edges and the number oftriads both suggest the strength of information flows withingroups. A large number of edges and/or triads show a signof the intensive interests among groups, and thus they aremore correlated to the ratio of fundraising goal achievement.Concurrent processes of social promotions and ex-
ternal stimulations.
1 1.5 2 2.5 3 3.50
0.5
1
1.5
2
# of backers (log base 10)
# of
edg
es (l
og b
ase
10)
(a) Number of edges
1 1.5 2 2.5 3 3.50
1
2
3
4
# of backers (log base 10)
diam
eter
(b) Diameter
1 1.5 2 2.5 3 3.50
0.5
1
1.5
# of backers (log base 10)
# of
con
nect
ed c
ompo
nent
s(lo
g ba
se 1
0)
(c) Number of weakly con-nected components
1 1.5 2 2.5 3 3.555
60
65
70
75
# of backers (log base 10)
exte
rnal
influ
ence
d no
des
(%)
(d) fraction of nodes fromexternal stimulations
Figure 10: Each figure shows, given the promotionalactivity in the starting bursty phase, the value of aparticular property as a function of the number ofbackers at target deadline. Top row and bottomrow display the properties from the aspect of socialpromotions and external stimulations, respectively.
Our above observations show that there are strong corre-lations between fundraising results and social features. Weclaim they are primarily due to the fact that if a projectis more persuasive to intrigue authors in social networks todiscuss it (either by social promotions or external stimula-tions), it is more attractive to investors. From the aspect ofsocial promotions, we expect such a project to have a largercampaign graph. From the aspect of external stimulations,we expect there exists more groups in the campaign, eachof which is represented by a weakly connected componentas each component can be viewed as started by an externalstimulus.
We consider the relationship between the number of back-ers and the social structure of its campaign in the startingbursty phase as an example to illustrate our claim. Fromthe aspect of social promotions, as shown in Figure 10(a)and Figure 10(b), we notice that the number of backers in-creases with the number of edges and/or the diameter. Bothfeatures indicate the expansion of crowdfunding campaignswhich is activated from the influences of social promotions.From the aspect of external stimulations, as can be seen inFigure 10(c), we observe the similar rise in the number ofweakly connected components, which indicates the increasein the number of groups.
These results from both aspects suggest that social pro-motions and external stimulations concurrently activate theprocesses of crowdfunding campaigns. One may ask whichone is more important? In Figure 10(d), we find that over60% of nodes are from external stimulations. This reflectsthe fact that most authors access a project from externalsources outside social media and motivated by personal in-terests to promote it. However, we also notice if there are
Correlation of the total number of backers at target deadline and the features extracted at specified time.
Correlation of raised funds/goal (%) at target deadline and the features extracted at specified time.
4. Experiments (Early Prediction)
100%
0%
0% 100%
$
$$$
FilmCrowdfunded
Project Duration
Fundraising Goal
Group A
Group B
Promoter: Backer:
(1) Predict # of backers (within 25% of project duration)
(2) Predict whether a project will succeed or fail (within 25% of project duration)
5 10 2570
75
80
85
Elapsed time of total project duration (%)
Accu
racy
A A,B A,B,C
Potential Target
Backer
Promoter
Patron
a specified duration [15, 16]. If they are interested in theproject, investors can contribute funds to it. If the fundrais-ing goal is reached, the funds are awarded to the proposer,and equal benefits are committed to the investors. Mostof the crowdfunding sites (e.g. Kickstarter) further applyan all-or-nothing principle: if the fundraising goal is notreached by the deadline, all the funds are returned to theinvestors, while all commitments are cancelled. Such anall-or-nothing principle helps investors to avoid the risk ofpaying for nothing.Besides fundraising activities on crowdfunding websites,
typically there are also fundraising-related social activitieson social networks like Twitter. Specifically, novel and cre-ative crowdfunding projects are notably discussed on socialnetworks where users are willing to share their interests, pro-vide or respond to comments, promote and announce news ofthe projects. These activities on social networks can be seenas a complementary data source to crowdfunding websitesand provide more insights into the crowdfunding process.We focus on a crowdfunding website called Kickstarter.comand related tweets on Twitter, since they exhibit the basic el-ements we wish to study, namely, the co-occurring fundrais-ing and social activities on Twitter. Our observations andmethodologies are applicable to other crowdfunding hosts,so long as they allow promotions on social media.
3. DATASET CHARACTERISTICSWe start our presentation by describing the data acquisi-
tion and processing for our analysis. We then provide thestatistical characteristics of our dataset.
3.1 Data acquisition and processingCrowdfunding data: From Kickstarter, we obtained all
the projects launched after Nov. 2012 and completed beforeApr. 2013. For each project we recorded its proposer, du-ration, webpage URL, shorten URL2, fundraising goal andavailable pledging options. Within each project’s durationwe recorded its total pledged amount and total number ofbackers on a daily basis. We noticed that there were sometrial projects having no backers or extremely low fundrais-ing goals (less than $100), and we removed them from ourdataset since none of them reached their fundraising goals.Twitter data: We gathered all the tweets regarding
Kickstarter from Nov. 2012 to Apr. 2013. The tweets are ex-tracted by the Twitter stream API with the keywords “kick-starter” and “kck.st/”2, ensuring they contain timestamps,authors, mentioned users, tweet texts, and URLs for ouranalysis. For each tweet’s author, we queried Twitter APIfor the metadata about the author as well as the followersand followees of the author. Since a tweet is limited to 140characters, most URLs are shortened by a URL shorteningservice. We expanded them to get the original URLs.Data preprocessing: A crowdfunding project could be
shared on different channels and referred to by various ti-tles. An effective promotion of a project should mention itsofficial URL to encourage others to visit it. For each projectwe consider only the tweets mentioning its URL within itsproject duration. We call the author of such a tweet a pro-moter. Besides, Kickstarter provides a Twitter shortcut forusers to share the news of sponsoring a project in a fixed for-
2 Kickstarter provides a shorten URL in the format of http://kck.st/* to encourage users to share the project
Potential Target
Backer
Promoter
Patron
Figure 2: Transitions of roles in crowdfunding. Solidlines show the action of broadcasting and dash linesshow the action of investment.
# of projects 1,521# of successful projects 841# of backers 145,032# of tweets 62,473# of promoters 39,051# of patrons 16,666# of mentioned users 14,415
Table 1: Statistics of thedataset
Film & VideoMusicGamesPublishingDesignOthers
Figure 3: Percentageof project categories.
0 0.5 1 1.5 2 2.5 30
20
40
60
80
100
120
140
Pledged funds/Fundraising Goal
# of
pro
ject
s
Figure 4: Distribution of the ratio of total pledgedamount to the fundraising goal. An additional 120projects (7.8% of all projects) were funded morethan 3x.
mat (e.g. “I just backed project title @kickstarter”). Hence,we further filter tweets with the keyword “backed” and theproject title, and call the author of such a tweet a patron.In this paper, we distinguish the patron in Twitter from thebacker in Kickstarter by assuming that a patron is not only abacker who puts a pledge on a Kickstarter project but alsotweets the news of sponsoring it. Figure 2 illustrates thetransition of roles in crowdfunding. In Figure 2, a potentialtarget would become a backer of a project if he/she investsin the project, or become a promoter if he/she broadcaststhe information of the project. If he/she both invests andpromote the project, he/she becomes a patron.
We further filter out projects that were seldom discussed(less than 5 tweets) in the Twitter dataset, simply becausethere are not enough data to correctly evaluate the crowd-funding campaign through Twitter. Table 1 and Figure 3show the basic statistics of our dataset after filtering.
3.2 Statistical characteristics of datasetThe distribution of the ratio of total amount of pledged
funds to the fundraising goal can be found in Figure 4. Itillustrates the fact that most Kickstarter projects either failsignificantly or meet their goals by relatively small margins.Among all the failed projects, only twenty-five percent of
(I) starting bursty phase50% of funds from the first 25% of project duration, only 15% projects achieve their goals in this phase.
(II) stationary phasefew activities happen in the middle 65% of project duration.
(III) final bursty phase60% of projects depend on the final 10% of project duration.
(I) # of backers is strongly related to promotional activity
(II) success rate is more related to the structure of promotional campaign
(I) Crowdfunding — people raise funds through collaborative contributions of general public (i.e., crowd) — has emerged as a $5.1 billion business (in 2013).
(II) Crowdfunding is strongly related to promotional campaigns in social media(III) Unique properties: 1. project duration 2. fundraising goal
Projects either fail significantly or meet their goals by relatively small marginsTransition of roles in crowdfunding
44% of projects failed to reach goals
(correlation coefficient: 0.79) (correlation coefficient: 0.9)
A: Project features; B:Social activity features; C: Social structure features
Over 75% accuracy using features within only 5% of project duration (2~3 days)
5.5% error reduction
(III) Concurrent processes of social promotions and external stimulations
1 1.5 2 2.5 3 3.50
0.5
1
1.5
2
# of backers (log base 10)#
of e
dges
(log
bas
e 10
)1 1.5 2 2.5 3 3.555
60
65
70
75
# of backers (log base 10)
exte
rnal
influ
ence
d no
des
(%)
1 1.5 2 2.5 3 3.50
0.5
1
1.5
# of backers (log base 10)
# of
con
nect
ed c
ompo
nent
s(lo
g ba
se 1
0)
1 1.5 2 2.5 3 3.50
1
2
3
4
# of backers (log base 10)
diam
eter