characteristics of sustainable oss projects: a theoretical and empirical study
TRANSCRIPT
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Characteristics of Sustainable OSS Projects: A Theoretical and Empirical Study
!Hideaki Hata, Taiki Todo, Saya Onoue, Kenichi Matumoto
1
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Toward Sustainable OSS
How can OSS projects attract developers?!
What can OSS projects do to incentivize developers to write code?
3
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Context
GHTorrent datasets
• Top-10 starred software projects for the top programming languages on Github: 90 projects
Filtering
• More than 3 year histories on Dec. 2012: 22 projects
4
Gousios, MSR 2014 Mining Challenge Dataset in GHTorrent
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
SW Population Pyramids
• Right: coding contributors
• Left: non-coding (comments, issues) contributors
5
Onoue et al., Software population pyramids:the current and the future of OSS development, ESEM 2014.
Expe
rien
ce
Developers
3 months
}
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Varieties of SPP
GHTorrent datasets
• Top-10 starred software projects for the top programming languages on Github: 90 projects
Filtering
• More than 3 year histories on Dec. 2012: 22 projects
6
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
akka
beanstalkd
blueprint−css
compass
devise
django−cms
django−debug−toolbar
homebrew
http−parser
jekyll
jquerykestrel
MaNGOS
mongo
node
openFrameworks
paperclip
rails
redis
scalatra
ThinkUP
tornado
5
10
15
20
5 10 15 20# of coding bars
# of
dis
cuss
ion
bars
Distribution of OSS Projects
7
# of coding bars
# o
f non
-cod
ing
bars
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
akka
beanstalkd
blueprint−css
compass
devise
django−cms
django−debug−toolbar
homebrew
http−parser
jekyll
jquerykestrel
MaNGOS
mongo
node
openFrameworks
paperclip
rails
redis
scalatra
ThinkUP
tornado
5
10
15
20
5 10 15 20# of coding bars
# of
dis
cuss
ion
bars
Distribution of OSS Projects
8
# of coding bars
# o
f non
-cod
ing
bars
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
akka
beanstalkd
blueprint−css
compass
devise
django−cms
django−debug−toolbar
homebrew
http−parser
jekyll
jquerykestrel
MaNGOS
mongo
node
openFrameworks
paperclip
rails
redis
scalatra
ThinkUP
tornado
5
10
15
20
5 10 15 20# of coding bars
# of
dis
cuss
ion
bars
Distribution of OSS Projects
9
# of coding bars
# o
f non
-cod
ing
bars
(a)
(c)
(b)
(d)
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Game Theoretical Model
A leader-follower game
• Project: keep (K) or setup (S)
• Developer: write code (C) or non-coding contribution (discussion, D)
11
developerproject
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Results of Equilibrium Analysis:!Incentivize Developers to Write Code
• Setup: To increase the utility of writing code compared to the utility of just non-coding contributions, projects need to setup the development environment, which can decrease the cost of writing code.
• Mandatory: Employment is a big incentive to write code. The project itself or other third-parties can select this option.
• Innovation: Innovations can decrease the cost and may increase the reward.
12
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Coverage of Setup
14
TABLE ICOVERAGE OF SETUP
Area Project Wiki Website How to Contribute Coding Guideline Multi-Language Document # of yes
(b)
rails no yes yes no no 2jekyll yes yes yes no no 3django-cms no yes no no no 1jquery yes yes yes yes no 4paperclip yes yes yes no yes 4homebrew yes yes yes yes yes 5node yes yes yes no no 3tornade yes yes no no no 2devise yes yes yes no no 3redis yes yes no no no 2openFrameworks yes yes no no no 2compass yes yes yes yes no 4
(d)mongo no yes yes yes no 3akka no yes no no no 1
(a)
ThinkUP yes yes no no no 2django-debug-toolbar yes yes no no no 2http-parser no no no no no 0beanstalkd yes yes no no no 2MaNGOS no yes no no no 1kestrel yes yes no no no 2
(c)scalatra no yes no no no 1blueprint-css yes yes no yes no 3
hosting service and social networking system for developers,can be regarded as such innovations.
IV. EMPIRICAL ANALYSIS
In this section we survey the studied projects to findempirical evidence that support the results of game-theoreticalanalysis. We conduct empirical analysis to see the impact ofsetup, mandatory, and innovation.
A. Setup CoverageIn software development, preparing document is helpful
setup. We list up the following five setup candidates, andsurvey the coverage:
• Wiki: The project have prepared wiki in the GitHub page.• Website: The project have its own webpage outside the
GitHub service.• How to contribute: There is a document for contributing
code, how to contribute, in the wiki or other officialwebsite.
• Coding guideline: There is a document for writing codeincluding style guides and patch guides.
• Multi-language document: There are other language doc-uments in official, which should be beneficial.
Table I summarizes the result of the setup coverage. Notethat the software population pyramids are created with data in2012, but this survey is conducted in 2014. So the coveragesmay be different from 2012.
The projects are classified into 4 as discussed in SectionII-C. The result is relatively clear. The projects in (b), which
have balanced software population pyramids with many codingand discussion contributors, have many yes’s, that is, thereare various helpful documents in such popular projects. Theprojects in (a) and (c), which do not have many codingcontributors, tend to provide a few documents. Especially,there is no “How to Contribute” documents.
This result indicate that setting up useful documents canbe the key of attracting coding contributors. And this may bethe reason of what makes the differences between the softwarepopulation pyramids of (a) and (b), only (b) have many codingcontributors although both attract developers.
B. Employment
The projects in (d) seems strange because there are not somany discussion contributors but many coding contributors. Inaddition, the project akka do not prepare enough documentsto reduce the writing cost as shown in Table I. However, thereason of this is clear: both projects are owned by companies.When we see developers as paid if their GitHub accountbelong to any organizations, there are 12 paid developers in 34on 2012/12 for akka, and 25 paid developers in 116 on thesame period for mongo. Since we counted paid developersroughly, the correct number may be different. For example,some developers belonging to organizations may contributeto the projects voluntarily. However, the existing of paiddevelopers in this two projects is clear, and this may bethe reason not to attract many developers for discussion andcoding.
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Employment
15
akka
beanstalkd
blueprint−css
compass
devise
django−cms
django−debug−toolbar
homebrew
http−parser
jekyll
jquerykestrel
MaNGOS
mongo
node
openFrameworks
paperclip
rails
redis
scalatra
ThinkUP
tornado
5
10
15
20
5 10 15 20# of coding bars
# of
dis
cuss
ion
bars
# of coding bars
# o
f non
-cod
ing
bars
25 paid in 11612 paid in 34
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Innovation: Impact of GitHub
16
1 year2 years3 years4 years5 years6 years7 years8 years
5 0 5
2007/12
1 year2 years3 years4 years5 years6 years7 years8 years
200 100 0 100 200
2008/12
1 year2 years3 years4 years5 years6 years7 years8 years
200 100 0 100 200
2009/12
1 year2 years3 years4 years5 years6 years7 years8 years
200 100 0 100 200
2010/12
1 year2 years3 years4 years5 years6 years7 years8 years
700 350 0 350 700
2011/12
1 year2 years3 years4 years5 years6 years7 years8 years
700 350 0 350 700
2012/12
ategory coding discusscussion movedsion moveddiscussion ategory codingmovediscussioncoding discussionscussion moved
Fig. 7. The transition of software population pyramids of the rails project. This project moved to GitHub on 2008/4
C. The Impact of Innovation
The impact of Social Coding introduced by GitHub hasattracted researchers [7], [8]. Starting the service of GitHubcan be regarded as innovation in the meaning of our models.
Figure 7 presents the transition of software populationpyramids of the rails project from December 2007 toDecember 2012. Note that the scales of the x-axis are notthe same. At the initial snapshot (2007/12), there are onlyeight coding contributors although the project had more thanthree year histories. Since the project had not data in GitHubat that time, we cannot show the discussion contributors.The project moved to GitHub on August 20084. After thatfirst the project attract discussion contributors, and lately itseems successfully attract coding contributors. Although it isdifficult to distinguish other factors, the project attract codingcontributors incomparably after moved to GitHub.
V. DISCUSSION
A. Limitations of Theoretical Models
Bounded rationality. Some readers may claim about theapplicability of game theory into the real human environments,because the real humans are known not to be rational ingeneral. Nevertheless, we would like to emphasize that ourapproach based on game theory is quite important for ana-lyzing human behaviors in real life. One of main reasons isthat we need such a mathematical model for several scientificsituations, such as estimating market dynamics or simulatingcrowd behavior in emergency situations. Furthermore, having atheoretical foundation enables us to easily extend the result of
4Rails Moving to Git, https://github.com/blog/32-rails-moving-to-git.
the analysis to a bit different situations, such as a new marketwith a slight modified pricing rules and/or a social laws.
Too much simplification. Another possible criticism for ourmodel is its immoderate simplification. We absolutely agreewith that point, but at the same time we also believe thatsimpleness is pros, rather than cons. Actually, even in oursimple model some new theoretical findings are revealed, aswe already discussed in the previous section. Introducing anelaborate model is an obviously meaningful future direction.
Future direction. Although we obtained a bunch of newfindings by focusing on SPNE, it must be very interesting totake into account different equilibrium concepts, such as Nashequilibrium and/or sequential equilibrium. We should carefullychoose one of them depending on the level of players rational-ity. Another possible direction is presuming that the third partyconsidered in the paper is a mechanism designer, which coulddetermine the market rule. The discussion with mandatorycoding is one of such example with a bad mechanism. Bydoing so, we can naturally handle with the participation ofmore than one player, which enables us to consider severalfeatures in real market, such as competition between players.
Beside these two directions, there are several possibilitieson modeling software development situations via game theoryand microeconomics theory. For instance, we could investigatethe way to achieve sustainable cooperation between developersbased on repeated games, incentivize developers to participatebased on online/dynamic mechanism design, analyze the pop-ulation dynamics based on segregation model, and make adecision for a new release of software based on voting theory.
Rails moved to GitHub on August 2008
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Summary
To attract and retain coding contributors
• Prepare documents (setup)
• Have paid developers (employment)
• Adopt new technologies/environment (innovation)
17
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015
Discussions
• Limitations in theoretical analysis
• Bounded rationality. Humans are not rational in general
• Too much simplification
• Threats to validity in empirical analysis
• Limited datasets
• Analysis result may have error
18