characteristics of sustainable oss projects: a theoretical and empirical study

19
Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015 Characteristics of Sustainable OSS Projects: A Theoretical and Empirical Study Hideaki Hata, Taiki Todo, Saya Onoue, Kenichi Matumoto 1

Upload: hideaki-hata

Post on 29-Jul-2015

47 views

Category:

Software


3 download

TRANSCRIPT

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Characteristics of Sustainable OSS Projects: A Theoretical and Empirical Study

!Hideaki Hata, Taiki Todo, Saya Onoue, Kenichi Matumoto

1

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015 2

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Toward Sustainable OSS

How can OSS projects attract developers?!

What can OSS projects do to incentivize developers to write code?

3

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Context

GHTorrent datasets

• Top-10 starred software projects for the top programming languages on Github: 90 projects

Filtering

• More than 3 year histories on Dec. 2012: 22 projects

4

Gousios, MSR 2014 Mining Challenge Dataset in GHTorrent

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

SW Population Pyramids

• Right: coding contributors

• Left: non-coding (comments, issues) contributors

5

Onoue et al., Software population pyramids:the current and the future of OSS development, ESEM 2014.

Expe

rien

ce

Developers

3 months

}

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Varieties of SPP

GHTorrent datasets

• Top-10 starred software projects for the top programming languages on Github: 90 projects

Filtering

• More than 3 year histories on Dec. 2012: 22 projects

6

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

akka

beanstalkd

blueprint−css

compass

devise

django−cms

django−debug−toolbar

homebrew

http−parser

jekyll

jquerykestrel

MaNGOS

mongo

node

openFrameworks

paperclip

rails

redis

scalatra

ThinkUP

tornado

5

10

15

20

5 10 15 20# of coding bars

# of

dis

cuss

ion

bars

Distribution of OSS Projects

7

# of coding bars

# o

f non

-cod

ing

bars

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

akka

beanstalkd

blueprint−css

compass

devise

django−cms

django−debug−toolbar

homebrew

http−parser

jekyll

jquerykestrel

MaNGOS

mongo

node

openFrameworks

paperclip

rails

redis

scalatra

ThinkUP

tornado

5

10

15

20

5 10 15 20# of coding bars

# of

dis

cuss

ion

bars

Distribution of OSS Projects

8

# of coding bars

# o

f non

-cod

ing

bars

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

akka

beanstalkd

blueprint−css

compass

devise

django−cms

django−debug−toolbar

homebrew

http−parser

jekyll

jquerykestrel

MaNGOS

mongo

node

openFrameworks

paperclip

rails

redis

scalatra

ThinkUP

tornado

5

10

15

20

5 10 15 20# of coding bars

# of

dis

cuss

ion

bars

Distribution of OSS Projects

9

# of coding bars

# o

f non

-cod

ing

bars

(a)

(c)

(b)

(d)

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Introducing game theory

10

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Game Theoretical Model

A leader-follower game

• Project: keep (K) or setup (S)

• Developer: write code (C) or non-coding contribution (discussion, D)

11

developerproject

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Results of Equilibrium Analysis:!Incentivize Developers to Write Code

• Setup: To increase the utility of writing code compared to the utility of just non-coding contributions, projects need to setup the development environment, which can decrease the cost of writing code.

• Mandatory: Employment is a big incentive to write code. The project itself or other third-parties can select this option.

• Innovation: Innovations can decrease the cost and may increase the reward.

12

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Empirical Analysis

13

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Coverage of Setup

14

TABLE ICOVERAGE OF SETUP

Area Project Wiki Website How to Contribute Coding Guideline Multi-Language Document # of yes

(b)

rails no yes yes no no 2jekyll yes yes yes no no 3django-cms no yes no no no 1jquery yes yes yes yes no 4paperclip yes yes yes no yes 4homebrew yes yes yes yes yes 5node yes yes yes no no 3tornade yes yes no no no 2devise yes yes yes no no 3redis yes yes no no no 2openFrameworks yes yes no no no 2compass yes yes yes yes no 4

(d)mongo no yes yes yes no 3akka no yes no no no 1

(a)

ThinkUP yes yes no no no 2django-debug-toolbar yes yes no no no 2http-parser no no no no no 0beanstalkd yes yes no no no 2MaNGOS no yes no no no 1kestrel yes yes no no no 2

(c)scalatra no yes no no no 1blueprint-css yes yes no yes no 3

hosting service and social networking system for developers,can be regarded as such innovations.

IV. EMPIRICAL ANALYSIS

In this section we survey the studied projects to findempirical evidence that support the results of game-theoreticalanalysis. We conduct empirical analysis to see the impact ofsetup, mandatory, and innovation.

A. Setup CoverageIn software development, preparing document is helpful

setup. We list up the following five setup candidates, andsurvey the coverage:

• Wiki: The project have prepared wiki in the GitHub page.• Website: The project have its own webpage outside the

GitHub service.• How to contribute: There is a document for contributing

code, how to contribute, in the wiki or other officialwebsite.

• Coding guideline: There is a document for writing codeincluding style guides and patch guides.

• Multi-language document: There are other language doc-uments in official, which should be beneficial.

Table I summarizes the result of the setup coverage. Notethat the software population pyramids are created with data in2012, but this survey is conducted in 2014. So the coveragesmay be different from 2012.

The projects are classified into 4 as discussed in SectionII-C. The result is relatively clear. The projects in (b), which

have balanced software population pyramids with many codingand discussion contributors, have many yes’s, that is, thereare various helpful documents in such popular projects. Theprojects in (a) and (c), which do not have many codingcontributors, tend to provide a few documents. Especially,there is no “How to Contribute” documents.

This result indicate that setting up useful documents canbe the key of attracting coding contributors. And this may bethe reason of what makes the differences between the softwarepopulation pyramids of (a) and (b), only (b) have many codingcontributors although both attract developers.

B. Employment

The projects in (d) seems strange because there are not somany discussion contributors but many coding contributors. Inaddition, the project akka do not prepare enough documentsto reduce the writing cost as shown in Table I. However, thereason of this is clear: both projects are owned by companies.When we see developers as paid if their GitHub accountbelong to any organizations, there are 12 paid developers in 34on 2012/12 for akka, and 25 paid developers in 116 on thesame period for mongo. Since we counted paid developersroughly, the correct number may be different. For example,some developers belonging to organizations may contributeto the projects voluntarily. However, the existing of paiddevelopers in this two projects is clear, and this may bethe reason not to attract many developers for discussion andcoding.

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Employment

15

akka

beanstalkd

blueprint−css

compass

devise

django−cms

django−debug−toolbar

homebrew

http−parser

jekyll

jquerykestrel

MaNGOS

mongo

node

openFrameworks

paperclip

rails

redis

scalatra

ThinkUP

tornado

5

10

15

20

5 10 15 20# of coding bars

# of

dis

cuss

ion

bars

# of coding bars

# o

f non

-cod

ing

bars

25 paid in 11612 paid in 34

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Innovation: Impact of GitHub

16

1 year2 years3 years4 years5 years6 years7 years8 years

5 0 5

2007/12

1 year2 years3 years4 years5 years6 years7 years8 years

200 100 0 100 200

2008/12

1 year2 years3 years4 years5 years6 years7 years8 years

200 100 0 100 200

2009/12

1 year2 years3 years4 years5 years6 years7 years8 years

200 100 0 100 200

2010/12

1 year2 years3 years4 years5 years6 years7 years8 years

700 350 0 350 700

2011/12

1 year2 years3 years4 years5 years6 years7 years8 years

700 350 0 350 700

2012/12

ategory coding discusscussion movedsion moveddiscussion ategory codingmovediscussioncoding discussionscussion moved

Fig. 7. The transition of software population pyramids of the rails project. This project moved to GitHub on 2008/4

C. The Impact of Innovation

The impact of Social Coding introduced by GitHub hasattracted researchers [7], [8]. Starting the service of GitHubcan be regarded as innovation in the meaning of our models.

Figure 7 presents the transition of software populationpyramids of the rails project from December 2007 toDecember 2012. Note that the scales of the x-axis are notthe same. At the initial snapshot (2007/12), there are onlyeight coding contributors although the project had more thanthree year histories. Since the project had not data in GitHubat that time, we cannot show the discussion contributors.The project moved to GitHub on August 20084. After thatfirst the project attract discussion contributors, and lately itseems successfully attract coding contributors. Although it isdifficult to distinguish other factors, the project attract codingcontributors incomparably after moved to GitHub.

V. DISCUSSION

A. Limitations of Theoretical Models

Bounded rationality. Some readers may claim about theapplicability of game theory into the real human environments,because the real humans are known not to be rational ingeneral. Nevertheless, we would like to emphasize that ourapproach based on game theory is quite important for ana-lyzing human behaviors in real life. One of main reasons isthat we need such a mathematical model for several scientificsituations, such as estimating market dynamics or simulatingcrowd behavior in emergency situations. Furthermore, having atheoretical foundation enables us to easily extend the result of

4Rails Moving to Git, https://github.com/blog/32-rails-moving-to-git.

the analysis to a bit different situations, such as a new marketwith a slight modified pricing rules and/or a social laws.

Too much simplification. Another possible criticism for ourmodel is its immoderate simplification. We absolutely agreewith that point, but at the same time we also believe thatsimpleness is pros, rather than cons. Actually, even in oursimple model some new theoretical findings are revealed, aswe already discussed in the previous section. Introducing anelaborate model is an obviously meaningful future direction.

Future direction. Although we obtained a bunch of newfindings by focusing on SPNE, it must be very interesting totake into account different equilibrium concepts, such as Nashequilibrium and/or sequential equilibrium. We should carefullychoose one of them depending on the level of players rational-ity. Another possible direction is presuming that the third partyconsidered in the paper is a mechanism designer, which coulddetermine the market rule. The discussion with mandatorycoding is one of such example with a bad mechanism. Bydoing so, we can naturally handle with the participation ofmore than one player, which enables us to consider severalfeatures in real market, such as competition between players.

Beside these two directions, there are several possibilitieson modeling software development situations via game theoryand microeconomics theory. For instance, we could investigatethe way to achieve sustainable cooperation between developersbased on repeated games, incentivize developers to participatebased on online/dynamic mechanism design, analyze the pop-ulation dynamics based on segregation model, and make adecision for a new release of software based on voting theory.

Rails moved to GitHub on August 2008

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Summary

To attract and retain coding contributors

• Prepare documents (setup)

• Have paid developers (employment)

• Adopt new technologies/environment (innovation)

17

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Discussions

• Limitations in theoretical analysis

• Bounded rationality. Humans are not rational in general

• Too much simplification

• Threats to validity in empirical analysis

• Limited datasets

• Analysis result may have error

18

Hideaki Hata, Characteristics of Sustainable OSS Projects @ CHASE 2015

Future Directions

• Integration of theory and empirical analysis is strong approach for

• understanding human behaviors

• designing desirable environments

19