an analysis of open source software development …oss/papers/ucla_arrowhead2.pdf · software...

28
An Analysis of Open Source Software Development Using Social Network Theory and Agent-Based Modeling Greg Madey Vincent Freeh Renee Tynan Chris Hoffman University of Notre Dame March 2003 The Second Lake Arrowhead Conference on HUMAN COMPLEX SYSTEMS Hosted by the UCLA Center for Computational Social Sciences

Upload: duongkiet

Post on 18-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

An Analysis of Open SourceSoftware Development UsingSocial Network Theory and

Agent-Based ModelingGreg Madey

Vincent Freeh

Renee Tynan

Chris HoffmanUniversity of Notre Dame

March 2003

The Second Lake Arrowhead Conference onHUMAN COMPLEX SYSTEMS

Hosted by the UCLA Center for Computational Social Sciences

Collaborators

Greg Madey*Yongqin Gao

Nadir KiyanclarCarlos Siu

Computer ScienceUniversity ofNotre Dame

Vincent Freeh*Computer Science

North Carolina StateUniversity

Renee Tynan*Chris Hoffman

Department ofManagementUniversity ofNotre Dame

*co-PIs – Operation Research, Computer Science, Social Psychology

Goals and Objectivesn Understanding Open Source Software Development

n NSF Grant, 2002-2005n CISE/IIS/Digital Society Technology

n Conceptual Explanatory Model of OSS Phenomenonn Agent-Based Simulation of OSSn Large Scale Data Mining: SourceForge and other

Archivesn Social Network Analysis with Dynamic Attachmentn Understanding Social & Task Dynamicsn Understand Role of Linchpin Developersn Frameworks, Data, Models to Support Future Studies

Research Model

Parameter Values

Structural Features

Parameter Values

Cross Validation

Structural Features

Combined Data MiningParameter Values

Understanding the Social and Task

Dynamics that Predict Developer Behaviors

Social Network Analysis: Longitudinal

Study of Preferential Attachment and Dynamic

Attachment

Conceptual Explanatory Model of

OSS: Agent-Based Modeling and Simulation

Overview

n Characteristics of Open Source Software(OSS) Development phenomenonn Self-organizedn Decentralizedn Emergent propertiesn Complex Adaptive Process

n Researchn Data collectionn Social Network Modelsn Agent-Based Modelsn Agent-Based Simulation

Open Source Software (OSS)• Free …

– to view source– to modify– to share– of cost

• Examples– Apache– Perl– GNU– Linux– Sendmail– Python– KDE– GNOME– Mozilla– Thousands more

Linux

GNUSavannah

Leaders

Linus TolvaldsLinux

Larry WallPerl

Richard StallmanGNU Manifesto

Eric RaymondCathedral and Bazaar

Open Source Software (OSS)

n Developmentn Mostly volunteern Global teamsn Virtual teamsn Self-organized - often peer-based meritocracyn Self-managed - but often a “charasmatic” leadern Often large numbers of developers, testers, support

help, end user participationn Rapid, frequent releasesn Mostly unpaid

Open Source Software (OSS):Significance

n Contradicts traditional wisdom:n Software engineeringn Coordination, large numbersn Motivation of developersn Qualityn Securityn Business strategy

n Significant component of e-Commerce infrastructure

n Almost everything is doneelectronically and available indigital form

n Opportunity for Social ScienceResearch -- large amounts ofdata available

n Research issues:n Understanding motivesn Understanding processesn Intellectual propertyn Digital dividen Self-organizationn Government policyn Impact on innovationn Ethicsn Economic modelsn Cultural issuesn International factors

Open Source Software (OSS)

Major Component of e-Technology Infrastructure with majorpresence in

e-Commercee-Sciencee-Governmente-Learning

Apache has over 62% market share of Internet Web serversLinux on over 7 million computersMost Internet e-mail runs on SendmailTens of thousands of quality productsPart of product offerings of companies like IBM, Apple

Apache in WebSphere, Linux on mainframe, FreeBSD in OSXCorporate employees participating on OSS projects

Open Source Software (OSS)n Seems to challenge traditional economic

assumptionsn Model for software engineeringn New business strategies

n Cooperation with competitorsn Beyond trade associations, shared industry research, and

standards processes — shared product development!

n Virtual, self-organizing and self-managing teamsn Social issues, e.g., digital divide, international

participationn Government policy issues, e.g., US software industry,

impact on innovation, security, intellectual property

Related Researchn Feller and Fitzgerald (ICIS, 2000)

n Research framework and analysis of the OSS phenomenon

n Hars and Ou (HICSS 2001)n Survey of OSS developersn Reported on motivations of developers

n Scacchi (IEE Proceedings - Software, 2002)n Study of socio-technical processes associated with OSS

development practices

n Wolf, Lakhani, and Bates (BCG/MIT Sloan, 2002)n Survey of Source Forge Developers

n Hann, Roberts, Slaughter, and Fielding (ICSE, 2002)n Survey of Apache developers - economic incentives

n Madey, Freeh, and Tynan (ICSE 2002, AMCIS 2002)

Collaborative Social Networks

n Small World Phenomenonn Research papers on joint authorship

n Newman, 2001n Barabasi et al, 2001n Erdos number

n Kevin Bacon Gamen Open Source Software development

n Link detachment

Related Research: Methodology

n Watts and Strogatz, (Nature, 1998)n Collective Dynamics of Small-World Networks

n Watts (Small Worlds, 1999)n Pumain and Moriconi-Ebrard, (GeoJournal, 1997)

n Zipf distribution of city sizes

n Huberman and Adamic, (Nature, 1999)n Growth Dynamics of the World Wide Web, Power Laws

n Axtell, (Science, 2001)n Zipf Distribution of U.S. Firm Sizes,”

n Barabasi, et al, (xxx.lanl.gov, 2001)n Evolution of the Social Network of Scientific Collaborations, Power

Law distribution

OSS as a Social Network

n Social Network Theory: Wasserman & Faust (1994),Watts (1999), and Barabasi (2002)n Agents are nodes on a graph (developers)n Edges are relationships (joint project participation)n Growth of network: random or types of preferential attachment,

formation of clustersn Network attributes: diameter, average degree, power law, clusters

node or vertex

edge or link

hubcircle of friends or clique

weak tie

developers

joint projectmembership

project

linchpin

OSS Social Network

project

SourceForge

• VA Software• Part of OSDN• Started 12/1999• Collaboration tools• 58,685 Projects• 80,000 Developers• 590,005 RegisteredUsers

Savannah• Uses SourceForgeSoftware• Free SoftwareFoundation•1,508 Projects•15,265 Registered

Data Collection — Monthlyn Web crawler (scripts)

n Pythonn Perln AWKn Sed

n Monthlyn Since Jan 2001n ProjectIDn DeveloperIDn Almost 2 million recordsn Relational database

PROJ|DEVELOPER8001|dev3788001|dev89758001|dev99728002|dev276508005|dev313518006|dev125098007|dev193958007|dev46228007|dev356118008|dev7698

15850 dev[46]dev[83] 15850 dev[46]

dev[48]

15850 dev[46]dev[56]

15850 dev[46]dev[58]

6882 dev[58]dev[47]

6882 dev[47]dev[79]

6882 dev[47]dev[52]

6882 dev[47]dev[55]

7028 dev[46]dev[99]

7028 dev[46]dev[51]

7028 dev[46]dev[57]

7597 dev[46]dev[45]

7597 dev[46]dev[72]

7597 dev[46]dev[55]

7597 dev[46]dev[58]

7597 dev[46]dev[61]

7597 dev[46]dev[64]7597 dev[46]

dev[67]

7597 dev[46]dev[70]

9859 dev[46]dev[49]9859 dev[46]

dev[53]

9859 dev[46]dev[54]

9859 dev[46]dev[59]

dev[46]

dev[83] dev[56]

dev[48]

dev[52]

dev[79]

dev[72]

dev[51]

dev[57]

dev[55]

dev[99]

dev[47]

dev[58]

dev[53]

dev[58]

dev[65]

dev[45]

dev[70]

dev[67]

dev[59]

dev[54]

dev[49]

dev[64]

dev[61]

Project 6882

Project 9859

Project 7597

Project 7028

Project 15850

OSS Developer - Social NetworkDevelopers are nodes / Projects are links

24 Developers5 Projects

2 Linchpin Developers1 Cluster

Regression: Number of projectsthat developers are on

# projects # of developers on that many projects

1 214882 36883 10864 4135 1776 767 358 219 910 611 512 615 116 117 1

y =10.6905 - 3.70892 x

R2 = 0.979906

0.5 1 1.5 2 2.5

2

4

6

8

10

Log( # of Projects)

Log(

# of

Dev

elop

ers)

Scale Free – Power Law (developers)

Empirical Data

Scale Free – Power Law (projects) Growth Rates

Agent Based Model

n Grow an Artificial SourceForgen Model as a Collaborative Social Networkn Parameterize model with empirical datan Guess “hidden” processes, mechanisms and

developer behaviorsn Agent Based Simulationn Verify, validate, and iteraten Discover how OSS works => Understanding

Agent Based Simulation

n Java Swarm / JDBC / Oracle Relational DBn Same database table design, same analysis tools

n Developer classn Each simulated developer is an instance of

“Developer” with random attributes and behaviorsn Local decision logicn Simulates self-organizationn Create new projectsn Join existing projectsn Abandon a project

Results: New Understanding?

n Random Attachment does not generatePower Law

n Preferential Attachment does, but …

n Bipartite nature of the social networkn Preferential and random attachment can be

implemented independently on developersand projects

n Both display power laws

Results: New Understanding?

n “Young upstart” phenomenon not capturedwith preferential attachment

n Added fitnessn Random fitness parameter for projects and

developers

n Power law still observed in both projects anddevelopers and “young upstart” feature modeled

n Still missing a real feature observed in thereal SourceForge …

Results: New Understanding

n Problemn Static fitness insufficientn Dynamic fitness neededn Discovered a life cycle property of the

projectsn Reflects both dynamic attachment and

detachment

n Thus, dynamic fitness, with a life cyclen Power law preserved!

Results/Limitations/Future Work

n Using social network theory, and agent-based modelingand simulation to gain understanding

n Role of social network theory and agent based modeling:n A frameworkn An investigation tooln What does it mean if the simulated SourceForge fits the real

SourceForge real well?

n Survey instruments to collect additional data onindividual project and developer behavior

n More types of datan More data sources

n Savannahn Large OSS projects

Thank You

Questions?