an analysis of open source software development …oss/papers/ucla_arrowhead2.pdf · software...
TRANSCRIPT
An Analysis of Open SourceSoftware Development UsingSocial Network Theory and
Agent-Based ModelingGreg Madey
Vincent Freeh
Renee Tynan
Chris HoffmanUniversity of Notre Dame
March 2003
The Second Lake Arrowhead Conference onHUMAN COMPLEX SYSTEMS
Hosted by the UCLA Center for Computational Social Sciences
Collaborators
Greg Madey*Yongqin Gao
Nadir KiyanclarCarlos Siu
Computer ScienceUniversity ofNotre Dame
Vincent Freeh*Computer Science
North Carolina StateUniversity
Renee Tynan*Chris Hoffman
Department ofManagementUniversity ofNotre Dame
*co-PIs – Operation Research, Computer Science, Social Psychology
Goals and Objectivesn Understanding Open Source Software Development
n NSF Grant, 2002-2005n CISE/IIS/Digital Society Technology
n Conceptual Explanatory Model of OSS Phenomenonn Agent-Based Simulation of OSSn Large Scale Data Mining: SourceForge and other
Archivesn Social Network Analysis with Dynamic Attachmentn Understanding Social & Task Dynamicsn Understand Role of Linchpin Developersn Frameworks, Data, Models to Support Future Studies
Research Model
Parameter Values
Structural Features
Parameter Values
Cross Validation
Structural Features
Combined Data MiningParameter Values
Understanding the Social and Task
Dynamics that Predict Developer Behaviors
Social Network Analysis: Longitudinal
Study of Preferential Attachment and Dynamic
Attachment
Conceptual Explanatory Model of
OSS: Agent-Based Modeling and Simulation
Overview
n Characteristics of Open Source Software(OSS) Development phenomenonn Self-organizedn Decentralizedn Emergent propertiesn Complex Adaptive Process
n Researchn Data collectionn Social Network Modelsn Agent-Based Modelsn Agent-Based Simulation
Open Source Software (OSS)• Free …
– to view source– to modify– to share– of cost
• Examples– Apache– Perl– GNU– Linux– Sendmail– Python– KDE– GNOME– Mozilla– Thousands more
Linux
GNUSavannah
Leaders
Linus TolvaldsLinux
Larry WallPerl
Richard StallmanGNU Manifesto
Eric RaymondCathedral and Bazaar
Open Source Software (OSS)
n Developmentn Mostly volunteern Global teamsn Virtual teamsn Self-organized - often peer-based meritocracyn Self-managed - but often a “charasmatic” leadern Often large numbers of developers, testers, support
help, end user participationn Rapid, frequent releasesn Mostly unpaid
Open Source Software (OSS):Significance
n Contradicts traditional wisdom:n Software engineeringn Coordination, large numbersn Motivation of developersn Qualityn Securityn Business strategy
n Significant component of e-Commerce infrastructure
n Almost everything is doneelectronically and available indigital form
n Opportunity for Social ScienceResearch -- large amounts ofdata available
n Research issues:n Understanding motivesn Understanding processesn Intellectual propertyn Digital dividen Self-organizationn Government policyn Impact on innovationn Ethicsn Economic modelsn Cultural issuesn International factors
Open Source Software (OSS)
Major Component of e-Technology Infrastructure with majorpresence in
e-Commercee-Sciencee-Governmente-Learning
Apache has over 62% market share of Internet Web serversLinux on over 7 million computersMost Internet e-mail runs on SendmailTens of thousands of quality productsPart of product offerings of companies like IBM, Apple
Apache in WebSphere, Linux on mainframe, FreeBSD in OSXCorporate employees participating on OSS projects
Open Source Software (OSS)n Seems to challenge traditional economic
assumptionsn Model for software engineeringn New business strategies
n Cooperation with competitorsn Beyond trade associations, shared industry research, and
standards processes — shared product development!
n Virtual, self-organizing and self-managing teamsn Social issues, e.g., digital divide, international
participationn Government policy issues, e.g., US software industry,
impact on innovation, security, intellectual property
Related Researchn Feller and Fitzgerald (ICIS, 2000)
n Research framework and analysis of the OSS phenomenon
n Hars and Ou (HICSS 2001)n Survey of OSS developersn Reported on motivations of developers
n Scacchi (IEE Proceedings - Software, 2002)n Study of socio-technical processes associated with OSS
development practices
n Wolf, Lakhani, and Bates (BCG/MIT Sloan, 2002)n Survey of Source Forge Developers
n Hann, Roberts, Slaughter, and Fielding (ICSE, 2002)n Survey of Apache developers - economic incentives
n Madey, Freeh, and Tynan (ICSE 2002, AMCIS 2002)
Collaborative Social Networks
n Small World Phenomenonn Research papers on joint authorship
n Newman, 2001n Barabasi et al, 2001n Erdos number
n Kevin Bacon Gamen Open Source Software development
n Link detachment
Related Research: Methodology
n Watts and Strogatz, (Nature, 1998)n Collective Dynamics of Small-World Networks
n Watts (Small Worlds, 1999)n Pumain and Moriconi-Ebrard, (GeoJournal, 1997)
n Zipf distribution of city sizes
n Huberman and Adamic, (Nature, 1999)n Growth Dynamics of the World Wide Web, Power Laws
n Axtell, (Science, 2001)n Zipf Distribution of U.S. Firm Sizes,”
n Barabasi, et al, (xxx.lanl.gov, 2001)n Evolution of the Social Network of Scientific Collaborations, Power
Law distribution
OSS as a Social Network
n Social Network Theory: Wasserman & Faust (1994),Watts (1999), and Barabasi (2002)n Agents are nodes on a graph (developers)n Edges are relationships (joint project participation)n Growth of network: random or types of preferential attachment,
formation of clustersn Network attributes: diameter, average degree, power law, clusters
node or vertex
edge or link
hubcircle of friends or clique
weak tie
developers
joint projectmembership
project
linchpin
OSS Social Network
project
SourceForge
• VA Software• Part of OSDN• Started 12/1999• Collaboration tools• 58,685 Projects• 80,000 Developers• 590,005 RegisteredUsers
Data Collection — Monthlyn Web crawler (scripts)
n Pythonn Perln AWKn Sed
n Monthlyn Since Jan 2001n ProjectIDn DeveloperIDn Almost 2 million recordsn Relational database
PROJ|DEVELOPER8001|dev3788001|dev89758001|dev99728002|dev276508005|dev313518006|dev125098007|dev193958007|dev46228007|dev356118008|dev7698
15850 dev[46]dev[83] 15850 dev[46]
dev[48]
15850 dev[46]dev[56]
15850 dev[46]dev[58]
6882 dev[58]dev[47]
6882 dev[47]dev[79]
6882 dev[47]dev[52]
6882 dev[47]dev[55]
7028 dev[46]dev[99]
7028 dev[46]dev[51]
7028 dev[46]dev[57]
7597 dev[46]dev[45]
7597 dev[46]dev[72]
7597 dev[46]dev[55]
7597 dev[46]dev[58]
7597 dev[46]dev[61]
7597 dev[46]dev[64]7597 dev[46]
dev[67]
7597 dev[46]dev[70]
9859 dev[46]dev[49]9859 dev[46]
dev[53]
9859 dev[46]dev[54]
9859 dev[46]dev[59]
dev[46]
dev[83] dev[56]
dev[48]
dev[52]
dev[79]
dev[72]
dev[51]
dev[57]
dev[55]
dev[99]
dev[47]
dev[58]
dev[53]
dev[58]
dev[65]
dev[45]
dev[70]
dev[67]
dev[59]
dev[54]
dev[49]
dev[64]
dev[61]
Project 6882
Project 9859
Project 7597
Project 7028
Project 15850
OSS Developer - Social NetworkDevelopers are nodes / Projects are links
24 Developers5 Projects
2 Linchpin Developers1 Cluster
Regression: Number of projectsthat developers are on
# projects # of developers on that many projects
1 214882 36883 10864 4135 1776 767 358 219 910 611 512 615 116 117 1
y =10.6905 - 3.70892 x
R2 = 0.979906
0.5 1 1.5 2 2.5
2
4
6
8
10
Log( # of Projects)
Log(
# of
Dev
elop
ers)
Scale Free – Power Law (developers)
Agent Based Model
n Grow an Artificial SourceForgen Model as a Collaborative Social Networkn Parameterize model with empirical datan Guess “hidden” processes, mechanisms and
developer behaviorsn Agent Based Simulationn Verify, validate, and iteraten Discover how OSS works => Understanding
Agent Based Simulation
n Java Swarm / JDBC / Oracle Relational DBn Same database table design, same analysis tools
n Developer classn Each simulated developer is an instance of
“Developer” with random attributes and behaviorsn Local decision logicn Simulates self-organizationn Create new projectsn Join existing projectsn Abandon a project
Results: New Understanding?
n Random Attachment does not generatePower Law
n Preferential Attachment does, but …
n Bipartite nature of the social networkn Preferential and random attachment can be
implemented independently on developersand projects
n Both display power laws
Results: New Understanding?
n “Young upstart” phenomenon not capturedwith preferential attachment
n Added fitnessn Random fitness parameter for projects and
developers
n Power law still observed in both projects anddevelopers and “young upstart” feature modeled
n Still missing a real feature observed in thereal SourceForge …
Results: New Understanding
n Problemn Static fitness insufficientn Dynamic fitness neededn Discovered a life cycle property of the
projectsn Reflects both dynamic attachment and
detachment
n Thus, dynamic fitness, with a life cyclen Power law preserved!
Results/Limitations/Future Work
n Using social network theory, and agent-based modelingand simulation to gain understanding
n Role of social network theory and agent based modeling:n A frameworkn An investigation tooln What does it mean if the simulated SourceForge fits the real
SourceForge real well?
n Survey instruments to collect additional data onindividual project and developer behavior
n More types of datan More data sources
n Savannahn Large OSS projects