ph.d. dissertation - studying the impact of developer communication on the quality and evolution of...

Download Ph.D. Dissertation - Studying the Impact of Developer Communication on the Quality and Evolution of a Software

If you can't read please download the document

Upload: nicolas-bettenburg

Post on 10-May-2015

253 views

Category:

Software


1 download

DESCRIPTION

Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software. In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions. Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.

TRANSCRIPT

  • 1.T H E I M PA C T O F D E V E L O P E R C O M M U N I C AT I O N O N T H E Q U A L I T Y A N D E V O L U T I O N O F A S O F T WA R E S Y S T E M Nicolas Bettenburg May 20th, 2014 Queens University School of Computing A T H E S I S P R E S E N T E D T O T H E S C H O O L O F C O M P U T I N G I N C O N F O R M I T Y W I T H T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F P H I L O S O P H I C A L D O C T O R

2. !2 OUTLINEPRESENTATION B E G I N I Studying the Impact of Developer Communication on the Quality and Evolution of a Software System I I I V E N D MOTIVATION LITERATURE REVIEW ANALYSIS & RESULTS I I I TOOLS V CONCLUSIONS i 3. I INTRODUCTION AND MOTIVATION i 4. !4 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 5. !4 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Software Development is a highly social, collaborative process. [Cataldo et al., CSCW 2006] 6. !4 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Developers spend up to 50% of their time communicating. [Perry et al., IEEE Software 1994] Software Development is a highly social, collaborative process. [Cataldo et al., CSCW 2006] 7. !4 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Developers spend up to 50% of their time communicating. [Perry et al., IEEE Software 1994] If communication plays such an integral part in developers work lives, does it affect software development? How? Software Development is a highly social, collaborative process. [Cataldo et al., CSCW 2006] 8. !5 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 9. !5 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System How can we put communication data in a relationship to software quality? 10. !5 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System How can we put communication data in a relationship to software quality? How can we put communication data in a relationship to software evolution? 11. !5 MOTIVATIONRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Where do we find communication data? How can we mine communication data ? How can we put communication data in a relationship to software quality? How can we put communication data in a relationship to software evolution? 12. !6 HYPOTHESISRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System The communication between software developers plays a key role in the quality and evolution of the software. 13. !6 HYPOTHESISRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System The communication between software developers plays a key role in the quality and evolution of the software. Literature Review What is already known How that knowledge was gained 14. !6 HYPOTHESISRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System The communication between software developers plays a key role in the quality and evolution of the software. Literature Review What is already known How that knowledge was gained Tools & Techniques Mining Communication Data Preparation for use in Experiments 15. !6 HYPOTHESISRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System The communication between software developers plays a key role in the quality and evolution of the software. Literature Review What is already known How that knowledge was gained Tools & Techniques Mining Communication Data Preparation for use in Experiments Software Quality Internal Developers Communication Discussing Bugs in the Software 16. !6 HYPOTHESISRESEARCH Studying the Impact of Developer Communication on the Quality and Evolution of a Software System The communication between software developers plays a key role in the quality and evolution of the software. Literature Review What is already known How that knowledge was gained Tools & Techniques Mining Communication Data Preparation for use in Experiments Software Quality Internal Developers Communication Discussing Bugs in the Software Software Evolution Communication between Developers and External Contributors Discussing Source Code Contributions 17. I I LITERATURE REVIEW 18. !8 REVIEWLITERATURE Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Establish what is already known within the field of study about the relationships between socio-technical information about the software, the software development process, and the stakeholders, and software quality. GOAL To determine what has already been investigated To gain a better understanding of the extraction of socio-technical information To identify challenges and open problems 19. !9 REVIEWLITERATURE DIMENSIONS Socio-Technical Metrics Network Metrics Technical Metrics Social Metrics Data Sources and Extraction Methodology Quality Metrics Previous Results 20. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System REVIEWLITERATURE MAJOR FINDINGS 21. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System REVIEWLITERATURE MAJOR FINDINGS Social and Socio-Technical information can explain quality aspects (software vulnerabilities) that technical information on its own cannot. 22. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System REVIEWLITERATURE MAJOR FINDINGS Social and Socio-Technical information can explain quality aspects (software vulnerabilities) that technical information on its own cannot. Combinations of Social and Technical Information yield better Models than using each source of information on its own. 23. !1 0Studying the Impact of Developer Communication on the Quality and Evolution of a Software System REVIEWLITERATURE MAJOR FINDINGS Social and Socio-Technical information can explain quality aspects (software vulnerabilities) that technical information on its own cannot. Socio-Technical Networks Together with Congruence Measures provide empirical evidence that Conways Law does exist. Combinations of Social and Technical Information yield better Models than using each source of information on its own. 24. I I I TOOLS AND TECHNIQUES 25. !1 2 TOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Off-The-Shelf Tools from Information Retrieval and Natural Language Processing are not readily usable for mining communication data found in software repositories. REALIZATION 26. !1 2 TOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Off-The-Shelf Tools from Information Retrieval and Natural Language Processing are not readily usable for mining communication data found in software repositories. REALIZATION Emails and Chat messages are not newspaper articles! 27. !1 2 TOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Off-The-Shelf Tools from Information Retrieval and Natural Language Processing are not readily usable for mining communication data found in software repositories. REALIZATION Emails and Chat messages are not newspaper articles! Unstructured data = mixtures of Natural Language text and Technical Information 28. !1 2 TOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Off-The-Shelf Tools from Information Retrieval and Natural Language Processing are not readily usable for mining communication data found in software repositories. REALIZATION Emails and Chat messages are not newspaper articles! Unstructured data = mixtures of Natural Language text and Technical Information Automated processing of extracted communication data 29. !1 3Studying the Impact of Developer Communication on the Quality and Evolution of a Software System UNSTRUCTURED DATAMINING From [email protected] Wed Jan 21 08:11:26 1998 Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST) From: "Brian E. Gallew" Subject: Re: [HACKERS] configure !- ---559023410-851401618-854387445=:824 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII !> If you can grab a copy and run it on your machine, and send me > the output, that would help alot. !Here is a gzip'ed tar of the results. !!===================================================================== | Please do not shoot at the thermonuclear weapons! -- Deacon | ===================================================================== | Finger [email protected] for my public key. | ===================================================================== !- ---559023410-851401618-854387445=:824 Content-Type: APPLICATION/x-gzip Content-Transfer-Encoding: BASE64 Content-Description: m88k-dg-dgux5.4R3.10.tar.gz !H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/ gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn /ljzCdcND75TAvIJTN8j9pDoXHECl2ZeE5960OgGFrEt6Ac43szz6YR4NpLN ! 30. !1 3Studying the Impact of Developer Communication on the Quality and Evolution of a Software System UNSTRUCTURED DATAMINING From [email protected] Wed Jan 21 08:11:26 1998 Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST) From: "Brian E. Gallew" Subject: Re: [HACKERS] configure !- ---559023410-851401618-854387445=:824 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII !> If you can grab a copy and run it on your machine, and send me > the output, that would help alot. !Here is a gzip'ed tar of the results. !!===================================================================== | Please do not shoot at the thermonuclear weapons! -- Deacon | ===================================================================== | Finger [email protected] for my public key. | ===================================================================== !- ---559023410-851401618-854387445=:824 Content-Type: APPLICATION/x-gzip Content-Transfer-Encoding: BASE64 Content-Description: m88k-dg-dgux5.4R3.10.tar.gz !H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/ gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn /ljzCdcND75TAvIJTN8j9pDoXHECl2ZeE5960OgGFrEt6Ac43szz6YR4NpLN ! 31. !1 4Studying the Impact of Developer Communication on the Quality and Evolution of a Software System TECHNICAL INFORMATIONEXTRACTING 1 2 3 32. !1 5Studying the Impact of Developer Communication on the Quality and Evolution of a Software System COMMUNICATION DATA TO CODELINKING Description Comment 1 Depends on: Blocks: Show dependency tree Attachments Add an attachment (proposed patch, testcase, etc.) Note You need to log in before you can comment on or make changes to this bug. Markus Keller 2004-11-30 13:27:58 EST I20041130-0800 Wrong compiler error when interface overrides two methods with same signature but different thrown exceptions: The call to ij.m() is OK, but eclipse flags it with "Unhandled exception type IOException". public class Over { void x() throws ZipException { IandJ ij= new K(); ij.m(); //wrong compile error } void y() throws ZipException { K k= new K(); k.m(); } } interface I { void m() throws IOException; } interface J { void m() throws ZipException; } interface IandJ extends I, J {} // swap I and J to make compile error disappear class K implements IandJ { public void m() throws ZipException { } } Kent Johnson 2004-12-01 14:14:02 EST This is not a MethodVerifier problem. This error is thrown from FlowContext.checkExceptionHandlers() X.java Y.java Over.java ZipExcepti on.java IOExceptio n.java I.java J.javaK.java 33. !1 6 SUMMARYTOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 34. !1 6 SUMMARYTOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Data Mining Tools and Techniques for mining communication data from software repositories 35. !1 6 SUMMARYTOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Data Mining Tools and Techniques for mining communication data from software repositories Data Extraction Tools and Techniques for extracting technical information data from unstructured data 36. !1 6 SUMMARYTOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Data Mining Tools and Techniques for mining communication data from software repositories Data Extraction Tools and Techniques for extracting technical information data from unstructured data Linking Different conceptual approaches for linking communication data to source code 37. !1 6 SUMMARYTOOLS AND TECHNIQUES Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Data Mining Tools and Techniques for mining communication data from software repositories Data Extraction Tools and Techniques for extracting technical information data from unstructured data Linking Different conceptual approaches for linking communication data to source code MUD Workshop Advancing the state of the art of mining unstructured data for software engineering research 38. I V ANALYSIS AND RESULTS 39. !1 8 SOFTWARE QUALITY The Relationships Between Communication Studying the Impact of Developer Communication on the Quality and Evolution of a Software System And PART 1 40. !1 9 OVERVIEWEXPERIMENTAL SETUP Studying the Impact of Developer Communication on the Quality and Evolution of a Software System DISCUSSION TOOLS METRICS MODELPREDICTIONS 41. !1 9 OVERVIEWEXPERIMENTAL SETUP Studying the Impact of Developer Communication on the Quality and Evolution of a Software System DISCUSSION TOOLS METRICS MODELPREDICTIONS VARIABLES 42. !1 9 OVERVIEWEXPERIMENTAL SETUP Studying the Impact of Developer Communication on the Quality and Evolution of a Software System DISCUSSION TOOLS METRICS MODELPREDICTIONS VARIABLES UNDERSTANDING 43. !2 0 OVERVIEWSOCIO-TECHNICAL METRICS Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Socio-Technical DIMENSIONS 44. !2 0 OVERVIEWSOCIO-TECHNICAL METRICS Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Socio-Technical DIMENSIONS Discussion Contents Source Code Patches Stack Traces 45. !2 0 OVERVIEWSOCIO-TECHNICAL METRICS Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Socio-Technical DIMENSIONS Discussion Contents Source Code Patches Stack Traces Communication Dynamics # Message Reply Time Length Subscribers 46. !2 0 OVERVIEWSOCIO-TECHNICAL METRICS Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Socio-Technical DIMENSIONS Discussion Contents Source Code Patches Stack Traces Social Structures # Discussion Participants Role in Project Reputation SNA Centrality Communication Dynamics # Message Reply Time Length Subscribers 47. !2 0 OVERVIEWSOCIO-TECHNICAL METRICS Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Socio-Technical DIMENSIONS Discussion Contents Source Code Patches Stack Traces Social Structures # Discussion Participants Role in Project Reputation SNA Centrality Communication Dynamics # Message Reply Time Length Subscribers Workow & Coordination (Re-)Assignments in Lifecycle Model 48. !2 1 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 49. !2 1 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System We can use Statistical Models for Prediction AND Understanding 50. !2 1 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System We can use Statistical Models for Prediction AND Understanding Communication Metrics can explain Software Defects as well as traditional product and process metrics. 51. !2 1 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System We can use Statistical Models for Prediction AND Understanding Communication Metrics can explain Software Defects as well as traditional product and process metrics. Communication Metrics do NOT describe the same information as traditional product and process metrics. 52. !2 1 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System We can use Statistical Models for Prediction AND Understanding Communication Metrics can explain Software Defects as well as traditional product and process metrics. Communication Metrics do NOT describe the same information as traditional product and process metrics. Combinations of communication metrics with traditional metrics results in more powerful models. 53. !2 2 SOFTWARE EVOLUTION The Relationships Between Communication Studying the Impact of Developer Communication on the Quality and Evolution of a Software System And PART 2 54. !2 3 SETUPEXPERIMENTAL Studying the Impact of Developer Communication on the Quality and Evolution of a Software System MAILING LIST (LINUX) TOOLS METRICS MODELS VARIABLES GERRIT (ANDROID) MANUAL ANALYSIS LITERATURE & DOCUMENTATION CONTRIBUTION MANAGEMENT MODEL 55. !2 4 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 56. !2 4 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System External Contributors are available for communication only for a limited time. 57. !2 4 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System External Contributors are available for communication only for a limited time. Communication problems can lead to wasting efforts in implementing undesired or already existing functionality. 58. !2 4 FINDINGSMAIN Studying the Impact of Developer Communication on the Quality and Evolution of a Software System External Contributors are available for communication only for a limited time. Communication problems can lead to wasting efforts in implementing undesired or already existing functionality. Delays in communicating review outcome are a key factor for abandoning desirable code contributions. 59. V CONCLUSIONS AND FUTURE WORK 60. !2 6 CONCLUSIONSSUMMARY Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 61. !2 6 CONCLUSIONSSUMMARY Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Software Development has a SOCIAL SIDE, too! 62. !2 6 CONCLUSIONSSUMMARY Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Software Development has a SOCIAL SIDE, too! Socio-Technical Information is valuable and can help us gain a more holistic understanding of software quality. 63. !2 6 CONCLUSIONSSUMMARY Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Software Development has a SOCIAL SIDE, too! Socio-Technical Information is valuable and can help us gain a more holistic understanding of software quality. Communication delays found to be a strong explainer for software defects. 64. !2 6 CONCLUSIONSSUMMARY Studying the Impact of Developer Communication on the Quality and Evolution of a Software System Software Development has a SOCIAL SIDE, too! Socio-Technical Information is valuable and can help us gain a more holistic understanding of software quality. Communication delays found to be a strong explainer for software defects. Communication delays found to be a key factor for abandoning code contributions. 65. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System 66. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? 67. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? How can we get them? 68. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? How can we get them? Measuring, Predicting, and Understanding not Enough! 69. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? How can we get them? Measuring, Predicting, and Understanding not Enough! How can we make social metrics actionable? 70. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? How can we get them? Measuring, Predicting, and Understanding not Enough! How can we make social metrics actionable? Looking at Repositories is not enough! 71. !2 7 AHEADTHE ROAD Studying the Impact of Developer Communication on the Quality and Evolution of a Software System What other Socio-Technical Metrics exist? How can we get them? Measuring, Predicting, and Understanding not Enough! How can we make social metrics actionable? Looking at Repositories is not enough! Go to Google, use Google Glass to record and document water cooler chat. Go to Mozilla and IBM, corroborate findings with the actual developers. Use voice recognition software to mine face-to-face communication data. PROJECTSFUN STUDENT 72. ?B E G I N I I I I V E N D MOTIVATION LITERATURE REVIEW ANALYSIS & RESULTS I I I TOOLS V CONCLUSIONS i !