christian bird empirical software engineering group microsoft research, redmond empirical software...

Christian Bird Empirical Software Engineering Group Microsoft Research, Redmond Empirical Software Engineering at Microsoft: Transitioning Research into Practice 1

A little about me... Computer Science Undergrad Worked for large tech company writing software Went back to school because There has to be a more principled way Decisions & Policies based on intuition and anecdotal evidence complex code has more severe bugs! Just add more people Correlation => Causation XML will solve all problems No bugs if 100% statement coverage 2

Many projects run over budget Most ship late And yet more money is spent on maintenance after release than before (80%) I dont think its because developers are inept Developing Software is Expensive and Time Consuming "The F-35 mission systems software development and test is tending towards familiar historical patterns of extended development and deferrals to later increments." 3

Improve Quality Software Engineering Goals Increase Productivity 4

Cholera Outbreak of 1854 Outbreak began in Soho, London on August 31, 1854 Many false ideas Miasma Divine Intervention It just happens Government concluded they could do nothing 5

Dr. Snows Cholera Investigation Had hypotheses about spread of Cholera Interviewed families Collected geographic data Observed the community 6

The beginning of a scientific field Broad Street Pump Considered the beginning of epidemiology Cholera outbreak stopped 7

How do we make projects work better more often? Empirical Method Gather Data Examine Relationships Make Changes & Build Tools 8

Does anyone care? Improve processes Target resources Improve quality and productivity They do if we ask the right questions! $300 Billion Market 9

Social Dynamics in Programming Design and programming are human activities; forget that and all is lost. - Bjarne Stroustrop. The C++ Programming Language 10

Results in this talk 1. 1. How does ownership and expertise affect software quality? 2. 2.How do we determine who should be coordinating work? 11

Distributed development is difficult, but possible to do with little effect on post-release failures. 12

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, Premkumar Devanbu Dont Touch My Code! Examining the Effects of Ownership on Software Qualit y 13

Dealing with Large Systems Divide system into modules and interfaces Assign modules to teams/developers This leads to strong ownership practices in commercial contexts 14

Is this a good thing? thin spread of application domain knowledge is a big problem (Curtis et al, [CACM 1988]) Expertise is related to code contributions (Mockus [ICSE 2002], McDonald [CSCW 2000]) Authorship improves code understanding (Fritz et al [ICSE 2010]) Repeated exposure has positives outcomes in other disciplines (Darr [Management Science, 1995]) 15

Ownership & Expertise Can we quantify the effect of component ownership on defects? What is the effect of many contributions from people with low expertise? 16

Ownership Terms Major Contributor a developer that has made at least 5% of the total commits. Minor Contributor a developer that has made less than 5% of the total commits. Ownership the proportion of commits made by the highest contributing developer. On a per component basis 17

Windows VistaWindows 7 CategoryMetricPre-releasePost-releasePre-releasePost-release Ownership Metrics Total0.840.700.920.24 Minor0.860.700.930.25 Major0.260.29-0.40-0.14 Ownership-0.49 -0.29-0.02 "Classical" Metrics Size0.750.690.700.26 Churn0.720.690.710.26 Complexity0.700.530.560.37 Failure Correlation Analysis Minor Contributors has higher correlation than any other measure 18

Correlations can be deceiving Guess which has more failures Now guess which is larger, more complex, and had more changes 19

Regression Analysis Allows us to control for component characteristics such as size, complexity, and churn Skewed distribution Use variance explained (R 2 ) to evaluate model improvement 20

Regression Analysis Results Addition of all measures were statistically significant. Total had less of an affect and improved model less than Minor. Model Windows VistaWindows 7 Pre-releasePost-releasePre-releasePost-release Base (code metrics)26%29%24%18% Base + Total40% (+14%)35% (+6%)68% (+35%)21% (+3%) Base + Minor46% (+20%)41% (+12%)70% (+46%)21% (+3%) Base + Minor + Major48% (+2%)43% (+2%)71% (+1%)22% (+1%) Base + Minor + Major + Ownership50% (+2%)44% (+1%)72% (+1%)22% (+0%) 21

Relationship to Failures Metric Effects of Failures Size, Complexity, ChurnMedium Positive Total ContributorsLarge Positive Minor ContributorsLargest Positive Major ContributorsSmall Positive OwnershipSmall Negative Only statistically significant in 3 of 4 cases 22

But WHY do some components have so many minor contributors? 23

The Major-Minor-Dependency relationship Dependency Major Contributor Minor Contributor Foo.ex e I need to fix Foo I need to change Bar which is used by Foo Bar.dll 24 Monte Carlo simulation showed that MMD happens twice as often as would be expected by chance

Use Monte Carlo simulation to compare observed phenomenon with random graphs exhibiting the same major and minor degree distributions Is this anything? 25

Graph Rewiring Do this n 2 times Is this anything? Foo.ex e Bar.dll Baz.sy s kernel.dl l sol.exe ie.exe Monte Carlo showed that MMD happens twice as often as would be expected in a random graph 26

Replicating Defect Prediction Pinzger, Nagappan, Murphy [FSE 08] Whole Network Precision75% Recall 82% Without Minors Precision44% Recall 58% Without Majors Precision84% Recall 88% Foo.ex e Bar.dll Baz.sy s kernel.dl l sol.exe ie.exe Minor Contributors are vital to predictive power 27

Recommendations 1. 1. Changes made by minor contributors should be reviewed with more scrutiny. 2. 2. Potential minor contributors should communicate desired changes rather than making them. 3. 3. Components with low ownership should be given priority by QA resources. 28

Takeaways Components that have changes from low expertise developers have more failures (even when controlling for the usual suspects). Dependency relationships are the driving factor behind many minor contributors. Ownership relationships affect defect prediction techniques. But! Practices and Policies can improve ownership! 29

More people More code Coordination overhead can dominate the project and breakdowns, leading to: Decreased Productivity Lower Quality More coordination The Problem of Large Software 30

The Solution? Divide it up! Parnas introduced the notion of a module, a responsibility assignment. Business Logic Web Service User Interface Database Layer Reporting Difficult in practice Some tasks cross many components Some components are affected by many tasks Tasks and components have dependencies on each other 31

Branches to the rescue Create a separate workspace for development of a feature, fix, or maintenance task. Initial stable state Development work Completio n Deliver changes 32

The Cost of Isolation Branches allow a temporary reprieve from requirements of awareness. Conflicting changes to the system will eventually manifest. 33

SocioTechnical Congruence Foo( ) Bar( ) Coordination requirement change related change 34

Does STC apply to branching? Branch 1 Branch 2 Coordination Requirement work related work 35

Our theory of branches 1. 1. Branch is created for some goal (add a feature, fix bugs on a subsystem, etc.) 2. 2. Developers making modifications define a virtual team. 3. 3. Different teams working on branches with similar goals introduce coordination requirements may go unmet. 36

How do we identify similar branches? A Branch is characterized in two ways: The changes required to accomplish the goal of the branch The contributors making those changes 37

Operationalizing Branch Profiles 38

Branch Profile Vectors 39

Measuring Similarity On any given branch Small proportion of contributors are an order of magnitude more active than the others Subset of files account for the vast majority of the changes. 40

Branch Similarity Example Goal Similarity: 0.81 Team Similarity: 0.03 41

How often is our theory followed? Compare all pairs of branches by file similarity and developer similarity. Dark areas mean many branch pairs in that area. Some of these are branches with few files, but there are some high volume branches with disparate teams. Most pairs of branches are not similar Same devs working on different things is OK Same files should mean same people Same files, but different team means possible problems 42

Which Branches Need Coordination? Compare all pairs of branches by file similarity and developer similarity. Dark areas mean many branch pairs in that area. Same files, but different team means potential problems Different Files Same Files Different Teams Same Teams 43

Quantitative Analysis Similarity Correlation Team Similarity > Goal Similarity 44

Next Steps Identify outcomes and their relationship to adherence to or violation of our theory. Provide real-time tools to project managers to alert them to coordination needs. Branch Health 45

Code Flow for a Single File Blue nodes are edits to the file Orange nodes are move operations 46

Assessing a Branch Simulate alternate branch structure to assess cost and benefit of individual branches Cost: Average Delay Increase per Edit How much delay does a branch introduce into development? Cost: Integrations per Edit on a Branch What is the integration/edit ratio within a branch? Benefit: Provided Isolation per Edit How many conflicts does a branch prevent per edit? 47

Removing a Single Branch 48

Which Branch Should I Remove? Delay (cost) Provided Isolation (benefit) 55

Low ownership leads to poor quality. We can identify coordination needs. Empirical Software Engineering 56

christian bird empirical software engineering group microsoft research, redmond empirical software...

Documents

software quality

ownership expertise

effects of ownership

software qualit y

inept developing software

effect of component

strong ownership practices

little effect