early detection of collaboration conflicts & risks in software development

Early Detection of Collaboration

Conflicts & Risks

submitted to Professor Shervin Shirmohammadi in partial fulfillment of the requirements

for the course ELG 5100 Software Engineering Project Management (Fall-2014)

Presented by:

Roopesh Jhurani

Obaid Karim

Agenda

Background

Version Control Terminology

Tool Study – Design & Implementation of Crystal UI

Case Study

Development of Speculative Analysis Technique

Centralized Vs Distributed Version Control Systems

References

Conclusion

Background

Collaborative software development project - Shared copies

inconsistencies and code conflicts.

Loose synchronization permits rapid development of software but produce

costly conflicts.

Code conflicts are a norm rather than an exception

Conflicts Types

Textual

Higher Order

Persist average 3 days out of which 33% are of higher order.

Background - Scenario

For e.g., a case in which two developers George and Ringo are adding

features to a project by working on a local copy code. On their local

machine test gets passed for both of them but since both were working

on two different files which were actually dependent, so on merging the

code the integration build test fails.

Use of awareness tool to notify for e.g., to notify George when he is

working on library code that Ringo’s changes are dependent on that

library. But possibility of false positive as there can be a case when

George is only exploring the code and not making any changes.

Version Control Terminology

Version Control Terminology •SAME

▫The repositories have same changesets: George’s and master repository (100 and 101)

•AHEAD

▫The repository has a proper superset of the other repository’s changesets: George’s AHEAD of Paul’s.

•BEHIND

▫The inverse of AHEAD: George’s repository is BEHIND John’s.

•TEXTUAL X:

▫(“textual conflict”): The distinct changesets require human intervention as they cannot be automatically merged by the VCS:

if George’s changeset 101 and Ringo’s changeset 102 modify overlapping LOC, they are in TEXTUAL conflict.

• BUILD X

▫The repositories can be automatically merged by the VCS, but the resulting merged code fails to build.

•TEST X

▫The repositories can be automatically merged by the VCS and the resulting merged code builds but fails its test suite.

• TEST √

▫The repositories can be automatically merged by the VCS and the resulting merged code builds and passes its test suite.

Note: When build scripts and test suites are not available then

only five relationships are considered: SAME, AHEAD, BEHIND,

TEXTUAL √ (BUILD √ and BUILD X) and TEXTUAL X.

There are times when offshore, onshore teams are not in sync and their code commits are not merged frequently for integration because of the fear of build failure.

The study analyzes nine open source system and its totaling 3.4 million lines of code to understand about software development conflicts nature, frequency, and persistent and to help in designing a software that will assure developers that merge will not create any conflicts and they should sync up.

Zimmerman has previously reported by studying four open source systems that of all merges 23 to 47 percent had textual conflicts (TEXTUALX), while the remainder were possible to merge (TEXTUAL√)

The study answers three questions:

1. How often do the TEXTUALX,BUILDX, TESTX, and TEST √ relationships happen?

2. How long do developers experience the conflict relationship TEXTUALX?

3. How risky is it not to share changes with teammates, if those changes would currently merge cleanly?

Case Study – Conflicts in Practice

Case Study – Conflicts in Practice

Below are nine different subject programs that were analyzed to address research questions RQ1, RQ2, RQ3 in collaborative software development environment. KNCSL is thousands of non comment source lines.

The group also includes GitHub (http://github.com) along with other 8 systems.

The criteria for selection of these system includes at least 10 developers, at least 1000 change sets and a Git History upto 13th February 2010.

The tool created to perform the analysis is the Crystal discussed in coming slides.

http://github.com/

http://github.com/

http://github.com/

http://github.com/

http://github.com/

Case Study – Conflicts in Practice: Textual and Higher Order (RQ1: How frequently do conflicts—textual and higher order occur across developers’ copies of a project’?

• The answer to this RQ1 is that conflicts are the norm. At no times all pairs of relationship were in

sync (SAME, AHEAD, BEHIND) with each other. Below figure presents historical data for

relationships after code merge. On an average of all the merges one in six or 16 percent textual

conflict (TEXTUAL X) are presented by Git’s built in merging mechanism. The number is smaller

than Zimmerman’s data as nowadays DVCSes have better merging algorithm. The other 83 percent

of the merges had no textual conflicts, meaning the relevant developers were in the TEXTUAL√

(including BUILDX and TESTX) relationship.

Case Study – Conflicts in Practice: Textual and Higher Order

• The below figures indicate that had the developers used Crystal tool for 19 percent of the

commits, Crystal would have informed those developers about TEXTUAL relationships (both

fail and pass). Conversely, the 81 percent of clean merges indicate the likely benefit of

notifying developers when a safe textual merge can be performed.

Case Study – Conflicts in Practice: Persistence (RQ2: How long do textual conflicts persist?)

The longer Textual Conflict relationship persist, it will grow into more severe.

On average, the TEXTUAL relationship persisted for 3.2 days and involved 18.3 changesets (with median values of 0.7 days and 6 changesets) before being resolved (left side of Fig.).

The longer a TEXTUAL √ relationship persists, the more opportunities it has to change into a conflict. It persisted for 2.4 days and involved 12.7 changesets (with median values of 0.8 days and 7 changesets) before incorporation (right side of Fig.).

In the worst case, in terms of time, one TEXTUAL relationship in Voldemort persisted for 138 days; in terms of changesets, one TEXTUAL relationship in Gallery3 persisted for 232 changesets without a merge, while each of the possible merges along the way would have been textually clean and fully automated.

Case Study – Escalation of Clean Merges into Conflicts (RQ3: Do clean merges devolve into conflicts?)

Parallel work enables fast software development but can also create conflicts.

Developers should perform safe merges as much as possible - Many times developer makes a

change without having incorporated and understood a first developer’s work thus resulting

into a conflict because of this parallel work.

It was found that 93 percent of the TEXTUALX relationships developed from a TEXTUAL

√ relationship; the other 7 percent developed from a BEHIND relationship. In other words, in

almost every case, both developers had already committed (but not shared) changes before

the conflict developed. Every TEXTUALX relationship between repository commits can be

prevented by incorporating others’ changes earlier.

It was found that 20 percent of TEXTUAL relationships devolved into a conflict. The

remaining 80 percent of TEXTUAL relationships was merged successfully, preventing a

conflict from developing.

Speculative Analysis Technique

The speculative analysis provides information about the presence or absence of errors in a

continuous and accurate way during the project and make developers better informed about

how and when to share changes and so as to reduce human effort.

It is a development state that represents a snapshot of the software system’s source code at an

instant in time including config files, make files, libraries etc.

In past there have two have been two classes that leveraged the past development state along

with the present state.

The first class compares the current state to previous state by relying on the regression

testing.

The second utilizes past experience to provide recommendation for future changes that

some files must be changed together.

Speculative technique is the third class for providing quick bug fix suggestion for

compilation errors by modern IDE’s and speculatively performing each operation in the

background for developers to select the best possible fix i.e., the IDE could also present

the effects of each quick that whether the system would compile or not, if tests would pass

or not and whether its feasible to merge the code.

Speculative Analysis Technique (Cont’d.) Improves developer productivity and software quality by automated code generation or bug

detection and removal and also code completion, refactorings, and quick fixes.

Uses genetic and evolutionary technique algorithms and use objective functions to search for

future states.

Provides the most efficient operation developer should pursue with concrete and precise data

about the consequences.

Utilizes the additional CPU cycles to compute possible new variations of a computer program

and to analyze the properties of it for the better evolvement of the software.

Advantages of this technique are that a developer can get an analysis report immediately if he

develops the code in development state whereas in case if he is looking on what code to edit

then analysis results can guide the developer by informing him about the best choice decision

for e.g., if some one else is using the same code.

Speculative Analysis Technique (Cont’d.)

• The figure represents a rough idea of a technique used to improve software

development process efficiency and software quality by utilizing an

untapped space of development states useful for the developer.

Speculative Analysis Technique (Cont’d.) With the help of speculative analysis technique Eclipse quick fix menu can be enhanced by

providing indication about the consequences of the fix if encountered with a compilation error. For

each operation these below symbols indicate if the compilation, test or merging would succeed or

fail. The clock indicates that computing by IDE is still in progess and Ǿ represents not applicable

i.e., testing cannot proceed if compilation fails.

This technique can also speculatively check if the merge operation to be performed on distributed

version control systems like Git, Mercurial will proceed cleanly or not. If yes then they can proceed

with the changes else they can temporarily avoid the merging operation.

Tool study – Crystal (Proactive Conflict Detector for

Distributed Version Control)

Crystal is a publicly available tool for the proactive detection of collaboration

conflicts and helps for scalability of the system.

It only works with DVCS.

It is based on the speculative analysis.

It provides predictions about potential code conflicts, identifies them, manages and

prevents them to as much as extent as possible.

It can stay in the background at system tray and can present only critical

information.

Main Window of Crystal summarizes all the projects requiring attention in different

shapes and colors as per the severity of the situation.

If the developer takes interest in any of the projects then he can hover over any

project and he will be presented with full details through tooltip.

Crystal currently works only with Git and Mercurial DVCSes.

Crystal is an open-source, cross-platform, standalone tool and is available for

download: http://crystalvc.googlecode.com.

Crystal’s UI

Below is a Crystal’s snapshot. It presents George’s local state and his relationships with the master

repository and the other collaborators, as well as guidance based on that information.

The local state tells George (in the native language of the underlying VCS) whether he must commit changes or resolve a conflict. Then Crystal displays the relationship with the master and the collaborators’ repositories.

Crystal’s UI Explanation

Crystal monitors multiple development repositories. It informs each developer when it is safe to push their changes, when they have fallen behind and could pull changes from others or a central repository, and when changes other developers have made will cause a syntactic or behavioral conflict.

Crystal examines commits. It does not examine developers working copy — their uncommitted modifications. The reason is that commits are more likely to be coherent and desired units of work, for which notification about (non-)conflicts is of value.

If conflicts occur, Crystal informs developers early, so they may resolve these conflicts quickly. Long-established conflicts can be much harder to resolve.

If changes are made without conflicts, Crystal gives developers confidence to merge their changes without fearing unanticipated side effects.

Crystal’s Legend

Crystal associates an icon with each of the seven relationship and has a fixed color to represent the severity of the relationship. Relationships;

No merging – green, Can be merged - yellow, manual merging - red.

Crystal’s Configuration File Format (conflictClient.xml)

• name of the project.

• the DVCS; currently, must be HG.

• the path to your local repository.

• the shortName of the repository that is your repository's parent; that is, the default place you push to and

pull from.

• necessary only if the --remotecmd option is necessary to specify the path to hg on the server where your

local repository resides; the value of this element is passed directly to the hg command with the --

remotecmd option.

• a command to execute to compile the project, such as "make“

•a command to execute to run the project's tests, such as "make test"

Short name

1

Kind

2

Clone

3

Parent

4

RemoteHG

5

Compile

6

Test

7

The Crystal configuration file is an XML file that describes the locations of the scratch space,

the hg executable, and the repositories Crystal monitors. The project XML element has 7 attributes;

Crystal’s Repository Access and Log Files

The more of your co-workers' repositories you have read access to, the more useful Crystal

will be.

3 ways to access are:

File system sharing: If you and your co-worker have access to the same file system, then

you can store your repository in a place where your co-worker can read it. You can either

grant your co-worker read permission to your repository, or you can copy your repository

to a location that your co-worker can read.

Dropbox sharing: This approach has several benefits: changes are copied immediately,

and the same technique works for all co-workers.

Http sharing: The http sharing approach is often easier, but it only works if you have

access to a machine that runs a web server.

Log files: Crystal maintains two log files to help with diagnosing unexpected problems. One

is a plain text log that is easy to read (.conflictClientLog.log) and the other is an XML log that

can be more easily analyzed programmatically (.conflictClientLog.xml).

Centralized vs. Distributed Version Control Systems

Centralized Version Control Systems Distributed Version Control Systems

Require Central Server to store the code master

copy.

Central server is not required to store project

code.

Each developer keeps a local copy. Developers

use separate branches.

Each developer has a clone for the repository.

Developers use either distributed repositories

or branches.

Project history is not maintained locally, its

only maintained on the server.

Each developer has a full project history.

Commits happen direct to the server Commits are made locally.

Requires connectivity as commits happen

directly to the server.

Connectivity is required only for push/pull

operation.

Examples: SVN, CVS, Perforce Mercurial, Git

Conclusion

The pending version control conflicts—including textual, build, and test—are

guaranteed to occur as conflicts are norm rather than the exception (unless a

developer modifies or abandons a committed change).

16 percent of all merges required human effort to resolve textual conflicts. 33 percent

of merges that were reported to contain no textual conflicts by the VCS in fact

contained higher-order conflicts. Conflicts persist, on average, for 3.2 days (with a

median conflict persisting 0.7 days).

Learning about them earlier allows developers to make better informed decisions

about how to proceed, whether it is to perform a safe merge, to publish a safe change,

to quickly address a new conflict, to interact with another developer and so on.

Speculative analysis tool, Crystal, provides concrete information and advice about

pending conflicts between collaborating team members while remaining largely

unobtrusive.

References

1. Brun, Yuriy, Reid Holmes, Michael D. Ernst, and David Notkin. "Early Detection of Collaboration Conflicts and Risks." IEEE Transactions on Software Engineering 39.10 (2013): 1358-375.

2. Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Speculative Analysis: Exploring Future States of Software,” Proc. FSE/SDP Workshop Future of Software Eng. Research, pp. 59-63, Nov. 2010.

3. Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Proactive Detection of Collaboration Conflicts,” Proc. 19th ACM SIGSOFT Symp. and 13th European Conf. Foundations of Software Eng., pp. 168-178, Sept. 2011.

4. K. Mus¸lu, Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Speculative Analysis of Integrated Development Environment Recommendations,” Proc. ACM Int’l Conf. Object Oriented Programming Systems Languages and Applications, Oct. 2012.

5. Y. Brun, R. Holmes, M. D. Ernst, and D. Notkin (2011), "Crystal: Precise and Unobtrusive Conflict Warnings", In proceedings of ESEC/FSE 2011 tool demonstration track.

6. Lecture Notes: http://www.site.uottawa.ca/~shervin/courses/elg5100/lectures/

7. Crystal VC: Proactive Conflict Detector for Distributed Version Control https://code.google.com/p/crystalvc/wiki/CrystalUserManual

http://www.site.uottawa.ca/~shervin/courses/elg5100/lectures/











Thank You!

Questions or Comments?