early detection of collaboration conflicts & risks in software development
TRANSCRIPT
Early Detection of Collaboration
Conflicts & Risks
submitted to Professor Shervin Shirmohammadi in partial fulfillment of the requirements
for the course ELG 5100 Software Engineering Project Management (Fall-2014)
Presented by:
Roopesh Jhurani
Obaid Karim
Agenda
Background
Version Control Terminology
Tool Study – Design & Implementation of Crystal UI
Case Study
Development of Speculative Analysis Technique
Centralized Vs Distributed Version Control Systems
References
Conclusion
Background
Collaborative software development project - Shared copies
inconsistencies and code conflicts.
Loose synchronization permits rapid development of software but produce
costly conflicts.
Code conflicts are a norm rather than an exception
Conflicts Types
Textual
Higher Order
Persist average 3 days out of which 33% are of higher order.
Background - Scenario
For e.g., a case in which two developers George and Ringo are adding
features to a project by working on a local copy code. On their local
machine test gets passed for both of them but since both were working
on two different files which were actually dependent, so on merging the
code the integration build test fails.
Use of awareness tool to notify for e.g., to notify George when he is
working on library code that Ringo’s changes are dependent on that
library. But possibility of false positive as there can be a case when
George is only exploring the code and not making any changes.
Version Control Terminology
Version Control Terminology •SAME
▫The repositories have same changesets: George’s and master repository (100 and 101)
•AHEAD
▫The repository has a proper superset of the other repository’s changesets: George’s AHEAD of Paul’s.
•BEHIND
▫The inverse of AHEAD: George’s repository is BEHIND John’s.
•TEXTUAL X:
▫(“textual conflict”): The distinct changesets require human intervention as they cannot be automatically merged by the VCS:
if George’s changeset 101 and Ringo’s changeset 102 modify overlapping LOC, they are in TEXTUAL conflict.
• BUILD X
▫The repositories can be automatically merged by the VCS, but the resulting merged code fails to build.
•TEST X
▫The repositories can be automatically merged by the VCS and the resulting merged code builds but fails its test suite.
• TEST √
▫The repositories can be automatically merged by the VCS and the resulting merged code builds and passes its test suite.
Note: When build scripts and test suites are not available then
only five relationships are considered: SAME, AHEAD, BEHIND,
TEXTUAL √ (BUILD √ and BUILD X) and TEXTUAL X.
There are times when offshore, onshore teams are not in sync and their code commits are not merged frequently for integration because of the fear of build failure.
The study analyzes nine open source system and its totaling 3.4 million lines of code to understand about software development conflicts nature, frequency, and persistent and to help in designing a software that will assure developers that merge will not create any conflicts and they should sync up.
Zimmerman has previously reported by studying four open source systems that of all merges 23 to 47 percent had textual conflicts (TEXTUALX), while the remainder were possible to merge (TEXTUAL√)
The study answers three questions:
1. How often do the TEXTUALX,BUILDX, TESTX, and TEST √ relationships happen?
2. How long do developers experience the conflict relationship TEXTUALX?
3. How risky is it not to share changes with teammates, if those changes would currently merge cleanly?
Case Study – Conflicts in Practice
Case Study – Conflicts in Practice
Below are nine different subject programs that were analyzed to address research questions RQ1, RQ2, RQ3 in collaborative software development environment. KNCSL is thousands of non comment source lines.
The group also includes GitHub (http://github.com) along with other 8 systems.
The criteria for selection of these system includes at least 10 developers, at least 1000 change sets and a Git History upto 13th February 2010.
The tool created to perform the analysis is the Crystal discussed in coming slides.
Case Study – Conflicts in Practice: Textual and Higher Order (RQ1: How frequently do conflicts—textual and higher order occur across developers’ copies of a project’?
• The answer to this RQ1 is that conflicts are the norm. At no times all pairs of relationship were in
sync (SAME, AHEAD, BEHIND) with each other. Below figure presents historical data for
relationships after code merge. On an average of all the merges one in six or 16 percent textual
conflict (TEXTUAL X) are presented by Git’s built in merging mechanism. The number is smaller
than Zimmerman’s data as nowadays DVCSes have better merging algorithm. The other 83 percent
of the merges had no textual conflicts, meaning the relevant developers were in the TEXTUAL√
(including BUILDX and TESTX) relationship.
Case Study – Conflicts in Practice: Textual and Higher Order
• The below figures indicate that had the developers used Crystal tool for 19 percent of the
commits, Crystal would have informed those developers about TEXTUAL relationships (both
fail and pass). Conversely, the 81 percent of clean merges indicate the likely benefit of
notifying developers when a safe textual merge can be performed.
Case Study – Conflicts in Practice: Persistence (RQ2: How long do textual conflicts persist?)
The longer Textual Conflict relationship persist, it will grow into more severe.
On average, the TEXTUAL relationship persisted for 3.2 days and involved 18.3 changesets (with median values of 0.7 days and 6 changesets) before being resolved (left side of Fig.).
The longer a TEXTUAL √ relationship persists, the more opportunities it has to change into a conflict. It persisted for 2.4 days and involved 12.7 changesets (with median values of 0.8 days and 7 changesets) before incorporation (right side of Fig.).
In the worst case, in terms of time, one TEXTUAL relationship in Voldemort persisted for 138 days; in terms of changesets, one TEXTUAL relationship in Gallery3 persisted for 232 changesets without a merge, while each of the possible merges along the way would have been textually clean and fully automated.
Case Study – Escalation of Clean Merges into Conflicts (RQ3: Do clean merges devolve into conflicts?)
Parallel work enables fast software development but can also create conflicts.
Developers should perform safe merges as much as possible - Many times developer makes a
change without having incorporated and understood a first developer’s work thus resulting
into a conflict because of this parallel work.
It was found that 93 percent of the TEXTUALX relationships developed from a TEXTUAL
√ relationship; the other 7 percent developed from a BEHIND relationship. In other words, in
almost every case, both developers had already committed (but not shared) changes before
the conflict developed. Every TEXTUALX relationship between repository commits can be
prevented by incorporating others’ changes earlier.
It was found that 20 percent of TEXTUAL relationships devolved into a conflict. The
remaining 80 percent of TEXTUAL relationships was merged successfully, preventing a
conflict from developing.
Speculative Analysis Technique
The speculative analysis provides information about the presence or absence of errors in a
continuous and accurate way during the project and make developers better informed about
how and when to share changes and so as to reduce human effort.
It is a development state that represents a snapshot of the software system’s source code at an
instant in time including config files, make files, libraries etc.
In past there have two have been two classes that leveraged the past development state along
with the present state.
The first class compares the current state to previous state by relying on the regression
testing.
The second utilizes past experience to provide recommendation for future changes that
some files must be changed together.
Speculative technique is the third class for providing quick bug fix suggestion for
compilation errors by modern IDE’s and speculatively performing each operation in the
background for developers to select the best possible fix i.e., the IDE could also present
the effects of each quick that whether the system would compile or not, if tests would pass
or not and whether its feasible to merge the code.
Speculative Analysis Technique (Cont’d.) Improves developer productivity and software quality by automated code generation or bug
detection and removal and also code completion, refactorings, and quick fixes.
Uses genetic and evolutionary technique algorithms and use objective functions to search for
future states.
Provides the most efficient operation developer should pursue with concrete and precise data
about the consequences.
Utilizes the additional CPU cycles to compute possible new variations of a computer program
and to analyze the properties of it for the better evolvement of the software.
Advantages of this technique are that a developer can get an analysis report immediately if he
develops the code in development state whereas in case if he is looking on what code to edit
then analysis results can guide the developer by informing him about the best choice decision
for e.g., if some one else is using the same code.
Speculative Analysis Technique (Cont’d.)
• The figure represents a rough idea of a technique used to improve software
development process efficiency and software quality by utilizing an
untapped space of development states useful for the developer.
Speculative Analysis Technique (Cont’d.) With the help of speculative analysis technique Eclipse quick fix menu can be enhanced by
providing indication about the consequences of the fix if encountered with a compilation error. For
each operation these below symbols indicate if the compilation, test or merging would succeed or
fail. The clock indicates that computing by IDE is still in progess and Ǿ represents not applicable
i.e., testing cannot proceed if compilation fails.
This technique can also speculatively check if the merge operation to be performed on distributed
version control systems like Git, Mercurial will proceed cleanly or not. If yes then they can proceed
with the changes else they can temporarily avoid the merging operation.
Tool study – Crystal (Proactive Conflict Detector for
Distributed Version Control)
Crystal is a publicly available tool for the proactive detection of collaboration
conflicts and helps for scalability of the system.
It only works with DVCS.
It is based on the speculative analysis.
It provides predictions about potential code conflicts, identifies them, manages and
prevents them to as much as extent as possible.
It can stay in the background at system tray and can present only critical
information.
Main Window of Crystal summarizes all the projects requiring attention in different
shapes and colors as per the severity of the situation.
If the developer takes interest in any of the projects then he can hover over any
project and he will be presented with full details through tooltip.
Crystal currently works only with Git and Mercurial DVCSes.
Crystal is an open-source, cross-platform, standalone tool and is available for
download: http://crystalvc.googlecode.com.
Crystal’s UI
Below is a Crystal’s snapshot. It presents George’s local state and his relationships with the master
repository and the other collaborators, as well as guidance based on that information.
The local state tells George (in the native language of the underlying VCS) whether he must commit changes or resolve a conflict. Then Crystal displays the relationship with the master and the collaborators’ repositories.
Crystal’s UI Explanation
Crystal monitors multiple development repositories. It informs each developer when it is safe to push their changes, when they have fallen behind and could pull changes from others or a central repository, and when changes other developers have made will cause a syntactic or behavioral conflict.
Crystal examines commits. It does not examine developers working copy — their uncommitted modifications. The reason is that commits are more likely to be coherent and desired units of work, for which notification about (non-)conflicts is of value.
If conflicts occur, Crystal informs developers early, so they may resolve these conflicts quickly. Long-established conflicts can be much harder to resolve.
If changes are made without conflicts, Crystal gives developers confidence to merge their changes without fearing unanticipated side effects.
Crystal’s Legend
Crystal associates an icon with each of the seven relationship and has a fixed color to represent the severity of the relationship. Relationships;
No merging – green, Can be merged - yellow, manual merging - red.
Crystal’s Configuration File Format (conflictClient.xml)
• name of the project.
• the DVCS; currently, must be HG.
• the path to your local repository.
• the shortName of the repository that is your repository's parent; that is, the default place you push to and
pull from.
• necessary only if the --remotecmd option is necessary to specify the path to hg on the server where your
local repository resides; the value of this element is passed directly to the hg command with the --
remotecmd option.
• a command to execute to compile the project, such as "make“
•a command to execute to run the project's tests, such as "make test"
Short name
1
Kind
2
Clone
3
Parent
4
RemoteHG
5
Compile
6
Test
7
The Crystal configuration file is an XML file that describes the locations of the scratch space,
the hg executable, and the repositories Crystal monitors. The project XML element has 7 attributes;
Crystal’s Repository Access and Log Files
The more of your co-workers' repositories you have read access to, the more useful Crystal
will be.
3 ways to access are:
File system sharing: If you and your co-worker have access to the same file system, then
you can store your repository in a place where your co-worker can read it. You can either
grant your co-worker read permission to your repository, or you can copy your repository
to a location that your co-worker can read.
Dropbox sharing: This approach has several benefits: changes are copied immediately,
and the same technique works for all co-workers.
Http sharing: The http sharing approach is often easier, but it only works if you have
access to a machine that runs a web server.
Log files: Crystal maintains two log files to help with diagnosing unexpected problems. One
is a plain text log that is easy to read (.conflictClientLog.log) and the other is an XML log that
can be more easily analyzed programmatically (.conflictClientLog.xml).
Centralized vs. Distributed Version Control Systems
Centralized Version Control Systems Distributed Version Control Systems
Require Central Server to store the code master
copy.
Central server is not required to store project
code.
Each developer keeps a local copy. Developers
use separate branches.
Each developer has a clone for the repository.
Developers use either distributed repositories
or branches.
Project history is not maintained locally, its
only maintained on the server.
Each developer has a full project history.
Commits happen direct to the server Commits are made locally.
Requires connectivity as commits happen
directly to the server.
Connectivity is required only for push/pull
operation.
Examples: SVN, CVS, Perforce Mercurial, Git
Conclusion
The pending version control conflicts—including textual, build, and test—are
guaranteed to occur as conflicts are norm rather than the exception (unless a
developer modifies or abandons a committed change).
16 percent of all merges required human effort to resolve textual conflicts. 33 percent
of merges that were reported to contain no textual conflicts by the VCS in fact
contained higher-order conflicts. Conflicts persist, on average, for 3.2 days (with a
median conflict persisting 0.7 days).
Learning about them earlier allows developers to make better informed decisions
about how to proceed, whether it is to perform a safe merge, to publish a safe change,
to quickly address a new conflict, to interact with another developer and so on.
Speculative analysis tool, Crystal, provides concrete information and advice about
pending conflicts between collaborating team members while remaining largely
unobtrusive.
References
1. Brun, Yuriy, Reid Holmes, Michael D. Ernst, and David Notkin. "Early Detection of Collaboration Conflicts and Risks." IEEE Transactions on Software Engineering 39.10 (2013): 1358-375.
2. Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Speculative Analysis: Exploring Future States of Software,” Proc. FSE/SDP Workshop Future of Software Eng. Research, pp. 59-63, Nov. 2010.
3. Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Proactive Detection of Collaboration Conflicts,” Proc. 19th ACM SIGSOFT Symp. and 13th European Conf. Foundations of Software Eng., pp. 168-178, Sept. 2011.
4. K. Mus¸lu, Y. Brun, R. Holmes, M.D. Ernst, and D. Notkin, “Speculative Analysis of Integrated Development Environment Recommendations,” Proc. ACM Int’l Conf. Object Oriented Programming Systems Languages and Applications, Oct. 2012.
5. Y. Brun, R. Holmes, M. D. Ernst, and D. Notkin (2011), "Crystal: Precise and Unobtrusive Conflict Warnings", In proceedings of ESEC/FSE 2011 tool demonstration track.
6. Lecture Notes: http://www.site.uottawa.ca/~shervin/courses/elg5100/lectures/
7. Crystal VC: Proactive Conflict Detector for Distributed Version Control https://code.google.com/p/crystalvc/wiki/CrystalUserManual
Thank You!
Questions or Comments?