a replication case study with clio
TRANSCRIPT
A Replication Case Study to Measure the Architectural Quality of a Commercial
System
Presented by:
Derek Reimanis
Modularity Violations
• Two components which change together, yet are not expected to change together
Releases
Component A
Component B
Component A
Component B
…
The CLIO Process
Revision History
Software Project
Identify Historical Pairwise
Dependencies
Find Pairwise Dependencies
Ticket History
Gather Metrics Associated with
Quality
Correlate Quality Metrics and Modularity Violations
Measure and Predict System
Quality
Isolate Groups of Affected Files
Locate Unexpected
Dependencies
Visualize Groups to Understand Scope
of Problems
SVS7 Demographics
Factor SVS7 Baseline
Programming Language C++ Java
Number of Modules 18 173
Number of Developers Up to 11 Up to 20
Project Lifetime 4 years 2 years
Number of Source Files 3903 (1569 cpp, 267 c, 2067 h) 900
Source Lines of Code (in thousands)
1300 300
Golden Helix’s SNP & Variation Suite (SVS7)
Metrics Associated with Quality
Metric Description
File size File size on disk of u
Fan-in Sum of references pointing from a file pair v to u
Fan-out Sum of references pointing from u to a file pair v
Change Frequency The number of times u is modified in the commit log
Ticket Frequency The number of times u is modified because of a ticket reference
Bug Change Frequency
The number of times u is modified because of a bug ticket reference
Pair Change Frequency
The number of times u and file pair v are modified in the same commit
Define a file pair as a pair of C/C++ source file and corresponding header file. Then, for each file pair u,
Scatter Plot Analysis
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100 120 140 160
R7
.5 C
han
ge F
req
ue
ncy
R7 Fan-out
R7.5 Change Frequency vs. R7 Fan-out
Correlate Quality Metrics
• Non-parametric statistic test
• Many values fall at zero
– Ordinary Least Squares performs poorly
• Kendall’s tau-b
𝜏𝐵 𝐹, 𝐺 =𝑐𝑜𝑛𝑐𝑜𝑟𝑑 𝐹, 𝐺 − 𝑑𝑖𝑠𝑐𝑜𝑟𝑑(𝐹, 𝐺)
𝑐𝑜𝑛𝑐𝑜𝑟𝑑 𝐹, 𝐺 + 𝑑𝑖𝑠𝑐𝑜𝑟𝑑(𝐹, 𝐺)
Correlate Quality Metrics
Tau-b table of metrics for svs7 + svs7.5
r7+r7.5 fan-in fan-out file size changes tickets bugs
Fan-in 1 0.257 0.301 0.331 0.328 0.464
Fan-out 0.257 1 0.441 0.417 0.416 0.637
size 0.301 0.441 1 0.293 0.273 0.510
changes 0.331 0.417 0.293 1 0.972 0.858
tickets 0.328 0.416 0.273 0.972 1 0.857
bugs 0.463 0.637 0.510 0.858 0.857 1
Presenting to Developers
• Findings are not surprising
– Most violations are connection points between modules
• Correlation between fan-out and bug change frequency
Study Comparisons
Similarities Differences
A select few files contributed to the majority of modularity
violations
Correlation between fan-out and bug change frequency
Usefulness of identified modularity violations
Conclusions
• CLIO needs further refinement
– More repeated case studies
• Importance of domain knowledge
Questions