supporting software evolution using adaptive change propagation
TRANSCRIPT
1
Supporting Software Evolution Using Adaptive Change Propagation Heuristics
Haroon MalikAhmed E. HassanSchool of Computing, Queen’s University, Canada
2
What is Change PropagationIt is the process of propagating code
changes to other entities in software system.
It ensures the consistency of assumptions in the system after changing an entity.
Mis-propagating likely to introduce bugs
3
The Change Propagation Process
DetermineInitial Entity To Change
ChangeEntity
DetermineOther Entities
To Change
ConsultGuru for Advice
New Req., Bug Fix
“How does a change in one source code entity propagate to other entities?”
No MoreChanges
For Each Entity
Suggested Entity
Consider change set with A, B and C changing together
4
A
B
C
Consider change set with A, B and C changing together
5
A
B
C
B
CA
D ED
HIST Heuristic
CUD Heuristic(Static dependency)
HELPFUL Wasted Developer time
Consider change set with A, B and C changing together
6
A
B
C
B
CA
D ED
HIST Heuristic
CUD Heuristic(Static dependency)
HELPFUL Wasted Developer time
Which heuristics should we pick ?
We should track the performance of pool of heuristics over time for each entity
Consider change set with A, B and C changing together
7
A
B
C
B
CA
D DD
HIST Heuristic
CUD Heuristic(Static dependency)
HELPFUL Wasted Developer time
Best Heuristic table (BHT)
Tracks and updates
Consider change set with A, B and C changing together
8
A
B
C
B
CA
D DD
HIST Heuristic
CUD Heuristic(Static dependency)
A
E
D
Tim
e
HIST or CUD? BHT says HIST always work
well with A [A-Freq]. We use HIST BHT might also say HIST
worked well with A, last time [A-REC]
Consider change set with A, B and D changing together
9
E
DA
Consider change set with A, B and D changing together
10
E
DA
B
Consider change set with A, B and D changing together
11
E
DA
B
X
Y
Precision= 1/5= 20%Recall = 1/1= 100%We want high Precision & high
Recall
12
Change Propagation Challenge
Mostly manual & time consuming processRequires dependency on others
knowledge of senior developers, who are usually too busy to guide every change
Experience of guru, who rarely exists in large projects Communication among different teams; itself is a
challenge in large projects Use of documentation & previous test suits which are
rarely up-todate
13
Shortcomings of Current Practices
Explores single dimension HIST: Given a changed entity A, a HIST heuristic would suggest
all entities that changed often with A in the past. CUD: Given a modified entity A, a CUD heuristic returns all
entities that depend on A or that A depends on. FILE: Given a modified entity A, a file heuristic would return all
entities in the same file as A
Static heuristics Do not adjust over time nor, Adapt to particular changed entity
14
Proposed Approach
Adaptive co-change meta-heuristics:Tracks best performing heuristics for each
entity in Best Heuristic table (BHT)Updates Table as project evolves
15
BHT Update
BHT has best performing heuristicsA-Recency:
For the last change of an entity
A-Frequency Over all changes of an entity
By continuously updating the BHT table, we ensure that we are always using the most optimal heuristic for an entity
16
Empirical Study
Used change sets from 5 open source projects with over 39 years of development:PostgreSQL, FreeBSD, Gcluster and GCC
Recover change sets from source control repositories (CVS)
Replayed the history to measure the performance
17
Performance Measures of Heuristics
ProjectHIST CUD FILE A-Freq A-Rec
Rec Prec Rec Prec Rec Prec Rec Prec Rec PrecPostgress 0.69 0.14 0.44 0.02 0.73 0.13 0.45 0.25 0.4 0.30FreeBSD 0.70 0.12 0.40 0.02 0.76 0.11 0.41 0.27 0.41 0.30GCluster 0.52 0.18 0.38 0.09 0.70 0.14 0.39 0.22 0.35 0.28GCC 0.78 0.10 0.43 0.02 0.80 0.12 0.51 0.21 0.47 0.25All 0.67 0.13 0.41 0.04 0.74 0.12 0.44 0.23 0.40 0.28F-measure 0.23 0.06 0.21 0.30 0.33
Recall: Adaptive heuristics are similar to traditional heuristicsPrecision: Adaptive heuristics out perform traditional heuristicsF-measure: Adaptive heuristics out perform traditional heuristics
(23% better than the best heuristic HIST)
18
Performance Characteristics of Adaptive Heuristics
To better understand our Adaptive Heuristics we examined their performance along three direction:
Performance Over TimeBHT Composition over TimeBHT suggestions vs. optimal suggestions
19
Performance Over Time
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1993 1995 1997 1999 2001 2003 2005
Years
Precesion
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1993 1995 1997 1999 2001 2003 2005
Years
Recall
HIST CUD File A-Freq A-Rec
For Precision: Adaptive heuristic outperforms traditional heuristics.
For Recall: Adaptive heuristics do not perform as well as other traditional heuristics. Overall A-Rec has lower recall as compared to A-Freq for all projects
20
BHT Composition over Time
0
5
10
15
20
25
30
35
40
45
50
55
60
0 500 1000 1500 2000 2500 3000 3500 4000Day(s)
HBT
com
post
ion(
%)
HISTFILECUD
0
5
10
15
20
25
30
35
40
45
50
55
0 500 1000 1500 2000 2500 3000 3500 4000Day(s)
HBT
com
post
ion
(%)
HISTFILECUD
A-Freq A-Rec
BHT for Free BSD All projects show same trends At start History is not widely used As the projects evolves, HIST is most effective.
HIST HIST
21
BHT Suggestion Vs. Optimal Since we are replaying of historical change set we can
compare Adaptive vs. Optimal heuristic Optimal heuristic always 100% suggests the best heuristic Suggestion: # of correctly suggested heuristics
76-85% Performance:
63% of optimal F-measureHIST is 44% of optimal best performing basic heuristics
37% room for improvement
22
Improving the Performance Adaptive Heuristics
Improve HIST in hope to improve adaptive heuristics by employing advance techniques
Two improved HIST [Hassan, Holt: 2005] RECN(M): given a changed entity E, RECN(M) suggests all
entities that changed with E in the past M months. FREQ(A): given a changed entity E, FREQ(A) suggests all
entities that changed with E at least twice in the past and changed more that A% of the time with E.
23
Improved HIST heuristics
Integrated RECN(4) and FREQ(60) into the heuristic pool used by adaptive meta-heuristics
Achieved 0.73 to 0.78 for Recall and 0.64 for Precision Nearly 30% increase in performance:
A-FREQ is within 91% of the optimal heuristic A-REC is within 93% of the optimal heuristic
RECN(M) F-Measure FREQ(A) F-MeasureRECN(2) 0.39 FREQ(50) 0.39RECN(4) 0.40 FREQ(60) 0.44RECN(6) 0.34 FREQ(70) 0.42RECN(8) 0.28 FREQ(80) 0.39
24
FindingsAdaptive heuristics can achieve:
0.73 to 0.78 for Recall and 0.64% Precession
57% improvement over T. heuristicsPerformance difference are statically
significant based on a paired Wilcoxon signed rant test at 5% level of significant. (Alpha=0.05)
25
Conclusion