supporting software evolution using adaptive change propagation

1

Supporting Software Evolution Using Adaptive Change Propagation Heuristics

Haroon MalikAhmed E. HassanSchool of Computing, Queen’s University, Canada

2

What is Change PropagationIt is the process of propagating code

changes to other entities in software system.

It ensures the consistency of assumptions in the system after changing an entity.

Mis-propagating likely to introduce bugs

3

The Change Propagation Process

DetermineInitial Entity To Change

ChangeEntity

DetermineOther Entities

To Change

ConsultGuru for Advice

New Req., Bug Fix

“How does a change in one source code entity propagate to other entities?”

No MoreChanges

For Each Entity

Suggested Entity

Consider change set with A, B and C changing together

4

A

B

C


5

A

B

C

B

CA

D ED

HIST Heuristic

CUD Heuristic(Static dependency)

HELPFUL Wasted Developer time


6

A

B

C

B

CA

D ED

HIST Heuristic



Which heuristics should we pick ?

We should track the performance of pool of heuristics over time for each entity


7

A

B

C

B

CA

D DD

HIST Heuristic



Best Heuristic table (BHT)

Tracks and updates


8

A

B

C

B

CA

D DD

HIST Heuristic


A

E

D

Tim

e

HIST or CUD? BHT says HIST always work

well with A [A-Freq]. We use HIST BHT might also say HIST

worked well with A, last time [A-REC]

Consider change set with A, B and D changing together

9

E

DA


10

E

DA

B


11

E

DA

B

X

Y

Precision= 1/5= 20%Recall = 1/1= 100%We want high Precision & high

Recall

12

Change Propagation Challenge

Mostly manual & time consuming processRequires dependency on others

knowledge of senior developers, who are usually too busy to guide every change

Experience of guru, who rarely exists in large projects Communication among different teams; itself is a

challenge in large projects Use of documentation & previous test suits which are

rarely up-todate

13

Shortcomings of Current Practices

Explores single dimension HIST: Given a changed entity A, a HIST heuristic would suggest

all entities that changed often with A in the past. CUD: Given a modified entity A, a CUD heuristic returns all

entities that depend on A or that A depends on. FILE: Given a modified entity A, a file heuristic would return all

entities in the same file as A

Static heuristics Do not adjust over time nor, Adapt to particular changed entity

14

Proposed Approach

Adaptive co-change meta-heuristics:Tracks best performing heuristics for each

entity in Best Heuristic table (BHT)Updates Table as project evolves

15

BHT Update

BHT has best performing heuristicsA-Recency:

For the last change of an entity

A-Frequency Over all changes of an entity

By continuously updating the BHT table, we ensure that we are always using the most optimal heuristic for an entity

16

Empirical Study

Used change sets from 5 open source projects with over 39 years of development:PostgreSQL, FreeBSD, Gcluster and GCC

Recover change sets from source control repositories (CVS)

Replayed the history to measure the performance

17

Performance Measures of Heuristics

ProjectHIST CUD FILE A-Freq A-Rec

Rec Prec Rec Prec Rec Prec Rec Prec Rec PrecPostgress 0.69 0.14 0.44 0.02 0.73 0.13 0.45 0.25 0.4 0.30FreeBSD 0.70 0.12 0.40 0.02 0.76 0.11 0.41 0.27 0.41 0.30GCluster 0.52 0.18 0.38 0.09 0.70 0.14 0.39 0.22 0.35 0.28GCC 0.78 0.10 0.43 0.02 0.80 0.12 0.51 0.21 0.47 0.25All 0.67 0.13 0.41 0.04 0.74 0.12 0.44 0.23 0.40 0.28F-measure 0.23 0.06 0.21 0.30 0.33

Recall: Adaptive heuristics are similar to traditional heuristicsPrecision: Adaptive heuristics out perform traditional heuristicsF-measure: Adaptive heuristics out perform traditional heuristics

(23% better than the best heuristic HIST)

18

Performance Characteristics of Adaptive Heuristics

To better understand our Adaptive Heuristics we examined their performance along three direction:

Performance Over TimeBHT Composition over TimeBHT suggestions vs. optimal suggestions

19

Performance Over Time

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1993 1995 1997 1999 2001 2003 2005

Years

Precesion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1993 1995 1997 1999 2001 2003 2005

Years

Recall

HIST CUD File A-Freq A-Rec

For Precision: Adaptive heuristic outperforms traditional heuristics.

For Recall: Adaptive heuristics do not perform as well as other traditional heuristics. Overall A-Rec has lower recall as compared to A-Freq for all projects

20

BHT Composition over Time

0

5

10

15

20

25

30

35

40

45

50

55

60

0 500 1000 1500 2000 2500 3000 3500 4000Day(s)

HBT

com

post

ion(

%)

HISTFILECUD

0

5

10

15

20

25

30

35

40

45

50

55

0 500 1000 1500 2000 2500 3000 3500 4000Day(s)

HBT

com

post

ion

(%)

HISTFILECUD

A-Freq A-Rec

BHT for Free BSD All projects show same trends At start History is not widely used As the projects evolves, HIST is most effective.

HIST HIST

21

BHT Suggestion Vs. Optimal Since we are replaying of historical change set we can

compare Adaptive vs. Optimal heuristic Optimal heuristic always 100% suggests the best heuristic Suggestion: # of correctly suggested heuristics

76-85% Performance:

63% of optimal F-measureHIST is 44% of optimal best performing basic heuristics

37% room for improvement

22

Improving the Performance Adaptive Heuristics

Improve HIST in hope to improve adaptive heuristics by employing advance techniques

Two improved HIST [Hassan, Holt: 2005] RECN(M): given a changed entity E, RECN(M) suggests all

entities that changed with E in the past M months. FREQ(A): given a changed entity E, FREQ(A) suggests all

entities that changed with E at least twice in the past and changed more that A% of the time with E.

23

Improved HIST heuristics

Integrated RECN(4) and FREQ(60) into the heuristic pool used by adaptive meta-heuristics

Achieved 0.73 to 0.78 for Recall and 0.64 for Precision Nearly 30% increase in performance:

A-FREQ is within 91% of the optimal heuristic A-REC is within 93% of the optimal heuristic

RECN(M) F-Measure FREQ(A) F-MeasureRECN(2) 0.39 FREQ(50) 0.39RECN(4) 0.40 FREQ(60) 0.44RECN(6) 0.34 FREQ(70) 0.42RECN(8) 0.28 FREQ(80) 0.39

24

FindingsAdaptive heuristics can achieve:

0.73 to 0.78 for Recall and 0.64% Precession

57% improvement over T. heuristicsPerformance difference are statically

significant based on a paired Wilcoxon signed rant test at 5% level of significant. (Alpha=0.05)

25

Conclusion