promise 2011: "selecting discriminating terms for bug assignment: a formal analysis"

Selecting Discriminating Terms for Bug Assignment: A

Formal Analysis.

Ibrahim Aljarah, Shadi Banitaan, Sameer Abufardeh, Wei Jin and Saeed Salem

North Dakota State University, Fargo, ND, USA

This research is supported by

Presentation Outlines

Bug Assignment Problem Overview

Bug Assignment Steps.

Term Selections

Log Odds Ratio based Term Selection Techniques.

Experimental Results

Conclusion

Future Directions

2

Suggest whom to assign this bug to.

Assign the bug to an appropriate developer.

Bug Assignment Problem

B 5

B 4

B3

B 7

B 2

B 1 B 6

New Bugs D1

D4 D3

D2

Bug Triager

B 2 B 1

B 3

B 4

B 7

B 6

B 5

3

9/21/2011

4

Bug Assignment Steps

6

Bug Reports Preprocessing

Bug-term matrix (M) and

Bug-developer vector (Y) construction 7

0 0 1 ….… 1

1 1 1 ….… 0

0 0 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

1 0 0 ….… 0

0 0 1 ….… 1

1 1 1 ….… 0

. . . ……. .

. . . ……. .

t1 t2 t3 ……. tR

b1

b2

b3

b4

b5

.

.

.

.

bN

d1

d1

d3

d1

d5

.

.

.

.

d9

Need to assign a value {0,1} to each entry of the bug-term matrix. T = {t1, t2, · · · , tR} is a set of R terms. D = {d1,....., dL} is a set of L pre-defined developers. B = {b1,..... bN} is a set of N bug reports to be assigned.

M Y

Term Selection

Term Selection:

It selects a subset of terms to describe the bug report.

It has been noted that the terms selection can be a good idea to

reduce the calculations time.

Thus, it Leads to significant improvement in classification

performance.

Common Techniques: Information Gain, Latent Semantic

analysis.

8

Discriminating Terms

A term that it is commonly found in the bug reports that have been fixed by a specific developer, but rarely found in other bug reports.

Log Odds Ratio Score used to decide which terms are discriminated.

Research goal: improving the classification quality by discarding non-discriminating terms before doing the classification task(bug assignment).

9

Log Odds Ratio (LOR)

The LOR score is calculated with respect to the individual

developer (class) which discriminates the terms in that class.

High score means that it is more discriminated.

The LOR score is calculated as follows:

10

Log Odds Ratio Calculation Example 11

2/3*log((2/3)/(1/4)= 1.78 ( Term1 has a highest Log Odds

Ratio)

1/3*log((1/3)/(1/4)=0.44

2/3*log((2/3)/(2/4)=0.88

)|( 11 DTermLogOdds

)|( 12 DTermLogOdds

)|( 13 DTermLogOdds

Term1 Term2 Term3 Class

Bug Report1 1 1 1 D1







Proposed Term Selection Techniques

Log-Odds-Ratio-based techniques

Terms From All selection (TFA)

In this method, the R' terms that have the highest LOR scores

will be chosen without considering the terms distribution

over all developers.

All the LOR scores for the terms that are related to each

class terms are combined in one common list

Then scores are sorted

And finally the R′ terms with the highest scoring are

extracted from the list.

12

Terms From All selection (TFA)

13

We have 12 bug reports, 3 developers and 10 different terms.

If we want to select 6 terms to generate the reduced Bug-term matrix M‘

Select 6 terms from highest scores regardless the distribution between developer

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 d1

d3

d2

d2

d1

d3

d1

d1

d3

d1

d1

d2

d3

0 0 1 1 1 1 0 1 1 1

1 1 1 0 1 0 1 1 1 1

0 0 1 1 0 1 1 0 0 1

1 1 0 0 1 1 1 1 0 0

0 0 1 1 0 0 1 1 1 1

0 0 1 0 0 1 1 0 0 0

0 0 1 1 1 1 0 1 1 1

0 0 1 1 1 0 0 1 1 0

0 0 0 0 1 1 1 0 0 1

0 1 1 1 0 1 1 1 1 0

0 1 0 1 1 0 1 1 1 1

0 1 0 0 0 1 1 1 0 0

M Y LOR Values (virtual)

d1 d2 d3

t1 1.04 1.95 1.33

t2 1.75 1.64 1.02

t3 1.07 1.43 1.35

t4 1.54 1.88 1.62

t5 1.85 1.16 1.53

t6 1.19 1.23 1.23

t7 1.63 1.43 1.67

t8 1.12 1.92 1.43

t9 1.12 1.12 1.39

t10 1.13 1.98 1.11

Term-Class Related selection (TCR):


Log-Odds-Ratio-based techniques

Idea : Select k terms from each class (developer).

It enhances the selection criteria by targeting terms that

have the highest LOR scores in each class.

Two ways are suggested to specify k, which are:

Equally Likely.

Variable.

14

TCR- ki Equally Likely:

Choosing fixed number of terms for each class. (k)

For example:

if we have 10 classes (developers) and we need to select

100 terms then we select 10 terms from the highest LOR

scored terms for each developer.

We maintain a unique set of terms, i.e., the number of

obtained terms R′ can be less than or equal to k × L.


Log-Odds-Ratio-based techniques 15

TCR- k Equally Likely:

16


If we want to select 6 terms to generate The reduced Bug-term matrix M′

2 term d3 , 2 term d2 , 2 term d1

0 0 1 1 1 1 0 1 1 1

1 1 1 0 1 0 1 1 1 1

0 0 1 1 0 1 1 0 0 1

1 1 0 0 1 1 1 1 0 0

0 0 1 1 0 0 1 1 1 1

0 0 1 0 0 1 1 0 0 0

0 0 1 1 1 1 0 1 1 1

0 0 1 1 1 0 0 1 1 0

0 0 0 0 1 1 1 0 0 1

0 1 1 1 0 1 1 1 1 0

0 1 0 1 1 0 1 1 1 1

0 1 0 0 0 1 1 1 0 0

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

b1

b2

b3

b4

b5

b6

b7

b8

b9

b10

b11

b12

d1

d3

d2

d2

d1

d3

d1

d1

d3

d1

d1

d2

d3

M Y LOR Values

d1 d2 d3

t1 1.04 1.95 1.33

t2 0.75 1.64 1.02

t3 1.07 1.43 1.35

t4 1.54 1.88 1.62

t5 1.85 1.16 1.53

t6 1.19 1.23 1.23

t7 1.63 0.43 1.67

t8 1.12 1.92 1.43

t9 1.12 1.12 1.39

t10 1.13 1.98 1.11

LOR Values (virtual)

TCR- ki Variable:

Choosing a variable number of terms for each class.

k is specified based on the developer fixing rate.

Fixing rate: is proportional to the number of bug reports

assigned to the developer from all available bug reports.


Log-Odds-Ratio-based techniques 17

Selection of the highest scored terms with (R' =20) from 100 bug

reports and 5 developers:

TCR- ki Variable:

18


If we want to select 6 terms to generate The reduced Bug-term matrix M′

1 term d3 , 2 term d2 , 3 term d1

0 0 1 1 1 1 0 1 1 1

1 1 1 0 1 0 1 1 1 1

0 0 1 1 0 1 1 0 0 1

1 1 0 0 1 1 1 1 0 0

0 0 1 1 0 0 1 1 1 1

0 0 1 0 0 1 1 0 0 0

0 0 1 1 1 1 0 1 1 1

0 0 1 1 1 0 0 1 1 0

0 0 0 0 1 1 1 0 0 1

0 1 1 1 0 1 1 1 1 0

0 1 0 1 1 0 1 1 1 1

0 1 0 0 0 1 1 1 0 0

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

b1

b2

b3

b4

b5

b6

b7

b8

b9

b10

b11

b12

d1

d3

d2

d2

d1

d3

d1

d1

d3

d1

d1

d2

d3

M Y LOR Values

d1 D2 d3

t1 1.04 1.95 1.33

t2 1.75 1.64 1.02

t3 1.07 1.43 1.35

t4 1.54 1.88 1.62

t5 1.85 1.16 1.53

T6 1.19 1.23 1.23

t7 1.13 1.43 1.67

t8 1.12 1.92 1.43

t9 1.63 1.12 1.39

t10 1.13 1.98 1.11

LOR Values (virtual)

Reduced Bug-term matrix M'

19

0 0 1 ….… 1

1 1 1 ….… 0

0 0 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

1 0 0 ….… 0

0 0 1 ….… 1

1 1 1 ….… 0

. . . ……. .

. . . ……. .

t2 t3 t4 ……. tR'

b1

b2

b3

b4

b5

.

.

.

.

bN

D1

D1

D3

D1

D5

.

.

.

.

D9

0 0 1 ….… 1

1 1 1 ….… 0

0 0 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

1 0 0 ….… 0

0 0 1 ….… 1

1 1 1 ….… 0

. . . ……. .

. . . ……. .

b1

b2

b3

b4

b5

.

.

.

.

bN

D1

D1

D3

D1

D5

.

.

.

.

DL

t1 t2 t3 ……. tR

Term selection

M M'

Training and Testing Data Preparation 20

1 1 1 ….… 1

1 1 1 ….… 0

1 1 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

1 0 0 ….… 0

0 0 1 ….… 1

1 1 1 ….… 0

. . . …………. ……. .

. . . …………. ……. .

. . . .………… ……. .

t1 t2 t3 tR'

B1

B2

B3

B4

B5

.

.

.

.

.

Bm

d1

d2

d2

d4

d4

.

.

.

.

d1

dL

1 1 1 ….… 0

1 1 1 ….… 1

0 0 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

. . . ……. .

B1

B2

B3

B4

B5

B?

d2

d2

…

dL

….

…..

Training Data Set

Applying 5 folds Cross Validation.

1 1 1 ….… 0

1 1 1 ….… 1

0 0 1 ….… 1

1 1 0 ….… 0

0 0 1 ….… 1

. . . ……. .

B6

B7

B8

B9

B10

B?

d1

…. d1

….

….

dL

Testing Data Set

M'

Experimental results

Eclipse Project Bugs Dataset:

A variety of open bug repositories are used in open source

development, our experiments applied on Bugzilla repository related

to Eclipse (https://bugs.eclipse.org).

Number of bugs in 2009 are:

21

Total Reported 38843 Bugs.

FIXED 20502 Bugs.

WONTFIX 1182 Bugs.

DUBLICATE 3120 Bugs.

WORKSFORME 1362 Bugs

INVALID 1465 Bugs

Not Eclipse 365 Bugs

Other (REASSINED ,NEW,REOPEN) 10847 Bugs (Still without Resolution)

https://bugs.eclipse.org/

Bugs Reports Status and Resolutions

53%

3%

8%

3%

4% 1%

28%

FIXED WONTFIX DUBLICATE WORKSFORME

INVALID NOTECLIPSE OTHER

22

Eclipse Bugs Reports Components:

Bugzilla Repository - Eclipse Project divided in 907 different components.

We use the most motivated components (have maximum Fixed Bugs) are :

Core Component: JDT Core is the Java infrastructure of the Java IDE

http://www.eclipse.org/jdt/core/index.php

UI Component: Java Development Toolkit UI.

http://www.eclipse.org/jdt/ui/index.html

SWT Component: Eclipse standard Widgets Toolkit.

http://www.eclipse.org/swt/

Experimental results 23

http://www.eclipse.org/jdt/core/index.php

http://www.eclipse.org/jdt/ui/index.html

http://www.eclipse.org/swt/

Number Of Fixed Bugs Per Component

0

500

1000

1500

2000

2500

UI Core SWT

Count Of Fixed Bugs

24


Evaluations:

Precision is the ratio of the correctly classified bug reports to

the total number of misclassified bug reports and correctly

classified bug reports.

Recall is the ratio of correctly classified bug reports to the total

number of unclassified bug reports and correctly classified bug

reports.

.

We used the Bayesian network Classifier.

25


26

Other Techniques used to compare are:

Information Gain which is calculated for each term with

respect to all classes, and terms with top R' information gain

values are returned.

Latent Semantic Analysis which is transforming terms into

concepts by extracting relations between terms in the

selected bug reports.


F-measure results of the five term selection methods using different number of terms.

These methods were applied on the Core component and only active developers were

considered.

27


TRC - ki Variable had the highest precision (0.59) and highest recall (0.55).

28

The results for the SWT Component


TRC - ki Variable achieved the highest precision (0.56) and was from the highest

recall (0.46) values.

29

The results for the UI Component

Conclusion

This research investigates the impact of several term selection methods on

effectiveness of the classification.

Three Log Odds Ratio (LOR) variants selection methods were proposed.

A comparison between the proposed selection methods and the

Information Gain (IG) and Latent Semantic Analysis (LSA) techniques was

done.

The LOR-based selection method (TRC - ki Variable) achieved:

up to 30% improvement in the precision and up to 5% in recall

These results demonstrate the impact of incorporating effective term

selection techniques on improving classification performance.

30

Future Directions

Investigation of other alternative weighting schemes to better identify

discriminating terms for improving classification accuracy.

Exploring the potential of incorporating external domain knowledge

and other evidence sources to better address the general bug

assignment task.

Expanding the data sets from multiple domains to further examine

the effectiveness of proposed term selection techniques.

31

Any Questions ?

32

Thank You.

promise 2011: "selecting discriminating terms for bug assignment: a formal analysis"

Technology