ahmed y. tamrawi electrical and computer engineering department iowa state university 2011
TRANSCRIPT
Fuzzy Set and Cache-based Approach for Bug Triaging
Ahmed Y. Tamrawi
Electrical and Computer Engineering Department
Iowa State University
2011
2
Software Bugs1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
A common term used to describe a flaw, mistake, or failure in a computer system that produces an incorrect or unexpected result, or causes it to behave in unintended ways.
Definition: (Software Bug)
• Bugs can occur in any software.• Ranging from operating systems, flight auto-
pilot software, to a simple arithmetic program!
• Software bugs are costing ~60 bln US$/Y.
The term “Bug”
(September 9, 1947)
3
More Bugs1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
4
Bug Repository• Software users and developers report bugs,
to allow software developers to fix them.• Bugs are reported using bug reports which
are added to an issue tracking system or bug repository.
1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
reported storedAn interface for Bugs Repository
Bugs Repository
5
• Manual bug triaging is a difficult, expensive, and lengthy process, since it needs the bug triager to manually read, analyze, and assign bug fixers for each newly reported bug.
Bug Triaging1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
Assigning a bug to the most appropriate/capable developer who will fix it.
Definition: (Bug Triaging)
6
Bug Triaging1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
New Bug Reports
Bugs Repository
Software Developers
Bug AssignmentBug Triager
7
Bug Triaging• Bug triager challenges:– Knowledge about the system/project;– Descriptiveness of bug report;– Rate of reporting bugs;– Many developers, different projects, and various
expertise!• Why not to automate the bug triaging
process?– Improve software quality;– Reduce cost and time.
1 2 3 4 5
{ Introduction }
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
Eclipse – Feb 2011
8
Example
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Motivation }
Assigned to: James MoodySummary: New Repository wizard follows implementation model, not user model.Description: The new CVS Repository Connection wizard's layout is confusing. This is because it follows the implementation model of the order of elds in the full CVS location path rather than the user model...
Assigned to: James MoodySummary: Opening repository resources doesn't honor type.Description: Opening repository resource always open the default text editor and doesn'thonor any mapping between resource types and editors. As a result it is not possible to viewthe contents of an image (*.gif le) in a sensible way....
Version Control Management
(VCM)
Technical Aspect
James Moody
This aspect is concerned about various Concurrent Versions System (CVS) repository features and operations within Eclipse project.
9
Technical Aspects & Terms
• A software system has many technical aspects.
• Technical aspects are described via the technical terms extracted from software artifacts.
• A bug report describes issues related to technical aspects via its terms.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Motivation }
10
Automatic Bug Triaging
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Motivation }
Who have the most bug-fixing capability/expertise with respect to the reported technical aspect(s) in a give bug report should be the fixer(s)
Key Philosophy for Automatic Bug Triaging
11
Problem Definition
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
In a software system, given a bug report B, and a set of developers D who have past fixing activity.Find the developers(s) with the most fixing expertise with respect to the reported technical aspect(s) in B.
Problem: (Automatic Bug Assignment)
Bugs RepositoryNew Bug
Report B
Software
Developers
12
Bugzie Overview• Bugzie considers the problem as a ranking
problem.– State-of-the-art approaches view the problem as
a classification problem.• For a bug report, Bugzie determines a
ranked list of developers most capable toward the reported issue(s).
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
13
Bugzie Overview• Bugzie utilizes the fuzzy set theory to rank
the fixing expertise of developers toward the technical aspects.
• Bugzie models the association of a developer and technical aspects.
• If a developer has higher fixing association with a technical aspect, he will have higher expertise and rank for that aspect.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
14
Association of Fixer & Term
• is more capable than in the issues related to t.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
For a technical term t, a fuzzy set Ct, with associated membership function , represents the set of developers who have the bug-fixing expertise relevant to technical aspects(s) described by t
Definition: (Capable Fixer toward a Term)
Ct
𝝁𝒕
𝟎
𝟏
15
Association of Fixer & Term• The membership score of a developer d
toward a term t is:
• Dd: Bug reports d has fixed.
• Dt: Bug reports containing t.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
𝜇𝑡 (𝑑 )=|𝐷𝑑∩𝐷𝑡||𝐷𝑑∪𝐷𝑡|
𝜇𝑡 (𝑑)∈[0 ,1]
𝐷𝑡D( )
𝜇𝑡 ( )=0
D( )
𝜇𝑡 ( )=1
𝐷𝑡𝐷𝑡
D( )
D( )
16
Association of Fixer & Bug Report
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
𝜇𝐵 (𝑑 )=1−∏𝑡∈ 𝐵
(1−𝜇𝑡 (𝑑))
Bug Report
(B)t1 t2 tn
∪
𝝁𝑩
𝟎
𝟏CB
¿𝑡∈ 𝐵𝐶𝑡
∪
17
Association of Fixer & Bug Report
• In fuzzy set, union is a flexible combination.• The strong membership to a sub-fuzzy set(s)
implies the strong membership to the combined fuzzy set.
• After calculating for the developers, Bugzie recommends the top-scored ones as fixers for the bug report.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
18
Bugzie Model
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
4
Bug Report
(B)Pre-processing
2t1 t2 tn
𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠
Des
cend
ing
on
Reco
mm
enda
tion
List
Reco
mm
enda
tion
3
∀ term𝑡
Bugs Repository
Initial Training
∀ d
evel
oper
s
1
Updating5
Bug Report
(B)
19
Bugzie Caching• Fixer candidates selection (Developers Caching).• Significant terms selection (Terms Caching).
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
∀ term𝑡
Bugs Repository
Initial Training
∀ d
evel
oper
s
∀ term𝑡∈𝑇 (𝑘)
Developers Cache F(x)
Terms Cache T(k)
∀∈𝐹
(𝑥)
20
Data Collection• Collected all fixed bug reports from 7 bug
repositories.• For each bug report, we extracted and
merged the summary and description.• For each system, we pre-processed these
reports: stemming, stop words removal, etc.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
System History Range #Bug Reports #Fixers Eclipse 10-10-2001 to 10-28-2010 177,637 2,144 Firefox 04-07-1998 to 10-28-2010 188,139 3,014 Jazz 06-01-2005 to 06-01-2008 34,228 156 Gcc 08-03-1999 to 10-28-2010 19,430 293 Apache 05-10-2002 to 01-01-2011 43,162 1,695 FreeDesktop 01-09-2003 to 12-05-2010 17,084 374 NetBeans 01-01-2008 to 11-01-2010 23,522 380
1 2 3 4 5
{Bugzie Model }
System #Terms Eclipse 193,862 Firefox 177,028 Jazz 39,771 Gcc 63,013 Apache 110,231 FreeDesktop 61,773 NetBeans 42,797
21
Locality of Fixing Activity
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
Timeline
Bug Report
20102009200820072006
1 2 3 4 5
{Bugzie Model }
22
Locality of Fixing Activity
• If d belongs to the F(x), we count this as a hit.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
Bug Report BFixed by d
Fixing Timeline
20102009200820072006
All Developers that have been fixing before B
Developers Cache F(x)
Recent x%
1 2 3 4 5
{Bugzie Model }
The recent fixing developers are likely to fix bug reports in the near future.
Hypothesis: (Locality of Fixing Activity)
23
Locality of Fixing Activity
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
94% - 98%
96% - 99%
24
Selection of Fixer Candidates
• The locality of fixing activity suggests the actual fixer for a given bug report is likely the one having recent fixing activity.
• For each bug report, Bugzie chooses the top x% of developers sorted by their fixing time as the fixer candidates F(x).
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
Bug Report BFixed by d
Fixing Timeline
20102009200820072006
All Developers that have been fixing before B
Developers Cache F(x)
Recent x%
25
Bug Report
(B)
Developers Caching
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
5
∀ term𝑡
Initial Training
Developers Cache F(x)
1
Bug Report
(B)Pre-processing
3t1 t2 tn
𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠
Des
cend
ing
on
Reco
mm
enda
tion
List
Reco
mm
enda
tion
Updating
4
Updating6 Bugs Repository
∀∈𝐹
(𝑥)
2
26
Selection of Descriptive Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
System #Terms Eclipse 193,862 Firefox 177,028 Jazz 39,771 Gcc 63,013 Apache 110,231 FreeDesktop 61,773 NetBeans 42,797
RECALL :For a developer d and a term t, the higher their association score , the higher significance of t in describing the technical aspects that d has fixing expertise.
27
Selection of Descriptive Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
Descending on
𝑘 (All Terms)
𝑇(𝐴𝑙𝑙𝑇𝑒𝑟𝑚𝑠)
𝑇 (𝑘)
28
Terms Caching
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{Bugzie Model }
Bugs Repository
Initial Training
Terms Cache T(k)
Bug Report
(B)Pre-processing t1 t2 tn
𝑡𝑖∈𝑇 (𝑘)
Des
cend
ing
on
Reco
mm
enda
tion
List
Reco
mm
enda
tion
Updating
∀ term𝑡∈𝑇 (𝑘)
∀ d
evel
oper
s
Bug Report
(B)Updating
29
Empirical Evaluation• We evaluated Bugzie on our collected
datasets.• Experiments:– Selection of fixer candidates;– Selection of terms;– Selection of developers and terms;– Comparison with state-of-the-art approaches.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
System History Range #Bug Reports #Fixers Eclipse 10-10-2001 to 10-28-2010 177,637 2,144 Firefox 04-07-1998 to 10-28-2010 188,139 3,014 Jazz 06-01-2005 to 06-01-2008 34,228 156 Gcc 08-03-1999 to 10-28-2010 19,430 293 Apache 05-10-2002 to 01-01-2011 43,162 1,695 FreeDesktop 01-09-2003 to 12-05-2010 17,084 374 NetBeans 01-01-2008 to 11-01-2010 23,522 380
30
Experiment Setup
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Bug Report B
Creation Timeline
0 1 2 3 4 5 6 7 8 9 10
Bugzie uses frame 0 for initial training1
Using training data, Bugzie recommends a top-n developers to fix bug report B
2
Bugzie updates the training data with the tested bug report B3
Move to next Bug Report
Bug Report B
Des
cend
ing
on
Reco
mm
enda
tion
List
for B
Bugzie repeats steps 2 and 3 till it consumes all bug reports
31
Prediction Accuracy• If the recommendation list for a bug report
contains its actual fixer, we count this as a hit (i.e. a correct recommendation).
• For each frame under test, we calculated Prediction Accuracy (PA).
• If we have 100 bugs and for 60 of those bugs, we could recommend the actual fixing developer is in our Top-2 list, then Top-2 prediction accuracy is 60%.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
𝑃𝐴 (% )= ¿𝐻𝑖𝑡𝑠¿𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝐶𝑎𝑠𝑒𝑠
×100 %
32
Selection of Fixer Candidates
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Bug Report
(B)
5
∀ term𝑡
Initial Training
Developers Cache F(x)
1
Bug Report
(B)Pre-processing
3t1 t2 tn
𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠
Des
cend
ing
on
Reco
mm
enda
tion
List
Reco
mm
enda
tion
Updating
4
Updating6 Bugs Repository
∀∈𝐹
(𝑥)
2
Bug Report BFixed by d
Fixing Timeline
20102009200820072006
All Developers that have been fixing before B
Developers Cache F(x)
Recent x%
33
Selection of Fixer Candidates
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Top-1 Prediction Accuracy
Top-5 Prediction Accuracy
Firefox ( ):At x = 10%, PA = 72.4%At x = 100%, PA = 70.7%
34
• Selecting a suitable portion of recent fixers does not lessen much the accuracy, and sometimes improves it as in the cases of Firefox, Eclipse, etc.
• Selecting only a portion of available developers as candidates also improves time efficiency.
Selection of Fixer Candidates
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
35
Selection of Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
5
Bugs Repository
Initial Training
Terms Cache T(k)
1
2
Bug Report
(B)Pre-processing
3t1 t2 tn
𝑡𝑖∈𝑇 (𝑘)
Des
cend
ing
on
Reco
mm
enda
tion
List
Reco
mm
enda
tion
Updating
4
∀ term𝑡∈𝑇 (𝑘)
∀ d
evel
oper
s
Bug Report
(B)Updating 6
36
Selection of Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Top-1 Prediction Accuracy
Top-5 Prediction Accuracy
Peak Range Peak Range
Eclipse( ):At k = 16, PA = 80%At k = All Terms, PA = 72%
37
• Selection of terms could improve much the prediction accuracy.
• The results suggest that one just needs a small yet significant set of terms for each developer to describe his bug-fixing expertise.
Selection of Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
38
Selection of Developers & Terms
• To study the impact of both developers selection (x) and terms selection (k).
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Eclipse
Firefox
39
Selection of Developers & Terms
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Base: Base model with all developers and all termsC.S.: Candidate SelectionT.S.: Terms SelectionBoth: The best PA when applying both C.S. and T.S.
40
Comparison• We compared Bugzie Results with state-of-
the-art approaches.
• Used Weka to re-implement those approaches
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
Approach Papers
Naïve Bayes (NB) Cubranic & Murphy[1] Anvik et al.[2] Bhattacharya & Neamtiu[3]
Bayesian Networks (BN) Bhattacharya & Neamtiu Inc. Naïve Bayes (InB) Bhattacharya & Neamtiu Inc. Bayesian Networks (InBN) Bhattacharya & Neamtiu Support Vector Machine (SVM) Anvik et al. Vector Space Model (VSM) Matter et al.[4] C4.5 (Decision Trees) Anvik et al.
41
Comparison• Some of the approaches (C4.5 - Decision
Trees) can not scale up well to our dataset.• We prepared smaller dataset:
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
System History Range #Bug Reports #Fixers #Terms Eclipse 01-01-2008 to 10-28-2010 69,829 1,510 103,690 Firefox 01-01-2008 to 10-28-2010 77,236 1,682 85,951 Jazz 06-01-2005 to 06-01-2008 34,228 156 39,771 Gcc 01-01-2008 to 10-28-2010 6,865 161 20,279 Apache 01-01-2008 to 01-01-2011 28,682 1,354 80,757 FreeDesktop 01-01-2008 to 12-05-2010 10,624 161 37,596 NetBeans 01-01-2008 to 11-01-2010 23,522 380 42,797
3-Year Histories of the full dataset
42
Comparison Results
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Empirical Evaluation }
(d) days, (h) hours, (m) minutes, (s) seconds
43
Conclusions• Bugzie achieves higher accuracy and
efficiency than state-of-the-art approaches.• Bugzie can accommodate the locality of
fixing activity and software evolution with flexible caching of developers and terms.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Conclusions}
44
Thesis Contributions • Bugzie, a scalable, fuzzy set and cache-based
automatic bug triaging approach, which is significantly more efficient and accurate than existing state-of-the-art approaches.
• The finding of the locality of fixing activity.• A comprehensive evaluation on the efficiency and
correctness of Bugzie in comparison with state-of-the-art approaches.
• An observation/method to capture a small and significant set of terms describing developers’ bug-fixing expertise.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Conclusions}
45
Future Work• Use different caching mechanisms for
developers and terms.• Explore the usage of other textual and non-
textual contents of bug reports for bug triaging.
• Use other software artifacts to accurately measure the developer’s expertise.
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging
1 2 3 4 5
{ Conclusions}
47
Thank You!
Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging