week 3 presentation

24
Week 3 Presentation Istehad Chowdhury CISC 864 Mining Software Engineering Data

Upload: thais

Post on 27-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Week 3 Presentation. Istehad Chowdhury CISC 864 Mining Software Engineering Data. Research Paper. Who Should Fix This Bug? John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia {janvik, lyndonh, murphy}@cs.ubc.ca. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Week 3 Presentation

Week 3 Presentation

Istehad ChowdhuryCISC 864

Mining Software Engineering Data

Page 2: Week 3 Presentation

Research Paper

Who Should Fix This Bug?

John Anvik, Lyndon Hiew and Gail C. MurphyDepartment of Computer Science

University of British Columbia{janvik, lyndonh, murphy}@cs.ubc.ca

Page 3: Week 3 Presentation

Problem with Open Bug Repository Overall, to cope with the surge of bugs in

large open source projects. “Everyday, almost 300 bugs appear that need

triaging. This is far too much for only the Mozilla programmers to handle.”

Many bug reports are invalid or duplicate of another bug report

Eclipse, 36%

Every bug report should be triaged To check validity and duplicity To assign the bug to an appropriate developer

Page 4: Week 3 Presentation

Problem cont..

Triager may not be sure whom to assign the bug.

Lot of time is wasted in reassigning and regaining 24% reports in Eclipse are re-assigned

Page 5: Week 3 Presentation

The research work Goal:

suggest whom to assign this bug to

Technique: Using data mining and machine

learning

Result: 60% precision and 10% recall

Page 6: Week 3 Presentation

Precision and Recall

Page 7: Week 3 Presentation

Life Cycle of a Bug Report

Page 8: Week 3 Presentation

Roles Reporter/Submitter Resolver Contributor Triager The roles are overlapping

Page 9: Week 3 Presentation
Page 10: Week 3 Presentation

Approach to the problem

Semi automated1. Characterizing bug reports2. Assigning a label to each report3. Choosing reports to train the supervised

machine learning algorithm4. Applying the algorithm to create the

classifier for recommending assignments.

Page 11: Week 3 Presentation

Heuristics on labeling bug reports FIXED (who provided last approved

patch), Firefox

FIXED (whoever marked report as resolved), Eclipse

DUPLICATE: whoever resolved the report is duplicate. Eclipse and Firefox

WORKSFORME (Firefox) -- unclassifiable.

Page 12: Week 3 Presentation

Experimental Results

Fig. Recommender accuracy and recall

Page 13: Week 3 Presentation

Validating Results with GCC

Why so poor result? Why recall is low in all cases, esp. gcc? Shows need of similarity in project natures.

Page 14: Week 3 Presentation

Trying Alternatives

Page 15: Week 3 Presentation

Trying Alternatives cont..

Unsupervised Machine learning

Incremental Machine learning

Incorporating Additional sources of Data

Component based classifier

Page 16: Week 3 Presentation

Component based classifier

Page 17: Week 3 Presentation

Points to Ponder

Page 18: Week 3 Presentation

Points to Ponder cont..

Are new developers assigned any bug?

“Needs further study to context of which it can be applied”-empirical research

Page 19: Week 3 Presentation

Points to Ponder cont..

Was there enough instances to evaluate using Cross Validation? For firefox 75%, gcc 86% developers have

less than 100 reports

Why was the labeling mechanism more successful in case of gcc and Eclipse than firefox? 1% for Eclipse, 47% for firefox

Page 20: Week 3 Presentation

Points in favor The research work was very intense

Thoroughly studied

Honest in identifying the limitations and smart pointing out of the future works

It opens up interesting doors of future research

Page 21: Week 3 Presentation

Points Against The study may not be suitable for a

environment where there is a frequent change in the active set of developers

The findings are too project specific and works well on “actual bugs” reports

Page 22: Week 3 Presentation

Points Against cont..

If there is any naivety in the heuristics it also propagates to the filtering process based on the heuristics to train the classifier.

I liked the way included the lesson learned section. However, the authors should have explained in more details how the mappings were done .

Page 23: Week 3 Presentation

Concluding Remarks It shows promise for improving the bug

assignment problem for OSS

“Coordination bug reports and CVS is challenging”

The effort is worth praising

Identifies need for further research

Page 24: Week 3 Presentation

Questions and Comments?