![Page 1: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/1.jpg)
Revisiting Unsupervised Learning for Defect Prediction
Wei Fu Tim [email protected] [email protected] http://weifu.us http://menzies.usFind these slides at: http://tiny.cc/unsup
Sep, 2017 1
![Page 2: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/2.jpg)
2
On the market(expected graduation:May,2018)
Tim Menzies (advisor)
Research Areas:
● Machine learning● SBSE● Evolutionary algorithms● Hyper-parameter tuning
Publications:FSE: 2ASE: 1TSE: 1IST: 1Under Review: 2
weifu.us
![Page 3: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/3.jpg)
Yang et al. reported: unsupervised predictors outperformed supervised predictors for effort-aware Just-In-Time defect prediction.
Implied: dramatic simplification of a seemingly complex task.
We Repeated and reflect Yang et al. results.
3
![Page 4: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/4.jpg)
General Lessons
4
Open Science
![Page 5: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/5.jpg)
Yang’s Unsup Methods in Precision
5
Red is bad!
Yang, Yibiao, et al. "Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models." FSE’16
![Page 6: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/6.jpg)
arXiv.org Effect
6
[Fu et al. FSE’17] [Liu et al. ESEM’17][Huang et al. ICSME’17]
Feb 28, 2017 Apr 6, 2017 Apr 7, 2017
RaleighSingapore Hangzhou
[Yang et al. FSE’16]
Mar 11, 2016
Nanjing Nanjing
Dr. Zhou Dr. Menzies Dr. Xia & Dr. Lo Dr. Zhou
[Fu et al.arXiv’17]
Dr. Menzies
RaleighOct, 2017
![Page 7: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/7.jpg)
7
X. ACM Transactions on Software Engineering and Methodology(TOSEM) ? ?
![Page 8: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/8.jpg)
We Promote Open Science
8
FSE’16
TSE’13
Our code & data at:
https://github.com/WeiFoo/RevisitUnsupervised
![Page 9: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/9.jpg)
9
Methods
![Page 10: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/10.jpg)
Yang et al’s Method: Core Idea
10[Koru 2010]Koru, Gunes, et al. "Testing the theory of relative defect proneness for closed-source software." Empirical Software Engineering 15.6 (2010): 577-598.
Koru et al [Koru 2010] suggest that “Smaller modules are proportionally more defect-prone and hence should be inspected first!”
AUC
bad other
recall
Area under the curve of Recall/LOC
● Bad = modules predicted defective
● Other = all other modules
● Sort each increasing by LOC
● Track the recall
20% effort
LOC
![Page 11: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/11.jpg)
Yang et al’s Unsupervised Method
Build 12 unsupervised models, on testing data:
11
20% effort
10% effortPredicted as “Defective”
Yang et al report:Aggregating performance over all projects, many simple unsupervised models perform better than supervised models.
![Page 12: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/12.jpg)
12
OneWay
![Page 13: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/13.jpg)
OneWay is not “the Way”
13
OneWay:
“The alternative way, maybe not the best way!” --Wei
![Page 14: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/14.jpg)
OneWay
14
Select the best one(e.g.: NS)
OneWay
Testing data Supervised data
Build 12 learners Test with NS
![Page 15: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/15.jpg)
15
Results
![Page 16: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/16.jpg)
• Recall
• Popt
• F1
• Precision
Our Evaluation Metrics
16
![Page 17: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/17.jpg)
Our Result Format
17
Bugzilla
Platform
Mozilla
JDT
Columba
PostgreSQL
Blue: Better
Black: Similar
Red: Worse
Report results on a project-by-project basis.Recall
![Page 18: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/18.jpg)
Research Questions
• All unsupervised predictors better than supervised?
• Is it beneficial to use supervised data?
• OneWay better than standard supervised predictors?
18
![Page 19: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/19.jpg)
Research Questions
• All unsupervised predictors better than supervised?
• Is it beneficial to use supervised data?
• OneWay better than standard supervised predictors?
19
![Page 20: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/20.jpg)
RQ1: All unsupervised predictors better?
20
Recall: LT and AGE are better; Others are not.
Popt: the similar pattern as Recall.
Recall Popt
![Page 21: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/21.jpg)
RQ1: All unsupervised predictors better?
21
F1: Only two cases better, LT on Bugzilla; AGE on PostgreSQL;
Precision: All are worse than the best supervised learner!
F1 Precision
![Page 22: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/22.jpg)
RQ1: All unsupervised predictors better?
22
Not all unsupervised predictors perform better than supervised predictors for each project and for different evaluation measures.
F1 Precision
![Page 23: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/23.jpg)
Research Questions
• All unsupervised predictors better than supervised?
• Is it beneficial to use supervised data?
• OneWay better than standard supervised predictors?
23
![Page 24: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/24.jpg)
RQ2:Is it beneficial to use supervised data?
24
Recall: OneWay was only outperformed by LT in 4/6 data sets.
Popt: The similar pattern as Recall.
Recall Popt
![Page 25: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/25.jpg)
RQ2:Is it beneficial to use supervised data?
25
F1: EXP/REXP/SEXP performs better than OneWay only on Mozilla.
Precision: Similar as F1 but more data sets.
F1 Precision
![Page 26: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/26.jpg)
RQ2:Is it beneficial to use supervised data?
26
As a simple supervised predictor, OneWay performs better than most unsupervised predictors.
F1 Precision
![Page 27: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/27.jpg)
Research Questions
• All unsupervised predictors better than supervised?
• Is it beneficial to use supervised data?
• OneWay better than standard supervised predictors?
27
![Page 28: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/28.jpg)
RQ3:OneWay better than supervised ones?
28
F1 PrecisionRecall Popt● Better than supervised learners in terms of Recall & Popt;● It performs just as well as other learners for F1.● As for Precision, worse!
![Page 29: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/29.jpg)
29
Conclusion
![Page 30: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/30.jpg)
Lessons Learned• Don’t aggregate results over multiple data sets.
– Different conclusions: Yang et al.
30
![Page 31: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/31.jpg)
Lessons Learned• Don’t aggregate results over multiple data sets.
– Different conclusions: Yang et al.
• Do use supervised data.
– Unsupervised learners’ are unstable.
31
![Page 32: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/32.jpg)
Lessons Learned• Don’t aggregate results over multiple data sets.
– Different conclusions: Yang et al.
• Do use supervised data.
– Unsupervised learners’ are unstable.
• Do use multiple evaluation criteria.
– Results vary for different evaluation criteria.
32
![Page 33: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/33.jpg)
Lessons Learned• Don’t aggregate results over multiple data sets.
– Different conclusions: Yang et al.
• Do use supervised data.
– Unsupervised learners’ are unstable.
• Do use multiple evaluation criteria.
– Results vary for different evaluation criteria.
• Do share code and data.
– Easy reproduction.
• Do post pre-prints to arXiv.org.
– Saved two years.
33
![Page 34: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/34.jpg)
34
tiny.cc/seacraft
Why Seacraft?● Successor of PROMISE repo, which contains a lot of SE artifacts.● No data limits;● Provides DOI for every submission (aka, easy citation);● Automatic updates if linked to github project.
![Page 35: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/35.jpg)
35
On the market(expected graduation: May,2018)
(Machine learning, SBSE, Evolutionary algorithms, Hyper-parameter tuning)
Publications: FSE: 2ASE: 1TSE: 1IST: 1Under Review: 2
weifu.us
OPEN science is FASTER science
![Page 36: for Defect Prediction Revisiting Unsupervised Learningfuwei.us/pdf/FSE17_Unsup.pdf · 7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance."](https://reader033.vdocuments.us/reader033/viewer/2022053003/5f0781737e708231d41d512f/html5/thumbnails/36.jpg)
Reference1. Kamei, Yasutaka, et al. "Revisiting common bug prediction findings using effort-aware models." Software Maintenance (ICSM), 2010 IEEE International Conference on.
IEEE, 2010.2. Arisholm, Erik, Lionel C. Briand, and Eivind B. Johannessen. "A systematic and comprehensive investigation of methods to build and evaluate fault prediction models."
Journal of Systems and Software 83.1 (2010): 2-17.3. Menzies, Tim, Jeremy Greenwald, and Art Frank. "Data mining static code attributes to learn defect predictors." IEEE transactions on software engineering 33.1 (2007):
2-13.4. Rahman, Foyzur, Daryl Posnett, and Premkumar Devanbu. "Recalling the imprecision of cross-project defect prediction." Proceedings of the ACM SIGSOFT 20th
International Symposium on the Foundations of Software Engineering. ACM, 2012.5. Mende, Thilo, and Rainer Koschke. "Effort-aware defect prediction models." Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on.
IEEE, 2010.6. D’Ambros, Marco, Michele Lanza, and Romain Robbes. "Evaluating defect prediction approaches: a benchmark and an extensive comparison." Empirical Software
Engineering 17.4-5 (2012): 531-577.7. Kamei, Yasutaka, et al. "A large-scale empirical study of just-in-time quality assurance." IEEE Transactions on Software Engineering 39.6 (2013): 757-773.8. Yang, Yibiao, et al. "Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models." Proceedings of the 2016 24th ACM
SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016.
36