Download - Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models
![Page 1: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/1.jpg)
Using Correlation and Accuracy for Identifying Good Estimators
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
Gary D. Boetticher Nazim Lokhandwala Univ. of Houston - Clear Lake, Houston, TX, USA
[email protected] [email protected]
6362
61
![Page 2: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/2.jpg)
http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop
Research vs. Reality according to JörgensenTSE ’07: 300+ software est. papers,
76 journals, 15+ Years
-89 89-99 00-04 Total
Algorithm 48 137 70 255
ML 1 32 41 74
Human 3 22 21 46
Misc. 7 19 26 52
68% Algorithm
20% ML12% Human
Paper HumanHihn 91 89%Heemstra 91 62%Paynter 96 86%Jørgensen 97 84%Hill 00 100%Kitchenham 02 72%
JSS ’04: Compendium of expert estimation studies
82% Human
18% Formal
![Page 3: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/3.jpg)
Statement of Problem
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
((Log (TechGradCourses + (TechGradCourses ^ ((Log TotWShops)/(Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (((ProcIndExp + (Log (Sin MgmtGradCourses)))/(Sin SWPMExp)) + (Sin ((Cos (TechGradCourses ^ ((ProcIndExp + (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Sin SWPMExp)))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Cos (TechGradCourses ^ ((Log SWProjEstExp) / (((Log (ProcIndExp + (Log (TechGradCourses ^ ((Log SWProjEstExp) / (Log SWProjEstExp)))))) - 3) / (ProcIndExp + (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (TechGradCourses ^ (Log SWProjEstExp))))) / (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos (Log (Log (Log SWProjEstExp)))))))))))))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) + ((Log SWProjEstExp) / (Log SWProjEstExp)))))) / (Log (Log (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp))))))))))))))))))))))) / (TechGradCourses ^ (Log SWProjEstExp)))))) / (((Log ((((Log TotLangExp) / (Log SWProjEstExp)) / (Log SWProjEstExp)) / (Sin SWPMExp))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))) - 3) / (TechGradCourses ^ (Log SWProjEstExp)))))))))) + (((((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + ((TechGradCourses ^ (TechGradCourses ^ (Cos (TechGradCourses ^ ((ProcIndExp + (Log (Log (TechGradCourses ^ (TechGradCourses ^ (Cos (Log (Log (TechGradCourses ^ (Cos ((((Log SWProjEstExp) / ((ProcIndExp + (Log (TechGradCourses ^ (Log (TechGradCourses + (Cos (Log (Log (TechGradCourses ^ (Cos (((((Log SWProjEstExp) / (TechGradCourses ^ (Log SWProjEstExp))) / ((ProcIndExp + (Log (Sin MgmtGradCourses))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / ((Log SWProjEstExp) / (Log SWProjEstExp)))) / (Sin SWPMExp)) / (Sin SWPMExp)))))))))))) / (TechGradCourses ^ (Log SWProjEstExp))))))) / (Sin SWPMExp))))))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (TechGradCourses ^ (Log SWProjEstExp))) / (Sin SWPMExp)))
Some Background
2006
http://www.starwarscrawl.com/?id=232
![Page 4: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/4.jpg)
Statement of Problem
How to build human-based estimation models that are accurate, intuitive, and
easy to understand?
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
TechUGCourses < 45.5| Hardware Proj Mgmt Exp < 6| | No Of Hardware Proj Estimated < 4.5| | | No Of Hardware Proj Estimated < 3| | | | TechUGCourses < 23| | | | | Hardware Proj Mgmt Exp < 0.75| | | | | | TechUGCourses < 18| | | | | | | Hardware Proj Mgmt Exp < 0.13| | | | | | | | TechUGCourses < 0.5| | | | | | | | | TechUGCourses < -1 : F (1/0)| | | | | | | | | TechUGCourses >= -1| | | | | | | | | | Degree < 3.5 : A (4/0)| | | | | | | | | | Degree >= 3.5 : A (5/2)| | | | | | | | TechUGCourses >= 0.5| | | | | | | | | TechUGCourses < 5.5| | | | | | | | | | Degree < 3.5 : F (5/0)| | | | | | | | | | Degree >= 3.5| | | | | | | | | | | TechUGCrses < 2 : A (1/0)| | | | | | | | | | | TechUGCrses >= 2 : F (1/0)| | | | | | | | | TechUGCrses >= 5.5| | | | | | | | | | Degree < 3.5| | | | | | | | | | | TechUGCrs < 10.5 : A (3/0)| | | | | | | | | | | TechUGCrses >= 10.5| | | | | | | | | | | | TechUGCrs<12.5 : F (3/0)| | | | | | | | | | | | TechUGCrses >= 12.5| | | | | | | | | | | | | TechUGCrs<16: A (2/0)| | | | | | | | | | | | | TechUGCrs>15 : A (2/1)| | | | | | | | | | Degree >= 3.5 : F (1/0)| | | | | | | HardProjMgmt Exp >= 0.13 : A (2/0)| | | | | | TechUGCourses >= 18 : A (2/0)| | | | | Hard Proj Mgmt Exp >= 0.75 : F (1/0)| | | | TechUGCourses >= 23 : F (5/0)| | | No Of Hardware Proj Est >= 3 : F (1/0)| | No Of Hardware Proj Est >= 4.5 : A (5/0)| Hardware Proj Mgmt Exp >= 6 : F (4/0)TechUGCrses >= 45.5 : A (2/0)
Some Background
2007
![Page 5: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/5.jpg)
The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
http://nas.cl.uh.edu/boetticher/publications.html
PROMISE 2008 versus 2007
• Sample set: 178 Samples
• One learner Accuracy and Intuitive Results
• Attribute reduction Analysis.
• Relatively Simple models.
![Page 6: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/6.jpg)
The Approach
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
• Personal Demographics• Age, Gender, Nationality, etc.
• Academic• Courses Undergrad/Grad:
CS, HW, SE, Proj. Mgmt, MIS• Workshops/Conferences:
CS, HW, SE, Proj. Mgmt, MIS• Work
• Programming: Ada, ASP, Assembly, C, C++, COBOL, DBMS, FORTRAN, Java, PASCAL, Perl, PHP, SAP, TCL, VB, Other• Work Experience (HW/SW)• Project Management Exp. (HW/SW)• # Projects Estimated (HW/SW)• Average Project Size
• Domain Experience• Procurement Industry Experience
Estimate 28 Components
Scale Factor
And
Correlation
ApplyMachineLearners
Buyer Admin
Buyer1
Buyern
...
Buyer Software
DistributionServer
Supplier1
Supplier2
Suppliern
:
SupplierSoftware
![Page 7: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/7.jpg)
How user compares to other respondents
Feedback to Users
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
User’s Estimates
Actual Estimates
![Page 8: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/8.jpg)
Experiments: Data
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
-5
0
5
10
15
20
25
30
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0Correlation
Scale
0
0
0
1
1
1
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0Correlation
Scale
-5.0
0.0
5.0
10.0
15.0
20.0
25.0
30.0
0.0 0.2 0.4 0.6 0.8 1.0
Correlation
Scale
0
0.5
1
1.5
2
2.5
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0Correlation
Scale
Original Data set Experiment 1
Experiment 2 Experiment 3
82.8 -29.4
0.008
29X
![Page 9: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/9.jpg)
Experiments: Tools, Configuration
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
Outliers Removed WEKA Toolset C4.5 (J48) 1000 Trials 10-Fold Cross Validation
![Page 10: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/10.jpg)
Results: Correlation Only
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
2-Class Problem: 10 Best (A), 10 Worst (F)
1000 Trials,Accuracy = 41.6%
Attribute Reductionusing WRAPPER
1000 Trials,Accuracy = 78.6%
![Page 11: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/11.jpg)
Results: Scale Factor Only
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
1000 Trials,Accuracy = 65.0%
Attribute Reductionusing WRAPPER
1000 Trials,Accuracy = 78.2%
2-Class Problem: 10 Best (A), 10 Worst (F)
![Page 12: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/12.jpg)
Results: Correlation & Scale Factor
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
1000 Trials,Accuracy = 82.2%
Attribute Reductionusing WRAPPER
1000 Trials,Accuracy = 93.3%
2-Class Problem: 10 Best (A), 10 Worst (F)
![Page 13: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/13.jpg)
Discussion - 1
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
Best Estimators
Poorest Estimators
Average Correlation 0.4173 0.3686
Average Scale Factor 2.6198 2.7419
How well does the decision tree from the third experiment apply to all the respondents minus outliers?
![Page 14: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/14.jpg)
Discussion - 2
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
Scope of effort
Amortization of effort
Reuse can skew estimates (esp. Design for Reuse)
Respondent’s estimates = Boetticher’s estimates
Challenges in component effort estimation
![Page 15: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/15.jpg)
Conclusions
Good accuracy rates,
especially after attribute reduction
Correlation + Scale Factor Intuitive Model
Bridges expert and model groups
http://nas.cl.uh.edu/boetticher/publications.html The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
![Page 16: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/16.jpg)
http://nas.cl.uh.edu/boetticher/publications.html
Thank You !
The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
![Page 17: Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models](https://reader035.vdocuments.us/reader035/viewer/2022062404/551b57cb550346d41a8b636d/html5/thumbnails/17.jpg)
References
1) Jorgensen, M., “A review of studies on Expert Estimation of Software Development Effort,” Journal of Systems and Software, 2004.
2) Jørgensen, Shepperd, A Systematic Review of Software Development Cost Estimation Studies, IEEE Transactions on Software Engineering, 33, 1, January, 2007, Pp. 33-53.
The 4th International Predictor Models in Software Engineering (PROMISE) Workshop
http://nas.cl.uh.edu/boetticher/publications.html