predicting number of software defects and defect

112
Predicting Number of Software Defects and Defect Remediation Time to Minimize Leakage and Allocate Rework Efforts in an Upcoming Software Release by Christine Adrian Sigalla B.S. in Triple Majors: Management Information Systems, Accounting, and Business Management, May 2014, La Roche University M.S. in Systems Engineering, May 2018, The George Washington University A Praxis submitted to The Faculty of The School of Engineering and Applied Science of The George Washington University in partial fulfillment of the requirements for the degree of Doctor of Engineering August 31, 2020 Praxis directed by Amir Etemadi Associate Professor of Engineering and Applied Science Oluwatomi Adetunji Professorial Lecturer of Engineering Management and Systems Engineering

Upload: others

Post on 26-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Number of Software Defects and Defect

Predicting Number of Software Defects and Defect Remediation Time to Minimize

Leakage and Allocate Rework Efforts in an Upcoming Software Release

by Christine Adrian Sigalla

B.S. in Triple Majors: Management Information Systems, Accounting, and Business Management, May 2014, La Roche University

M.S. in Systems Engineering, May 2018, The George Washington University

A Praxis submitted to

The Faculty of

The School of Engineering and Applied Science of The George Washington University

in partial fulfillment of the requirements for the degree of Doctor of Engineering

August 31, 2020

Praxis directed by

Amir Etemadi Associate Professor of Engineering and Applied Science

Oluwatomi Adetunji

Professorial Lecturer of Engineering Management and Systems Engineering

Page 2: Predicting Number of Software Defects and Defect

ii

The School of Engineering and Applied Science of The George Washington University

certifies that Christine Adrian Sigalla has passed the Final Examination for the degree of

Doctor of Engineering as of July 24, 2020. This is the final and approved form of the

Praxis.

Predicting Number of Software Defects and Defect Remediation Time to Minimize

Leakage and Allocate Rework Efforts in an Upcoming Software Release

Christine Adrian Sigalla

Praxis Research Committee:

Amir Etemadi, Associate Professor of Engineering and Applied Science, Praxis Co-Director

Oluwatomi Adetunji, Professorial Lecturer of Engineering Management and Systems Engineering, Praxis Co-Director

Thomas Holzer, Professorial Lecturer of Engineering Management and Systems Engineering, Committee Member

Page 3: Predicting Number of Software Defects and Defect

iii

© Copyright 2020 by Christine Adrian Sigalla All rights reserved

Page 4: Predicting Number of Software Defects and Defect

iv

Dedication

I dedicate this research to my husband, parents, sisters, extended family, in-laws,

professors, managing director, manager, friends, colleagues, and employer for supporting

and being patient with me during this 2-year journey of demanding and unbalanced life.

My husband deserves an honorable dedication for being my biggest motivator and

supporter behind all of my accomplishments. My husband stepped up and took care of all

house maintenance and domestic errands while I concentrated on my studies.

I would like to thank my mother and sisters, who never stopped praying for me

even when I felt like giving up at the beginning of the research phase. My father inspired

me to pursue this doctoral degree and challenged me to overcome all obstacles and

become the first generation of the clan to ever receive the doctorate with honors.

An honorable dedication goes to my late mother-in-law, who passed away during

my final year of doctoral research. I am so thankful that I had a chance to spend time with

her during her final days, and she was very supportive of my education and career

journey. A special dedication goes to my advisors for being patient, understanding, and

offering valuable advice that directed me to the right path during the challenging 2-year

journey.

Page 5: Predicting Number of Software Defects and Defect

v

Acknowledgements

I wish to thank my family and friends for making this 2-year journey possible. I thank my

advisors, Dr. Etemadi and Dr. Adetunji, for providing valuable advice in completing the

praxis. I also acknowledge my employer for providing partial tuition reimbursement

towards my education.

Page 6: Predicting Number of Software Defects and Defect

vi

Abstract of Praxis

Predicting Number of Software Defects and Defect Remediation Time to Minimize

Leakage and Allocate Rework Efforts in an Upcoming Software Release

Information Technology companies spend substantial resources fixing the damage

caused by software defects. Defect remediation time is an important metric in allocating

rework efforts (resources) for fixing the defects. The aim of this praxis is to use statistical

learning models to predict the number of defects and defect remediation time prior to

testing. Obtaining information from these models is valuable because it gives software

engineering managers a better method by which to minimize the leakage of the defects

and allocate rework efforts in an upcoming software release.

The predictors for number of defects are: total number of components delivered,

code size (lines of code), total number of developers working on code components, total

number of requirements, and total number of test cases. The predictors for defect

remediation time are: total number of test cases, total number of requirements, number of

defects, and code size. Previous studies have used these predictors individually in

predicting the number of defects and defect remediation time. However, none of the

previous studies have considered combining all the predictors in their predictions.

This praxis addresses a gap in previous software industry research by proposing

the number of defects and defect remediation time predictions using 202 mainframe

languages software projects over 4 years of a dataset containing 1,143 defects, and the

combined influence of all the predictors. The proposed statistical learning models used in

this praxis are negative binomial regression, multiple linear regression, random forest,

and support vector machine. If the number of defects and defect remediation time can be

Page 7: Predicting Number of Software Defects and Defect

vii

predicted, both software managers and researchers will benefit from this research by

applying statistical learning models to minimize defect leakage and allocate rework

efforts.

Page 8: Predicting Number of Software Defects and Defect

viii

Table of Contents

Dedication ......................................................................................................................... iv

Acknowledgements ........................................................................................................... v

Abstract of Praxis ............................................................................................................ vi

List of Figures .................................................................................................................. xii

List of Tables .................................................................................................................. xiv

List of Symbols ................................................................................................................ xv

List of Acronyms ............................................................................................................ xvi

Chapter 1—Introduction ..................................................................................................... 1

1.1 Background ....................................................................................................... 1

1.2 Research Motivation ......................................................................................... 3

1.3 Problem Statement ............................................................................................ 4

1.4 Thesis Statement ............................................................................................... 4

1.5 Research Objectives .......................................................................................... 5

1.6 Research Questions and Hypotheses ................................................................ 5

1.7 Scope of Research ............................................................................................. 6

1.8 Research Limitations ........................................................................................ 7

1.9 Organization of Praxis ...................................................................................... 7

Chapter 2—Literature Review ............................................................................................ 9

2.1 Introduction ....................................................................................................... 9

2.2 Software Defects Prediction Metrics .............................................................. 11

2.2.1 Total Number of Developers (TNOD) ................................................... 11

2.2.2 Number of Components Delivered (NOCD) ......................................... 14

Page 9: Predicting Number of Software Defects and Defect

ix

2.2.3 Total Number of Requirements (TR) .................................................... 14

2.2.4 Total Number of Test Cases (TTC) ...................................................... 15

2.2.5 Code Size - Lines of Code (LOC).......................................................... 15

2.2.6 Summary of Metrics for Software Defect Prediction ............................ 16

2.3 Root Causes of Software Rework ................................................................... 17

2.3.1 Introduction: Software Rework.............................................................. 17

2.3.2 Root Causes Analysis of Software Rework ........................................... 19

2.3.3 Possible Ways of Reducing Avoidable Software Rework ..................... 21

2.4 Defect Remediation Time Prediction Metrics ................................................ 22

2.4.1 Code Size - Lines of Code (LOC).......................................................... 22

2.4.2 Number of Defects (NOD) ..................................................................... 23

2.4.3 Total Number of Test Cases (TTC) ....................................................... 23

2.4.4 Total Number of Requirements (TR) ..................................................... 24

2.4.5 Summary of Metrics for Defect Remediation Time Prediction ............. 24

2.5 Summary and Conclusion ............................................................................... 25

Chapter 3—Methodology ................................................................................................. 27

3.1 Introduction ..................................................................................................... 27

3.1.1 Data ........................................................................................................ 28

3.1.2 Data Description .................................................................................... 28

3.1.3 Proposed Approaches............................................................................. 29

3.2 Regression Techniques ................................................................................... 33

3.2.1 Negative Binomial Regression Model ................................................... 33

3.2.2 Multiple Linear Regression Model ........................................................ 35

Page 10: Predicting Number of Software Defects and Defect

x

3.3 Classification Techniques ............................................................................... 36

3.3.1 Random Forest ....................................................................................... 36

3.3.2 Support Vector Machine Model............................................................. 41

Chapter 4—Results ........................................................................................................... 44

4.1 Analysis of Significant Predictors for Software Defects ................................ 44

4.1.1 Defects Data Collection and Cleaning ................................................... 45

4.1.2 Negative Binomial Regression Summary & Partial Dependency Plots

for Defects Significant Predictors ................................................................... 45

4.2 Analysis of Software Defect Prediction Model .............................................. 50

4.2.1 Data Partition for Defects Prediction ..................................................... 51

4.2.2 Variable Importance Plot Using Defects Data ....................................... 51

4.2.3 Development of Software Defects Prediction........................................ 52

4.2.4 Measures of Model Accuracy for Software Defects Prediction ............ 53

4.2.5 Results of Software Defects Prediction Model ...................................... 54

4.3 Analysis of Significant Predictors for Software Defect Remediation Time ... 56

4.3.1 Data Collection and Cleaning for Defect Remediation Time ................ 57

4.3.2 Multiple Linear Regression (MLR) Model Summary ........................... 57

4.3.3 Partial Dependency Plots for Significant Predictor(s) of Defect

Remediation Time ........................................................................................... 58

4.4 Analysis of Defect remediation Time Prediction Model ................................ 60

4.4.1 Data Partition for Defect Remediation Time Prediction ........................ 60

4.4.2 Variable Importance Plot Using Defect Remediation Time Data ......... 61

4.4.3. Development of Software Defect Remediation Time Prediction ......... 61

Page 11: Predicting Number of Software Defects and Defect

xi

4.4.4 Model Accuracy Measures for Defect Remediation Time .................... 62

4.4.5 Results of Defect Remediation Time Prediction Model ........................ 63

Chapter 5—Discussion and Conclusions .......................................................................... 67

5.1 Discussion and Conclusions ........................................................................... 67

5.2 Contributions to Body of Knowledge ............................................................. 69

5.3 Recommendations for Future Research .......................................................... 69

References ......................................................................................................................... 71

Appendix A—Dataset for Defects and Defect Remediation Time ................................... 77

Appendix B—Metrics for Defects and Defect Remediation time .................................... 79

Appendix C—Models Development & Results ................................................................ 82

Appendix D—Measures of Model Performance .............................................................. 90

Page 12: Predicting Number of Software Defects and Defect

xii

List of Figures

Figure 3-1. High-level Overview of Building a Model. ................................................... 30

Figure 3-2. Process Flow to Identify Significant Predictors for NOD and DRT. ............. 31

Figure 3-3. Process Flow for SDP and DRT Prediction. .................................................. 32

Figure 3-4. Data Subset: Random Selection of Data. ....................................................... 37

Figure 3-5. Independent Variables Set: Random Selection of Variables. ........................ 37

Figure 3-6. RF Classification Process. .............................................................................. 39

Figure 3-7. 2-Dimensional Hyperplane and 3-Dimensional Hyperplane. ........................ 41

Figure 4-1. NBR Model Summary Result. ....................................................................... 46

Figure 4-2. Relationship between Number of Defects and LOC. ..................................... 48

Figure 4-3. Relationship between Number of Defects and NOCD. ................................. 49

Figure 4-4. Relationship between Number of Defects and TTC ...................................... 50

Figure 4-5. Variable Importance Plot for Software Defects. ............................................ 52

Figure 4-6. Random Forest Model Result for Defects Prediction. ................................... 53

Figure 4-7. Support Vector Machine Model Result for Defects Prediction. .................... 53

Figure 4-8. Number of Predicted Defects vs. Actual Defects. .......................................... 55

Figure 4-9. Actual Versus Predicted Defects Graph. ........................................................ 56

Figure 4-10. Multiple Linear Regression Model Result ................................................... 58

Figure 4-11. Relationship between Defect Remediation Time and NOD. ....................... 59

Figure 4-12. Variable Importance Plot for Defect Remediation Time. ............................ 61

Figure 4-13. Random Forest Model Result for DRT Prediction. ..................................... 62

Figure 4-14. Support Vector Machine Model Result for DRT Prediction. ...................... 62

Figure 4-15. Predicted Defect Remediation Time vs. Actual Defect Remediation Time. 64

Figure 4-16. Actual vs. Predicted Defect Remediation Time Graph. ............................... 65

Page 13: Predicting Number of Software Defects and Defect

xiii

Figure A-1. Defects. .......................................................................................................... 78

Figure A-2. Defect Remediation Time. ............................................................................ 78

Figure C-1. NBR Model Summary Result. ....................................................................... 83

Figure C-2. Number of Predicted Defects vs. Actual Defects. ......................................... 86

Figure C-3. Multiple Linear Regression Model Result. ................................................... 87

Figure C-4. Predicted Defect Remediation Time vs. Actual Defect Remediation Time. . 89

Page 14: Predicting Number of Software Defects and Defect

xiv

List of Tables

Table 2-1. Summary of Predictors for Software Defect Prediction .................................. 17

Table 2-2. Summary of Predictors for Defect Remediation Time Prediction .................. 25

Table 3-1. Metrics Definitions and Abbreviations ........................................................... 28

Table 4-1. Data Partition for Software Defects Prediction ............................................... 51

Table 4-2. Measure of Errors for Software Defect Prediction .......................................... 54

Table 4-3. Data Partition for Defect Remediation Time Prediction ................................. 60

Table 4-4. Measure of Errors for Defect Remediation Time Prediction .......................... 63

Table 4-5. Summary Table………………………………………………...…………….66

Table B-1. Metrics Definitions and Abbreviations ........................................................... 79

Table B-2. Summary of Predictors for Software Defect Prediction ................................. 80

Table B-3. Summary of Predictors for Defect Remediation Time Prediction .................. 81

Table D-1. Measure of Errors for Software Defects Prediction ....................................... 93

Table D-2. Measure of Errors for Defect Remediation Time Prediction ......................... 96

Page 15: Predicting Number of Software Defects and Defect

xv

List of Symbols

� Predictor / Independent Variable

� Response / Dependent Variable

∈ Variant Epsilon / to mean “belongs to” or “is in the set of”

K Value of Response Variable

E Error

Pr Probability

Γ Gamma Function

Ix Variable Importance Score

Vi Vector

λ Variance of Y

r Dispersion parameter

n Dimensional Space

errorOOBn Out-Of-Bag Error

Page 16: Predicting Number of Software Defects and Defect

xvi

List of Acronyms

LOC Lines of Code

RF Random Forest

SVM Support Vector Machine

NBR Negative Binomial Regression

MLR Multiple Linear Regression

IT Information Technology

NOD Number of Defects

DRT Defect Remediation Time

IEEE Institute of Electrical and Electronics Engineers

COBOL Common Business-Oriented Language

JCL Job Control Language

CPY Copybook

TTC Total Number of Test Cases

TR Total Number of Requirements

NOCD Number of Components Delivered

TNOD Total Number of Developers

SCM Software Configuration Management

PD Partial Dependency

PMI Project Management Institute

RL Release

UAT User Acceptance Testing

Page 17: Predicting Number of Software Defects and Defect

1

Chapter 1—Introduction

1.1 Background

A software defect, commonly referred to as a “bug,” is an error in the software

source code that makes the software product function in unintended ways, yielding

unexpected results. In the continuous expansion of new technology, software defects

have become a major concern in the software industry. The probability of having defect-

free software has become difficult to achieve because of the complexity in the software

source code, which leads to software failure. The failure to capture defects during the

development phase of software engineering can result in system downtime, rework

efforts, and overheads in production (Harekal & Suma, 2015).

Information Technology (IT) companies are constantly working and spending

substantial resources in finding and fixing the damage caused by software defects in

order to deliver high quality software products to their customers and attain customer

satisfaction. Finding and fixing the defects after the software product has been delivered

to stakeholders is expensive; therefore, there is a need for predicting software defects

prior to testing (Felix & Lee, 2017; Harekal & Suma, 2015). In software engineering, the

failure to capture defects “during pre production time of the software certainly leads

towards defect leakage” (Harekal & Suma, 2015, p.20).

Software defect prediction refers to a method of predicting defective modules

(source code components) by analyzing past historical data and building statistical

learning classifiers to improve software reliability. Software reliability is defined as the

likelihood of a software product being free from defects. Identifying defective modules

Page 18: Predicting Number of Software Defects and Defect

2

early in the development of software will improve the software quality and help software

engineers optimize resource allocation for fixing defects while supporting the

development, testing, and maintenance of the software (Fan et al., 2019; Li et al., 2018).

Software engineering managers are faced with a challenge of predicting rework

efforts during the software development planning process. Rework effort is defined as the

“effort [resources] required to fix the software defects identified during system testing”

(Bhardwaj & Rana, 2015, p.1). The role of the manager is to ensure the software product

satisfies the client’s needs, is delivered on time, and on budget. In order to determine the

rework efforts, it is important to predict the number of defects first, followed by the

defect remediation effort (time).

Defect fixing (remediation) effort is the “effort [time] required in person-hours to

fix a defect” (Goel & Singh, 2011, p.124). In this praxis, defect remediation time is

expressed in hours rather than person-hours. The purpose of predicting defect

remediation time prior to testing is to assist software engineering managers in planning

testing efforts, prioritizing work, and allocating appropriate resources to fix the defects in

a situation where there is a high volume of defects (Akbarinasaji et al., 2018; Harekal &

Suma, 2015).

This praxis aims at predicting number of defects and defect remediation time in

order to minimize the defect leakage and allocate resources to fix the defects prior to

testing using statistical learning models. The defect remediation time prediction can also

be used to improve defect correction time allocation in the project schedule. However,

project schedule is not included on the dataset for this research. The statistical learning

models discussed in the context of this praxis are negative binomial regression, multiple

Page 19: Predicting Number of Software Defects and Defect

3

linear regression, random forest, and support vector machine. To predict the number of

defects, the following predictors are used as an input to the statistical learning models:

total number of components delivered, code size (lines of code), total number of

developers working on code components, total number of requirements, and total number

of test case (Dhiauddin et al., 2012; Di Nucci et al., 2018; Kumar & Malik, 2019; Umar,

2013). To predict defect remediation time, the following predictors are used as an input to

the statistical learning model: total number of test cases, total number of requirements,

number of defects, and code size (Goel & Sing, 2011; Ramdoo & Huzooree, 2015).

1.2 Research Motivation

Recent studies (Dhiauddin et al., 2012; Di Nucci et al., 2018; Kumar & Malik,

2019; Umar, 2013; Goel & Sing, 2011; Ramdoo & Huzooree, 2015) have used these

predictors individually in predicting number of defects and defect remediation time using

object-oriented programming software projects written in JAVA, Python, JavaScript,

PHP, Ruby, and Scala. However, none of the studies have considered combining all the

predictors and using the procedural programming software projects written in mainframe

language (Common Business-Oriented Language [COBOL]) in their predictions.

The combined influence of all the predictors in predicting software defects and

defect remediation time using COBOL projects is the missing piece in the technical

literature and the software industry in general. This research predicts the number of

defects first, followed by the defect remediation time, and the results suggests that the

identified predictors can be used to predict the number of defects, and defect remediation

time.

Page 20: Predicting Number of Software Defects and Defect

4

1.3 Problem Statement

Failure to capture defects during software development results in defect leakage,

causing system down time, overheads in production and rework efforts which consume

up to 70% of the allocated budget for software development project.

Software defects that are discovered during the post production phase tend to

cause system outage, rework efforts, and overheads, which can affect the client and

vendor business relationship due to client dissatisfaction and software malfunctions. In

this situation, the client is forced to spend substantial resources in fixing the damages

caused by the vendor inability to identify defects prior to the software release. Due to

tight project schedules, it is difficult for all of the defects to be resolved; hence, some of

the defects are moved to the next release with no estimated time to fix the defects (Felix

& Lee, 2017; Ramdoo & Huzooree, 2015 ; Harekal & Suma, 2015).

1.4 Thesis Statement

Statistical learning models are required to forecast future software defects and

defect remediation time prior to testing in order to minimize the leakage of defects and

allocate rework efforts (resources) for fixing the defects in an upcoming software release.

In order to minimize the leakage of defects and allocate resources to fix defects

prior to testing, statistical learning methods are used to predict number of defects and

defect remediation time based on all of the predictors. The statistical learning methods

used are multiple linear regression, negative binomial regression, random forest, and

support vector machine.

Recent studies have built software defects and defect remediation time prediction

models based on statistical learning predictors, which are useful to object-oriented

Page 21: Predicting Number of Software Defects and Defect

5

programming projects only. Therefore, it is difficult for senior management or software

engineering managers to use these models with a mainframe related background.

1.5 Research Objectives

The aim of this praxis is to achieve the following research objectives:

� To determine the significant predictors for predicting number of defects.

� To formulate a statistical learning model to predict number of defects.

� To determine the significant predictors for predicting defect remediation time.

� To formulate a statistical learning model to predict defect remediation time.

The purpose of these research objectives is to propose statistical learning models that

apply the combined influence of all the predictors by predicting the defects and defect

remediation time. The final objective of the predictions is to minimize the leakage of

defects and allocate resources to fix the defects prior to testing.

1.6 Research Questions and Hypotheses

The following research questions (RQ) and hypotheses (H) were used to guide the

praxis and meet the objectives of the research:

RQ1: Code size, total number of components delivered, total number of

developers working on code components, total number of requirements, and total number

of test cases are the predictors influencing the number of defects. Which predictors are

significant in predicting the number of defects?

RQ2: How can statistical learning models forecast the number of defects using

code size, total number of developers working on code components, total number of

components delivered, total number of test cases, and total number of requirements?

Page 22: Predicting Number of Software Defects and Defect

6

RQ3: Code size, number of defects, total number of requirements, and total

number of test cases are the predictors influencing the defect remediation time prediction.

Which predictors are significant in predicting defect remediation time?

RQ4: How can statistical learning models forecast the defect remediation time

using code size, number of defects, total number of requirements, and total number of test

cases?

H1: Negative binomial regression and random forest models can identify the most

important predictors for number of defects.

H2: Random Forest and support vector machine models can be used to predict the

number of defects.

H3: Multiple linear regression can identify the important predictors for defect

remediation time.

H4: Random forest and support vector machine models can be used to predict

defect remediation time.

1.7 Scope of Research

This praxis relies on the dataset obtained from the IT firm. The dataset contains 4

years of defect data based on 202 COBOL projects and 1,143 total defects found in User

Acceptance Testing (UAT) environment. The scope of this research is to predict the

number of software defects and defect remediation time following the principles found in

the literature. The praxis is designed to perform the predictions using predictors that have

been individually used in the past by previous studies; however, the novelty of the

research is using the combined influence of all the predictors to perform defects and

defect remediation time predictions. The results obtained from the predictions are going

Page 23: Predicting Number of Software Defects and Defect

7

to provide managers a better method to minimize the leakage of defects in production and

assist in allocating resources to fix the defects.

1.8 Research Limitations

This praxis used the IT data containing only 202 COBOL projects to build and

validate the models. Future studies will need to use the object-oriented projects to build

the predictions models for defects and defect remediation time using the same defined

predictors used on COBOL projects and validate the measure of the model accuracy. The

models are not restricted to use with only COBOL projects. This praxis does not show the

calculation of number of resources to fix defects. However, the number of resources to

fix defects is based on the ratio of the predicted defect remediation time (in hours) and 40

hours of working days.

1.9 Organization of Praxis

The praxis contains five chapters and is organized as follows. Chapter 1 includes

the background of the problem, solution, research objectives, research questions and

hypotheses, scope of the research, research limitations, and organization of the praxis.

Chapter 2 presents the literature review of the software defects and defect remediation

time predictions using metrics. It also explains the root causes of software rework, which

are used as additional predictors for predicting defect remediation time. Chapter 3

explains the methodology techniques for identifying significant predictors for defects and

defect remediation time. In addition, it explains the flow and the criteria for evaluating

the best model for predicting defects and defect remediation time. Chapter 4 provides the

Page 24: Predicting Number of Software Defects and Defect

8

results of the model based on the written hypotheses. Chapter 5 provides the summary of

the research, conclusions, and future recommendations for researchers looking to extend

or critique the results outcome.

Page 25: Predicting Number of Software Defects and Defect

9

Chapter 2—Literature Review

2.1 Introduction

Software defect prediction which aims to predict defective modules prior to

testing is significant and a popular research field in software engineering. The statistical

learning defect prediction model is built based on meaningful metrics and historical data

collected from past releases of procedural programming software projects written in

mainframe languages.

In the past few decades, most studies have been conducted on defect prediction

using open source, object-oriented software projects. The purpose has always been to

minimize the software costs and improve the quality of software by identifying defective

modules prior to testing. The majority of defect prediction models are tested using open

source software projects (Dhiauddin et al., 2012; Di Nucci et al., 2018; Kumar & Malik,

2019; Umar, 2013); therefore, it is challenging for those models to be tested on closed

projects due to privacy preservation issues based on proprietary and commercial

reasoning.

While many previous studies have focused on predicting software defects for the

purpose of providing cost effective support in the development of a software product

(Felix & Lee, 2017), this research focuses on predicting the number of defects prior to

testing in order to minimize the leakage of defects in an upcoming software release using

COBOL software projects.

In this research, the software metrics or predictors for software defect prediction

are total number of components delivered, code size (lines of code), total number of

Page 26: Predicting Number of Software Defects and Defect

10

developers working on code components, total number of requirements, and total number

of test cases (Bell et al., 2013; Bird et al., 2011; Dhiauddin et al., 2012; Di Nucci et al.,

2018; Kumar & Malik, 2019; Ostrand et al., 2010; Posnett et al., 2013; Rahman &

Devanbu, 2011; Eyolfson et al., 2011; Umar, 2013).

The predictors for software defects are used as inputs to construct a statistical

learning model based on historical data. The prediction of defect remediation time plays

an important role in allocating resources to fix defects when faced with a high volume of

defects prior to software release. The defect remediation time prediction is a valuable

metric to managers in terms of properly allocating resources to fix defects found in

software development prior to testing.

The software engineering industry has experienced complex and difficult times in

delivering quality software products on time, on budget, and with good quality due to

project risks and improper scheduling of resources to fix defects (Goel & Singh, 2011). It

is challenging to predict the time one can take to fix defects, since some defects tend to

take more time to fix than others. In this way, the number of defects is predicted first,

followed by defect remediation time. The following metrics are applied to predict the

time it takes to fix defects: total number of test cases, total number of requirements,

number of defects, and code size (Goel & Singh, 2011; Ramdoo & Huzooree, 2015).

This chapter firstly introduces the software metrics for predicting software

defects. Secondly, it provides information about identifying the root causes of software

reworks. Finally, it describes the software metrics that will be used for predicting defect

remediation time based on the recent studies conducted by previous researchers. The

overall purpose of this chapter is to explain individual predictors or metrics for software

Page 27: Predicting Number of Software Defects and Defect

11

defects and defect remediation time, which have been used by previous researchers, and

to identify a literature gap.

2.2 Software Defects Prediction Metrics

In this section, the predictors for software defects prediction are discussed in

detail based on the studies performed by previous researchers. The predictors for

software defect prediction are total number of developers working on code components ,

number of components delivered, total number of requirements, total number of test

cases, and code size (lines of code).

2.2.1 Total Number of Developers (TNOD)

Previous studies (Bell et al., 2013; Bird et al., 2011; Di Nucci et al., 2018;

Ostrand et al., 2010; Posnett et al., 2013; Rahman & Devanbu, 2011; Eyolfson et al.,

2011) have demonstrated the role of developers in the introduction of defects. Posnett et

al. (2013) observed that developers who focus their attention on working on only one part

of an application software are likely to introduce fewer defects, compared to those who

are unfocussed. Unfocussed developers tend to focus their attention on working on

multiple parts of an application software. This indicates that a developer performing all of

their tasks on a single code component tends to have a higher level of focus on the

component, which makes them less likely to introduce defects. Hence, software modules

changed by focused developers tend to introduce fewer defects as compared to the

modules modified by unfocussed developers.

Di Nucci et al. (2018) applied the Posnett et al. (2013) observation by determining

the focus level of developers working on code components and scattered measures.

Scattered measures refer to the “frequency of changes made by developers over the

Page 28: Predicting Number of Software Defects and Defect

12

different system’s modules, but also considers the “distance” between the modified

modules” (Di Nucci et al., 2018,p. 8).Di Nucci et al. (2018) observed that high levels of

scattered measures tend to introduce more defects as compared to the low levels of

scattered measures which introduce fewer defects. Therefore, high levels of scattered

measures are associated with unfocused developers while low levels of scattered

measures are associated with focused developers.

Bell et al. (2013) and Ostrand et al. (2010) investigated whether the files that

contain defects remediated by a specific developer in a current release can assist in

predicting the defects in a file remediated by the same developer in the next release and

improve the accuracy of the standard negative binomial regression model. The purpose of

those studies was to determine if a file modified by a particular developer in a current

release is more or less likely to have defects in the future release than a file modified by a

developer at random.

Bell et al. (2013) and Ostrand et al. (2010) found that knowing a specific

developer who worked on a file is not likely to improve the prediction of defects in a file

but knowing the cumulative number of developers who modified a file can be a

significant variable in predicting defects . This is because an individual developer whose

files have more defects does not necessarily indicate that the developer is

underperforming ; instead, this can imply that the best developers tend to work on

complex files that are very difficult to execute

Eyolfson et al. (2011) contended that developers who have more experience are

less likely to introduce system defects as compared to less experienced developers.

Rahman and Devanbu (2011) examined the effect of developers’ experience and

Page 29: Predicting Number of Software Defects and Defect

13

ownership on the module. Rahman and Devanbu (2011) critiqued the observations by

Eyolfson et al. (2011) and showed that there is no link between developer experience and

the introduction of defects.

Bird et al. (2011) examined the relationship between developers’ ownership of

software components and the quality of software. Bird et al. (2011) found that a

developer who makes 80% of code changes on the module is considered to have a high

level of expertise on the module and the module is considered to have a high ownership

level. If many developers make changes on the module, then the developers were

considered to have low expertise on the module, and the module to have a low ownership

level. Bird et al. (2011) research indicated that a high level of component ownership

leads to fewer defects and a low level of component ownership leads to more defects.

The studies mentioned in the previous section have concentrated on the role of

the developers working on a code component in terms of the developer’s experience,

ownership, level of focus on the component, and the possibility of introducing software

defects. This shows the variable “number of developers working on a code component”

has been used in the literature many times.

In this praxis, the number of developers working on code components is applied

as a predictor for software defects prediction. The purpose is to find out if the number of

developers working on code components is statistically significant in predicting software

defects in the case of COBOL projects. This can imply that having many or fewer

developers working on multiple components can introduce higher or lower defect levels.

Page 30: Predicting Number of Software Defects and Defect

14

2.2.2 Number of Components Delivered (NOCD)

Umar (2013) identified total number of test cases executed, test team size,

allocated development effort, test case execution effort, and total number of components

delivered as predictors for defects prediction. Umar (2013) observed that there is a strong

correlation between the number of defects and the number of components delivered in the

project. Having more components delivered to the testing phase of software development

life cycle indicates that there is a higher chance of getting defects.

The predictors used in Umar’s (2013) research are meant to improve testing

efficiency and assist developers in evaluating software quality and defect proneness. In

addition, predicting defects using number of components delivered can help project

managers in assigning resources, budget and rescheduling allocations. In this praxis,

number of components delivered is applied as a predictor for software defects.

2.2.3 Total Number of Requirements (TR)

Kumar and Malik (2019) proposed the logit regression model to develop a

software metrics quality testing prediction framework. The purpose of the framework was

to implement software quality testing for the organization. Software quality testing refers

to the testing of a system or components to ensure deliverables are meeting the

requirements , client expectations and exploring the system to find the defects.

It is difficult to identify all of the defects that may result in significant losses;

hence, this framework is needed to minimize the program or project cost. In Kumar and

Malik (2019) research, evaluation of the framework is explained using 18 metrics and

logit regression model. Total number of requirements was one of the 18 metrics applied

Page 31: Predicting Number of Software Defects and Defect

15

in Kumar and Malik (2019) research; it includes the sum of the functional and non-

functional requirements.

On a tight schedule, if the team receives too many requirements and has fewer

resources with which to work on a project, then there is higher chance of introducing

defects to the system, which may affect testing. Total requirement is applied as a

predictor for software defects prediction in this praxis.

2.2.4 Total Number of Test Cases (TTC)

Total number of test cases are the input variables or conditions that need to satisfy

a requirement is working as expected. Dhiauddin et al. (2012) and Umar (2013) proposed

that total number of test cases, along with other predictors, forecast number of defects.

Dhiauddin et al. (2012) and Umar (2013) observed that there is a strong correlation

between the number of defects and total number of test cases: “If number of test cases are

high and critical to requirements, the chances [of] getting defects is high” (Umar, 2013, p.

742). This indicates that the number of defects is directly proportional to the number of

test cases. In this praxis, total number of test cases is applied as a predictor for software

defects prediction.

2.2.5 Code Size - Lines of Code (LOC)

According to Jing et al. (2018) defect metrics plays a major role in building a

predictive analytics model that can improve the quality of software. The defect metrics

are divided into code and process metrics. Code metrics measure the complexity and size

of the source code while the process metrics deal with the complexity of the software

code development process. The complexity on the source code involves the presence of

many methods that are unnecessary on the module without reusing the code to create

Page 32: Predicting Number of Software Defects and Defect

16

smaller methods that can accomplish a task with minimal lines of code to improve code

readability.

Lines of code (LOC) is a code metric used to measure the size of a source code.

Huda et al. (2017) considered LOC as the amount of executable code, without including

blank lines or comments. Jing et al. (2018) concluded that having high complexity on the

source code may result in a higher likelihood of introducing defects. Zhang (2009)

discovered that, by simply using the LOC metric, one can predict software defects.

Menzies et al. (2007) discovered that code metrics are still efficient predictors of

software defects, based on the National Aeronautics and Space Administration dataset.

Dhiauddin et al. (2012) proposed code size as a measure of software complexity to

predict software defects. Code size was expressed in terms of 1,000 lines of code

(KLOC). In this research, code size (LOC) is applied as a predictor for software defects

prediction.

2.2.6 Summary of Metrics for Software Defect Prediction

Table 2-1 shows the summary and source of predictors for defect prediction.

Page 33: Predicting Number of Software Defects and Defect

17

Table 2-1. Summary of Predictors for Software Defect Prediction

Metrics Type

Predictors Authors & Year of Research

Process Metrics

Total Number of Developers (TNOD)

Bell et al.,2013; Bird et al., 2011; Di Nucci et al., 2018; Ostrand et al., 2010; Posnett et al., 2013; Eyolfson et al., 2011, Rahman & Devanbu, 2011

Process Metrics

Number of Components Delivered (NOCD)

Umar (2013)

Process Metrics

Total Number of Requirements (TR)

Kumar & Malik (2019)

Process Metrics

Total Number of Test Cases (TTC)

Dhiauddin et al. (2012); Umar (2013)

Code Metrics

Code Size (LOC) Dhiauddin et al. (2012); Huda et al. (2017), Jing et al. (2018); Menzies et al. (2007); Zhang (2009)

2.3 Root Causes of Software Rework

2.3.1 Introduction: Software Rework

Geshwaree and Ramdoo (2015) defined rework as an additional effort of

repeating a process due to the fact either the process was implemented incorrectly, or the

client changed the project requirements.

Many firms spend a substantial amount of rework effort (time) and money to

improve the quality of a product during the development of software. Rework can impact

the productivity of the firm; hence, it is important to identify avoidable rework effort at

an early stage of the development of a software product. According to Rubio and Gulo

Page 34: Predicting Number of Software Defects and Defect

18

(2015), rework is considered as one of the activities in software development and is often

misunderstood or defined poorly.

Most developers spend most of their time on avoidable rework rather than the

work that is supposed to be correct the first time. Eliminating avoidable rework seems to

be a problem in the software engineering field; as such, there is a great deal of ongoing

research in the rework research area. According to Ramdoo and Huzooree (2015) rework

can consume up to 70% of the budget allocated for a software development project.

It is difficult to eliminate rework entirely, since some software defects are

inevitable. However, we can identify and avoid rework at the early stage of software

development by avoiding project management issues like having conflicting

requirements from clients, which can introduce rework in a project. Rework is still

considered a complex and challenging problem in the software engineering field (Zahra

et al., 2014).

Morozoff (2010) and Conroy and Kruchten (2012) used metrics as an approach to

understand and reduce rework in software development. In this praxis, root causes of

software rework are identified and alternatives to reduce avoidable rework are discussed.

Some of the root causes of rework are applied as predictors for defect remediation time

prediction in section 2.4.

Page 35: Predicting Number of Software Defects and Defect

19

2.3.2 Root Causes Analysis of Software Rework

Ramdoo and Huzooree (2015) identified the root causes of software rework in the

Mauritius organization in the software development using Ishikawa cause and effect

methodology. They categorized the root causes of rework into the categories of:

� Ambiguous Project Requirements

� People and Testing

� History and Versioning

2.3.2.1 Ambiguous Project Requirements

According to Geshwaree and Ramdoo (2015) ambiguous requirements still

remain a problem in software development. The authors identified the following as

reasons for requirements uncertainty among members of the production team of the

Mauritius organization:

� Requirements were not defined correctly.

� Having conflicting requirements from clients or teams.

� Inability to gather requirements due to some of the team members being on a

personal leave or vacation.

� Lack of team members’ participation and involvement in the project.

� Inability to document requirement changes on shared repository.

2.3.2.2 People and Testing (Test Cases)

Ramdoo and Huzooree (2015) observed that stakeholders in the Mauritius

organization had a difficult time expressing their project needs since stakeholders

preferred to see something first to confirm their desire or even decide what they wanted.

As such, developers and clients can have misunderstandings regarding how they view

Page 36: Predicting Number of Software Defects and Defect

20

requirements, leading to inaccurate expectations. Stakeholders play an important role in

software development; hence, issues caused by people in the Mauritius organization

occurred due to the following reasons:

� Team underestimated the significance of requirements and design phases.

� Lack of technical insight from the team.

� Improper coding standard.

� Overworking developers led to poor code quality.

Rework is a major problem in any organization, Ramdoo and Huzooree (2015)

discovered that developers in the Mauritius organization worked under pressure due to

schedule constraints. As a result, they were not fully involved in the testing. There was

also no automated tool with which to perform regression testing; therefore, developers

provided minimal test cases and performed basic testing only. In addition, test plans were

not documented, and software defects were not properly fixed due to the tight deadline.

2.3.2.3 History and Versioning

Ramdoo and Huzooree (2015) mentioned that it was difficult to trace back all of

the code and document histories and versions because most backups were saved on a

personal developer workplace remote server. The team had to run additional ad hoc jobs

to obtain the most updated and current versions of the code. Requirements were also

documented poorly or improperly; therefore, it took more time to search and receive the

right version of a document.

Page 37: Predicting Number of Software Defects and Defect

21

2.3.3 Possible Ways of Reducing Avoidable Software Rework

Avoidable rework effort is defined as the effort of redoing work because the client

changed the requirements; in this situation, work needs to be redone or the system is

implemented incorrectly. Avoidable rework can be minimized if best processes,

practices, and techniques are followed. Ramdoo and Huzooree (2015) evaluated the best

practices intended to reduce avoidable rework in order to determine the degree of

appropriateness of minimizing avoidable rework.

The best practices considered were:

� Standards and procedures: Following common programming standards and

procedures on how the system should be enacted. This avoids previous

mistakes and reduces rework.

� Audits and Reviews: According to the IEEE standard for software reviews

and audits (2008), examining the system and its documentation to help

validate system quality and ensure that it meets client expectations is the

function of the audit and review process. Auditors and reviewers can help find

defects in the system, and thereby reduce rework effort in the future.

� Software Configuration Management (SCM): SCM is a process that can

trace, track, and control information concerning the software (Kim et al.,

2010). Software configuration management can reduce rework by: providing

the trace to all histories and versions of the changes made by developers in

real time, without wasting time searching for updated or historical work; using

tickets to view the history and to update the information; and avoiding

situations in which developers work on an ad hoc basis.

Page 38: Predicting Number of Software Defects and Defect

22

2.4 Defect Remediation Time Prediction Metrics

Section 2.3 identified the root causes of software rework. From root causes

analysis, number of requirements and number of test cases from ambiguous project

requirements and testing categories, respectively, were selected for this praxis. Total

number of requirements and total number of test cases are applied in this praxis as

predictors for defect remediation time prediction because they are more likely to cause

rework effort.

Both software defect remediation time and software rework refer to same thing,

which is the “effort [resources] required to fix software defects identified during system

testing” (Bhardwaj & Rana, 2015,p.1). According to IEEE standards, the term defect

remediation time is used throughout the praxis.

In this section, the predictors for software defect remediation time prediction are

discussed in detail based on previous studies. The purpose is to understand the predictors

to predict the time it takes to fix defects because some of the rework cannot be avoided

and is thus inevitable. The defect remediation time predictors are code size, number of

defects, total number of test cases, and total number of requirements (Goel & Singh,

2011; Ramdoo & Huzooree, 2015).

2.4.1 Code Size - Lines of Code (LOC)

Goel and Singh (2011) proposed various size-related metrics or class size (source

line of code; functional points) to predict defect remediation time. Goel and Singh (2011)

observed that the larger the class size, the more likely the software will introduce defects

which will require additional effort to fix.

Page 39: Predicting Number of Software Defects and Defect

23

According to Goel and Singh (2011), dataset source lines of code was significant

in predicting defect fix-effort based on correlation analysis. This may indicate that the

more lines of code one has on components, the higher the complexity in the code and the

higher the chance of introducing defects. This situation requires additional effort to fix

the defects.

In addition, Huda et al. (2017), Jing et al. (2018), Dhiauddin et al. (2012), Zhang

(2009), and Menzies et al. (2007) observed that there is strong correlation between

number of defects and source lines of code. This indicates that there is also a good chance

of a need for additional work to fix the defects. In this research, LOC is used as a

predictor for defect remediation time on closed projects, in order to conclude if LOC is

significant in predicting defect remediation time.

2.4.2 Number of Defects (NOD)

Goel and Singh (2011) indicated that number of defects is the best metric for

forecasting the defect remediation time. Therefore, the higher the number of defects in

the system, the more additional effort is required to fix the bugs in the system. This

indicates that there is a strong correlation between the number of defects and defect

remediation time based on Goel and Singh’s (2011) dataset.

2.4.3 Total Number of Test Cases (TTC)

According to Ramdoo and Huzooree (2015) having minimal number of test cases

can lead to a higher chance of introducing defects, since only basic testing is conducted in

this situation. This can require additional time out of the allocated schedule to fix defects.

Dhiauddin et al. (2012) and Umar (2013) reported that the number of defects is directly

proportional to the number of test cases.

Page 40: Predicting Number of Software Defects and Defect

24

This relationship indicates that, as number of test cases increases, the chance of

defects occurring also increases. When there is a high probability of introducing defects,

there is also probability of requiring additional effort to fix the defects. Hence, TTC is a

good indicator for predicting defect remediation time.

2.4.4 Total Number of Requirements (TR)

Ambiguous project requirements can cause rework effort, according to Ramdoo

and Huzooree (2015). Too many number of requirements with fewer developers can lead

to a high chance of introducing defects, which may require additional effort to fix. Kumar

& Malik (2019) reported that total number of requirements (functional and non-

functional) is one of the 18 attributes that can impact the quality of testing and introduce

defects. Every time a defect is introduced, there is an additional effort needed to fix that

defect. Hence, total number of requirements is used in this research as a predictor for

forecasting defect remediation time.

2.4.5 Summary of Metrics for Defect Remediation Time Prediction

Table 2 shows the source of predictors for defect remediation time prediction.

Page 41: Predicting Number of Software Defects and Defect

25

Table 2-2. Summary of Predictors for Defect Remediation Time Prediction

Metrics Type

Predictors Authors & Year of Research

Code Metrics

Code Size (LOC) Dhiauddin et al. (2012); Goel and Singh (2011), Huda et al. (2017); Jing et al. (2018); Menzies et al. (2007); Zhang (2009)

Process Metrics

Number of Defects (NOD)

Goel & Singh (2011)

Process Metrics

Total Number of Test Cases (TTC)

Dhiauddin et al. (2012); Ramdoo & Huzooree (2015); Umar (2013)

Process Metrics

Total Number of Requirements (TR)

Kumar & Malik (2019); Ramdoo & Huzooree (2015)

2.5 Summary and Conclusion

Although many studies have identified potential ways to minimize leakage of

defects and rework, it remains difficult to eliminate defects entirely. As such, some of the

rework is inevitable. According to extant literature, some rework cannot be avoided.

Previous researchers have also addressed alternatives through which to reduce rework.

While many studies have concentrated more on predicting software defects, we found out

that there are very few studies that have looked at the strategies to reduce rework and

forecasting defect remediation time.

Most studies have concentrated on using open source datasets rather than

company datasets, due to easy availability of the open source dataset. Past research

studies have used these predictors individually in their defects and defect remediation

Page 42: Predicting Number of Software Defects and Defect

26

time predictions. However, these studies have not considered the combined influence of

all the predictors in their predictions.

Page 43: Predicting Number of Software Defects and Defect

27

Chapter 3—Methodology

3.1 Introduction

In this chapter, various methodologies are used to predict software defects and

defect remediation time. The methodologies applied are categorized into regression and

classification techniques. The tool used to build the models is R language and its

packages. Felix and Lee (2017) applied regression techniques such as simple and

multiple linear regression models to predict number of software defects. Perreault (2017)

and Prasad et al. (2015) employed classification techniques such as random forest and

support vector machine models to predict number of defects.

Regression technique is a form of predictive modeling methodology, which

examines the relationship between response (dependent) and predictor (independent)

variables. The purpose of the technique is forecasting and determining the casual effect

relationship between the response and the predictor variables. The commonly used

regression techniques are:

� Multiple Linear Regression (MLR): Used to explain a relationship between a

dependent variable and more than one independent variables.

� Negative binomial Regression (NBR): Applied when the variance is greater

than the mean for over-dispersed count data.

Classification technique is a methodology in which data are categorized into a

number of classes for the purpose of predicting the class of the new data. There are many

commonly used classification techniques (Prasad et al., 2015). In this praxis, the

following models are explored in detail.

Page 44: Predicting Number of Software Defects and Defect

28

� Random Forest : A classification and regression algorithm made up of many

decision trees.

� Support Vector Machine: A classification and regression algorithm that

concentrates on finding the best hyperplane that divides datasets into two

classes.

3.1.1 Data

The source of the data in this Praxis is IT industry XYZ’s1 historical defects from

2016 March to 2019 November, found in user acceptance testing (UAT) environment of

XYZ databases. The dataset consists of 16 software releases; the number of releases per

year is 4 (March, May, August, and November). The dataset has 202 COBOL projects.

3.1.2 Data Description

The data include the following metrics; their abbreviations and definitions are

shown in Table 3-1.

Table 3-1. Metrics Definitions and Abbreviations

Metrics Abbreviation Definition

Total Defect Remediation Time

DRT Total time to fix the defects and expressed in hours.

Total Number of Components Delivered

NOCD Total number of modules (programs) that completed and delivered to a tester before the beginning of testing.

Code Size LOC Total source lines of code that are modified or added by a developer per the project scope.

1 XYZ is anonymous IT company

Page 45: Predicting Number of Software Defects and Defect

29

Total Number of Developers

TNOD Number of programmers assigned to work on component (s) / project.

Total Number of Requirements

TR Total number of requirements are the total number of project tasks that need to be completed per business ask.

Total Number of Test cases

TTC Total number of test cases are the input variables or conditions that need to satisfy a requirement is working as expected.

Number of Defects

NOD Number of defects are errors in the source code which makes the software product to function in unintended ways yielding unexpected results.

Project PR Project is “a temporary endeavor undertaken to create a unique project service or result.” (PMI, 2008, p. 434)

Release (Year.Month.Day)

RL Software release is the process of developing and delivering the final product of the software application.

3.1.3 Proposed Approaches

Many studies have employed classification and regression techniques to predict

software defects and time effort to fix the defects (Felix & Lee, 2017; Goel & Singh,

2011; Perreault, 2017). The proposed approaches are based on statistical learning models,

as shown in Figures 3-1, 3-2, and 3-3.

Page 46: Predicting Number of Software Defects and Defect

30

Figure 3-1. High-level Overview of Building a Model.

Page 47: Predicting Number of Software Defects and Defect

31

Figure 3-2. Process Flow to Identify Significant Predictors for NOD and DRT.

In this praxis, figure 3-2 is used to explain step by step approaches that are used

to determine significant predictors for software defects and defect remediation time. First,

data is imported into R from dataset archives (dataset). Second, specific metrics are

selected for defects and defect remediation time. The purpose of the metrics selection is

to ensure only predictors with a strong relationship to a target variable are selected.

Irrelevant variables are excluded. The data cleaning process is conducted to identify

incomplete rows and remove them from the dataset to reduce the model performance

impacts.

Third, using negative binomial regression (NBR) and multiple linear regression

(MLR) model results, significant predictors for the number of defects and defect

remediation time are identified. Any predictor with a probability value (p-value) of less

than 0.05 (significance level) is considered statistically significant. Fourth, partial

dependence (PD) plots are used to show the marginal effect of small number of predictors

on the response variable of the statistical learning model (Friedman, J.H, 2001; Zhao et

al., 2019). PD plots are used to explain if the relationship between the response and

Page 48: Predicting Number of Software Defects and Defect

32

independent variables is linear or complex. For example, when PD plots are applied to

linear regression model then the plots will show a relationship which is linear.

Figure 3-3. Process Flow for SDP and DRT Prediction.

Page 49: Predicting Number of Software Defects and Defect

33

Figure 3-3 depicts the step-by-step process to predict software defects and Defect

remediation time. The first step is to import data to R, the second step is to perform

feature selection and data cleansing. The third step is splitting the dataset into training,

validation, and testing categories. The fourth step is ensuring that 30% of original dataset

is used for testing and 70% of the original dataset is used for second partition, which

includes training and validation. In this step, 70% of second partition is used to train the

models and 30% is used to validate the models. The fifth step is building the random

forest and support vector machine model using a training dataset.

The sixth step is displaying the variable importance plots using random forest.

Variable importance indicates that when the important variables are removed from the

model the error increases, as compared to less important variables. The seventh step is

determining the best fit model using validation of the dataset to describe the data well,

using measure of error metrics. The applied measure of error metrics used in this praxis

are root mean square error (RMSE), mean absolute error (MAE), mean absolute

percentage error (MAPE), mean square error (MSE), and R-square. Lastly, we predict the

number of defects and defect remediation time using testing dataset (unseen data).

3.2 Regression Techniques

3.2.1 Negative Binomial Regression Model

The NBR model is a method that predicts the value of a dependent count variable

from a set of independent variables (Yu, 2012). NBR is similar to standard simple linear

regression, except NBR assumes that counts are generated from negative binomial

distribution and not from normal distribution, as presumed by the simple linear

regression.

Page 50: Predicting Number of Software Defects and Defect

34

In this praxis, NBR is used to analyze the relationship between number of defects

(target/dependent variable) and predictors (NOCD, TR, TTC, LOC, and TNOD). The

purpose of the NBR is to determine significant predictors for number of defects. The

value of predicted NOD is a nonnegative integer, while the predictors have numerical

value and are continuous.

Let Y be the dependent variable (number of defects; NOD) and the value of Y is k

ϵ {0,1,2,3 …}. This represents that the module has k defects. Let X1, X2, X3, X4 ….X5 be

the independent variables (NOCD, LOC, TNOD, TR, and TTC). The resulting negative

binomial NBR analysis generates probability below (Yu, 2012).

Pr (Y=k|X1, X2, X3, X4, X5…Xn) (1)

The probability (Pr) represent Y=k when X1=x1, X2=x2, .... Xn=xn. According to Yu

(2012), NBR analysis generates Equation 2, to be used to forecast the possibility of

having a number of defects in a module.

���� = = ��� ���� ���� �1 − �

� ���

� �� ��

� (2)

Yu (2012) also defined parameters as follows: “Where Γ is gamma function, λ is variance

of Y and r is the dispersion parameter” (p.64 ).

λ = exp (a+b1x1+ b2x2+ b3x3+ b4x4+ …bnxn) (3)

Page 51: Predicting Number of Software Defects and Defect

35

NBR models Y with the assumption the count comes from negative binomial distribution

with variance λ. The value of r and parameters (a, b1, b2, b3 …, bn) are estimated using the

maximum likelihood method.

Gamma function: Γ(n) = (n - 1)! (4)

Replacing Equations 3 and 4 into Equation 2, it becomes easy to predict the possibilities

of introducing defects in a module. To determine the most significant predictors for

number of defects using the NBR model summary, we consider all predictors with p-

value < significance level to be statistically significant. The significance level applied in

this research is 0.05.

3.2.2 Multiple Linear Regression Model

The MLR model is a method of finding the linear relationship between a response

variable (number of defects; NOD) and two or more predictor variables (Prasad et

al.,2015). It is used to understand the relationship between the output (dependent

variable) and n-independent variables (input). The independent variables for predicting

defect remediation time are lines of code, number of defects, total number of

requirements, and total number of test cases. MLR is used to predict and identify

significant predictors of a response variable. Equation 5 represents the MLR model:

Y=B0 + B1X1 +B2X2 + …. +BnXn + E (5)

where Y is the dependent variable and X1, ..., Xn are independent variables. Coefficients

b1, b2, …. bn are regression coefficients, b0 is the y-intercept (constant term), and E is

error. MLR depends on historical data to predict the values of response variable (number

of defects). In this research, MLR is used to determine the most significant predictors for

Page 52: Predicting Number of Software Defects and Defect

36

defect remediation time. If the p-value is less than the significance level 0.05, then the

predictors are considered statistically significant. For example, assume you have defect

remediation time factor variable X1, and Y is the predicted value of the response variable.

Using simple linear regression, one can predict (DRT) as

Y= B0 +B1X1 (6)

where B1 calculates the relationship between X1 and Y. Similarly, for more than one

predictor ranging from X1 to Xn the regression coefficients also range from B1 to Bn

(Jadhav, 2019). In order to use MLR, at least three assumptions should be met (Osborne

& Waters, 2002; Williams et al., 2013). The assumptions are:

1. There should be a linear relationship between dependent and independent

variables. Non-linearity can be fixed by transforming variables to achieve a

linear state.

2. The variables need to be normally distributed.

3. MLR requires no auto correlation in the dataset.

4. Homoscedasticity: residual is same across all levels (regression line).

5. Larger sample size tends to yield better results compared to small sample size.

6. MLR assumes little or no multi collinearity in the dataset.

3.3 Classification Techniques

3.3.1 Random Forest

Random Forest (RF) is a method that builds multiple decision trees based on

random selection of independent variables and data (Pushphavathi et al., 2014). Each

subset of data can have a different size to develop the tree, as shown in Figure 3-4. The

subsets may or may not overlap.

Page 53: Predicting Number of Software Defects and Defect

37

Figure 3-4. Data Subset: Random Selection of Data.

Figure 3-5. Independent Variables Set: Random Selection of Variables.

Page 54: Predicting Number of Software Defects and Defect

38

Assume X1 to Xn are independent variables (as shown in Figure 3-5) that can be

used to develop decision trees. At first goal, it may happen that X1, X2, X3 and some

variables are randomly selected, followed by the second goal where X4, X5, and some

variables are randomly selected. The randomly selected variables are then used to make

decision trees, which are called random decision trees. The combination of individual

random decision trees makes random forest. The four major benefits of having many

trees are addressed in the remainder of this section:

1. Most of the decision trees are usually correct. It is only some part of data that

is always wrong. This concludes most of the decision trees give correct

prediction.

2. If you conduct a poll, as shown in Figure 3-6, the observation from the first,

second, and fourth trees is Y while the third tree observation is N. According to

the majority voting process (Twala, 2011) , Y will be the ideal observation. For

classification, this means the final decision is based on a majority vote of all

decision trees; for regression, it is the average mean decision of all decision trees.

3. RF can estimate the missing value from the dataset

4. RF won’t overfit the model due to the presence of many decision trees.

Page 55: Predicting Number of Software Defects and Defect

39

Figure 3-6. RF Classification Process.

The following is a summary overview of how RF works:

• RF random select subset of data from training dataset.

• RF random select number of independent variables.

• Build RF model. Develop multiple of decision trees to form a forest (the

trees are not pruned).

• Create a voting to determine the most accurate outcome of prediction

based on the observations from the trees.

Page 56: Predicting Number of Software Defects and Defect

40

3.3.1.2 Variable Importance and Feature Selection Objectives

RF can identify important variables by ranking (features) variables based on the

level of their importance. As shown in Figure 3-3, variable importance indicates that,

when more important variables are removed from the model, the error increases as

compared to when fewer important variables are removed. The main purpose of ranking

features is to add noise to each independent variable. The calculation of variable

importance of each independent variable in the RF algorithm is as follows:

1. Utilize out-of-bag (OOB) data to compute out-of-bag error (errorOOB1) for

every decision tree in the RF. Out-of-bag data is the data which is not used to

train the decision tree. OOB is usually estimated to be one-third of the original

data. The OOB data determines the decision tree performance and its

prediction error rate is OOB error (Gao et al., 2019).

2. For OOB data, randomly add noise to the independent variable (feature) X

and compute the OOB error and mark the error as errorOOB2 (Gao et al.,

2019).

3. The variable importance score Ix of variable X for N number of trees in RF is

calculated as follows (Gao et al., 2019):

�� = ∑ �������� !������� "# $%

& (7)

The purpose of selecting the feature (independent variable) randomly is to:

� Produce better accuracy of prediction model

� Develop faster model

Page 57: Predicting Number of Software Defects and Defect

41

� Determine independent variables that are highly correlated with dependent

variable

3.3.2 Support Vector Machine Model

Support vector machine (SVM) algorithm is usually used as a classification and

regression technique (Shuai et al., 2013). SVM is all about finding the hyperplane which

separate the objects that have different classes (Bowes et al., 2017; Prasad et al.,2015).

Hyperplane is a decision boundary or space which separate a space into two classes.

If there are more than two lines which separate the classes (refer to Figure 3-7),

the decision is to go for data points which are close to the other side then find maximum

hyperplane which has maximum margin from both sides. The blue and yellow data

points/coordinates that are close to hyperplane are knows as support vectors (Twala,

2011). For two-dimensional, hyperplane R2 is a line while for 3-dimensiona,l hyperplane

R3 is a plane.

Figure 3-7. 2-Dimensional Hyperplane and 3-Dimensional Hyperplane.

Page 58: Predicting Number of Software Defects and Defect

42

Hyperplane Rn is (n-1) dimensional space where n is the number of dimensions. The

margin maximizing hyperplane (Y) equation in the n dimension is:

Y = VO + V1X1 +V2X2 +V3X3 ….

Y = VO+ VTX where VT = ViTi

Y = b + VTX (8)

Vi = VO, V1, V2, V3 are the vectors

X= Variables

VO = b = biased term

Although SVM models are mainly intended for linear classification, they can be

used for non-linear classification through the use of kernel trick. Kernel is used to

transform non-linear space into linear space by mapping low dimension data to high

dimension. There are many types of kernel functions, such as linear, polynomial,

gaussian, and radial basis. Each kernel type is suitable for a specific domain.

For predictions, SVM models separate target variable data from predictors data

using optimal hyperplane. The SVM model then transforms target and predictor data into

higher dimensional feature boundary data. The type of kernel used in this research is

radial. Radial kernel is applied when data cannot be separated in linear form, and

therefore require non-linear decision boundaries.

Using non-linear kernels, it is possible to overfit the data, provided there is a

presence of many features. Overfitting happens when the model adapts too much on the

Page 59: Predicting Number of Software Defects and Defect

43

training data and performs well but fails to make an accurate prediction on unseen data.

This means models perform poorly on unseen data. The opposite of overfitting is

underfitting. Underfitting refers to when the model fails to adapt to training data and

performs poorly, on both training and unseen data, resulting in poor predictions.

SVM models are also accurate and widely used due to the following benefits:

1. SVM is used to separate linear and non-linear space using the kernel trick

very quickly. It is less likely to lead to overfitting due to the presence of

hyperplane and margin liens from both sides.

2. Its complexity with number of variables is linear: for example, if you have 40

variables, the moment you double the variables to 80, its complexity (time

taken for execution) also doubles.

3. SVM can work on small datasets. It is suitable for very large datasets with

non-linear separation because its complexity with the number of records (R)

in the dataset is not double, but R3. For example, if you have 50,000 records

and decide to double the records, SVM’s complexity will not double but will

be (50,000)3.

Page 60: Predicting Number of Software Defects and Defect

44

Chapter 4—Results

This chapter reports results of the models based on the four hypotheses. Each

section in this chapter addresses a research question, with the purpose of providing an

approach to minimize software defects and allocate resource efforts.

4.1 Analysis of Significant Predictors for Software Defects

In this section, findings regarding research question RQ1 , hypothesis H1 and the

associated preliminary model result are presented. The purpose of hypothesis H1 is to

determine the most significant predictors for number of defects using negative binomial

regression and random forest models. The decision to use negative binomial regression

was because the variance is much higher than the mean: hence, it has greater variability.

Thus, for negative binomial regression, any predictor that achieves a p-value of less than

or equal to 0.05 is considered statistically significant in predicting software defects.

Utilizing random forest partial dependence plots, it is also possible to determine

the relationship between individual predictor and target variable. The original dataset is

used to determine important predictors for software defects (see Appendix A). Research

question RQ1 and hypothesis H1 are listed as follows:

RQ1: Code size, total number of components delivered, number of developers

working on code components, total number of requirements, and total number of test

cases are the predictors influencing the number of defects. Which predictors are

significant in predicting the number of defects?

H1: Negative binomial regression and random forest can identify the most

important predictors for number of defects.

Page 61: Predicting Number of Software Defects and Defect

45

4.1.1 Defects Data Collection and Cleaning

Using the R tool, 202 rows of defect data with 9 variables were collected and imported

into RStudio. These variables were: project, release (month, day, and year), number of

defects, defect remediation time, number of components delivered, number of developers,

lines of code, requirements, and test cases. Project and release are categorical variables,

and were therefore excluded from the development of the models and its predictions.

After data cleaning, the dataset had six variables and 202 rows, since three columns were

removed. The removed columns were defect remediation time, project, and release.

4.1.2 Negative Binomial Regression Summary & Partial Dependency Plots

for Defects Significant Predictors

Using the defect dataset, named Dataset1 (see Appendix C), it was possible to run

the negative binomial regression model and determine the most significant predictors for

software defects. The glm.nb function from the R package called Modern Applied

Statistics with S (MASS) was used to build negative binomial regression. To achieve this,

variance and mean of defects were calculated first to determine if negative binomial or

Poisson regression was a more appropriate model to use.

According to the result, the variance of the number of defects is 35 and the mean

is 6. Since variance is much higher than the mean, this indicates an over dispersed count

outcome. As such, negative binomial regression was used in this praxis instead of

Poisson regression to determine significant predictors for number of defects. The

following is the summary result of the NBR model, using the whole dataset (as shown in

figure 4-1.

Page 62: Predicting Number of Software Defects and Defect

46

Figure 4-1. NBR Model Summary Result.

First, R tool was used to perform the call and display the deviance residuals. Next,

the regression coefficients for all the independent variables, standard error, z-value, and

p-value are displayed as shown in Figure 4-1. The variable Developer.s refers to TNOD.

TNOD has a coefficient of 2.193 × 10!� and p-value of 2× 10!�). The p-value of

TNOD is less than the significance code (0.05), which indicates that TNOD is

statistically significant.

The variable Requirements refers to TR and has a coefficient of 1.959 × 10!*�

and a p-value of 0.00122 which is less than significance code (0.05). TR is statistically

significant. The variable Test.Cases refers to total number of test cases (TTC) and has a

Page 63: Predicting Number of Software Defects and Defect

47

p-value of 0.00492.TTC is statistically significant since its p-value is less than 0.05.

According to measure of errors metrics for NBR, mean absolute error (MAE) is 4.191,

root mean square deviation (RMSE) is 6.783, mean squared error (MSE) is 46.013, and

mean absolute percentage error (MAPE) is 60.22. The measure of errors metrics for

random forest is listed as follows: MAE equal to 0.367, RMSE equal to 0.703, MSE

equal to 0.4937, MAPE equal to 11.197. Based on measure of errors metrics, random

forest is considered the best model for identifying the significant predictors for number of

defects prediction. The significant predictors for number of defects are lines of code

(LOC), number of components delivered (NOCD) and total number of test cases (TTC).

Figures 4-2, 4-3, and 4-4 represent random forest partial dependency plots for the

significant variables: LOC, NOCD, and TTC. Figure 4-2 shows that having more than

twenty thousand lines of code changes on a project is more likely to introduce more than

8 number of defects. The more the lines of code on a project, the more likely there is high

possibility of getting the leakage of defects due to high complexity on the source code.

Page 64: Predicting Number of Software Defects and Defect

48

Figure 4-2. Relationship between Number of Defects and LOC.

Figure 4-3 shows that delivering more than 100 components to the User

Acceptance Testing (UAT) region indicates a higher chance of introducing more than 8

defects on a tight schedule project with fewer resources(testers) with which to work on a

project.

Page 65: Predicting Number of Software Defects and Defect

49

Figure 4-3. Relationship between Number of Defects and NOCD.

Figure 4-4 indicates that executing more than 350 test cases that are critical to the

project requirements leads to more than six software defects.

Page 66: Predicting Number of Software Defects and Defect

50

Figure 4-4. Relationship between Number of Defects and TTC

Therefore, the conclusion regarding Hypothesis H1 is that: Random Forest can

identify the most important predictors for number of defects. The significant predictors

were lines of code (LOC), number of components delivered (NOCD) and total number of

test cases (TTC).

4.2 Analysis of Software Defect Prediction Model

Random forest and support vector machine models were run using all the

predictors for software defects, as referenced in Appendix B. The objective of building

the models was to determine the best fit model that could accurately predict the number

of software defects in the upcoming release. Data partition and prediction error measures

are computed to compare the models. In this section, question RQ2 and hypothesis H2 are

addressed.

Page 67: Predicting Number of Software Defects and Defect

51

RQ2: How can statistical learning models forecast the number of defects using code

size, number of developers working on code components, total number of components

delivered, total number of test cases and total number of requirements?

H2: Random Forest and Support Vector Machine models can be used to predict the

number of defects.

4.2.1 Data Partition for Defects Prediction

After the completion of data cleanup (see section 4.1.1), the next step was to

partition the data. The original Dataset1 was split into two subsets: testing and second

partition. The second partition contains training and validation data. Testing data include

30% of original Dataset1, and the remaining 70% was used for training and validation.

Table 4-1 shows data partition for predicting number of defects.

Table 4-1. Data Partition for Software Defects Prediction

Dataset Partition Values

Training 104

Validation 40

Testing 58

4.2.2 Variable Importance Plot Using Defects Data

According to variable importance plot, lines of code (LOC) is more important

than number of components delivered (NOCD) such that, when LOC is removed from the

model, the error increases more than when NOCD is removed (see Figure 4-5). Random

forest can determine significant predictors using variable importance plot. For prediction,

random forest uses all variables and subset of data randomly to generate decision trees.

Page 68: Predicting Number of Software Defects and Defect

52

Figure 4-5. Variable Importance Plot for Software Defects.

4.2.3 Development of Software Defects Prediction

M-try is the number of random selected predictors used at each decision tree.

Using the training dataset, it was possible to build the random forest and the support

vector machine. The m-try is 3 for random forest (see Figure 4-6 and Appendix C). This

indicates that only three predictors are randomly selected and used to split the decision

trees. Radial kernel was also used for the support vector machine to map data from low to

high dimensional space (see Figure 4-7 and Appendix C).

Page 69: Predicting Number of Software Defects and Defect

53

Figure 4-6. Random Forest Model Result for Defects Prediction.

Figure 4-7. Support Vector Machine Model Result for Defects Prediction.

4.2.4 Measures of Model Accuracy for Software Defects Prediction

Using the validation dataset, it was possible to determine the following measure

of model accuracy for software defect prediction using random forest and support vector

statistical learning models. The measures of model accuracy used were R-square, mean

absolute error (MAE), root mean square deviation (RMSE), mean squared error (MSE),

Page 70: Predicting Number of Software Defects and Defect

54

and mean absolute percentage error (MAPE). Table 4-2 shows a comparison of the error

measures and determination of the best model for software defects prediction. Regardless

of which measure of error accuracy was used, random forest was the best fit model for

predicting software defects with r-square of 85.9%, based on unseen data.

Table 4-2. Measure of Errors for Software Defect Prediction

Dataset Type Measure of Errors Random Forest Support Vector Machine

Training R-Squared 0.934 0.544

Validation MAE 0.739 1.86

MAPE 20.75% 35.74%

MSE 1.457 15.77

RMSE 1.2 3.97

Testing R-Squared 0.859 0.688

4.2.5 Results of Software Defects Prediction Model

NBR analysis can predict multiple number of software defects in one component

but it is not effective in predicting fault-prone modules (Yu, 2012). According to measure

of errors metrics for NBR, mean absolute error (MAE) is 4.48, root mean square

deviation (RMSE) is 6.62, mean squared error (MSE) is 43, and mean absolute

percentage error (MAPE) is 62.4 (see Appendix D). Based on the measure of errors

metrics for NBR and table 4-2 results, NBR is not as effective as random forest in

predicting number of software defects.

Page 71: Predicting Number of Software Defects and Defect

55

Using the best model random forest, it is possible to predict software defects using

unseen data, which is categorized as testing dataset to forecast number of defects in an

upcoming release. Figure 4-8 shows the actual (observed) and the predicted defects. In

addition, Figure 4-9 shows how far the predicted defects have deviated from the actual

defects (prediction error). For example, the first row shows that the actual and predicted

defects are 1 and 1 respectively. The residual can be estimated to 0 hours. The residual is

the difference between the predicted and the actual values.

Figure 4-8. Number of Predicted Defects vs. Actual Defects.

Page 72: Predicting Number of Software Defects and Defect

56

Figure 4-9. Actual Versus Predicted Defects Graph.

Therefore, the conclusion regarding Hypothesis H2 was that: Based on the

measure of error values (MAE, MAPE, RMSE, MSE, and R-Square), random forest was

the best model for predicting the number of defects with r-square of 85.9%, as compared

to the support vector machine, which had r-square of 68.8% based on unseen data (testing

dataset).

4.3 Analysis of Significant Predictors for Software Defect Remediation Time

In this section, hypothesis H3 and the model result are presented. The purpose of

hypothesis H3 is to determine the most significant predictors for defect remediation time

using multiple linear regression (MLR). Using MLR, any predictor that achieves a p-

value of less or equal to 0.05 is considered statistically significant in predicting defect

remediation time. The original dataset was used to determine important predictors for

defect remediation time (see Appendix A). This section addresses the research question

Page 73: Predicting Number of Software Defects and Defect

57

RQ3 and hypothesis H3 and provides results of data collection, data cleaning, and model

result.

RQ3: Code size, number of defects, total number of requirements, and total

number of test cases are the predictors influencing the defect remediation time prediction.

Which predictors are significant in predicting defect remediation time?

H3: Multiple linear regression can identify the important predictors for defect

remediation time.

4.3.1 Data Collection and Cleaning for Defect Remediation Time

Using R tool, 202 rows of defect remediation time data with nine variables were

collected and imported into RStudio. After data cleansing, the dataset had five variables

and 202 rows, since four columns are removed. The removed columns were: project,

release, number of developers, and components delivered. The predictors used for defect

remediation time prediction are number of defects, lines of code, requirements, and test

cases.

4.3.2 Multiple Linear Regression (MLR) Model Summary

Using the dataset named Dataset2 the multiple linear regression model was run to

determine the most significant predictors for software defect remediation time (see

Appendix C). The lm function from the MASS R package was used to build the multiple

linear regression model. The results of the MLR model are shown in Figure 4-10.

Page 74: Predicting Number of Software Defects and Defect

58

Figure 4-10. Multiple Linear Regression Model Result

First, the R tool was used to perform the call and display the deviance residual.

Next, the regression coefficients for all of the independent variables containing standard

error, x-value and p-value are displayed as shown on the above. The variable defects

refers to NOD. NOD has a coefficient of 2.823 and p-value of 0.00000000182. The p-

value of NOD is less than the significance code (0.05) which indicate NOD is statistically

significant. The R-square of MLR is 77% which indicate the measure of how close the

data points are fitted close to the regression line.

4.3.3 Partial Dependency Plots for Significant Predictor(s) of Defect

Remediation Time

Figure 4-11 shows the random forest partial dependency plot for NOD. According

to the PD plot, having more than seven defects in a highly complex system requires more

than 25 hours to fix. The higher the number of defects, the more the effort required to fix

the defects.

Page 75: Predicting Number of Software Defects and Defect

59

Figure 4-11. Relationship between Defect Remediation Time and NOD.

The conclusion regarding Hypothesis H3 was that: Multiple linear regression can

be used to identify the most important predictor for defect remediation time with the r-

square of 77%. The significant predictor is number of defects.

Page 76: Predicting Number of Software Defects and Defect

60

4.4 Analysis of Defect remediation Time Prediction Model

Random forest and support vector machine models were run using all the

predictors for defect remediation time (see Appendix B). The objective of building the

models was to determine the best fit model that could accurately predict the defect

remediation time in the upcoming release. Data partition and prediction error measures

were computed to compare the models. In this section, question RQ4 and hypothesis H4

are addressed.

RQ4: How can statistical learning models forecast the defect remediation time

using code size, number of defects, total number of requirements, and total number of test

cases?

H4: Random forest and support vector machine models can be used to predict

defect remediation time.

4.4.1 Data Partition for Defect Remediation Time Prediction

After the completion of data cleanup (see section 4.3.1), the next step was to

partition the data. The original Dataset2 was split into two subsets: testing and second

partition. The second partition contained training and validation data. Testing data

contained 30% of original Dataset2 and the remaining 70% was used for training and

validation. Table 4-3 shows the values of each dataset type.

Table 4-3. Data Partition for Defect Remediation Time Prediction

Dataset Partition Values

Training 103

Validation 40

Testing 59

Page 77: Predicting Number of Software Defects and Defect

61

4.4.2 Variable Importance Plot Using Defect Remediation Time Data

Number of Defects (NOD) was found to be more important than lines of code

(LOC) such that, when NOD is removed from the model, the error increases more than

when LOC is removed (see Figure 4-12).

Figure 4-12. Variable Importance Plot for Defect Remediation Time.

4.4.3. Development of Software Defect Remediation Time Prediction

Using the training dataset for DRT prediction indicated that only three predictors

were randomly selected and used to split the decision trees (see Figure 4-13 and

Appendix C). Radial kernel was used for the support vector machine.

Page 78: Predicting Number of Software Defects and Defect

62

Figure 4-13. Random Forest Model Result for DRT Prediction.

Figure 4-14. Support Vector Machine Model Result for DRT Prediction.

4.4.4 Model Accuracy Measures for Defect Remediation Time

Using the validation dataset, we can determine the following measure of model

accuracy for defect remediation time prediction based on random forest and support

vector statistical learning models (see Appendix D). The measures of model accuracy

used were R-square, mean absolute error (MAE), root mean square deviation (RMSE),

Page 79: Predicting Number of Software Defects and Defect

63

mean squared error (MSE), and mean absolute percentage error (MAPE). Table 4-4

shows a comparison of the measure of errors and the best model for defect remediation

time prediction. Random forest was the best fit model for predicting defect remediation

time with r-square of 70.9%.

Table 4-4. Measure of Errors for Defect Remediation Time Prediction

Dataset Type Measure of Errors Random Forest Support Vector Machine

Training R-Squared 0.83 0.49

Validation MAE 4.3 5.39

MAPE 95.86% 119.85%

MSE 29.367 59.28

RMSE 5.42 7.69

Testing R-Squared 0.709 0.392

4.4.5 Results of Defect Remediation Time Prediction Model

According to the best model random forest, it was possible to predict defect

remediation time using unseen data, which is categorized as the testing dataset, to

forecast the time it takes to fix defects in an upcoming release. Figure 4-15 shows the

actual (observed) and predicted defect remediation time (see Appendix C). In addition,

Figure 4-16 shows how far the predicted defect remediation time data deviated from the

actual defect remediation time (prediction error). For example, the first row shows the

Page 80: Predicting Number of Software Defects and Defect

64

actual and predicted defect remediation time are 1 and 3 hours respectively. The residual

can be estimated to 2 hours (3 minus 1 = 2).

Figure 4-15. Predicted Defect Remediation Time vs. Actual Defect Remediation

Time.

Page 81: Predicting Number of Software Defects and Defect

65

Figure 4-16. Actual vs. Predicted Defect Remediation Time Graph.

Therefore, the conclusion regarding hypothesis H4 was that: Based on the measure

of errors metrics (MAE, MAPE, RMSE, MSE, and R-Square), random forest was the best

model in predicting defect remediation time with r-square of 70.9%, as compared to

support vector machine, which had an r-square of 39.2% based on unseen data (testing

dataset). Table 4-5 shows the summary of the hypotheses tests results and answers to

research questions.

Page 82: Predicting Number of Software Defects and Defect

66

Table 4-5. Summary Table

Hypothesis

Number

Hypotheses Results Research Questions

Results

H1 RF can identify the most

important predictors for number

of defects

The significant predictors

for number of defects are

lines of code (LOC),

number of components

delivered (NOCD) and

total test cases (TTC)

H2 NBR is not as effective as RF in

predicting the number of software

defects

Using RF, it is possible

to predict number of

defects using unseen

data, which is

categorized as testing

dataset

H3 MLR can identify the most

important predictors for defect

remediation time

The significant predictor

for defect remediation

time is number of

defects

H4 SVM is not as effective as RF in

predicting the defect remediation

time

Using RF, it is possible

to predict defect

remediation time using

unseen data, which is

categorized as testing

dataset.

Page 83: Predicting Number of Software Defects and Defect

67

Chapter 5—Discussion and Conclusions

5.1 Discussion and Conclusions

The goal of the praxis was to predict software defects and defect remediation time

in order to minimize the leakage of defects and allocate rework efforts in upcoming

software release. In order to perform the forecast, extant literature on predictors for

software defects and defect remediation time was reviewed. The literature review

provided insight on the type of predictors to be applied in this praxis, as well as

methodologies (negative binomial regression, random forest, support vector machine, and

multiple linear regression) used to test the hypotheses. Previous studies have been

conducted to investigate the circumstances, prior to testing, under which developers,

requirements, test cases, lines of code, and components delivered tend to introduce

defects. However, none considered using the combined influence of all the predictors to

predict software defects and defect remediation time.

By analyzing the XYZ company dataset from the past 16 software releases

containing 202 software projects, significant predictors influencing the number of defects

and defect remediation time were identified. The predictors were considered significant

when the p-value of each variable was less than the significance code 0.05 (alpha). The

significant predictors influencing the number of defects were total number of test cases,

total number of developers, and total number of requirements . The significant predictor

for defect remediation time was number of defects. Partial dependency plots were also

applied to determine the marginal effect of the predictor(s) on the number of defects and

Page 84: Predicting Number of Software Defects and Defect

68

defect remediation time. The Partial dependency plots showed there was strong

correlation between the significant predictors and response variable.

The following summary shows the marginal effect of each significant predictor on

the specific response variable:

� If there are more than seven developers working on a single module or

project, the chances of having more than six defects is high, due to the lack of

developer component focus and ownership. At same time, having more than

seven developers working on the agile project can enable developers to find

more defects, which can minimize the leakage of defects in the release.

� Umar (2013) suggested that “If number of test cases are high and critical to

requirements, the chances [of] getting defects is high” (p. 742). The results of

this praxis indicate that executing more than 350 total number of test cases,

which are critical to project requirements, correlates with a high chance of

introducing more than six defects.

� A project with more than 40 number of requirements on a tight schedule has a

high chance of introducing more than six defects.

� If there are more than seven defects found from the complex systems, then the

total time to fix the defects will be more than 25 hours. Hence, the higher the

number of defects, the more effort is required to fix them.

The results of this praxis suggest that it is feasible to predict defect remediation time after

identifying the number of predicted defects. Therefore, it is recommended to forecast

number of defects and defect remediation time predictions at R-squares of 85.9% and

70.9%, respectively, using the random forest model.

Page 85: Predicting Number of Software Defects and Defect

69

5.2 Contributions to Body of Knowledge

1. Identifying the significant predictors (number of developers, total number of

requirements, total number of test cases, lines of code, and components

delivered) for software defects in order to analyze the effect of each

significant predictor on the number of defects.

2. Demonstrating a methodology for predicting software defects using the

combined influence of all the predictors (number of developers, total number

of requirements, total number of test cases, lines of code, and components

delivered) in order to minimize the leakage of defects in an upcoming

software release.

3. Identifying the significant predictors (number of defects, lines of code, total

number of test cases, and total number of requirements) of defect remediation

time in order to analyze the effect of each significant predictor on the defect

remediation time.

4. Demonstrating a methodology for predicting software defect remediation time

using the combined influence of all the predictors (number of defects, lines of

code, total number of test cases and total number of requirements) in order to

allocate rework efforts in an upcoming software release.

5.3 Recommendations for Future Research

According to the model results and predictors for software defects and defect

remediation time predictions, the following are the recommendations for future research

improvements:

Page 86: Predicting Number of Software Defects and Defect

70

� The data used in this praxis was based on COBOL projects, and the model

results are specific to mainframe application systems. Hence, extending the

model to work on Java, C++, PHP, Perl, and Python datasets will enable the

model to work on various technology platforms.

� Currently, the model predicts software defects and defect remediation time

prior to testing. Extending the model to predict defects and defect remediation

time prior to production install will help minimize the leakage of defects.

� Extending the model to include an acceptable number of software defects in

testing

� Considering that the current dataset has 202 projects, it is a relatively small

dataset. Extending the model to predict defects and defect remediation time

based on large dataset consisting of around 50,000 projects—like the standish

chaos database (The Standish Group, 2013)—will improve measures of model

performance and accuracy.

Page 87: Predicting Number of Software Defects and Defect

71

References

Akbarinasaji, S., Caglayan, B., & Bener, A. (2018). Predicting bug-fixing time: A

replication research using an open source software project. Journal of Systems

and Software, 136, 173–186. https://doi.org/10.1016/j.jss.2017.02.021

Bell, R., Ostrand, T., & Weyuker, E. (2013). The limited impact of individual developer

data on software defect prediction. Empirical Software Engineering, 18(3), 478–

505. https://doi.org/10.1007/s10664-011-9178-4

Bhardwaj, M., & Rana, A. (2015). Impact of size and productivity on testing and rework

efforts for web-based development projects. ACM SIGSOFT Software

Engineering Notes, 40(2), 1–4. https://doid.org/10.1145/2735399.2735404

Bird, C., Nagappan, N., Murphy, B., Gall, H., & Devanbu, P. (2011). Don’t touch my

code! Examining the effects of ownership on software quality. SIGSOFT/FSE

2011 - Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of

Software Engineering, 4–14. https://doi.org/10.1145/2025113.2025119

Bowes, D., Hall, T., & Petrić, J. (2017). Software defect prediction: do different

classifiers find the same defects? Software Quality Journal, 26(2), 525–552.

https://doi.org/10.1007/s11219-016-9353-3

Conroy, P. & Kruchten, P. (2012). Performance norms: An approach to rework reduction

in software development. Electrical & Computer Engineering (CCECE).

Dhiauddin, M., Suffian, M., & Ibrahim, S. (2012). A prediction model for system testing

defects using regression analysis. JSCSE, 2(7), 55–68.

https://doi.org/10.7321/jscse.v2.n7.6

Page 88: Predicting Number of Software Defects and Defect

72

Di Nucci, D., Palomba, F., De Rosa, G., Bavota, G., Oliveto, R., & De Lucia, A. (2018).

A developer centered bug prediction model. IEEE Transactions on Software

Engineering, 44(1), 5–24. https://doi.org/10.1109/TSE.2017.2659747

Eyolfson, J., Tan, L., & Lam, P. (2011). Do time of day and developer experience affect

commit bugginess. Proceedings - International Conference on Software

Engineering, 153–162. https://doi.org/10.1145/1985441.1985464

Fan, G., Diao, X., Yu, H., Yang, K., & Chen, L. (2019). Software defect prediction via

attention-based recurrent neural network. Scientific Programming, 2019, 1–14.

https://doi.org/10.1155/2019/6230953

Felix, E., & Lee, S. (2017). Integrated approach to software defect prediction. IEEE

Access, 5, 21524–21547. https://doi.org/10.1109/ACCESS.2017.2759180

Friedman, J.H. (2001). Greedy function approximation : A gradient boosting machine.

Annals of statistics,1189-1232. https://doi.org/10.1214/aos/1013203451

Gao, X., Wen, J., & Zhang, C. (2019). An improved random forest algorithm for

predicting employee turnover. Mathematical Problems in Engineering, 2019, 1–

12. https://doi.org/10.1155/2019/4140707

Geshwaree, H., & Ramdoo, V. (2015). A systematic research on requirement engineering

processes and practices in Mauritius. International Journal of Advanced Research

in Computer Science and Software Engineering, 5(2), 40–46.

Goel, B., & Singh, Y. (2011). An empirical analysis of metrics to predict the software

defect fix-effort. International Journal of Computers and Applications, 33(2).

https://doi.org/10.2316/journal.202.2011.2.202-2749

Page 89: Predicting Number of Software Defects and Defect

73

Harekal, D., and Suma, V. (2015). Implication of post production defects in software

industries. International Journal of Computer Applications, 109(17), 20–23.

https://doi.org/10.5120/19419-1032

Huda, S., Alyahya, S., Mohsin Ali, M., Ahmad, S., Abawajy, J., Al-Dossari, H., &

Yearwood, J. (2017). A framework for software defect prediction and metric

selection. IEEE Access, 6(99), 2844–2858.

https://doi.org/10.1109/ACCESS.2017.2785445

IEEE Software 2008 Editorial Calendar. (2008). IT Professional, 10(2), 18–18.

https://doi.org/10.1109/mitp.2008.30

Jadhav, R. B. (2019). A software defect learning and analysis utilizing regression method

for quality software development. International Journal of Advanced Trends in

Computer Science and Engineering, 1275–1282.

https://doi.org/10.30534/ijatcse/2019/38842019

Kim, D.-Y., & Youn, C. (2010). Traceability enhancement technique through the

integration of software configuration management and individual working

environment. Secure Software Integration and Reliability Improvement (SSIRI),

Fourth International Conference on IEEE.

Kumar, S., & Malik, K. (2019). Software metrics quality testing (SMQT) prediction

using logit regression model. International Journal of Computer Applications,

178(30), 1–4. https://doi.org/10.5120/ijca2019919114

Li, Z., Jing, X., & Zhu, X. (2018). Progress on approaches to software defect prediction.

Iet Software, 12(3), 161–175. https://doi.org/10.1049/iet-sen.2017.0148

Page 90: Predicting Number of Software Defects and Defect

74

Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to

learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.

https://doi.org/10.1109/TSE.2007.256941

Morozoff, E. (2010). Using a line of code metric to understand software rework. IEEE

Software, 27(1), 72–77. https://doi.org/10.1109/ms.2009.160

Osborne, J. W., & Waters, E. (2002). Four assumptions of multiple regression that

researchers should always test. Practical Assessment, Research & Evaluation,

8(2). https://doi.org/10.7275/r222-hv23

Ostrand, T., Weyuker, E., & Bell, R. (2010). Programmer-based fault prediction. ACM

International Conference Proceeding Series, 1–10.

https://doi.org/10.1145/1868328.1868357

Perreault, L. (2017). Using classifiers for software defect detection [Conference paper].

26th International Conference on Software Engineering and Data Engineering,

SEDE.

Posnett, D., D’Souza, R., Devanbu, P., & Filkov, V. (2013). Dual ecological measures of

focus in software development. Proceedings - International Conference on

Software Engineering, 452–461. https://doi.org/10.1109/ICSE.2013.6606591

Prasad, M. C. M., Florence, L. F., & Arya3, A. (2015). A research on software metrics

based software defect prediction using data mining and statistical learning

techniques. International Journal of Database Theory and Application, 8(3), 179–

190. https://doi.org/10.14257/ijdta.2015.8.3.15

Project Management Institute. (2008). A guide to the project management body of

knowledge (PMBOK® guide) (4th ed.). Author.

Page 91: Predicting Number of Software Defects and Defect

75

Pushphavathi, T. P., Suma, V., & Ramaswamy, V. (2014). A novel method for software

defect prediction: Hybrid of FCM and random forest. 2014 International

Conference on Electronics and Communication Systems (ICECS).

https://doi.org/10.1109/ecs.2014.6892743

Rahman, F., & Devanbu, P. (2011). Ownership, experience and defects: A fine-grained

research of authorship. Proceedings - International Conference on Software

Engineering, 491–500. https://doi.org/10.1145/1985793.1985860

Ramdoo, V. D., & Huzooree, G. (2015). Strategies to reduce rework in software

development on an organisation in Mauritius. International Journal of Software

Engineering & Applications, 6(5), 09–20. https://doi.org/10.5121/ijsea.2015.6502

Rubio, R. P. M. T., & Gulo, C. A. (2015). Characterizing developers’ rework on GitHub

open source projects. Doctoral Symposium in Informatics Engineering.

Shuai, B., Li, H., Li, M., Zhang, Q., & Tang, C. (2013). Software defect prediction using

dynamic support vector machine. 2013 Ninth International Conference on

Computational Intelligence and Security. https://doi.org/10.1109/cis.2013.61

The Standish Group. (2013). The chaos manifesto. The Standish Group.

Twala, B. (2011). Predicting software faults in large space systems using statistical

learning techniques. Defence Science Journal, 61(4), 306–316.

https://doi.org/10.14429/dsj.61.1088

Umar, S. N. (2013). Software testing defect prediction model - a practical approach.

International Journal of Research in Engineering and Technology, 2(5), 741–745.

https://doi.org/10.15623/ijret.2013.0205001

Page 92: Predicting Number of Software Defects and Defect

76

Williams, M., Grajales, C., & Kurkiewicz, D. (2013). Assumptions of multiple

regression: Correcting two misconceptions. Practical Assessment, Research and

Evaluation, 18(9), 1–14.

Yu, L. (2012). Using negative binomial regression analysis to predict software faults: A

research of Apache Ant. International Journal of Information Technology and

Computer Science, 4(8), 63–70. https://doi.org/10.5815/ijitcs.2012.08.08

Zahra, S., Nazir, A., Khalid, A., Raana, A., & Nadeem Majeed, M. (2014). Performing

inquisitive research of PM traits desirable for project progress. International

Journal of Modern Education and Computer Science, 6(2), 41–47.

https://doi.org/10.5815/ijmecs.2014.02.6

Zhang, H. (2009). An investigation of the relationships between lines of code and defects.

IEEE International Conference on Software Maintenance, ICSM, 274–283.

https://doi.org/10.1109/ICSM.2009.5306304

Zhao, Q., & Hastie, T. (2019). Causal Interpretations of Black-Box Models. Journal of

Business & Economic Statistics, 1–10.

https://doi.org/10.1080/07350015.2019.1624293

Page 93: Predicting Number of Software Defects and Defect

77

Appendix A—Dataset for Defects and Defect Remediation Time

The proprietary defect and defect remediation time data came from XYZ firm and

contained 202 COBOL projects and 16 releases starting from 2016 to 2019. The

following R source code and global environment provided data collection, cleaning

process for defect and defect remediation time prediction.

1. Source Code for Data import and cleanup:

#Import Data to RStudio:

Dataset <- readXL("C:/Users/T-sus/Desktop/data.xlsx", rownames=FALSE,

header=TRUE, na="", sheet="CRDB", stringsAsFactors=TRUE)

#Perform Data Cleanup Process:

Dataset1 <- subset(Dataset, select=c(Components.Delivered,Defects,Developer.s.,

Lines.of.Code,Requirements,Test.Cases))

Dataset2 <- subset(Dataset, select=c(Defect.Remediation.Time.In.hours,Defects,

Lines.of.Code,Requirements,Test.Cases))

� Global environment: Defects Prediction

According to the global environment, the dataset has 202 projects and 9 variables

before cleanup. After data cleansing the dataset had 6 variables and 202 projects since

Page 94: Predicting Number of Software Defects and Defect

78

three columns are removed for defects prediction. The removed columns were defect

remediation time, project and release.

For defect remediation time prediction, the dataset had 5 variables and 202

projects after data cleanup process. Four variables were removed for defect remediation

time prediction. The removed columns are project, release, number of developers and

components delivered.

Figure A-1. Defects.

Figure A-2. Defect Remediation Time.

Page 95: Predicting Number of Software Defects and Defect

79

Appendix B—Metrics for Defects and Defect Remediation time

Table B-1 represents the definition and abbreviation of each metric as applied on the

praxis.

Table B-1. Metrics Definitions and Abbreviations

Metrics Abbreviation Definition

Total Defect Remediation Time

DRT Total time to fix the defects and expressed in hours.

Total Number of Components Delivered

NOCD Total number of modules (programs) that are completed and delivered to a tester before the beginning of testing.

Code Size LOC Total source lines of code that are modified or added by a developer per the project scope.

Total Number of Developers

TNOD Number of programmers assigned to work on component (s) / project.

Total Number of Requirements

TR Total number of requirements are number of project tasks that need to be completed per business ask.

Total Number of Test cases

TTC Total number of test cases are the input variables or conditions that need to satisfy a requirement is working as expected.

Number of Defects

NOD Number of defects are errors in the source code which makes the software product to function in unintended ways yielding unexpected results.

Project PR Project is “a temporary endeavor undertaken to create a unique project service or result.” (PMI, 2008, p. 434)

Release (Year.Month.Day)

RL Software release is the process of developing and delivering the final product of the software application.

Page 96: Predicting Number of Software Defects and Defect

80

Tables B-2 and B-3 provide a list of predictors which were applied on this research for

predicting number of defects and defect remediation time.

Table B-2. Summary of Predictors for Software Defect Prediction

Metrics Type

Predictors Authors and Year of Research

Code Metrics

Code Size (LOC) Dhiauddin et al. (2012); Huda et al. (2017); Jing et al. (2018); Menzies et al. (2007); Zhang (2009)

Process Metrics

Total Number of Test Cases (TTC)

Dhiauddin et al. (2012); Umar (2013)

Process Metrics

Total Number of Requirements (TR)

Kumar & Malik (2019)

Process Metrics

Number of Components Delivered (NOCD)

Umar (2013)

Process Metrics

Number of Developers (NOD) Working On Code Components

Bell et al. (2013); Bird et al. (2011); Di Nucci et al. (2018); Ostrand et al. (2010); Posnett et al. (2013); Rahman & Devanbu (2011); Eyolfson et al. (2011)

Page 97: Predicting Number of Software Defects and Defect

81

Table B-3. Summary of Predictors for Defect Remediation Time Prediction

Metrics Type

Predictors Authors & Year Of Research

Code Metrics

Code Size (LOC) Dhiauddin et al. (2012); Goel & Singh (2011); Huda et al. (2017); Jing et al. (2018); Menzies et al. (2007); Zhang (2009)

Process Metrics

Total Number of Test Cases (TTC)

Dhiauddin et al. (2012); Ramdoo & Huzooree (2015); Umar (2013)

Process Metrics

Total Number of Requirements (TR)

Kumar & Malik (2019); Ramdoo & Huzooree (2015)

Process Metrics

Number of Defects (NOD)

Goel & Singh (2011)

Page 98: Predicting Number of Software Defects and Defect

82

Appendix C—Models Development & Results

This appendix addresses four research questions and hypotheses by providing the source

code and models results.

1. Analysis of Significant Predictors for Software Defects.

RQ1: Code size, total number of components delivered, number of developers

working on code components, total number of requirements and total number of

test cases are the predictors influencing the number of defects. Which predictors

are significant in predicting the number of defects?

H1: Negative binomial regression and random forest can identify the most

important predictors for number of defects.

� Source Code to build NBR model:

# Require MASS package before running glm.nb

library(MASS)

require(MASS)

# Build NBR model based on overall data

mymodel <- glm.nb(Defects ~ Components.Delivered + Developer.s. +

Requirements + Test.Cases + Lines.of.Code, data = Dataset1)

summary(mymodel)

Page 99: Predicting Number of Software Defects and Defect

83

Results: According to figure C-1, the significant predictors are total number of

developers TNOD, total number of requirements TR and total number of test

cases TTC.

Figure C-1. NBR Model Summary Result.

� Build Random Forest Model Using whole dataset

rf.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = dataset1, method = "rf",

importance = TRUE)

rf.fit

Page 100: Predicting Number of Software Defects and Defect

84

� Variable Importance Plot For Significant Predictors For Number of Defects

varImp(rf.fit)

Results: Using random forest variable importance plot, the significant predictors

for number of defects are lines of code, total number of test cases and number of

components delivered with overall percentage of 100, 78.91 and 73.42

respectively. According to measure of errors metrics for NBR, mean absolute

error (MAE) is 4.191, root mean square deviation (RMSE) is 6.783, mean squared

error (MSE) is 46.013, and mean absolute percentage error (MAPE) is 60.22. The

measure of errors metrics for random forest are listed as follow: MAE equal to

0.367, RMSE equal to 0.703, MSE equal to 0.4937, MAPE equal to 11.197.

Based on measure of errors metrics, random forest is considered the best model

for identifying the significant predictors for number of defects prediction.

2. Analysis of Software Defect Prediction Model.

RQ2: How can statistical learning models forecast the number of defects using code

size, number of developers working on code components, total number of

components delivered, total number of test cases and total number of requirements?

H2: Random Forest and Support Vector Machine models can be used to predict the

number of defects.

Page 101: Predicting Number of Software Defects and Defect

85

� Source Code to build RF and SVM model:

#Build Random Forest Model Using training dataset

rf.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "rf", importance

= TRUE)

rf.fit

#Build Support Vector machine model Using training dataset

svm.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements

+

Test.Cases + Lines.of.Code, data = training, method = "svmRadial")

svm.fit

# Number of Defects Prediction Using RF

Actualpredicted <- predict(rf.fit, newdata = testing)

PredictedDefects <- round(Actualpredicted, 0)

ActualDefects <- testing$Defects

View(data.frame(ActualDefects,PredictedDefects))

plot(ActualDefects,PredictedDefects)

Results: Figure C-2 represents the actual (observed) and the predicted defects. Random

Forest was considered the best model in predicting the number of defects with r-square of

85.9%

Page 102: Predicting Number of Software Defects and Defect

86

Figure C-2. Number of Predicted Defects vs. Actual Defects.

3. Analysis of Significant Predictors for Software Defect Remediation Time.

RQ3: Code size, number of defects, total number of requirements and total number

of test cases are the predictors influencing the defect remediation time prediction.

Which predictors are significant in predicting defect remediation time?

H3: Multiple linear regression can identify the important predictors for defect

remediation time.

Page 103: Predicting Number of Software Defects and Defect

87

� Source Code to build MLR model:

# Require MASS package before running lm

library(MASS)

require(MASS)

# Build Multiple linear regression and provide summary for significant predictors

LinearModel.1 <- lm (Defect.Remediation.Time.In.hours ~ Defects +

Requirements + Test.Cases + Lines.of.Code, data = Dataset2)

summary(LinearModel.1)

� Results: According to figure C-3, the significant predictor for Defect

Remediation time is number of effects NOD).

Figure C-3. Multiple Linear Regression Model Result.

Page 104: Predicting Number of Software Defects and Defect

88

4. Analysis of Defect Remediation Time Prediction Model.

R4: How can statistical learning models forecast the defect remediation time using

code size, number of defects, total number of requirements and total number of test

cases?

H4: Random Forest and Support Vector Machine models can be used to predict

defect remediation time.

� Source Code to build RF and SVM model:

#Build Random Forest Model Using training dataset

rf.fit <- train(Defect.Remediation.Time.In.hours ~ Defects + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "rf", importance

= TRUE)

rf.fit

#Build Support Vector machine model Using training dataset

svm.fit <- train(Defect.Remediation.Time.In.hours ~ Defects + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "svmRadial")

svm.fit

# Number of Defects Prediction Using RF

Actualpredicted <- predict(rf.fit, newdata = testing)

PredictedDefectsTime <- round(Actualpredicted, 0)

ActualDefectsTime <- testing$Defect.Remediation.Time.In.hours

Page 105: Predicting Number of Software Defects and Defect

89

View(data.frame(ActualDefectsTime,PredictedDefectsTime))

plot(ActualDefectsTime,PredictedDefectsTime)

Results: Figure C-4 represents the actual (observed) and the predicted defect remediation

time Random Forest was considered the best model in predicting the defect remediation

time with r-square of 70.9%

Figure C-4. Predicted Defect Remediation Time vs. Actual Defect Remediation

Time.

Page 106: Predicting Number of Software Defects and Defect

90

Appendix D—Measures of Model Performance

This appendix provides the results of measure of errors from defects and defect

remediation time prediction. The source code and results are displayed as follows.

� Source Code for RF and SVM- Defects.

#Using Training Dataset

RF:

rf.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "rf", importance =

TRUE)

rf.fit

SVM:

svm.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "svmRadial") #

svmLinear svmRadial

svm.fit

NBR:

mymodel <- glm.nb(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = training)

Page 107: Predicting Number of Software Defects and Defect

91

summary(mymodel)

#Using Validation Dataset

RF:

residuals.rf <- predict(rf.fit, newdata = validation) - validation$Defects

MSE.rf <- (mean(residuals.rf^2))

RMSE.rf <- sqrt(mean(residuals.rf^2))

MAE.rf <- mean(abs(residuals.rf))

residuals3.rf <-(validation$Defects - predict (rf.fit, newdata = validation))

MAPE.rf <- mean(abs((residuals3.rf/validation$Defects)*100),na.rm= TRUE)

SVM:

residuals.svm <- predict(svm.fit, newdata = validation) - validation$Defects

MSE.svm <- (mean(residuals.svm^2))

RMSE.svm <- sqrt(mean(residuals.svm^2))

MAE.svm <- mean(abs(residuals.svm))

residuals3.svm <-(validation$Defects - predict(svm.fit, newdata = validation))

MAPE.svm <- mean(abs((residuals3.svm/validation$Defects)*100),na.rm= TRUE)

Page 108: Predicting Number of Software Defects and Defect

92

NBR:

residuals.nb <- predict(mymodel, newdata = validation) - validation$Defects

residuals3.nb <-(validation$Defects - predict (mymodel, newdata = validation))

MAPE.nb <- mean(abs((residuals3.nb/validation$Defects)*100),na.rm= TRUE)

MSE.nb <- (mean(residuals.nb^2))

RMSE.nb <- sqrt(mean(residuals.nb^2))

MAE.nb <- mean(abs(residuals.nb))

#Using Testing Dataset

RF:

rf.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = testing, method = "rf", importance =

TRUE)

rf.fit

SVM:

svm.fit <- train(Defects ~ Components.Delivered + Developer.s. + Requirements +

Test.Cases + Lines.of.Code, data = testing, method = "svmRadial") #

svmRadial

svm.fit

Page 109: Predicting Number of Software Defects and Defect

93

� Results:

According to measure of errors metrics for NBR, mean absolute error (MAE) is 4.48,

root mean square deviation (RMSE) is 6.62, mean squared error (MSE) is 43, and mean

absolute percentage error (MAPE) is 62.4. Using measure of errors metrics for NBR and

Table D-1, the result shows random forest was the best fit model for predicting defects

with r-square of 85.9%.

Table D-1. Measure of Errors for Software Defects Prediction

Dataset Type Measure of Errors Random Forest Support Vector Machine

Training R-Square 0.934 0.544

Validation

MAE 0.739 1.86

MAPE 20.75% 35.74%

MSE 1.457 15.77

RMSE 1.2 3.97

Testing R-Square 0.859 0.688

� Source Code for RF and SVM- Defect Remediation Time.

#Using Training Dataset

RF:

rf.fit <- train(Defect.Remediation.Time.In.hours ~ Defects + Requirements +

Test.Cases + Lines.of.Code, data = training, method = "rf", importance =

TRUE)

rf.fit

Page 110: Predicting Number of Software Defects and Defect

94

#Using Validation Dataset

RF:

residuals.rf <- predict(rf.fit, newdata = validation) -

validation$Defect.Remediation.Time.In.hours

residuals4.rf <-((validation$Defect.Remediation.Time.In.hours - predict(rf.fit, newdata =

validation))/validation$Defect.Remediation.Time.In.hours)*100

MSE.rf <- (mean(residuals.rf^2))

RMSE.rf <- sqrt(mean(residuals.rf^2))

MAE.rf <- mean(abs(residuals.rf))

SVM:

residuals.svm <- predict(svm.fit, newdata = validation) -

validation$Defect.Remediation.Time.In.hours

MSE.svm <- (mean(residuals.svm^2))

RMSE.svm <- sqrt(mean(residuals.svm^2))

MAE.svm <- mean(abs(residuals.svm))

residuals4.svm <-((validation$Defect.Remediation.Time.In.hours - predict(svm.fit,

newdata = validation))/validation$Defect.Remediation.Time.In.hours)*100

MAPE.svm <-mean(abs(residuals4.svm))

Page 111: Predicting Number of Software Defects and Defect

95

#Using Testing Dataset

RF:

rf.fit <- train(Defect.Remediation.Time.In.hours ~ Defects + Requirements +

Test.Cases + Lines.of.Code, data = testing, method = "rf", importance =

TRUE)

rf.fit

SVM:

svm.fit <- train(Defect.Remediation.Time.In.hours ~ Defects + Requirements +

Test.Cases + Lines.of.Code, data = testing, method = "svmRadial") #

svmRadial

svm.fit

Page 112: Predicting Number of Software Defects and Defect

96

� Results:

Table D-2 compares the measure of errors and determine the best model for defect

remediation time prediction. Random forest was the best fit model for predicting defect

remediation time with r-square of 70.9%.

Table D-2. Measure of Errors for Defect Remediation Time Prediction

Dataset Type Measure of Errors Random Forest Support Vector Machine

Training R-Square 0.83 0.49

Validation

MAE 4.3 5.39

MAPE 95.86% 119.85%

MSE 29.367 59.28

RMSE 5.42 7.69

Testing R-Square 0.709 0.392