a framework for software defect prediction and metric...

15
SPECIAL SECTION ON SOFTWARE STANDARDS AND THEIR IMPACT IN REDUCING SOFTWARE FAILURES Received November 4, 2017, accepted December 10, 2017, date of publication December 27, 2017, date of current version February 14, 2018. Digital Object Identifier 10.1109/ACCESS.2017.2785445 A Framework for Software Defect Prediction and Metric Selection SHAMSUL HUDA 1 , SULTAN ALYAHYA 2 , MD MOHSIN ALI 3 , SHAFIQ AHMAD 4 , JEMAL ABAWAJY 1 , HMOOD AL-DOSSARI 2 , AND JOHN YEARWOOD 1 1 School of IT, Deakin University Melbourne Burwood Campus, Burwood, VIC 3125, Australia 2 Information Systems Engineering Research Group, King Saud University, Riyadh, Saudi Arabia 3 Australian National University, Canberra, ACT 0200, Australia 4 Department of Industrial Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia Corresponding author: Shafiq Ahmad (ashafi[email protected]) This work was supported by the Deanship of Scientific Research at King Saud University under Grant RG-1436-039. ABSTRACT Automated software defect prediction is an important and fundamental activity in the domain of software development. However, modern software systems are inherently large and complex with numerous correlated metrics that capture different aspects of the software components. This large number of correlated metrics makes building a software defect prediction model very complex. Thus, identifying and selecting a subset of metrics that enhance the software defect prediction method’s performance are an important but challenging problem that has received little attention in the literature. The main objective of this paper is to identify significant software metrics, to build and evaluate an automated software defect prediction model. We propose two novel hybrid software defect prediction models to identify the significant attributes (metrics) using a combination of wrapper and filter techniques. The novelty of our approach is that it embeds the metric selection and training processes of software defect prediction as a single process while reducing the measurement overhead significantly. Different wrapper approaches were combined, including SVM and ANN, with a maximum relevance filter approach to find the significant metrics. A filter score was injected into the wrapper selection process in the proposed approaches to direct the search process efficiently to identify significant metrics. Experimental results with real defect-prone software data sets show that the proposed hybrid approaches achieve significantly compact metrics (i.e., selecting the most significant metrics) with high prediction accuracy compared with conventional wrapper or filter approaches. The performance of the proposed framework has also been verified using a statistical multivariate quality control process using multivariate exponentially weighted moving average. The proposed framework demonstrates that the hybrid heuristic can guide the metric selection process in a computationally efficient way by integrating the intrinsic characteristics from the filters into the wrapper and using the advantages of both the filter and wrapper approaches. INDEX TERMS Software metric, defect prediction, metric heuristics. I. INTRODUCTION Enterprise software spending worldwide amounted to $3.8 trillion in 2014 [1] with quality assurance and testing accounting for 23% of the spending [2]. The significant amount of spending allocated to software testing shows the importance and criticality of software testing in software development life cycle (SDLC). However, modern software systems are inherently large and complex which makes them highly prone to defects. Different entities like processes, products and resources of the software systems are being continuously updated. Changes in resources are also frequent as developers move from project to project or adopt new tools. This makes the software systems not only large and complex but also highly evolving. As a result, setting measurable targets and ensuring the quality of such complex and highly evolving software systems is very challenging. Software engineers’ ultimate aim is to release bug-free high quality software to end users. Unfortunately, defects are inevitable in large-scale software systems thus mechanisms are required for locating and correcting the faults. Although software testing is one of the popular techniques commonly adopted to reduce software defects and thereby ensure the quality of the system, it requires the execution of exhaus- tive test cases which become quite tedious and expensive. 2844 2169-3536 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 6, 2018

Upload: others

Post on 06-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

SPECIAL SECTION ON SOFTWARE STANDARDS AND THEIR IMPACT INREDUCING SOFTWARE FAILURES

Received November 4, 2017, accepted December 10, 2017, date of publication December 27, 2017,date of current version February 14, 2018.

Digital Object Identifier 10.1109/ACCESS.2017.2785445

A Framework for Software DefectPrediction and Metric SelectionSHAMSUL HUDA 1, SULTAN ALYAHYA2, MD MOHSIN ALI3, SHAFIQ AHMAD 4,JEMAL ABAWAJY1, HMOOD AL-DOSSARI2, AND JOHN YEARWOOD11School of IT, Deakin University Melbourne Burwood Campus, Burwood, VIC 3125, Australia2Information Systems Engineering Research Group, King Saud University, Riyadh, Saudi Arabia3Australian National University, Canberra, ACT 0200, Australia4Department of Industrial Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia

Corresponding author: Shafiq Ahmad ([email protected])

This work was supported by the Deanship of Scientific Research at King Saud University under Grant RG-1436-039.

ABSTRACT Automated software defect prediction is an important and fundamental activity in the domain ofsoftware development. However, modern software systems are inherently large and complex with numerouscorrelated metrics that capture different aspects of the software components. This large number of correlatedmetrics makes building a software defect prediction model very complex. Thus, identifying and selecting asubset of metrics that enhance the software defect prediction method’s performance are an important butchallenging problem that has received little attention in the literature. The main objective of this paperis to identify significant software metrics, to build and evaluate an automated software defect predictionmodel. We propose two novel hybrid software defect prediction models to identify the significant attributes(metrics) using a combination of wrapper and filter techniques. The novelty of our approach is that itembeds the metric selection and training processes of software defect prediction as a single process whilereducing the measurement overhead significantly. Different wrapper approaches were combined, includingSVM and ANN, with a maximum relevance filter approach to find the significant metrics. A filter scorewas injected into the wrapper selection process in the proposed approaches to direct the search processefficiently to identify significant metrics. Experimental results with real defect-prone software data sets showthat the proposed hybrid approaches achieve significantly compactmetrics (i.e., selecting themost significantmetrics) with high prediction accuracy compared with conventional wrapper or filter approaches. Theperformance of the proposed framework has also been verified using a statistical multivariate quality controlprocess using multivariate exponentially weighted moving average. The proposed framework demonstratesthat the hybrid heuristic can guide the metric selection process in a computationally efficient way byintegrating the intrinsic characteristics from the filters into the wrapper and using the advantages of boththe filter and wrapper approaches.

INDEX TERMS Software metric, defect prediction, metric heuristics.

I. INTRODUCTIONEnterprise software spending worldwide amounted to$3.8 trillion in 2014 [1] with quality assurance and testingaccounting for 23% of the spending [2]. The significantamount of spending allocated to software testing shows theimportance and criticality of software testing in softwaredevelopment life cycle (SDLC). However, modern softwaresystems are inherently large and complex which makes themhighly prone to defects. Different entities like processes,products and resources of the software systems are beingcontinuously updated. Changes in resources are also frequentas developersmove from project to project or adopt new tools.

This makes the software systems not only large and complexbut also highly evolving. As a result, setting measurabletargets and ensuring the quality of such complex and highlyevolving software systems is very challenging.

Software engineers’ ultimate aim is to release bug-freehigh quality software to end users. Unfortunately, defects areinevitable in large-scale software systems thus mechanismsare required for locating and correcting the faults. Althoughsoftware testing is one of the popular techniques commonlyadopted to reduce software defects and thereby ensure thequality of the system, it requires the execution of exhaus-tive test cases which become quite tedious and expensive.

28442169-3536 2017 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

VOLUME 6, 2018

Page 2: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

Moreover, testing is limited to the case before softwarerelease due to limited resources, usage scenarios, executionenvironments and time compared to actual operational situ-ations. No structural testing strategy can guarantee enoughsoftware testing to completely eliminate bugs and experimen-tal studies on test effectiveness ratios show that it is typicallyno better than 40% for statement coverage [3]. This makeseffective prediction of defect-prone components of softwarean essential part of modern software engineering. However,locating and correcting a software defect following deploy-ment is 100 times more expensive than repairing it during thedevelopment process [4]. Moreover, delayed prediction of thefault allows it to propagate to subsequent development stagesand complicate the whole prediction process. Similarly, pre-diction in early stages is challenging due to the unavailabilityof its own failure data at those stages [5].

Generally, internal and external metrics are used as soft-ware quality measures to designate the level of softwarereliability. The internal metrics are measured by consideringonly the source code of the software whereas external metricsare measured by considering only the behaviour or function-alities of the software [2]. Software engineers need the keyinternal metrics causing the defects at an early developmentstage to predict the key metrics of entities of later life-cyclesfor monitoring and controlling the products. They also needto track changes across different entities like process, productand resources as different versions are released. For example,changes may happen in the design, code, regression tests, testrequirements, processes and resources from one release to thenext. Changes in the key attributes of different entities canpotentially indicate problems in design, or a fall in qualityand reliability. This, in turn can help to identify the modulesat the development stage that are most likely to be error-prone or difficult to maintain and test later on. This requiresthe measurement of a large number of internal metrics of dif-ferent entities. However, it is very hard and expensive for thedevelopers to monitor, measure, analyze and track all metricsof the different entities of highly evolving software systemsat every release to ensure its quality. Current literature ondefect prediction shows limited capability in achieving a lessexpensive measurement process [6], [7].

Thus, automated prediction of defect-prone components ofsoftware is an important and fundamental activity for highlydynamic software systems in the course of the software devel-opment process. Many approaches that use standard machinelearning (ML) techniques such as naïve bayes (NB) [8],support vector machines (SVM) [9], decision trees [10], andneural networks [11] have been proposed for software defectprediction based on themeasure of internal metrics and defectdata from similar projects [12]–[14] or earlier releases toconstruct defect prediction models.

Filter [15] and wrapper [16] approaches both are usedin metric selection. Filter models use heuristics on thedataset to reduce the overall search space. Filter approachesdo not consider performance evaluation during their met-ric ranking process as that is usually done later. Therefore,

performance of the selected subset in fault prediction may notbe guaranteed all the time. Also, performance can be affecteddue to independent feature evaluation without consideringtheir interaction. On the other hand, wrapper approachesuse an induction algorithm to evaluate the performance ofa selected feature subset. Typically, for an m-dimensionalmetric dataset, a wrapper approach needs to search fromthe 2m subset space to find the most significant subsetof metrics. Therefore, it is computationally expensive [17]due to the exponential search space. While the wrapperapproaches consider feature dependencies, they show a risk ofover-fitting [17], [18].

Although software defect prediction models have been ofconsiderable interest to researchers and practitioners overthe past two decades; to the best of our knowledge, thereare limited research and methods available in the currentliterature to determine the key metrics for efficient auto-mated monitoring and controlling to achieve a high qualityproduct at every release. As the selection of high qualitymetrics is imperative for developing a highly accurate soft-ware defect prediction mechanism, a reliable, low-overhead,and best-performing defect prediction model is an urgentrequirement [19].

In this article, we propose a hybrid wrapper-filter approachfor software faults prediction and metric selection. The nov-elty of the proposed approach is that we introduced differentwrapper heuristics and hybridized with the filter approaches.Two wrapper heuristic based hybrid models were proposedincluding SVM hybrid heuristics and Artificial Neural Net-work (ANN) hybrid heuristics, and then the hybrids havebeen applied to find the significant metrics from the softwaredefect data. This avoids the impact of any bias by the wrapperto find the best subset of metrics. In summary, we make thefollowing contributions:• We formulate the defect prediction and key metric selec-tion problem as a single optimization problem.

• We propose a novel framework for identifying signifi-cant metrics using a combination of different wrapperand filter techniques to minimize the search space. Thenovelty of our approach is that it embeds the metricselection and training processes of software defect pre-diction into a single process while reducing themeasure-ment overhead significantly.

• A hybrid model that couples SVM and ANN witha maximum relevance (MR) filter for high predictionaccuracies;

• Two different wrappers and corresponding wrapperscores were been used with a filter score to justify theperformance of the hybrid algorithms.

The rest of the article is organized as follows. The nextsection presents the background terminology on softwaremetrics and related work on fault prediction. Section IIIprovides a mathematical formulation of the metric selectionproblem and discusses the proposed framework for metricselection and fault prediction and their related algorithms.Experimental analyses using real software fault data and the

VOLUME 6, 2018 2845

Page 3: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

results of using the proposed framework has been presentedin section IV. The conclusion and future work is discussed inthe last section.

II. BACKGROUND AND RELATED WORKIn this section, we present some important software metricsthat are used for the efficient design of a defect predictor, andsome fault prediction research work closely related to ours.

A. SOFTWARE METRICSMeasurements are needed in software products for thepurpose of quality assurance, performance, debugging, man-agement, and estimating costs. These also play a vital rolein discovering defects in software components. Measure-ments are usually carried out for software metrics. Differentsoftware metrics are used for defect prediction. In objectoriented (OO) software, design coupling based metricssuch as coupling between objects (CBO), response for aclass (RFC), message passing coupling (MPC), andinformation-flow-based coupling (ICP) play important rolesin defect prediction [14]. A systematic review targetingsoftware defect prediction conducted in [13] showed thatthe most correlated OO metrics are CBO, RFC, and LOC.Furthermore, Chidamber and Kemerer’s (CK) and traditionalsource code metrics were proved to be useful for softwaredefect prediction in [20].

The description ofmetrics is available in several literatures.However, to make our article independent, below we summa-rize some of the important software metrics.

1) LINES OF CODE (LOC) METRICThis is a very basic software metric, that includes the numberof executable source instructions excluding comments, blanklines, and any non-executable lines.Many studies have showna rough correlation between the LOC metric and the numberof defects in software [21].

2) THE MCCABE CYCLOMATIC COMPLEXITY (MCC) METRICThis metric measures the amount of independent decisionlogic through a program’s source code, which represents howcomplex the source code is to test. It may also be applied toindividual functions, modules, methods or classes within aprogram. It is proved to be an important metric for predictingsoftware defects [22].

Control flow graph G can be used to measure theMCC metric of a software product where• Each node corresponds to a block of sequential code, and• Each edge corresponds to a path created by a decision.

It can be calculated by the following equation

v(G) = e− n+ 2p, (1)

where• e = number of edges in the graph,• n = number of nodes in the graph, and• p = number of connected module components in thegraph

As for example, the MCC calculated from the control flowgraph of Fig. 1 is v(G) = 5 (where e = 8, n = 7, and p = 2).That means, it needs to test five independent decision logicsto test all of the functionality.

Fig. 1. A control flow graph.

3) THE MCCABE ESSENTIAL COMPLEXITY (MEC) METRICThis metric measures the amount of poorly-structured logicthrough a program’s source code. It iteratively reduces thestructured control sequences (e.g. if-then-else, for loops, andwhile loops) into a single entry point and a single exit pointto provide a measure of how well a software program isstructured. While a program with a high MCC is difficultto test, a low MEC of the same program allows a program-mer to break up the program into some smaller functionsto test the individual functions easily. Programs or func-tions with a high MEC means that they are hard tomaintain.

Control flow graph G, as used for measuring theMCC metric, is used for measuring the MEC metric ev(G).Nodes 2, 3, 4, 5, and 6 of the control flow graph shownin Fig. 1 could be iteratively reduced to a single node. So,ev(G) = 1. That means, managing the software/programsource code relating to this flow graph is easy.

4) THE MCCABE MODULE DESIGNCOMPLEXITY (MMDC) METRICThis metric measures the amount of decision logic involvedwith subroutine calls by a module. It quantifies how muchsecurity testing is required if the module is integrated withthe rest of the system. A study in [23] suggests that theMMDC metric could be used for software defect prediction.

This is calculated as an MCC metric by reducing thecontrol flow graph G by removing the nodes representingthe decision logic that does not impact the calling control ofthe module over its subroutines. For example Fig. 2 showsa control flow graph G for the corresponding program code.Nodes 8 to 12 ofG could be reduced to a single node that doesnot have any impact on the calling control of the module overits subroutines subA() and subB(), and thus the MMDCof G is iv(G) = 3.

2846 VOLUME 6, 2018

Page 4: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

Fig. 2. Program source code with control flow graph.

5) HALSTEAD METRICSHalstead considers some primitive parameters of the imple-mented software product or program to measure its length,effort, development time, and other factors. The primitiveparameters are as follows [24].• n1 = the number of operators (distinct ),• n2 = the number of operands (distinct ),• N1 = the total number of operators, and• N2 = the total number of operands.

Based on the above parameters, several Halstead measure-ments can be calculated as follows [24].• Program vocabulary: n = n1 + n2,• Program length: N = N1 + N2,• Estimated program length: N = n1 log2 n1 + n2 log2 n2,• Program volume : V = N × log2 n,• Program difficulty: D = n1

2 ×N2n2,

• Programming effort: E = D× V , and• Estimated time: T = E

S seconds, where S is the speed ofmental discriminations, which is recommended as 18.

6) OBJECT ORIENTED METRICSObject oriented (OO) metrics are one of the most widely usedmetrics for predicting defects in software products [13]. Someof the commonly used OO metrics are as follows.• Coupling between objects (CBO) [25]: When a classuses methods or instance variables defined by the otherclass, these two classes are then said to be coupled. It isdefined as the number of classes to which a class iscoupled.

• Response for a class (RFC) [25]: An object of a classinvokes a set of functions of the class, and each ofthe methods may directly call several remote methods.RFC of a class is defined as the sum of the number of

methods in the class and the number of external methodsdirectly called by those methods.

• Message passing coupling (MPC) [26]: MPC is defineas the total number of call statements identified in aclass or the number of messages passed between objectswithin the local method of a class.

• Data abstraction coupling (DAC) [26]: DAC is definedas the total number of other class types defined as anattribute in a class. DAC excludes the count of primitivetypes, inherited types from the base class, and systemtypes.

B. RELATED WORKA systematic review, primarily intended for software faultprediction (SFP) using ML techniques, of 64 primary studieswith seven categories from January 1991 to October 2013,was presented in [13]. The author showed that theMLmodelsare superior than the traditional statistical models to classifysoftware modules as either defective or non-defective. It wasidentified in the study that (a) the most widely used ML tech-niques for the prediction of software faults are C4.5, naïvebayes (NB), multilayer perceptron (MLP), support vectormachines (SVM), and random forest (RF), (b) the most com-monly used metric selection technique is the filter approachof correlation-based feature selection, (c) the most frequentlyused metrics are the procedural metrics, (d) the most usefulOOmetrics are CBO, RFC, and LOC, (e) the most frequentlyused dataset is theNASAdataset, and (f) themost widely usedperformancemeasures are the accuracy, precision, recall, areaunder a receiver operating characteristic curve (AUC), andF-Measure.

Another systematic literature reviewwas carried out in [20]on 106 primary studies to determine which are the mostwidely used metrics for software defect prediction. It isobserved from the study that 49% of the metrics usedare OO (Chidamber and Kemerer’s (CK) metrics are amongthe highest), 27% are traditional source code metricsor 24% are process metrics.

A fuzzy logic based defect prediction model which usesonly the most important reliability metrics to predict faultsat each phase of the SDLC was presented in [5]. The metricselection process was carried out through an expert opin-ion, and so the evaluation may be biased due to this non-automated process. As training a classifier with the sameproject datasets performed better than its counterpart [27],involving an expert may lead to the selection of key met-rics from different projects. Abaei et al. [28], on the otherhand, proposed a semi-supervised hybrid self-organizingmap (HySOM) model as a combination of self-organizingmap (SOM) and ANN, which eliminates the strong require-ments of software quality experts and historical data. In ourcase, expert knowledge is not required to select the keymetrics. Rather, the model is able to select those as a part oflabeling modules as either defective or non-defective.

Hassan [29] used changes in software components to pre-dict defects on those components. A successful attempt of

VOLUME 6, 2018 2847

Page 5: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

applying the adaptive neuro fuzzy inference system (ANFIS)model which uses expert knowledge in terms of a number offuzzy sets and types of membership functions at the trainingand learning to predict software faults was presented in [30].

An ML-based software fault prediction model and theeffort estimationmodel are proposed in [31]. A rule extractionalgorithm was used to develop the decision trees to under-stand how automated prediction was made. An association-rule mining based approach was proposed in [32] whichutilized the software defect patterns to find out the actionpatterns responsible for the software defects.

The influence of dataset size and metrics set on the perfor-mance of the software fault prediction showed that the ran-dom forest (RF) and naïve bayes (NB) techniques outperformas a software fault predictor for the large and small datasets,respectively, for the method-level metrics (irrespective of thepresence of the derived Halstead metrics) [6]. On the otherhand, fault prediction using the class-level metrics, whethercorrelation-based metric selection applied to it or not, showsthat NB is the best fault predictor. While these techniqueswere evaluated based on the metric types and size, they wereunable to make any conclusion about the minimum sizedsignificant metrics for the defect prediction, which is the caseof our model.

A comparative study between the software design and codemetrics to evaluate their effectiveness in predicting faults onsoftwaremodules showed that using only the designmetric onthe very early stages of the software development cycle couldbe a better policy of predicting faults as early as possible [7].One of the main drawbacks of this work is that the evaluationwas carried out only on a single dataset, which may showbiased results in the final defect predictionmodel. In our case,we used different types of datasets for our evaluation. More-over, we not only predicted defects on software modules, butalso selected the key metrics.

Feature selection and software defect classification tech-niques were proposed in [12]. It was shown that the perfor-mance of this classifier can be further improved if only theimportant features, rather than all the features, are applied.A comparison of search space optimization was made whichshowed that greedy forward elimination (GFE) is efficientthan correlation-based forward elimination. However searchspace optimization approach is based on only filter heuristictechniques whereas in our approach we used filter, wrapperand their hybrid techniques along with the sequential elimi-nation techniques.

A general framework for software defect prediction wasproposed and evaluated in [33]. The key objective of thatframework was to solve the potential bias in the baselineexperiments of [19]. The main components of the frameworkwere evaluation and prediction. In the evaluation stage, dif-ferent learning techniques were analyzed to choose the bestsubset of metrics from the historical data. In the predictionstage, this subset of metrics was used to train the predictorusing historical data. Then that predictor was used to predictthe faults of software with new data. Although this framework

was stable, provided more accurate results, and solved thebaseline bias of the previous approaches, one of the disadvan-tages was that the metric selection and defect prediction wereperformed in two separate steps, rather than in a single step.Reducing these two steps process into a single embeddedprocess may reduce the computational complexity. This isone of the main focuses of our article.

Earlier work of Huda et al. [34] on multivariate processmonitoring and prediction of sources of out-of-control signalsin manufacturing systems based on a hybrid of the wrapperand filter approach is proved to be efficient in determiningmost significant attributes. This approach [34] injected theuseful information computed by the filter heuristic into thewrapper search process to reduce the search space whilemaximizing the performance of the classifiers. However,the approach proposed in [34] used a single wrapper heuristicthat may affect the performance of the hybrid due to the biasof the particular wrapper approach. In this article, we extendthis earlier work [34] and propose a muli Wrapper-Filterhybrid framework to detect the source(s) of the softwarefaults.

III. METHODOLOGYA. PROBLEM FORMULATIONLet us assume that we have given a set of N trainingsoftware modules X , {X : xj|j = 1, 2, 3, . . . ,N } whichhave the corresponding set C of fault/no-fault classes {C :cl |l = 1, 2, 3, . . . ,N }. It is also assumed that we have a setof nmetrics {Q : qi|i = 1, 2, 3 . . . , n} based on that, the faultsare identified in X . The problem of determining the set ofsignificant software metric mathematically can be defined as

G(Q,X ) = argmaxSk⊂Q

Perf(Sk | X ). (2)

Here Perf(Sk | X ) is a performance function that determinesthe capability of a set of metrics Sk to identify the softwarefault which can be defined as

Perf(Sk | X ) =

∑Ni=1 yiN

, (3)

where yi = 1 if the predicted fault type ci by the decisionfunction D of the wrapper is equal to the actual type ciotherwise yi = 0. One of the the main difficulties is toselect a subset of metrics Sk that maximizes the Perf(Sk | X ).This generates a subsequent problem to select an appropriatewrapper that can determine a decision function D to provideuseful information about the selected significant metrics. Thegeneration of subset Sk ⊂ Q for G(Q,X ) in (2) is alsocomputationally expensive task. Given an n−dimensionalmetric set, a fault prediction subsystem needs to find thesignificant software metrics from 2n subsets using an exhaus-tive search. Reduction of the exponential search space to alower computational complexity to determine a minimum-sized subset of metrics while maximizing the performance ofthe defect detector is still a challenging problem.

2848 VOLUME 6, 2018

Page 6: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

B. HYBRID FRAMEWORKSequential backward elimination (BE) or forward elimina-tion (FE) of attributes from a complete set of attributes ofa dataset is a general approach and is widely used in theML literature. The exponential search space generated byexhaustive methods can be easily reduced using BE or FE.However, complexity arises when minimum sized subset isrequired with the highest performance. For this purpose,while reducing the search space, heuristics are used inBE or FE. Two main heuristics are 1)filter and 2) wrapper.Filter heuristic approaches can extract intrinsic characteris-tics from data and are easily scalable to a high dimensionaldatabase of software modules. However, these approachesdo not use any performance evaluation criteria. Thereby asubset selected by the filter does not guarantee maximumperformance. In contrast, given the advantages of embed-ded performance evaluation, wrapper approaches face therisk of over-fitting and local optimization of decision func-tions [17], [18]. Instead of improving the limitation of indi-vidual approaches, we propose a hybrid model that blends thetwo approaches by integrating filter’s score into the wrapperselection process. Thus the proposed hybrid model has theadvantages of both approaches. The hybrid model may beinfluenced by the drawbacks of an individual filter or a wrap-per. However, in this paper, our proposedmodel uses differentwrapper techniques to find a suitable wrapper that maximizesthe performance while minimizing the dimension of the finalsubset. Two different wrapper frameworks are proposed:

1) Artificial Neural Network based Input Gain Measure-ment Approximation (ANNIGMA) [35]

2) Support Vector Machines (SVM) based wrapperscoring

In the proposed hybrid models, a BE process is engagedwhere significant metrics are selected by using a hybridwrapper-filter score. To guarantee themaximum performanceof a selected subset, a wrapper evaluation is done for eachsubset Sk . The benefits in the hybrid models are that thecomputation of heuristic score for each metric and subsetevaluation are blended into a single software component.Moreover, application of the BE strategywith a wrapper-filterscore reduces search space complexity from an exponentialspace 2n to a polynomial space n with maximum perfor-mance for the selected subset. A detail algorithm is given inAlgorithm 1. Algorithm 1 combines two wrappers in one pro-cess. At step-2, it takes input for which wrapper is selected.When input W=true, ANN is selected and when W=false,SVM is selected. In Algorithm 1, step-7 to step-33 runthe main BE process. Filter score is computed at step-9.Cross-validation is executed between step-10 and step-20in Algorithm 1.

1) HYBRID MODEL-1: ANNIGMA WRAPPER AND FILTERDecision function for a two layer ANN is as

Ds =∑r

F(∑i

qi ×Wir )× Wrs, (4)

Fig. 3. A single hidden layer multi layer perceptron (MLP) neuralnetwork.

if i, r and s are the input, hidden and output layers, andF(q) =1/(1+ exp(−q)) is a logistic activation linear function of thefirst and second layers, where Wir and Wrs are the networkweights of a two layer neural network shown in Fig. 3. Thenthe ANNIGMA [35] score can be defined as

ANNIGMA(qis) =LGis

maxnLGns

, (5)

where LGis is the local gain for ith input to sth output;LGis =

∑r |Wir × Wrs|. A cross validation approach is

employed while training the wrapper ANN.In each fold, the ANNIGMA score is computed for every

metric. Then, after training all folds, the ANNIGMA scorefor ith metric qi and sth output node is averaged as

ANNIGMA(qis)average =1n

n∑fold=1

ANNIGMA(qis)fold . (6)

The ANNIGMA score for ith metric

ANNIGMA(qi) =1K

K∑s=1

ANNIGMA(qis)average, (7)

where K is the total number of output nodes of the network.The score for each fold in cross-validation for ANNIGMAis computed using step-11 to step-14 and the average iscomputed using step-21 to step-25. In the hybrid, maximumrelevance (MR) is considered as the filter. MR score of ametric is computed using their information gain as belowwhich is normalized using the maximum score of the subsetof the metric

Relevance(qi) =I (qi; cl)

maxqi∈S

I (qi; cl), (8)

where I (qi; cl) is the mutual information between qi and classlabel set C which is defined as

I (qi; cl) = H (qi)− H (qi|cl), (9)

where H (qi) is the entropy of qi with the probability densityfunction p. Here qi takes discrete values from set of valuesV = {v1, v2, v3....vl}. The MR score is computed usingstep-9 in Algorithm 1. The combined score of a metric in the

VOLUME 6, 2018 2849

Page 7: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

Algorithm 1 Hybrid Approach for Metric Selection andFault Identification

1: input{X : xj|j = 1, 2, 3, . . . ,N } data from softwaremodules with n metric

2: input W ← True; W=true for ANN, W=false forSVM

3: output ← SBEST an significant subset of metrics4: Let S ← xj whole set of n metrics5: Let S0← initial sets of metrics6: for i = 1 to n− 1 do7: Current set of metrics Scurrent ← S8: Compute MR score using equation (8)9: for fold = 1 to n do10: if (W == true) then11: Train the wrapper ANN with software metric

set Scurrent12: Compute ANNIGMA scores of all metrics in

the set Scurrent using equation (5)13: Compute accuracy14: else15: Train the wrapper SVM with software metric

set Scurrent16: Compute SVM scores of all metrics in the set

Scurrent using equation (13)17: Compute accuracy18: end if19: end for20: Compute average accuracy of all folds for Scurrent21: if (W == true) then22: Compute average ANNIGMA of Scurrent for sth

output node by (6)23: Compute average ANNIGMA of Scurrent by (7)24: else25: Compute average SVM score of Scurrent by (14)26: end if27: Compute combined score for every metrics in

Scurrent by (10 to 15)28: Rank the metrics in Scurrent using combined score

in descending order29: S0← S0 ∪ Scurrent30: Update the metric set Scurrent by removing the

metric with lowest score31: Scurrent ← Scurrent − (metric with lowest score)32: end for33: SBEST ← Find the subset from S0 with the highest

accuracy34: return SBEST

MR-ANNIGMA is computed as follows, and in step-28 inAlgorithm 1. step-28 in Algorithm 1.

Combined Score : MR-ANNIGMA(qi)

=I (qi; cl)

maxqi∈Sk

I (qi; cl)+ ANNIGMA(qi). (10)

2) HYBRID MODEL-2: SVM WRAPPER AND FILTERThe decision function for SVM is defined by

D(xj) = sign(〈w · xj〉 + b), (11)

where 〈w · xj〉 + b is the hyperplane that separates the faultyand non-faulty types, w is the weight vector, and b is thedisplacement in the hyperplane which is a constant. Theweight vector w is computed through maximization of thedistance between the supporting hyperplane and is definedas

w =N∑j=1

cjγjxj. (12)

Here γj is the lagrange multiplier. The implication of ametric can be determined by taking the partial deriva-tives of the geometric distance between the two supportingplanes max

w,b2/||w||2 with respect to xij as

∑j

|∂||w||2

∂xij| = τwi, (13)

where τ is a constant. The above derivation shows thatith metric with higher wi has more influence in determin-ing the width of the margin of the resulting hyperplane.It concludes that a significant metric for better discrimina-tion will have a higher weight wi. The score for each foldin cross-validation for SVM is computed using step-16 tostep-18 in Algorithm 1. In the second proposed approach,SVM is hybridized with anMR filter score. While computingthe the combined score, the SVM wrapper is trained usingn-fold cross validation and SVMwrapper score for all metricsare averaged over all folds as defined below which is accom-plished in step-26 in Algorithm 1.

SVMS(qi)average =1

total fold = n(SVMS(qi)1

+ SVMS(qi)2 + . . .+ SVMS(qi)n).

(14)

The combined score of a metric in the MR-SVM is computedas follows and in step-28 in Algorithm 1.

Combined Score : MR-SVM(qi)

=I (qi; cl)

maxqi∈Sk

I (qi; cl)+

SVMS(qi)averagemaxqi∈Sk

SVMS(qi)average. (15)

The hybrids (MR-SVM and MR-ANNIGMA) in the pro-posed framework use different wrapper models where crossvalidation training has been used to avoid the bias in wrap-per score computation due to training data. In Algorithm 1,step-21 computes the average accuracy of each subset foreach wrapper used as the evaluation criterion of the metricsubset. The new subset generation and accuracy trackingof the subsets are accomplished in step-29 to step-32. TheBE process in Algorithm 1 runs until the size of the subsetof metrics is reduced to one. The subset with the highest

2850 VOLUME 6, 2018

Page 8: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

TABLE 1. Accuracies for SVM, MR, ANNIGMA, MR-ANNIGMA, and MR-SVM for software fault prediction dataset PC4. The bold values represent some ofthe significant accuracies achieved under each individual algorithm.

accuracy or close to the highest accuracy with a smallernumber of metrics is selected as the most significant sub-set. This process is accomplished in step-34 to step-35in Algorithm 1.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONThe evaluation of the algorithms was carried out for theeight NASA datasets CM1, JM1, AR1, KC1, KC2, PC1,PC3, and PC4. The authors are aware of the problems onoriginal NASA datasets. Therefore, the datasets were takenfrom its latest version from the openscience repository [36].Tables 1-3 show the accuracies of the hybrid models(SVM-MR) and (ANN-MR) at different iterations of theBE approaches. These tables also include the accuracies ofthe BE iterations from the independent filter (MR) and thewrappers SVM and ANN. Each of the accuracies presentedis an average of three trials.While computing average accura-cies, trials trials with 10fold cross validation were considered.Table 1 shows the results for the PC4 dataset. It can beobserved from Table 1 that while applying the BE iterationand the metrics are eliminated, SVM’s accuracy drops fromapproximately 91% to 90% at 28 metrics. Then it increased

to 91.038% at 25metrics. After that accuracy did not increase,instead it dropped to 87% with single metrics. Thereforethe best subset of metrics by SVM was identified as 25.MR achieved amaximum accuracy of 91.221% for 33metricsand almost the same maximum accuracy (91.084%) wasachieved with a metrics set of 19. The hybrid approach(MR-SVM) achieved almost 91% accuracy at different subsetof metrics including 34, 20, 17 and 11. After that accuracydid not increase, instead it dropped to 87.6% with singlemetrics. Therefore the best subset from MR-SVM is con-sidered as 11 with an accuracy of 91.038%. The results ofhybrid model based on ANNIGMA are also presented inthe Table 1. ANNIGMA achieved 90.855% with 31 metricsand MR-ANNIGMA achieved 91.056% with 17 metrics.It also can be seen that hybrid model MR-ANNIGMA per-formed better than MR and ANNIGMA.

A demonstration of the subset generation process includ-ing the computed score of each individual metric in eachBE iteration of the hybrid models is shown in Fig. 4. Thehybrid models start with 37 metrics (Fig. 4a), and the15th metric achieves lowest score. Therefore, hybrideliminates the 15th metric in its next BE iteration, and

VOLUME 6, 2018 2851

Page 9: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

Fig. 4. Hybrid score in BE iterations for SVM, MR, and MR-SVM with different set of subsets for PC4 dataset. (a) Hybrid score in BEiterations for MR-SVM with total subsets = 13. (b) Hybrid score in BE iterations for MR-SVM with total subsets = 13. (c) Hybrid scorein BE iterations for MR-SVM with total subsets = 12. (d) Hybrid score in BE iterations for MR-SVM with total subsets = 11.(e) Hybrid score in BE iterations for MR-SVM with total subsets = 10. (f) Hybrid score in BE iterations for MR-SVM withtotal subsets = 9. (g) Hybrid score in BE iterations for MR-SVM with total subsets = 8.

recalculated the score. This subset generation process wasthen shown for subset of 13 to 8 metrics as shownin Fig. 4b-Fig. 4g.

Similarly, Table 2 shows the final accuracy of the hybridmodels for the PC3 dataset for MR-ANNIGMA. Both MRand ANNIGMA achieved almost the same level of accuracieswith subset size 1. The subset generation process of thehybrid technique (MR-SVM) is shown in Fig. 5 and thecorresponding accuracies of BE are presented in Table 2.It is observed from Table 2 that for a subset of 10 metrics,the hybrid models outperform the other two approaches,

achieving an accuracy of 88.053% compared to the max-imum accuracy 87.558% and 87.651% of accuracies forSVM and MR with subset sizes of 2 and 36, respectively.An evaluation of hybrid models for the KC2 dataset ispresented in Table 3. It is observed that the minimum subsetsize for which MR-SVM achieved an accuracy of 83.269% is3. Similar accuracies were achieved by SVM (83.014%) andMR (83.142%) for subset sizes 5 and 4, respectively. Theresults also show that MR-ANNIGMA achieved 84.713%for metric − 18 whereas ANNIGMA achieved 84.674%for metric − 3. For the KC2 dataset, MR-ANNIGMA

2852 VOLUME 6, 2018

Page 10: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

TABLE 2. Accuracies for SVM, MR, ANNIGMA, MR-ANNIGMA, and MR-SVM for software fault prediction dataset PC3. The bold values represent thehighest accuracy achieved under each individual algorithm.

TABLE 3. Accuracies for SVM, MR, ANNIGMA, MR-ANNIGMA, and MR-SVM for software fault prediction dataset KC2. The bold values represent some ofthe significant accuracies achieved under each individual algorithm.

performed better than ANNIGMA, MR-SVM and MR.Therefore, the hybrid technique outperforms both thewrapper and filter approaches. Experimental results showthat different hybrids may perform differently depending

on the wrapper. However, it is clearly observed fromthe experimental results that hybrids MR-SVM andMR-ANNIGMA perform better than the independent fil-ter and wrapper approaches which significantly proves the

VOLUME 6, 2018 2853

Page 11: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

Fig. 5. Hybrid score in BE iterations for SVM, MR, and MR-SVM with different set of subsets for PC3 dataset. (a) Hybrid scorein BE iterations for MR-SVM with total subsets = 37. (b) Hybrid score in BE iterations for MR-SVM with total subsets = 12. (c) Hybrid scorein BE iterations for MR-SVM with total subsets = 11. (d) Hybrid score in BE iterations for MR-SVM with total subsets = 10. (e) Hybridscore in BE iterations for MR-SVM with total subsets = 9. (f) Hybrid score in BE iterations for MR-SVM with total subsets = 8.

effectiveness of the hybrid framework in software fault pre-diction and metric selection.

An analysis with varieties of different datasets is requiredto generalize the conclusion.We have conducted experimentswith CM1, JM1, AR1, KC1, and PC1 datasets in additionto KC2, PC3, and PC4 datasets. We also analyzed the stan-dard deviation while computing the accuracies and AUC forthese various datasets. Table 4 compares the average accu-racy and standard deviation of different approaches againstdifferent datasets. It is observed from the results that all theapproaches achieved a higher accuracy (more than 93%) with

the PC1 dataset compared to the other datasets, andMR-SVMachieved a higher accuracy of 93.99% compared to the otherapproaches for this dataset. The standard deviation values arein between 0.002 to 0.004 for these results. Table 5 presentsthe AUC value that we achieved for different approachesagainst different datasets. It is observed from the results thatSVM and MR achieved maximum AUC of 0.674 and 0.687,respectively, for the KC1 dataset. For ANNIGMA, the max-imum AUC of 0.688 was achieved for the KC2 dataset.However, hybrid approaches MR-ANNIGMA and MR-SVMachieved the highest accuracy of 0.678 and 0.701 for PC1 and

2854 VOLUME 6, 2018

Page 12: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

TABLE 4. Best accuracy values achieved by different approaches from different datasets. Average is calculated with 3 trials and ‘‘Std.’’ is the standarddeviation.

TABLE 5. Best AUC values achieved by different approaches from different datasets. Average is calculated with 3 trials and ‘‘Std.’’ is the standarddeviation.

TABLE 6. Most significant metrics achieved by different algorithms and proposed approaches.

JM1 datasets, respectively, with the standard deviation valuebetween 0.006 to 0.014. From this analysis we may concludethe effectiveness of the hybrid framework in software faultprediction and metric selection.

The identity of the final metrics and the metric subsetsizes are very important when comparing the performanceof different techniques. It is observed from Table 1 thatMR-SVM, MR, and SVM achieved final subset sizes

VOLUME 6, 2018 2855

Page 13: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

TABLE 7. List of significant metrics with their serial number used in the experiments.

TABLE 8. MEWMA results for software fault prediction dataset PC3.

of 11, 19, and 25, respectively, for the PC4 dataset. For thePC3 dataset, these are 10, 36, and 2, respectively (refer toTable 2). The significant metrics that form the final smallestsubsets for PC4 and PC3 datasets for different techniques arepresented in Table 6. A list of software metrics used in theevaluation process for PC4, PC3 and KC2 datasets is shownin Table 7. The metrics presented in Table 6 are representedby the serial number, which could be compared against thedata of Table 7 to extract the metric names.

A multivariate exponentially weighted moving average(MEWMA) [37] is applied which simultaneously monitors

and measures performance of two or more related qualitycharacteristics for any out-of-control (OC) situation. Theperformance of the proposed framework using MEWMA isdescribed as follows: The MEWMA statistical control chartis used to justify process performance for metrics. The resultsin Table 8 indicate that in PC3, proposed framework achieved95.9% in-control (IC) vs. 4.1% as out-of-control (OC) sit-uations for MR-SVM and 93.6% vs. 6.4% for in-controland out-of-control situations respectively, recorded forMR-ANNIGMA which is higher than the filter approach.Table 9 shows the MEWMA performance justification

2856 VOLUME 6, 2018

Page 14: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

TABLE 9. MEWMA results for software fault prediction dataset PC4.

process for MR-SVM. MR-SVM achieved 94.0% vs. 6.0%for in-control and out-of-control situations respectivelyin PC4. The results show that hybrid model MR-SVM per-formed better than MR-ANNIGMA which proves that ourproposed framework is also capable to find a suitable wrapperwhile it finds the most significant and compact subset.

V. CONCLUSIONSAchieving the planned goal while ensuring high quality soft-ware is challenging. Traditional testing approaches are notsufficient enough for quality assurance due to the limitedopportunities of user testing during software development.Automated techniques for software fault prediction at differ-ent stages of development is now ubiquitous in software qual-ity monitoring. However this requires measuring and trackingof a large number of metrics which is time consuming andexpensive. Therefore, determination of significant metrics forwhich software faults can be identified is a crucial problem.

In this article, we proposed a framework for finding signifi-cant metrics using a hybrid wrapper and filter framework. Thenovelty of the proposed framework is that it embeds the train-ing of both metric selection and fault prediction in a singleprocess which reduces the complexity of metric selection andfault prediction. The proposed framework demonstrates thatthe hybrid heuristic can guide the metric selection processin a computationally efficient way by integrating the intrin-sic characteristics from filters into the wrapper and takingadvantage of both the filter and wrapper approaches. Twodifferent hybridmodels were proposed using ANNIGMA andSVM wrappers and their hybrids with maximum relevancefilters. An extensive experimental study with real softwaremetrics and faulty datasets showed that the proposed frame-work performs significantly better than the conventional filterand wrapper approaches and it found more compact set ofsignificant metrics while achieving higher performance infault prediction. The current study was limited to proceduralmetrics. Therefore, in future studies, authors should extendthis work for OO metrics and a combination of more filtersand wrappers. Applying parallel computing can also be inves-tigated to reduce the computational complexity.

REFERENCES[1] Gartner Says Worldwide It Spending on Pace to Reach

$3.8 Trillion in 2014. Accessed: Feb. 21, 2016. [Online]. Available:http://www.gartner.com/newsroom/id/2643919

[2] O. F. Arar and K. Ayan, ‘‘Software defect prediction using cost-sensitive neural network,’’ Appl. Soft Comput., vol. 33, pp. 263–277,Aug. 2015.

[3] B. S. Ainapure, Software Testing & Quality Assurance, 1st ed. Chennai,India: Technical Publications, 2014.

[4] L. Pelayo and S. Dick, ‘‘Applying novel resampling strategies to softwaredefect prediction,’’ in Proc. Annu. Meeting North Amer. Fuzzy Inf. Process.Soc. (NAFIPS), Jun. 2007, pp. 69–72.

[5] H. B. Yadav and D. K. Yadav, ‘‘A fuzzy logic based approach for phase-wise software defects prediction using software metrics,’’ Inf. Softw. Tech-nol., vol. 63, pp. 44–57, Jul. 2015.

[6] C. Catal and B. Diri, ‘‘Investigating the effect of dataset size, metrics sets,and feature selection techniques on software fault prediction problem,’’ Inf.Sci., vol. 179, no. 8, pp. 1040–1058, 2009.

[7] M. Zhao, C. Wohlin, N. Ohlsson, and M. Xied, ‘‘A comparison betweensoftware design and code metrics for the prediction of software faultcontent,’’ Inf. Softw. Technol., vol. 40, no. 14, pp. 801–809, 1998.

[8] T. Bayes, ‘‘An essay towards solving a problem in the doctrine of chances,’’Philos. Trans. Roy. Soc. London, no. 53, pp. 370–418, 1763. [Online].Available: http://www.stat.ucla.edu/history/essay.pdf

[9] N. Cristianini and J. Shawe-Taylor, An Introduction to Support VectorMachines and Other Kernel-Based Learning Methods. New York, NY,USA: Cambridge Univ. Press, 2000.

[10] J. R. Quinlan, ‘‘Induction of decision trees,’’ Mach. Learn., vol. 1, no. 1,pp. 81–106, Mar. 1986.

[11] B. Kröse and P. V. D. Smagt, An Introduction to Neural Networks.Amsterdam, The Netherlands: Univ. Amsterdam, 1993.

[12] I. H. Laradji, M. Alshayeb, and L. Ghouti, ‘‘Software defect predictionusing ensemble learning on selected features,’’ Inf. Softw. Technol., vol. 58,pp. 388–402, Feb. 2015.

[13] R. Malhotra, ‘‘A systematic review of machine learning techniques forsoftware fault prediction,’’ Appl. Soft Comput., vol. 27, pp. 504–518,Feb. 2015.

[14] A. A. Asad and I. Alsmadi, ‘‘Evaluating the impact of software metricson defects prediction. Part 2,’’ Comput. Sci. J. Moldova, vol. 22, no. 1,pp. 127–144, 2014.

[15] J. A. Rodger, ‘‘Toward reducing failure risk in an integrated vehiclehealth maintenance system: A fuzzy multi-sensor data fusion Kalman filterapproach for IVHMS,’’ Expert Syst. Appl., vol. 39, no. 10, pp. 9821–9836,Aug. 2012.

[16] F. Aparisi and J. Sanz, ‘‘Interpreting the out-of-control signals of multi-variate control charts employing neural networks,’’ Int. J. Comput., Elect.,Autom., Control Inf. Eng., vol. 4, no. 1, pp. 24–28, 2010.

[17] R. Kohavi and G. H. John, ‘‘Wrappers for feature subset selection,’’ Artif.Intell., vol. 97, nos. 1–2, pp. 273–324, 1997.

[18] K. S. Balagani andV. V. Phoha, ‘‘On the feature selection criterion based onan approximation of multidimensional mutual information,’’ IEEE Trans.Pattern Anal. Mach. Intell., vol. 32, no. 7, pp. 1342–1343, Jul. 2010.

[19] T. Menzies, J. Greenwald, and A. Frank, ‘‘Data mining static codeattributes to learn defect predictors,’’ IEEE Trans. Softw. Eng., vol. 33,no. 1, pp. 2–13, Jan. 2007.

[20] D. Radjenović, M. Heričko, R. Torkar, and A. Živkovič, ‘‘Software faultprediction metrics: A systematic literature review,’’ Inf. Softw. Technol.,vol. 55, no. 8, pp. 1397–1418, 2013.

[21] H. Zhang, ‘‘An investigation of the relationships between lines of code anddefects,’’ in Proc. IEEE Int. Conf. Softw. Maintenance (ICSM), Sep. 2009,pp. 274–283.

[22] L. Yu and A. Mishra, ‘‘Experience in predicting fault-prone softwaremodules using complexity metrics,’’ Quality Technol. Quantitative Man-age., vol. 9, no. 4, pp. 421–433, May 2012. [Online]. Available: http://web.it.nctu.edu.tw/~qtqm/qtqmpapers/2012V9N4/2012V9N4_F7.pdf

[23] N. Ohlsson and H. Alberg, ‘‘Predicting fault-prone software modulesin telephone switches,’’ IEEE Trans. Softw. Eng., vol. 22, no. 12,pp. 886–894, Dec. 1996.

VOLUME 6, 2018 2857

Page 15: A Framework for Software Defect Prediction and Metric …users.cecs.anu.edu.au/~mohsin/downloads/a-framework-for...the most correlated OO metrics are CBO, RFC, and LOC. Furthermore,ChidamberandKemerer’s(CK)andtraditional

S. Huda et al.: Framework for Software Defect Prediction and Metric Selection

[24] M. H. Halstead, Elements of Software Science (Operating and Program-ming Systems Series). New York, NY, USA: Elsevier, 1977.

[25] S. R. Chidamber and C. F. Kemerer, ‘‘A metrics suite for object orienteddesign,’’ IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 476–493, Jun. 1994.

[26] W. Li and S. Henry, ‘‘Object-oriented metrics that predict maintainability,’’J. Syst. Softw., vol. 23, no. 2, pp. 111–122, Nov. 1993.

[27] P. He, B. Li, X. Liu, J. Chen, and Y. Ma, ‘‘An empirical study on softwaredefect prediction with a simplified metric set,’’ Inf. Softw. Technol., vol. 59,pp. 170–190, Mar. 2015.

[28] G. Abaei, A. Selamat, and H. Fujita, ‘‘An empirical study based onsemi-supervised hybrid self-organizing map for software fault prediction,’’Knowl.-Based Syst., vol. 74, pp. 28–39, Jan. 2015.

[29] A. E. Hassan, ‘‘Predicting faults using the complexity of code changes,’’in Proc. 31st Int. Conf. Softw. Eng., 2009, pp. 78–88.

[30] E. Erturk and E. A. Sezer, ‘‘A comparison of some soft computing meth-ods for software fault prediction,’’ Expert Syst. Appl., vol. 42, no. 4,pp. 1872–1879, 2015.

[31] J. Moeyersoms, E. J. de Fortuny, K. Dejaeger, B. Baesens, and D. Martens,‘‘Comprehensible software fault and effort prediction: A data miningapproach,’’ J. Syst. Softw., vol. 100, pp. 80–90, Feb. 2015.

[32] C.-P. Chang, C.-P. Chu, and Y.-F. Yeh, ‘‘Integrating in-process softwaredefect prediction with association mining to discover defect pattern,’’ Inf.Softw. Technol., vol. 51, no. 2, pp. 375–384, 2009.

[33] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, ‘‘A general softwaredefect-proneness prediction framework,’’ IEEE Trans. Softw. Eng., vol. 37,no. 3, pp. 356–370, May/Jun. 2011.

[34] S. Huda, M. Abdollahian, M. Mammadov, J. Yearwood, S. Ahmed, andI. Sultan, ‘‘A hybrid wrapperŰfilter approach to detect the source(s) ofout-of-control signals in multivariate manufacturing process,’’Eur. J. Oper.Res., vol. 237, no. 3, pp. 857–870, 2014.

[35] C.-N. Hsu, H.-J. Huang, and S. Dietrich, ‘‘The ANNIGMA-wrapperapproach to fast feature selection for neural nets,’’ IEEE Trans. Syst., Man,Cybern. B, Cybern., vol. 32, no. 2, pp. 207–212, Apr. 2002.

[36] The Tera-Promise Defect Repository. Accessed: Feb. 21, 2016. [Online].Available: http://openscience.us/repo/defect/

[37] H. Hashimoto et al., ‘‘Epileptic seizure monitoring by using multivariatestatistical process control,’’ in Proc. 12th IFAC Symp. Comput. Appl.Biotechnol., Int. Fed. Autom. Control, 2013, pp. 16–18.

SHAMSUL HUDA received the Ph.D. degree in computer science. He isa Lecturer with the School of Information Technology, Deakin University,Australia. He has published over 50 journal and conference papers in well-reputed journals, including the IEEE transactions. His main research areasare information security, cyber physical systems, computational intelligence,and machine learning. He was with Federation University as a Research Fel-low. He was an Assistant Professor with the Computer Science Department,Khulna University of Engineering and Technology, Bangladesh.

SULTAN ALYAHYA received the B.Sc. degree (Hons.) in information sys-tems fromKing SaudUniversity and theM.Sc. degree in information systemsengineering and the Ph.D. degree in computer science from Cardiff Univer-sity, U.K., in 2007 and 2013, respectively. He is currently an Assistant Pro-fessor with the College of Computer and Information Sciences, King SaudUniversity. His main research interests are in the fields of software projectmanagement, agile development and computer supported co-operative work.

MD MOHSIN ALI received the Ph.D. degree in computer science fromAustralian National University, Australia. He is currently a Staff Scientistwith National Computational Infrastructure, Australian National University.His main research areas are fault tolerance in distributed parallel computing,complexity in the developments of supercomputer applications, architec-tures, and high performance computing. He has published over 20 researcharticles in the above areas.

SHAFIQ AHMAD received the Ph.D. degree from RMIT University, Mel-bourne, Australia. He is currently an Assistant Professor with the Collegeof Engineering, King Saud University. He has published a research bookand a number of refereed research articles in international journals andconferences. He has more than two decades working experience in industryand academia in Australia, Europe, and Asia. His research interests includesmart manufacturing, performance analysis, and bibliometric.

JEMAL ABAWAJY received the Ph.D. degree in computer science. He isa Full Professor with the School of Information Technology, Faculty ofScience, Engineering and Built Environment, Deakin University, Australia.He is currently the Director of the Parallel and Distributing ComputingLaboratory. He is an author/co-author of five books, over 250 papers inconferences, book chapters, and journals, such as the IEEE TRANSACTIONS

ON COMPUTERS and the IEEE TRANSACTIONS ON FUZZY SYSTEMS. He is a SeniorMember of the IEEE Computer Society, the IEEE Technical Committeeon Scalable Computing, the IEEE Technical Committee on DependableComputing and Fault Tolerance, and the IEEE Communication Society. Hehas served on the editorial-board of numerous international journals. He iscurrently serving as an Associate Editor for the International Journal of BigData Intelligence and the International Journal of Parallel, Emergent andDistributed Systems. He has also guest edited many special issues. He alsoedited ten conference volumes.

HMOOD AL-DOSSARI received the M.S. degree in computer sciencefrom King Saud University and the Ph.D. degree in computer sciencefrom Cardiff University. He is currently an Assistant Professor with theCollege of Computer and Information Sciences, King Saud University. Hisresearch interests include quality of service assessment, trust and reputationmanagement systems, human and computer interaction, sentiment analysis,and social mining. He has several publications in international journalsand conferences. He has attended various conferences and presented manyseminars.

JOHN YEARWOOD received the Ph.D. degree in computer science. He iscurrently the Head of the School of Information Technology, Deakin Univer-sity, Australia. His main research areas are machine learning, optimization,and information security. He has authored two books and over 200 refereedjournal, book chapter, and conference articles. He was an Editor-in-Chiefof the Journal of Research and Practice in Information Technology and areviewer for many journals.

2858 VOLUME 6, 2018