1 a novel method for early software quality prediction based on support vector machine fei xing 1,...
TRANSCRIPT
![Page 1: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/1.jpg)
1
A Novel Method for Early Software Quality Prediction Based on Support Vector Machine
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
1Department of Computer Science
Beijing Normal University2Department of Computer Science & Engineering
The Chinese University of Hong Kong
![Page 2: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/2.jpg)
2
Outline• Background• Support vector machine
– Basic theory– SVM with Risk Feature– Transductive SVM
• Experiments• Conclusions• Further work
![Page 3: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/3.jpg)
3
Background
• Modern society is fast becoming dependent on software products and systems.
• Achieving high reliability is one of the most important challenges facing the software industry.
• Software quality models are in desperate need.
![Page 4: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/4.jpg)
4
Background
• Software quality model – A software quality model is a tool for focusing
software enhancement efforts. – Such a model yield timely predictions on a
module-by-module basis, enabling one to target high-risk modules.
![Page 5: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/5.jpg)
5
Background
• Software complexity metrics– A quantitative description of program
attributes. – Closely related to the distribution of faults in
program modules.– Playing a critical role in predicting the quality
of the resulting software.
![Page 6: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/6.jpg)
6
Background
• Software quality prediction – Software quality prediction aims to evaluate
software quality level periodically and to indicate software quality problems early.
– Investigating the relationship between the number of faults in a program and its software complexity metrics
![Page 7: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/7.jpg)
7
Background• Related work
– Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault-prone and non fault-prone categories.
• Discriminant analysis, • Factor analysis, • Classification trees, • Pattern recognition,
• EM algorithm, • Feedforward neural
networks, • Random forests
![Page 8: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/8.jpg)
8
Background
• Classification Problem
• Two types of errors – A Type I error is the case where we conclude
that a program module is fault-prone when in fact it is not.
– A Type II error is the case where we believe that a program module is non fault-prone when in fact it is fault-prone.
![Page 9: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/9.jpg)
9
Background
• Which error type is more serious in practice?
– Type II error has more serious implications, since a product would be seem better than it actually is, and testing effort would not be directed where it is needed the most.
![Page 10: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/10.jpg)
10
Research Objectives
• In search of a well accepted mathematical model for software quality prediction.
• Lay out the application procedure for the selected software quality prediction model.
• Perform experimental comparison for the assessment of the proposed model.
• Select proven model for investigation: Support Vector Machine.
![Page 11: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/11.jpg)
11
Support Vector Machine
• Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory
• Traced back to the classical structural risk minimization (SRM) approach, which determines the classification decision function by minimizing the empirical risk.
![Page 12: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/12.jpg)
12
Support Vector Machine (SVM)
• It is a new technique for data classification, which has been used successfully in many object recognition applications
• SVM is known to generalize well even in high dimensional spaces under small training sample conditions
• SVM excels in linear classifiers
![Page 13: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/13.jpg)
13
Given two classes of data sampled from x and y, we are trying to find a linear decision plane wT z + b=0, which can correctly discriminate x from y.
wT z + b< 0, z is classified as y;
wT z + b >0, z is classified as x.wT z + b=0 : decision hyperplane
y
x
Linear Binary Classifier
![Page 14: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/14.jpg)
14
• The current state-of-the-art classifier– Local Learning
Decision Plane
Support Vectors
Margin
Support Vector Machine
![Page 15: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/15.jpg)
15
Support Vector Machine
• Dual problem– Using standard Lagrangian duality techniques,
one arrives at the following dual Quadratic Programming (QP) problem:
})(2
1max{
1 1,
l
i
l
jijijijii yy xx
li
y
i
l
iii
,2,1,0
01
s.t.
![Page 16: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/16.jpg)
16
Support Vector Machine• The Optimal Separating Hyperplane
– Place a linear boundary between the two different classes, and orient the boundary in such a way that the margin is maximized:
– The optimal hyperplane is required to satisfy the following constrained minimization as:
( ) 0g x w x b
. . [( ) ] 1 0i is t y w x b
21min{ }
2w
![Page 17: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/17.jpg)
17
Support Vector Machine• The Generalized Optimal Separating Hyperplane
– For the linearly non-separable case, positive slack variables are introduced:
– C is used to weight the penalizing variables , and a larger C corresponds to assigning a higher penalty to errors.
. . [( ) ] 1 0i i is t y w x b
0i
2
1
1min
2
n
ii
C
w
i
![Page 18: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/18.jpg)
18
Support Vector Machine
• SVM with Risk Feature – Take into account the cost of different types of
errors by adjusting the error penalty parameter C to control the risk.
– C1 is the error penalty parameter of class 1 and C2 is the error penalty parameter of class 2.
2
1 21 1
1min
2
k l
i ii i k
C C
w
![Page 19: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/19.jpg)
19
C1=10000 C2=20000
C1=20000 C2=10000
Optimal Separating HyperplaneC1=20000 C2=20000
![Page 20: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/20.jpg)
20
Support Vector Machine
• Transductive SVM – A kind of semi-supervised learning
– Taking into account a particular test set as well as training set, and trying to minimize misclassifications of only those particular examples.
![Page 21: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/21.jpg)
21
Experiments• Data Description
– Medical Imaging System (MIS) data set.– 11 software complexity metrics were measured for
each of the modules– Change Reports (CRs) represent faults detected.– Treat those modules with 0 or 1 CRs to be non
fault-prone (total 114), and those with CRs from 10 to 98 to be fault-prone (total 89).
– The total 203 samples are divided into two parts: half for training and remaining half for testing
![Page 22: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/22.jpg)
22
Experiments• Metrics of MIS data
– Total lines of code including comments (LOC)– Total code lines (CL)– Total character count (TChar)– Total comments (TComm)– Number of comment characters (MChar)– Number of code characters (DChar)– Halstead’s program length (N)– Halstead’s estimated program length ( )– Jensen’s estimator of program length (NF )– McCabe’s cyclomatic complexity (v(G))– Belady’s bandwidth metric (BW),– ……
N̂
![Page 23: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/23.jpg)
23
Distribution in first three principal components space
![Page 24: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/24.jpg)
24
Methods for Comparison
• Applied models– QDA: Quadratic Discriminant Analysis– PCA: Principal Component Analysis– CART: Classification and Regression Tree– SVM: Support Vector Machine– TSVM: Transductive SVM
• Evaluation criteria– CCR: Correct Classification Rate– T1ERR: Type I error– T2ERR: Type II error
![Page 25: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/25.jpg)
25
The Comparison ResultsMethods CCR Std T1ERR T2ERR
QDA 85.49% 0.0288 7.37% 7.14%
PCA+QDA 86.53% 0.0275 4.90% 7.52%
PCA+CART 83.02% 0.0454 9.59% 6.41%
SVM 89.00% 0.0189 2.33% 8.67%
PCA+SVM 89.07% 0.0209 2.06% 8.87%
TSVM 90.03% 0.0326 2.11% 7.86%
![Page 26: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/26.jpg)
26
The Comparison of the Three Kernels of SVM
Kernel function CCR Std T1ERR T2ERR
Polynomial 70.74% 0.0208 0.45% 28.81%
Radial basis 88.68% 0.0220 2.65% 8.67%
Sigmoid 88.75% 0.0223 2.29% 8.96%
![Page 27: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/27.jpg)
27
Experiments with the Minimum Risk
• SVM with the risk feature
• The Bayesian decision with the minimum risk
![Page 28: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/28.jpg)
28
SVM with the Risk Feature
C1 C2 CCR Std T1ERR T2ERR
5000 20000 86.53% 0.0393 11.43% 5.10%
8000 20000 86.53% 0.0275 4.90% 7.52%
10000 20000 89.00% 0.0189 2.33% 8.67%
15000 20000 89.07% 0.0209 2.06% 8.87%
20000 20000 89.07% 0.0209 2.06% 8.87%
![Page 29: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/29.jpg)
29
The Bayesian Decision with the Minimum Risk
Risk ratio CCR Std T1ERR T2ERR
1:1 85.94% 0.0387 7.59% 6.47%
1:1.1 83.80% 0.0326 11.06% 5.14%
1:1.2 78.73% 0.0321 17.90% 3.37%
1:1.3 71.31% 0.0436 27.12% 1.57%
1:1.4 59.98% 0.0516 39.75% 0.27%
![Page 30: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/30.jpg)
30
Discussions• Features of this work
– Modeling nonlinear functional relationships– Good generalization ability even in high
dimensional spaces under small training sample conditions
– SVM-based software quality prediction model achieves a relatively good performance
– Easily controlling Type II error by adjusting the error penalty parameter C of SVM
![Page 31: 1 A Novel Method for Early Software Quality Prediction Based on Support Vector Machine Fei Xing 1, Ping Guo 1,2 and Michael R. Lyu 2 1 Department of Computer](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d0a5503460f949dd0cf/html5/thumbnails/31.jpg)
31
Conclusions
• SVM provides a new approach which has not been fully explored in software reliability engineering.
• SVM offers a promising technique in software quality prediction.
• SVM is suitable for real-world applications in software quality prediction and other software engineering fields.