04403073

Upload: vidi87

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 04403073

    1/4

    Analyzing Software System Quality Risk Using Bayesian Belief Network

    Hu YongGuangdong

    University of Foreign

    Studies,

    Sun Yat-senUniversity, 510275,

    Chinahenryhu200211@163.

    com

    Chen JuhuaSun Yat-sen

    University, 510275,

    Chinaisscjh@ mail.

    sysu.edu.cn

    Jiaxing HuangSun Yat-sen

    University, 510275,

    [email protected]

    Mei LiuUniversity of

    Kansas, 66045,

    [email protected]

    Kang XieSun Yat-sen

    University, 510275,

    [email protected].

    edu.cn

    Abstract

    Uncertainty during the period of software projectdevelopment often brings huge risks to contractors andclients. Developing an effective method to predict the costand quality of software projects based on facts such as

    project characteristics and two-side cooperation capabilityat the beginning of the project can aid us in finding waysto reduce the risks. Bayesian Belief Network (BBN) is a

    good tool for analyzing uncertain consequences, but it isdifficult to produce precise network structure andconditional probability table. In this paper, we build up thenetwork structure by Delphi method for conditional

    probability table learning, and learn to update theprobability table and confidence levels of the nodescontinuously according to application cases, which would

    subsequently make the evaluation network to have learningabilities, and to evaluate the software development risks inorganizations more accurately. This paper also introducesthe EM algorithm to enhance the ability in producinghidden nodes caused by variant software projects.

    1. Introduction

    Both software developing technologies and tools aremaking rapid progress in recent years, but many softwarefail due to project schedule delay, cost overspend, orunsatisfactory product. According to a report delivered bythe Standish Group in 2004, 18% of the software projectsinvestigated was considered as failure; and 58% of themwere unsuccessful due to schedule delay, exceeding budgetor unsatisfying product. A successful softwaredevelopment project relies on many factors. It iscomplicated and difficult to control all of the factors and toensure a smooth operation between the factors. The risksmay be effectively managed if they can be detected

    beforehand. The goal of this paper is to introduce amathematical model and to demonstrate that a softwaredevelopment team can rely on the model to accurately

    predict and calculate the risks and corresponding impacts

    on the success of the project.

    2. Related Works

    Boehm [1] proposed to divide software riskmanagement into two parts: risk assessment and riskcontrol. Rogers Pressman [2] suggested a risk Driver

    method according to the risk assessment method of the AirForce. This driver method has complete network structure,

    but it cannot use the historical data effectively.Ramamoorthy used expert system based on regulation andknowledge system based on influence diagram to assessrisks[3]. This method integrated historical data andmathematical method. Clyde Chittister [4] posed threequestions, which must be answered by software riskassessors: What are the problems? How often do the

    problems occur? What troubles will the problem bring?Bayesian Belief Networks (BBN) can provide goodanswers to those questions. The nodes of BBNcorresponding to risk events may answer Q1, confidencelevels of nodes may answer Q2, and the condition

    probability relationship may answer Q3. Sunita Chulani [5]proposed to apply Bayesian Belief Network to softwarecost assessment. In this way, it can integrate predictive andempirical knowledge and sample data information. Thefunction of learning is outstanding on incomplete andsparse data samples. Pendharkar [6] suggested usingBayesian Belief Network for software cost assessmentmodel, which integrated historical information andexperts assessment in order to get more accurate cost

    prediction. Another important application of BBN insoftware development process is software quality

    management. Norman Fenton [7] [8] proposed theapplications of BBN in software defect prediction, qualitycontrol and management. He indicated that BBN is themost effective model in software quality managementthrough comparison with six other types of methods [9].He built up the BBN model mostly based on experts

    judgments. The probability distribution is determined froman analysis of the literature or common-sense assumptionson the direction and strength of relations between variables[10]. Anthony Kwok Tai Hui, et al [11] introduced amethod to build up a Bayesian Belief Network for softwarerisk assessment. They used a professional research reportas a reference to build up Bayesian Belief Network andconditional probability table. The assessment network wasused to make Bayesian reasoning. After that it can get the

    probability distribution of every risk event. By this method,Bayesian Belief Network can be applied to the wholeprocess of software risk assessment, and it is reliable. Butit has two drawbacks. First, since the conditional

    probability table is based on experts experience, the resultcan be subjective. The second problem is that whenadditional nodes need to be introduced to the network, newresearch work must be performed in order to get the newconditional probability table. Therefore, the expandabilityResearch supported by Guangdong Software Science Foundation

    (2005B70101096)., National Nature Science Foundation (70572053)(60673135)Corresponding author: Xie Kang [email protected]

    2007 IEEE International Conference on Granular Computing

    0-7695-3032-X/07 $25.00 2007 IEEE

    DOI 10.1109/GrC.2007.83

    93

    2007 IEEE International Conference on Granular Computing

    0-7695-3032-X/07 $25.00 2007 IEEE

    DOI 10.1109/GrC.2007.83

    93

  • 8/13/2019 04403073

    2/4

    is limited. The main work of this paper is to analyze therisk factors, extend and adjust the Bayesian Belief

    Network in order to build up a better network, and learnconditional probability table by sample data.

    3. Build a Bayesian Belief Assessment

    Network

    BBN is a special directed acyclic graph. A completeBBN consists of network structure and CPT (conditional

    probability table). Once the network structure is fixed, theconditional independent hypothesis of every probabilityevents can be designed. There are two methods to buildBayesian Belief Network. One is to learn the networkstructure based on certain algorithm using some givensample data, and the other one is to use Delphi method tofix network structure according to experts experience. Inthis paper, we adopt the second approach. The rationalefalls into two aspects. One reason is that constructingnetwork structure by learning requires a lot of sample dataand counting work. Secondly, we believe that the expertsin software development can judge the independentrelationships between two events. Therefore in this paper,we present some event factors and judge the independentrelationship of each factor according to experts opinions.First of all, we collected the event factors from previousmaterials [12][13][14]. After that we classified and filteredall factors and selected 50 risk factors to fill the table. Thenwe use the list to fix the topology and rate of the network

    by Delphi method. Finally we present this list to about 30experts, and let them make connections between two riskfactors that may have a causal relationship. In order tosimplify the network and satisfy characteristics of adirected acyclic graph, we make some necessary changesto the network by deleting some connections that have theleast agreement among the experts. The partial network isshown in Figure 1.

    4. Bayesian Belief Network ParameterLearning

    Once the network structure is fixed, we can startworking on the CPT learning. In order for the model tomaintain assessing ability when there are not enough datasamples, we let the learning model to integrate the

    predictive knowledge and the information of datacollection. Here, the predictive knowledge is the

    probability of each event node and the conditionprobability of the adjacent nodes. The experts gave theirsubjective opinion when building up the network structure,and the CPT item is the average value given by each expert.This CPT will be used as an initial table for the network

    parameter learning. We gave every network node two

    values: {0: never happened, 1: happened}. We marked thenetwork structure as , and the nodes as , then the CPT

    we needed is the union probability of the whole BayesianBelief Network. There are two methods to learn Bayesianunion probability. When all the data samples in thecollection are complete, we can use grads uppingalgorithm [15]. However, through investigation, we found

    that we cannot give all the items a value in all projectsamples due to the difference of various projects. If wecannot figure out the value of some event nodes, this

    project sample is considered as incomplete. When there aresamples with missing data in the sample set, our modelwill automatically call the EM algorithm for learning.LAUR ITZENS had investigated how EM algorithm can

    be applied in learning Bayesian Belief Network parameter[16]. If there is missing data, we need to give the nodeswith missing values a hypothesized value. Then it correctsthese values according to the Bayesian Belief Network,which we are learning. We will show how EM is appliedin our learning model. Because the event nodes only take

    three values, let stand for the number of samples in

    the sample set D where and

    iv

    ijkN

    iji vv iki

    stands for the number of samples whereikN

    iki , ijk stands for the probability of iji vv when

    iki . We let the CPT at present to be the

    hypothesized initial value . And the formula for

    calculating the next hypothesis based on the presenthypothesis is described in the following steps. First of

    all, let the value of

    01t

    t

    ijk at present hypothesis to be ijk ,

    we defined a likelihood function as below:

    ijk

    ijk

    l

    lii dvPl ln)|,()|(t 1

    Given the likelihood function above, each of iteration ofthe EM algorithm has two steps:

    In the first step, the E step, each calculates the

    expectation of under hypothesis :

    jki ,,

    ijkNt

    lijkii

    t

    ijk kjvPDNE ln),(],|[ln . (2)

    Then, we can calculate formula (1) using the expectation

    of all . In the second step, the M step, we choose

    to be

    ijkN

    1t which tries to maximize the formula1:

    )|(maxarg1 tt l

    3

    The iteration continues until the value of formula (1)converges. LAUR ITZENS have proved that thisconverging point exists and it can be reached in a fewsteps [16]. When it converges, we will have the value of

    ijk for each . Then we can update CPT according

    to formula (4) to make the sum of probability

    jki ,,

    j

    ijk to

    be 1:

    ki

    ijk

    ijk

    ijk

    ,

    4

    9494

  • 8/13/2019 04403073

    3/4

    Figure 1. Software System Quality Risk Assessment Network

    5. Introduction of the Assessment Tool Basedon the Model

    An operational tool can be developed for software riskassessment and simulation based on our model. This toolmay help the project management team to analyze andcontrol the risks of software projects. On one hand, thistool may help the user organization to assess the ability ofthe contractor organization. On the other hand, this toolmay also help the contractor organization to do some self-assessment on the project, or it can help the organization inconfiguring resource and avoiding risk. The input of thetool is the probability vector of the top-level nodes.Usually we make the probability of the exiting event to be1, and the other event to be 0. When we finish all the input,

    we can start the reasoning of the model.This tool can also help the project managers to trace the

    project through its whole life cycle. When some eventhappens, they can change the probability vector manually,and start the simulation. With this tool they can get the realtime risk prediction. Meanwhile, this kind of simulationcan support the decision makers of the project. This toolcan be used to monitor the project until its completion.And when the project is finished, the result can be added tothe data sample collection of the model learning process toanalyze future projects.

    6. Experiment and Validation

    Bayesian Belief Network has strong capabilities in

    analyzing and learning, and it can also maintain thecapabilities with existence of missing data, which matchesthe need of the software developing diversity. This paperintroduced Bayesian Belief Network to simulate andanalyze the changing risks of software development. Wecombined the current literature and experts experience to

    construct network. The CPT deduced from learning andupdating may reduce the subjectivity of assessment modelgreatly, and make the assessment result more reliable. Weintroduced EM algorithm to learn CPT, which enhancedthe models analyzing ability and predicting ability tochanging projects.

    For the validation of the model, we collected our datawith questionnaire from real projects. We had distributedin total 300 questionnaires and had collected 135 samples

    back. After evaluation, 120 of them were considered to bevalid. These data samples came from a broad industrialscope including software developing, communication,Internet service, transportation and government located inGuangzhou, Shanghai, Shenzhen and Nanjing. Amongthem, 72% of the participants have at least 4 yearssoftware project experience, and 50% of them arecompany managers or project managers, 36.7% of themare project developer. All of the above suggest that theyhave the expertise and are qualified to provide the data.Then we separated the samples into two sets: 20 sampleswere chosen for the network validation and the rest 100samples were used to train the model. The precision canreach 80%. As expected, most correct or true predictionshave higher probability (larger than 0.95). The detailedresults are shown in Table 1.

    7. Conclusion

    In this research, we analyzed potential risks involved insoftware system quality using Bayesian Belief Network.The network structure is constructed using the Delphi

    method for conditional probability table learning. Theprobability table and confidence levels of the nodes areupdated and learned continuously based on applicationcases, which would subsequently enable the evaluationnetwork to have learning abilities, and to evaluate the

    9595

  • 8/13/2019 04403073

    4/4

    software development risks in organizations moreaccurately. The EM algorithm is introduced to enhance theability of producing hidden nodes caused by variantsoftware projects. Our model is validated though trainingand evaluation over 120 real-life development projects.The experiment results have demonstrated that the modelcan achieve high prediction accuracy of 80% (shown inTable 1). The confidence levels of our predictions are

    mostly larger than 0.95, which means that our model isindeed reliable. Based on our model, an operational toolcan be developed for software risk assessment andsimulation, which can help the project management teamto analyze and control the risks of software projects andhelp the project managers to trace the project through itswhole life cycle. .

    Table 1. Comparison of sample results and the prediction of the model

    No 1 2 3 4 5 6 7 8 9 10

    Sample Results Fail Fail Fail Fail Fail Fail Fail Success Success FailPrediction Fail Fail Fail Fail Fail Success Fail Fail Success SuccessProbability 0.97 0.99 0.97 1.0 0.97 0.81 0.63 0.79 0.97 0.92True-false T T T T T F T F T FNo 11 12 13 14 15 16 17 18 19 20

    Sample Results Fail Success Success Fail Success Success Success Success Success SuccessPrediction Fail Success Success Success Success Success Success Success Success SuccessProbability 0.96 0.99 0.85 0.64 0.61 0.98 0.98 0.98 0.94 0.64True-false T T T F T T T T T TAccuracy 80%

    7. References

    [1] Boehm. Software Risk Management: Principles andPractice. IEEE Software. 1991, (1):32-41.

    [2] Roger S. Pressman. A Manager's Guide to SoftwareEngineering. McGraw-Hill, Inc., New York, NY,1993.

    [3] C. V. Ramamoorthy, C. Chandra. Knowledge BasedTools for Risk Assessment in Software Developmentand Reuse. Proceedings of the 1993 IEEEInternational Conference on Tools with AI, Boston,Massachusetts, Nov. 1993. 1993,364-371.

    [4] Clyde Chittister, Yacov Y. Haimes. Assessment and

    Management of Software Technical Risk. IEEETransactions on Systems, Man, and Cybernetics.1994, 24(2):187-202.

    [5] Sunita Chulani, Barry Boehm. Bayesian Analysis ofEmpirical Software Engineering Cost Models. IEEETransactions on Software Engineering. 1999,24(4):573-583.

    [6] P arag C. Pendharkar, Girish H. Subramanian, JamesA. Rodger. A Probabilistic Model for PredictingSoftware Development Effort. IEEE Transactions onSoftware Engineering. 2005, 31(7):615-624.

    [7] Martin Neil, Norman Fenton. Predicting SoftwareQuality using Bayesian Belief Networks.Proceedings of 21st Annual Software Engineering

    Workshop, NASA/Goddard Space Flight Centre,December 4-5, 1996. 1996, 217-230.

    [8] Norman Fenton, Martin Neil. Probabilistic Modelingfor Software Quality Control. S. Benferhat and P.Besnard (Eds.): ECSQARU 2001, LNAI 2143. 2001,

    444453.[9] Norman Fenton, Martin Neil. A Critique of Software

    Defect Prediction Models. IEEE Transactions onSoftware Engineering. 1999, 25(5):675-689.

    [10] Norman Fenton, Paul Krause and Martin Neil. AProbabilistic Model for Software Defect Prediction.IEEE Transactions in Software Engineering. 2000.

    [11] Anthony Kwok Tai Hui, Dar Biau Liu. A BayesianBelief Network Model and Tool to Evaluate Riskand Impact in Software Development Projects.Reliability and Maintainability, 2004 AnnualSymposium RAMS. 2004, 297-301.

    [12] Sarma Nidumolu. The Effect of Coordination andUncertainty on Software Project Performance:

    Residual Performance Risk as an InterveningVariable. Information Systems Research. 1995,6(3):191-219.

    [13] Linda Wallace, Mark Keil, Arun Rai. UnderstandingSoftware Project Risk: A Cluster Analysis.Information and Management. 2004, 42(1):115-125.

    [14] Carr, M., Kondra, S. Taxonomy Based RiskIdentification. Software Engineer InstitutedTechnical Report SEI-93-TR-006, Pittsburgh, PA.Software Engineering Institute (SEI internal report),1993.

    [15] Tom M. Mitchell, Machine Learing, MeGraw-HillCompanies, Inc. 1997

    [16] Steffen L. Lauritzen. The EM Algorithm forGraphical Association Models with Missing Data.Computational Statistics & Data Analysis. 1995,19(2):191-201.

    9696