software fault reporting processes in business …...software fault reporting processes in...

Software Fault Reporting Processes in

Business-Critical Systems

Jon Arvid Børretzen

Doctoral Thesis

Submitted for the partial fulfilment of the requirements for the degree of

Philosophiae Doctor

Department of Computer and Information Science

Faculty of Information Technology, Mathematics and Electrical Engineering Norwegian University for Science and Technology

ii

Copyright © 2007 Jon Arvid Børretzen ISBN 82-471-xxxx-x (printed) ISBN 82-471-xxxx-x (electronic) ISSN 1503-8181 NTNU 2007:xx (local report series) Printed in Norway by NTNU Trykk, Trondheim

iii

Abstract Today’s society is crucially dependent on software systems. The number of areas where functioning software is at the core of operation is growing steadily. Both financial systems and e-business systems relies on increasingly larger and more complex computer and software systems. To increase e.g. the reliability and performance of such systems we rely on a plethora of methods, techniques and processes specifically aimed at improving the development, operation and maintenance of such software. The BUCS project (BUsiness-Critical Systems) is seeking to develop and evaluate methods to improve the support for development, operation and maintenance of business-critical software and systems. Improving software processes relies on the ability to analyze previous projects and derive concrete improvement proposals. The research in this thesis is based on empirical studies performed in several Norwegian companies that develop business-critical software. The work specifically aims to assess the use of fault reporting approaches, and describe how improvement in this area can benefit process and product quality. Some specific software methods will be adopted from safety-critical software engineering practices, while others will be taken from general software engineering. Together they will be tuned and refined for this particular context. A specific goal in the BUCS project has been to facilitate the use of traditional Software Criticality Analysis techniques for the development of business-critical software. This encompasses techniques used to evaluate and explore potential risks and hazards in a system. The thesis describes six studies of software development technology for business-critical systems. The main goal is to attain a better understanding of business-critical systems, as well as to adapt and improve relevant methods and processes. Through data mining of historical software project data and other studies of relevant projects, we have gathered information to be evaluated with the goal of improving business-critical systems development. The BUCS project has been involved in investigation of development projects for business-critical systems, investigations that have been continued in the EVISOFT user-driven project. The main goal was to study the effects of revised development methods for business-critical software, in order to improve important quality aspects of these systems. The main research questions in this work are: • RQ1. What is the role of fault reporting in existing industrial software development? • RQ2. How can we improve existing fault reporting processes? • RQ3. What are the most common and severe fault types, and how can we reduce

them in number and severity? • RQ4. How can we use safety analysis techniques together with failure report

analysis to improve the development process? The main contributions of this thesis are: • C1. Describing how to utilize safety criticality techniques to improve the

development process for business-critical software. • C2. Identification of typical shortcomings in fault reporting. • C3. Improved model of fault origins and types for business-critical software.

iv

Preface This thesis is submitted to the Norwegian University of Science and Technology (NTNU) in partial fulfilment of the requirements for the degree Philosophiae Doctor. The work has been performed at the Department of Computer and Information Science, NTNU, Trondheim, with Professor Reidar Conradi as the main advisor, and Professor Tor Stålhane and Professor Torbjørn Skramstad as co-advisors. The thesis is part of the BUCS project (BUsiness-Critical Systems) and has been financed for three years by the Norwegian Research Council through the IKT’2010 basic IT Programme under NFR grant number 152923/V30. In addition comes one year as a teaching assistant paid by NTNU. The BUCS project has been lead by Professor Tor Stålhane. Some of the work in this thesis has also partly been financed by the EVISOFT user-driven R&D project under NFR grant number 174390/I40.

v

Acknowledgements During the work on this thesis, I have been lucky to been in contact with many people who have provided help, inspiration and motivation. First of all, I want to thank my supervisor, Professor Reidar Conradi, for giving valuable feedback and comments on many drafts and ideas during the last four years. Also, I want to thank Professor Tor Stålhane, my co-advisor, for being the source of a lot of good advice and many bad jokes. I also want to thank the present and former members of the software engineering group at IDI, NTNU for giving me a good working environment. A special thanks to my BUCS colleagues Torgrim Lauritsen and Per Trygve Myhrer for collaboration in our research and daily work. Parts of the work for this thesis have been done in collaboration with people from several industrial organizations. I am very grateful to these companies and the people I have been in touch with from these organizations who have been helpful and accommodating when sharing their information and experience with me. Also I want to thank master student Jostein Dyre-Hansen for helping me analyze a great deal of data material. Finally, I want to thank my family and friends for their encouragement and inspiration, and I would especially express my thanks to Ingvild for her love and patience.

Trondheim, Nov 1, 2007


vii

Table of contents 1 Introduction .............................................................................................................. 1

1.1 Motivation .............................................................................................................. 1 1.2 Research Context.................................................................................................... 2 1.3 Research design ...................................................................................................... 4 1.4 Research questions and contributions..................................................................... 4 1.5 Included research papers ........................................................................................ 5 1.6 Thesis structure....................................................................................................... 7

2 State-of-the-art.......................................................................................................... 9

2.1 Introduction ............................................................................................................ 9 2.2 Software engineering.............................................................................................. 9 2.3 Software Quality................................................................................................... 11 2.4 Anomalies: Faults, errors, failures and hazards.................................................... 12 2.5 Current methods and practices ............................................................................. 16 2.6 Business-critical software..................................................................................... 20 2.6.1 Criticality definitions......................................................................................... 20 2.7 Techniques and methods used to develop safety-critical systems........................ 22 2.8 Empirical Software Engineering .......................................................................... 26 2.9 Main challenges in business-critical software engineering .................................. 29

3 Research Context and Design................................................................................. 31

3.1 BUCS Context ...................................................................................................... 31 3.2 Research Focus..................................................................................................... 32 3.3 Research approach and research design ............................................................... 35 3.4 Overview of the studies ........................................................................................ 41

4 Results .................................................................................................................... 43

4.1 Study 1: Preliminary Interviews with company representatives (used in P1)...... 43 4.2 Study 2: Combining safety methods in the BUCS project (Paper P1) ................. 44 4.3 Study 3: Fault report analysis (Papers P2, P3, P5) ............................................... 45 4.4 Study 4: Fault report analysis (Paper P4) ............................................................. 48 4.5 Study 5: Interviewing practitioners about fault management (Paper P6)............. 50 4.6 Study 6: Using hazard identification to identify faults (Paper P7)....................... 51 4.7 Study 7: Experiences from fault report studies (Technical Report P8)................ 52

5 Evaluation and Discussion ..................................................................................... 55

5.1 Contributions ........................................................................................................ 55 5.2 Contribution of this thesis vs. literature................................................................ 57 5.3 Revisiting the Thesis Research Questions, RQ1-RQ4 ......................................... 58 5.4 Evaluation of validity ........................................................................................... 59 5.5 Industrial relevance of results............................................................................... 60 5.6 Reflection: Research cooperation with industry................................................... 61

6 Conclusions and future work.................................................................................. 63

6.1 Conclusions .......................................................................................................... 63

viii

6.2 Future Work.......................................................................................................... 64 Glossary.......................................................................................................................... 67

Term definitions ......................................................................................................... 67 References ...................................................................................................................... 73 Appendix A: Papers........................................................................................................ 81 Appendix B: Interview guide ....................................................................................... 175

ix

List of Figures Figure 1-1 The studies with their related papers and contributions 5 Figure 1-2 The structure of this thesis 8 Figure 2-1 Relationship between faults, errors, failures and reliability 13 Figure 2-2 Relationship between hazards, accidents and safety 14 Figure 2-3 Faults, Hazards, Reliability and Safety 15 Figure 2-4 The Rational Unified Process 18 Figure 2-5 Relationship of business-critical and other types of criticality 21 Figure 2-3 Relationship between faults, errors and failures 22 Figure 4-1 Combining PHA/HazOp and Safety Case 45 Figure 4-2 Percentage of high severity faults in some fault categories 47 Figure 4-3 Quality views associated to defect data, and their relations 48 Figure 4-4 Distribution of severity with respect to fault types for all projects 50 Figure 4-5 Distribution of hazards represented as fault types (%) 51

List of Tables Table 2-1 Examples of different systems’ criticality 22 Table 2-2 Properties of some safety criticality analysis techniques 25 Table 2-3 12 ways of studying technology, from [Zelkowitz98] 26 Table 2-4 Empirical research approaches 28 Table 3-1 Description of our studies 33 Table 3-2 Type of studies in this thesis 41 Table 3-3 Relation between main and local research questions 41 Table 4-1 Distribution of all faults in fault type categories 47 Table 4-2 Distribution of all faults in fault type categories 47 Table 4-3 Fault type distribution across all projects 49 Table 5-1 Relationship of contributions and research questions 56

xi

Abbreviations BUCS Business-Critical Software (project) CBD Component-Based Development CBSE Component-Based Software Engineering CCA Cause-Consequence Analysis COTS Commercial Off The Shelf DBMS Data Base Management System GQM Goal Question Metric GUI Graphical User Interface ETA Event Tree Analysis EVISOFT EVidence based Improvement of SOFTware engineering (project) FMEA Failure Mode and Effects Analysis FMECA Failure Mode Effects and Criticality Analysis FTA Fault Tree Analysis HAZOP Hazard and Operability Analysis IEEE Institute of Electrical and Electronics Engineers INCO Incremental and component-based software development (project) ISO International Organization for Standardization NFR Norwegian Research Council NS-ISO Norwegian Standard NTNU Norwegian University of Science and Technology OMG Object Management Group OS Operating System OSS Open Source Software PHA Preliminary Hazard Analysis QA Quality Assurance RUP Rational Unified Process (by Rational) SPI Software Process Improvement UML Unified Modelling Language (by Rational, later OMG) XP Extreme Programming

1

1 Introduction In this chapter the background and research context for this thesis is presented. The chapter also introduces the research design, the research questions and the contributions. Finally, the list of papers and the outline of the thesis is presented.

1.1 Motivation The technological development in our society has lead to software systems being introduced into an increasing number of different business domains. In many of these areas we become more or less dependent on these systems, and their potential weaknesses could have grave consequences. In this respect, we can coarsely divide software products into three categories: safety-critical software (e.g. controlling traffic signals), business-critical software (e.g. for banking) and non-critical software (e.g. for word processing). Evidently, the definition of business-critical versus the other two categories may be difficult to state precisely, and would in many cases depend on the particular viewpoint of the business and users. To clarify the distinction between business-critical and safety-critical, we can consider what consequences operation failure (observable and erroneous behaviour of the system compared to the requirements) will have in the two different cases. For safety-critical applications, the result of a failure could easily be a physical accident or an action leading to physical harm for one or more human beings. In the case of business-critical systems, the consequences of failures are not that grave, in the sense that accidents do not mean real physical damage, but that the negative implications may be of a more financial or trust-threatening nature. Ian Sommerville states that business-criticality signifies the ability of core computer and other support systems of a business to have sufficient QoS to preserve the stability of the business [Sommerville04]. Thus business-critical systems are those whose failure could threaten the stability of a business. The overall goal for the BUCS project is to better understand and thus sensibly improve software technologies, including processes used for developing business-critical software. In order to do this, empirical studies of projects have been performed in cooperation with Norwegian ICT industry. Specific BUCS goals as presented in the BUCS project proposal [BUCS02] are the following:

2

BG1 To obtain a better understanding of the problems encountered by Norwegian industry during development, operation and maintenance of business-critical software.

BG2 Study the effects of introducing safety-critical methods and techniques into the development of business-critical software, to reduce the number of system failures (increased reliability).

BG3 Provide adapted and annotated methods and processes for development of business-critical software

BG4 Package and disseminate the effective methods into Norwegian software industry.

In this thesis, we aim to study how software faults and software fault reporting practises affects business-critical software, and also if techniques (e.g. PHA, Hazop, etc) from the area of safety-critical systems development can have a positive effect on other quality attributes (e.g. reliability) than safety. The relation between faults and failures is explained in Section 2.4.

1.2 Research Context This thesis is a part of the work done in the BUCS basic research and development project (BUsiness-Critical Software). The BUCS project was funded by the Norwegian Research Council as a basic R&D project in IT, and was run in 2003-2007. Some parts of the work in this thesis were also financed by the EVISOFT project, a national, user-driven R&D project on software process improvement funded by the Norwegian Research Council [EVISOFT06]. Within the BUCS project, this thesis will focus on fault reporting processes in business-critical systems. Some important research issues we want to study are the following:

• How do software faults affect the reliability and safety of business-critical systems? • What are the common fault types in business-critical systems? • How can we use system safety methods in business-critical application development?

1.2.2 The BUCS project The goal of the BUCS project is not to help developers to finish their development on schedule and budget. We are not particularly interested in the delivered functionality or how to identify or avoid process and project risk. This is not because we think that these properties are not important – it is just that we have defined them out of the BUCS project. The goal of the BUCS project is to help developers, users and other stakeholders to develop software whose later use is less prone to critical problems, i.e. has sufficient reliability and safety. In a business environment this means that the system seldom behaves in such a way that it causes the customer or his users to lose money, important information, or both. We will use the term business-critical for this characteristic.

3

Another term is business-safe, which means that a system fulfils the criteria for business-safety in a business-critical system. That a system is business-safe does not mean that the system is fault-free, i.e. cannot possibly fail. What this means is that the system will have a low probability of entering a state where it will cause serious losses. In this respect, the system characteristic is close to the term “safe”. This term is, however, wider, since it is concerned with all activities that can cause damage to people, equipment, the environment or severe economic losses. Just as with general safety, business-safety is not a characteristic of the system alone – it is a characteristic of the system’s interactions with its usage environment. BUCS are considering two groups of stakeholders and wants to help them both:

• The customers and their users. They need methods that enables them to: o Understand the dangers that can occur when they start to use the system as

part of their business. o Write or state requirements to the developers so that they can take care of the

risks incurred when operating the system. • The developers. They need help to implement the system so that:

o It can be made business-safe. o They can support their claims with analysis and documentation. o It is possible to change the systems in such a way that when the operating

environment or profile changes, the systems are still business-safe. BUCS aim to help the developers to build a business-safe system without large increases in development costs or schedule. This is achieved by the following contributions from BUCS:

BC1 A set of methods for analysing business-safety concerns. These methods are adapted to the software development process in general and – for the first version – especially to the Rational Unified Process (RUP).

BC2 A systematic approach for analysing, understanding, and protecting against business-safety related events.

BC3 A method for testing that the customers’ business-safety concerns are adequately taken care of in the implementation.

Why should development organizations do something that costs extra, i.e. is this a smart business proposition? We definitively mean that the answer is “Yes”, and for the following reasons:

•••• The only solution most companies have to offer to customers with business-safety concerns today is that the developers will be more careful and test more – this is not a good enough solution.

•••• By building a business-safe system, the developers will help the customer to achieve an efficient operation of their business and thus build an image of a company that have their customers’ interest in focus. Applying new methods to increase the products’ business-safety must thus be viewed as an investment. The return on the investment will come as more business from large, important customers.

4

BUCS will not invent entirely new methods. What we will do, is to take commonly used methods, especially from the area of systems safety such as Hazard Analysis and FMEA, and adapt them to more mainstream software development. This is done by extending the methods, making them: • More practical to use in a software development environment. • Suitable to fit into the ways developers work in a software project environment –

concerning both process and related software tools and methods.

1.3 Research design As stated in the BUCS project proposal, “The principal goal is through empirical studies to understand and improve the software technologies and processes used for developing business-critical software” [BUCS02]. This entails both quantitative and qualitative studies, and in some cases a combination. Several aspects have to be considered when performing such studies, and particularly: • Deciding on the metrics used in the investigations. • Deciding on the process of retrieving information (data mining, observation,

surveys). Members of the BUCS project have conducted interviews, experiments, data analysis, surveys, and case studies. The methods employed in this part of the BUCS project are structured interviews, historical data mining and analysis, and case studies.

1.4 Research questions and contributions The goal of this research is to explore quality issues of business-critical software, with focus on fault reporting and management, as well as the use of safety analysis techniques for this type of software development. In this thesis, four overall research questions have been defined: RQ1. What is the role of fault reporting in existing industrial software development? RQ2. How can we improve on existing fault reporting processes? RQ3. What are the most common and severe fault types, and how can we reduce them

in number and severity? RQ4. How can we use safety analysis techniques together with fault report analysis to

improve the development process?

5

First fault report

analysis study

2005

Second fault report

analysis study

2006

Assessing hazard

analysis vs. fault

report analysis

2007

Preliminary

Interview Study

2003

Literature study

of safety

methods

2004

Interviews on fault

reports

2007

Experiences on

fault reporting

2007

Phase 1 Phase 2 Phase 3

C

P

June 2007June 2003

Quantitative

study

Qualitative

study

Contribution

Paper

Input

P1

P2

P3

P4

P7

P6

C1

C1

C2

C2

C2

C1

C2

C3

C3

C3

C3

P5

Industrial cooperation

Study 1

Study 2

Study 3

Study 4

Study 5

Study 7

Study 6

Figure 1-1 The studies with their related papers and contributions The research questions together with the studies performed have resulted in the following contributions:

C1. Describing how to utilize safety criticality techniques to improve the development process for business-critical software.

C2. Identification of typical shortcomings in fault reporting. C3. Improved model of fault origins and types for business-critical software.

Figure 1-1 illustrates how the studies, contributions and research papers are connected. It also shows the time and sequence of the studies and how the different studies have influenced each other with input and experience. The background cloud shows which studies were performed with industrial cooperation.

1.5 Included research papers This thesis includes seven papers numbered P1 to P7, whose full text is included verbatim in Appendix A. The papers are briefly described in the following:

6

P1. Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer: "Safety activities during early software project phases", In Proc. Norwegian Informatics Conference (NIK'04), pp. 180-191, Stavanger, 29. Nov. - 1. Dec. 2004. Relevance to the thesis: This paper describes the introduction and use of safety criticality analysis techniques in early project phases. It presents several relevant techniques and how they can be combined with a common development methodology like RUP. My contribution: I was the leading author and contributed 80% of the work, including literature review and paper writing. P2. Jon Arvid Børretzen and Reidar Conradi: "A study of Fault Reports in Commercial Projects", In Jürgen Münch and Matias Vierimaa (Eds): Proc. 7th International Conference on Product Focused Software Process Improvement (PROFES'2006), pp. 389-394, Amsterdam, the Netherlands, 12-14 June 2006. Relevance to the thesis: This paper presents work done in the area of fault report analysis, and describes how using a fault categorization scheme can help identify problem areas in the development process. My contribution: I was the leading author and contributed 80% of the work, including research design, data collection, data analysis and paper writing. P3. Parastoo Mohagheghi, Reidar Conradi, and Jon A. Børretzen: "Revisiting the Problem of Using Problem Reports for Quality Assessment", In Kenneth Anderson (Ed.): Proc. the 4th Workshop on Software Quality, held at ICSE'06, 21 May 2006 - as part of Proc. 28th International Conference on Software Engineering & Co-Located Workshops, 21-26 May 2006, Shanghai, P. R. China, ACM Press 2006, ISBN 1-59593-085-X, ISSN 0270-5257 Relevance to the thesis: This paper describes experience with working with problem reports from industry. It discusses several problems with using this type of data and how they can be used for assessing software quality. My contribution: I contributed on 30% of the work, including data collection and analysis, commenting the data material and draft paper. P4. Jon Arvid Børretzen and Jostein Dyre-Hansen: Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical Study, Proceedings of the European Systems & Software Process Improvement and Innovation Conference 2007 (EuroSPI07), pp. 212-223, Potsdam, Germany, 26-28 Sept. 2007. Relevance to the thesis: This paper continues the fault report study focus, refining the design and execution of the previous study and confirming several of our findings. My contribution: I was the leading author and contributed 80% of the work, including research design, data collection, data analysis and paper writing.

7

P5. Jingyue Li, Anita Gupta, Jon Arvid Børretzen, and Reidar Conradi: "The Empirical Studies on Quality Benefits of Reusing Software Components" Proc. The First IEEE International Workshop on Quality Oriented Reuse of Software (QUORS'2007), held in conjunction with IEEE COMPSAC 2007, 5 p, Beijing, July 23-27, 2007. Relevance to the thesis: This paper uses data from our first fault report study, and presents a study where defect types are compared in reusable components with non-reusable components. My contribution: I contributed 20% of the work, including theory definition, data collection, data analysis and commenting on data material, results and draft paper. P6. Jon Arvid Børretzen: “Fault classification and fault management: Experiences from a software developer perspective”. 14 pages, submitted to Journal of Systems and Software. Relevance to the thesis: This paper presents findings from a series of interviews performed with developers involved in fault reporting, and seeks to describe problems and issues in fault management and reporting as seen from the practitioners’ viewpoint. My contribution: I contributed 95% of the work, including interviews, transcription, coding, analysis and paper writing. P7. Jon Arvid Børretzen: “Using Hazard Identification to Identify Potential Software Faults: A Proposed Method and Case Study”. 10 pages, submitted to the First International Conference on Software Testing, Verification and Validation, Lillehammer, Norway, April 9-11, 2008. Relevance to the thesis: This paper seeks to combine the knowledge gained from fault reports analysis with the potential of hazard analysis techniques, and proposes a novel method for doing this. My contribution: I contributed 85% of the work, including Hazard analysis, Fault report analysis, data analysis and paper writing.

1.6 Thesis structure Chapter 2 deals with issues about software engineering in general and state-of-the-art, including software criticality and especially business-critical software and an overview of the most important challenges in these areas. Chapter 3 presents the context for the BUCS project and research, the methods used and the research questions for this thesis. Chapter 4 presents the results of the studies performed. An evaluation of the contributions and results are made in Chapter 5. Chapter 6 sums up the thesis work and present relevant issues for further work. Figure 1-2 illustrates how the thesis is composed.

8

Theory and state-

of-the-art

Chapter 2

Research Context

and design

Chapter 3

Conclusions and

future work

Chapter 6

Evaluation

Chapter 5

Results

Chapter 4

Figure 1-2 The structure of this thesis In this thesis I have used the term “we” when presenting the work, both when presenting my description of the work in Chapters 1-6 and in the collaborative work from the papers P1 to P7 – in Appendix A.

9

2 State-of-the-art This chapter describes the challenges in software engineering that are the motivation behind improving approaches for business-critical software development. Then, there is a presentation of literature related to business-critical software development. The definitions of these subjects are discussed and research challenges are described for each of them. Finally, the chapter is summarized and the research challenges are described related to the studies in this thesis.

2.1 Introduction In the engineering of business-critical software systems, as in the engineering of other software systems, there is a multitude of different methods, techniques and processes being employed by industry. Since business-critical applications are not really an established phrase or topic within the software engineering community, it is therefore difficult to point out specific methods and techniques that are being used when business-critical systems are being developed. Instead, the most common methods of software engineering will be presented in the following, with additional comments on how they may be best utilized to aid the development of business-critical systems. Also, a presentation of methods from the development of safety-critical applications will be made as these are relevant for use in the BUCS project in general.

2.2 Software engineering Software Engineering is an engineering discipline dealing with all aspects of software development from the early stages of system specification to maintaining the system after it has gone into use. Software engineering is the profession concerned with creation and maintenance of software by applying computer technology, project management, domain knowledge, and other skills and technologies. Fairley says that: "Software engineering is the technological and managerial discipline concerned with systematic production and maintenance of software products" of required functionality and quality "on time and within cost estimates" [Fairley85]. On the other hand, software systems have social and economic value, by making people more productive, improving their quality of life, and making them able to perform work and actions that would otherwise be impossible, like controlling a modern aeroplane. Software engineering technologies and practices help developers by improving productivity and quality. The field has been evolving continuously from its early days

10

in the 1940s until today in the 2000s. The ongoing goal is to improve technologies and practices, seeking to improve the productivity of practitioners and the quality of applications for the users. The effort in software engineering technology was stepped up due to the “software crisis” (a term coined in 1968), which identified many problems of software development [Glass94]. Many software projects ran over budget and schedule. Some projects caused property damage, and a few projects actually caused loss of life. The software crisis was originally defined in terms of productivity, but evolved to emphasize quality. The most common result of failed software development projects are projects that overrun their schedule and budget, but more serious consequences may also be the result of poorly executed software projects.1 Cost and Budget Overruns: A survey conducted at the Simula Research Laboratory in 2003 showed that 37% of the investigated projects used more effort than estimated. The average effort overrun was 41%, with 67% in projects with a public client, and 21% for projects with a private client [Moløkken04].

Property Damage: Software defects can cause property damage. Poor software security allows hackers to steal identities, and defective control systems can damage the physical systems the software is controlling. The result is lost time, money, and damaged reputation. The expensive European Ariane 5 rocket exploded on its virgin voyage in 1996, because its software operated under different flight conditions than the software was designed for [Kropp98].

Life and Death: Defects in software can be lethal. Some software systems used in radiotherapy machines failed so gravely that they administered lethal doses of radiation to patients [Leveson95]. The use of the term “software crisis” has been slowly fading out; perhaps because the software engineering community have come to the understanding that it is unrealistic and unproductive to remain in crisis mode for this many years. Software engineers are accepting that the problems of software engineering are truly difficult and only hard work over a long period of time can solve them. Processes and methods have become major parts of software engineering, e.g. object-orientation (OO) and the Rational Unified Process (RUP). Studies have however shown, that many practitioners resist formalized processes, which often treats them impersonally like machines, rather than creative people [Thomas96]. The profession of software engineering is important, and has made big improvements since 1968, even though it is not perfect. Software engineering is a relatively young field, and practitioners and researchers continually work to improve the technologies and practices, in order to improve the final products and to better comply with the needs of the users and customers.

1 Peter G. Neumann has done much work on this subject, and edits a contemporary list of software problems and

disasters on his website http://catless.ncl.ac.uk/Risks/ [Neumann07].

11

2.3 Software Quality The word quality can have several meanings and definitions, even though most of these definitions try to communicate practically the same idea. Often, the context in which the quality is to be judged, decides which definition that will be used. The context could be user-orientated, product-oriented, production-oriented or even emotionally oriented. ISO defines quality as “The totality of features and characteristics of a product/service that bears upon its ability to satisfy stated or implied needs” [ISO 8402]. Another ISO definition is “Quality: ability of a set of inherent characteristics of a product, system or process to fulfil requirements of customers and other interested parties” [ISO 9001]. Aune presents the following simplified definitions from the ISO 8402 standard [Aune00]:

1. Quality: Conformity with requirements (or needs, expectations, specifications) 2. Quality: The satisfaction of the customer 3. Quality: Suitability for use (at a given time)

Software quality in terms of reliability is often related to faults and failures, e.g. in number of faults found, or failure rate over a period of time during use. Added to this, as the before mentioned definitions imply, there are other quality factors that are important, e.g. the software’s ability to be used in an effective way (i.e. its usability). There is a multitude of concepts that together can be used to define quality, where the importance of a given factor or characteristic depends on the software context. Reliability, Usability, Safety, Security, Availability and Performance are common examples. The glossary in Appendix A describes some of the relevant quality attributes.

2.3.1 Software Quality practices Quality Assurance (QA) QA is the planned and systematic efforts needed to gain sufficient confidence in that a product or a service will satisfy stated requirements to quality (e.g. degree of safety/reliability). Alternatively, QA is control of product and process throughout software development, so that we increase the probability that we manage to fulfil the requirements specifications. Software QA involves the entire software development process, monitoring and improving the process, making sure that any agreed-upon standards and procedures are followed, and ensuring that problems are found and dealt with. QA work is oriented towards problem “prevention”. Solving problems is a high-visibility process; preventing problems is low-visibility. Among the duties of a QA team are certification and standardization work, as well as internal inspections and reviews. Other relevant QA tasks are inspections, testing, verification and validation, some of which are presented further in section 2.5.4. Software Process Improvement (SPI) Software Process Improvement is basically systematic improvement of the work processes used in a software-producing organization, based on organizational goals and

12

backed by empirical studies and results. Capability Maturity Model Integration (CMMI) and ISO 9000 are examples of ways to assess and certify software processes. Statistical Process Control (SPC) and the Goals/Question/Metric (GQM) paradigm are examples of methods used to implement Software Process Improvement [Dybå00], but these require a certain level of stability in an organization to be applicable. To be able to measure improvement, we have to introduce measurement into software development processes. SPI initiatives are generally based on measurement of processes, followed by results and information feedback into the process under study. The work in this thesis is directed towards measurement of faults in software, and how this information may be used to improve the software process and product.

2.4 Anomalies: Faults, errors, failures and hazards Improving software quality is a goal of most software development organizations. This is not a trivial task, and different stakeholders will have different views on what software quality is. In addition, the character of the actual software will influence what is considered the most important quality attributes of that software. For many organizations, analyzing routinely collected data could be used to improve their process and product quality. Fault reports is one possible source of such data, and research shows that fault analysis can be a viable approach to certain parts of software process improvement [Grady92]. One important issue in developing business-critical software is to remove possible causes for failure, which may lead to wrong operations of the system. In our studies we will investigate fault reports from business-critical industrial software projects. Software quality encompasses a great number of properties or attributes. The ISO 9126 standard defines many of these attributes as sub-attributes of the term “quality of use” [ISO91]. When speaking about business-critical systems, the critical quality attribute is often experienced as the dependability of the system. In [Laprie95], Laprie states that “a computer system’s dependability is the quality of the delivered service such that reliance can justifiably be placed on this service.” According to [Avizienis04] and [Littlewood00], dependability is a software quality attribute that encompasses several other attributes, especially reliability, availability, safety, integrity and maintainability2. The term dependability can also be regarded subjectively as the “amount of trust one has in the system”. Quality-of-Service (QoS) is the dependability plus performance, usability and certain provision aspects [Emstad03]. Much effort has been put into reducing the probability of software failures, but this has not removed the need for post-release fault-fixing. Faults in the software are detrimental to the software’s quality, to a greater or lesser extent dependent on the nature and severity of the fault. Therefore, one way to improve the quality of developed software is to reduce the number of faults introduced into the system during initial development.

2 In Laprie’s initial dependability definition, the attribute security was present, while the attributes integrity and

maintainability were not [Laprie95].

13

Faults are potential flaws (i.e. incorrect versus explicitly stated requirements) in a software system, that later may be activated to produce an error (as incorrect internal dynamic state). An error is the execution of a "passive fault", and my lead to a failure (for incorrect external dynamic state). This relationship is illustrated in Figure 2-1. A failure results in observable and incorrect external behaviour and system state. The remedies for errors and failures are to limit the consequences of an active error or failure, in order to resume service. This may be in the form of duplication, repair, containment etc. These kinds of remedies do work, but studies have shown that this kind of downstream (late) protection is more expensive than preventing the faults from being introduced into the code [Leveson95].

Figure 2-1 Relationship between faults, errors, failures and reliability

Faults that unintentionally have been introduced into the system during some lifecycle phase can be discovered either by formal proof or manual inspections before the system is run, by testing during development or when the application is run on site. The discovered faults are then reported in some fault reporting system, to be candidates for later correction. Software may very well have faults that do not lead to failure, since they may never be executed, given the actual context and usage profile. Many such faults will remain in the system unknown to the developers and users. That is, a system with few discovered faults is not necessarily the same as a system with few faults. Indeed, many reported faults may be deemed too “exotic” or irrelevant to correct. Inversely, a system with many reported faults may be a very reliable system, since most relevant faults can have been eliminated. Faults are also commonly known as defects or bugs, while a more extensive concept is anomaly, used in the IEEE 1044 standard [IEEE 1044]. The relationship between faults, errors and failures concerns the reliability dimension. If we look at the safety dimension, we have a relationship between hazards and accidents. A hazard is a state or set of conditions of a system or an object that, together with other conditions in the environment of the system or object, may lead to an accident (safety dimension) [Leveson95]. Leveson defines an accident as “an undesired and unplanned (but not necessatily unexpected) event that results in at least a specified level of loss.”

Fault (static) Potential flaw, erroneous program.

Error (dynamic) Erroneous internal system state.

Failure (dynamic) Erroneous external behaviour.

Reliability

14

The connection between hazards and safety is defined through Leveson’s definition of safety: “Safety is freedom form accidents or losses”. Figure 2-2 illustrates this relationship.

Figure 2-2 Relationship between hazards, accidents and safety

To reduce the chances of critical faults existing in a software system, the latter should be analyzed in the context of its environment and operation to identify possible hazardous events [Leveson95]. Hazard analysis techniques like Failure and Effect Analysis (FMEA) and Hazard and Operability Study (Hazop) can help us to reduce the product risk stemming from such accident. Hazards encompass a greater scope than faults, because a system can be prone to many hazards even if it has no faults. Hazards are related to the system’s environment, not just to the software itself. Therefore they may be present even though the system fulfils the requirements specifications completely, i.e. has no faults. The full lines in Figure 2-3 show the common view of how faults are related to reliability and hazards are related to safety. In parts of the thesis we also suggest that faults may influence safety and hazards may influence reliability, as shown by the dotted lines. Literature searches shows little work that have been done in this specific area, but the fact that faults and hazards do share some characteristics make plausible connections between faults and safety and hazards and reliability also, at least from a pragmatic viewpoint. Avizienis et al. emphasize that fault prevention and fault tolerance aim to provide the ability to deliver a service that can be trusted, while fault removal and fault forecasting aim to reach confidence in that ability, by justifying that the functional, dependability and security specifications are adequate, and that the system is likely to meet them [Avizienis04]. Hence, by working towards techniques that can prevent faults and reduce the number and severity of faults in a system, the quality of the system can be improved in the area of reliability (and thus dependability).

Hazards (static) Potential negative event

Accident/loss (dynamic) Negative effect of event occuring

Safety

15

Figure 2-3 Faults, Hazards, Reliability and Safety A usual course of events leading to a fault report is that someone reports a failure through testing or operation, whereupon a report is logged. This report could initially be classed as a failure report, as it describes what happened when the system failed. As developers examine the report, they will eliminate reported “problems” that were not real failures (often caused by wrong user commands) or duplicates of previously reported ones. Primarily, they work to establish what caused the failure, i.e. the originall fault. When they identify the fault, they can choose to repair the fault and report what the fault was and how it was repaired. The failure report has thus become a fault report. When looking at a large collection of fault/failure reports in a system in testing or operation, some faults have been repaired, while others have not (and may never be). Still, we choose to refer to a report of a software failure as a fault report, even if the fault has not yet been identified, since it is stored with the other fault reports, and work is usually being done to identify the fault that caused the failure. 2.4.1 Reflection and challenges As stated in Section 1.1, the terminology from the literature, although clear and concise in each individual field and source, gets confusing and conflicting when you compare definitions. In our work, we have not tried to redefine the terms and definitions to make them smoothly fit together, we merely want to explain some of our understanding about faults and fault reporting, to the degree it is relevant for the thesis. We still see a need for work unifying concepts, especially in the reliability area. There is great diversity in the literature on the terminology used to report software or system related problems. The possible differences between problems, troubles, bugs, anomalies, defects, errors, faults or failures are discussed in books (e.g., [Fenton97]), standards and classification schemes such as IEEE Std. 1044-1993 [IEEE 1044] the United Kingdom Software Metrics Association (UKSMA)’s scheme [UKSMA], and papers; e.g., [Freimut01]. Until there is agreement on the terminology used in reporting problems, we must be aware of these differences and answer the above questions when using a term.

Faults

Hazards

Reliability

Safety

16

2.5 Current methods and practices

2.5.1 General software engineering paradigms In software engineering there have been many different paradigms or life-cycle models. The most common and well known paradigms are presented in the following. The traditional software process (waterfall): The waterfall model was the first widely used software development model. It was first proposed in 1970 by W. W. Royce [Royce70], in which software development is seen as flowing steadily through the phases of requirements analysis, design, implementation, testing (validation), integration and maintenance. In the original article, Royce advocated using the model repeatedly, in an iterative way. However, many people do not know that, and some have unjustly discredited this paradigm for real use. In practice, the process rarely proceeds in a purely linear fashion. Iterations, by going back to or adapting results of previous stages, are common. The spiral model: The spiral model was defined by Barry Boehm [Boehm88], and combines elements of both design and prototyping in stages, so it's a mix of top-down and bottom-up concepts. This model was not the first model to discuss iteration, but it was the first model to explain why iteration is important. As originally envisioned, the iterations were typically 6 months to 2 years long. This persisted until around 2000. Increasingly, development has turned towards shorter iteration periods, because of higher time-to-market demand. In her doctoral thesis, Parastoo Mohagheghi reports iterations of 2-3 months being common [Mohagheghi04b]. Prototyping, iterative and incremental development: The prototyping model is a software development process that starts with (incomplete) requirements gathering, followed by prototyping and user evaluation. Often the customer/user may not be able to provide a complete set of application objectives, detailed input, processing, or output requirements at the start. After user evaluation, another prototype will be built based on feedback from users, and again the cycle returns to customer evaluation. Agile methods: The benefits of agile methods for small teams working with rapidly changing requirements have been documented [Beck99]. However, both by proponents and critics, the applicability of agile methods to larger projects is hotly debated. Large-scale projects, with high QA requirements, have traditionally been seen as the home-ground for plan-driven software development methods. Deciding when to use agile methods also depends on the values and principles that a developer wishes to be reflected in her/his work. Extreme Programming (XP) [Beck99], one of the more popular of the agile methods, is explicit in its demand for developers to follow a "code of software conduct" that transmits these values and principles to the project at-hand. In keeping with the philosophy of agile methods, there is no rigid structure defining when to use any particular feature of these approaches(!).

17

2.5.2 Software Reuse Reuse in software development is a term describing development that includes systematic activities for creation and later incorporation ("reuse") of common, domain-specific artifacts. Reuse can lead to profound technological, practical, economic, and legal obstacles, but the benefits may be substantial. It mostly concerns program artifacts in the form of components. In the SEI’s report [Bachmann00] on technical aspects of CBD, a component is defined as:

• An opaque implementation of functionality. • Subject to third-party composition. • Conformant to component model.

Software development that systematically develops domain-specific and generalized software artifacts for possible, later reuse is called software development for reuse. Software development that systematically makes use of such pre-made, reusable artefacts, is called software development with reuse. Component-based software engineering (CBSE) Component-based software engineering is a field of study within software engineering, building on prior theories of software objects, software architectures, software frameworks and software design patterns, and on extensive theory of object-oriented (OO) programming and design of all these. It claims that software components, like the idea of a hardware component used e.g. in telecommunication systems, can be ultimately made interchangeable and reliable. CBSE is often said to be mostly software development with reuse, and with emphasis on reusing components developed outside the actual project. Commercial Off-The-Shelf (COTS) COTS components are external executable software components being sold, leased, or licensed to the general public; offered by a vendor trying to profit from it; supported and evolved by the vendor, and used by the customers normally without source code access or modification ("black box"). Different ways of incorporating COTS-based activities is described by Li et al. in [Li06]. Open Source Software (OSS) Open Source Software is software released following the principles of the open source movement. In particular, it must be released under an Open Source license as defined by the Open Source Definition, with there being over 50 license types. The Open Source movement is a result of the free software movement, that advocates the term "Open Source Software" as an alternative term for free software, and primarily makes its arguments on pragmatical rather than philosophical grounds. Nearly all Open Source Software is also "Free Software". An OSS component is an external component for which the source code is available ("white box"), and the source code can be acquired either free of charge or for a nominal fee, and with a possible obligation to report back any changes done.

18

2.5.3 Specific software development methods The two following methods are well-known and commonly used in software development. Rational Unified Process (RUP) The Rational Unified Process (RUP) is a software process, design and development method created by the Rational Software Corporation [Rational], and is described in [Kruchten00] and [Kroll03]. It describes how to effectively deploy software using commercially proven techniques. It is really a heavyweight process, and therefore particularly applicable to larger software development teams working on large projects. It is essentially an incremental development process which centers on the Unified Modelling Language (UML) [Fowler04]. It divides a project into four distinct phases; Inception, Elaboration, Construction and Transition. Figure 2-4 shows the overall architecture of the RUP.

Figure 2-4 The Rational Unified Process

Patterns and Architecture-driven methods Design patterns are recurring solutions to problems in object-oriented design. The phrase was introduced to computer science in the 1990s by the text “Design Patterns: elements of reusable object-oriented software” [Gamma95]. The scope of the term remained a matter of dispute into the next decade. Algorithms are not thought of as design patterns, since they solve implementation problems rather than design problems. Typically, a design pattern is thought to encompass a tight interaction of a few classes and objects. Three major terms have been proposed: pattern languages, pattern catalogs and pattern systems [Riehle96]. The architect Christopher Alexander's work on a pattern language, for designing buildings and communities, was the inspiration for the design patterns of software [Price99]. Interest in sharing patterns in the software community has led to a number of books and symposia. The goal of the pattern literature is to make the experience of past designers accessible to beginners and others in the field. Design patterns thus presents

19

different solutions in a common format, to provide a language for discussing design issues.

2.5.4 Techniques for increasing trust in software systems In addition to the general practices of QA and SPI for improving quality in software systems, there are also some specific verification techniques that are commonly used in software development to increase the trust in software. Software verification is a discipline whose goal is to assure that software fully satisfies all the expected requirements, and the following are some well known techniques in use: Testing: Dynamic verification is performed during the execution of software, and dynamically checks its behaviour; it is commonly known as testing. Testing is part of more or less all software development processes, and can be performed at many levels, for instance unit level, interface level and system level. Inspections: An inspection is also a very common sort of review used in software development projects. The goal of the inspection is for all of the inspectors to reach consensus on a work product and approve it for use in the project. Commonly inspected work products include software requirements specifications, design documentation and test plans. In an inspection, a work product is selected for review and a team is gathered for an inspection meeting to review the work product. In an inspection, a defect is any part of the work product that will keep an inspector from approving it. For example, if the team is inspecting a software requirements specification, each defect will be text in the document which an inspector disagrees with. Basili et al. describes an investigation of an inspection technique called perspective-based testing in [Basili00]. Formal methods: Formal methods are mathematically-based techniques for the specification, development and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expectation that, as in other engineering disciplines, performing appropriate mathematical analyses can contribute to the reliability and robustness of a design. However, the high cost of using formal methods means that they are usually only used in the development of high-integrity systems, where safety or security is important. Heimdahl and Heitmayer present some issues concerning formal methods in [Heimdahl98].

2.5.5 Business-Critical computing and related terms At first glance, there is little evidence of work on business-critical computing, when searching the literature. The term “mission-critical” is much more commonly used, and can be interpreted to include many of the characteristics of “business-critical”. The key similarity is that both terms are related to the core activity of an organization, and that the computer systems supporting this activity should not fail. Another term that comes from Software Engineering Institute (SEI) is “performance-critical” [SEI], and has much of the same meaning as “business-critical”.

20

“Safety-critical” systems are closely connected to these former terms, but this term has a more severe meaning. Nonetheless, most of the main characteristics of these terms are the same; i.e. that reliability, availability and similar quality attributes are deemed very important. Safety-critical systems have been much more thoroughly researched than the other types of “-critical” systems, simply because of the seriousness of failure and the potential effects of failure in safety-critical systems.

2.6 Business-critical software As mentioned, our societies’ dependency on timely and well-functioning software systems is increasing. Banking systems, train control systems, airport landing systems, automatic teller machines and industrial process control systems are but examples of the systems many of us are directly or indirectly critically dependent on. Of these, some are highly critical to our safety (e.g. traffic control), while others are critical only in the sense that we are able to perform operations that we want or need to carry out our work/business (for instance cinema ticket sales). That a software-intensive system is business-critical means that:

If and when a system failure occurs, the consequences are restricted to financial or financially related negative implications, not including physical harm to humans, animals or physical objects. The consequences are severe enough to mean a considerable loss of money if the fault or failure is not corrected or averted swiftly enough.

2.6.1 Criticality definitions Business-critical software systems have a lot in common with safety-critical systems, but there are also quite telling differences. A simplistic way to distinguish them is to put them into classes according to the effects that software anomalies (faults or hazards) may have on the environment. The classes are safety-critical, mission-critical, performance-critical, business-critical, and non-critical: Safety-critical: A safety-critical system could be a computer, electronic or electromechanical system where a hazardous event may cause injury or even death to human beings, or physical harm to other objects that interact with the system. Examples are aircraft control systems and nuclear power-station control systems, where an accident in most cases will lead to economic losses as well as injury and other physical damage. Common tools to design safety-critical systems are redundancy and formal methods, and a spectrum of specialized technologies exist for safety-critical systems (Hazop, Fault-tree analysis etc). The IEC 61508 standard is intended to be a basic functional safety standard applicable to all kinds of industry, and is also used to define the safety standards of some safety-critical systems [IEC 61508].

21

Mission-critical: The term mission-critical system reflects military usage and is used to describe activities, processing etc., that are deemed vital to the organization's business success and, possibly, its very existence. Some major software systems are described as mission-critical if such a system, product or service experiences a failure or is otherwise unavailable to the organization, it will have a significant negative impact upon the organization. Such systems typically include support for accounts/billing, customer balances, computer-controlled machinery and production lines, just-in-time ordering, and delivery scheduling. Examples of related technologies are Enterprise Resource Planning tools, such as SAP [SAP]. Performance-critical: The SEI defines performance-criticality as the ability of software-intensive systems to perform successfully under adverse circumstances, e.g., under heavy or unexpected load or in the presence of subsystem failures. One trivial example of this is the performance of the SMS telecom services during New Years Eve. Some services like this can have critical functions, and yet, the behaviour of systems under such circumstances is often less than acceptable [SEI]. Business-critical: The difference between a business-critical and a regular commercial software system is really defined by the business. There is no established general definition telling us which software applications are critical to an operation. In a retail business, a Customer Relationship Management (CRM) system may be the most important. On the other hand, it may be the manufacturing or supplier management software that is the most important. We need to consider the impact of relevant services from software on the business operations, and determine how much value each brings to the business and the impacts of such software parts being unavailable. The impact can be lost revenue, corrupted data or lost user time, as well as indirect and more elusive losses in customer reputation, goodwill, slipped deadlines, and increased levels of stress among employees and customers. Non-critical: Although important enough, some types of software will simply not be classified as critical. Word processors, spreadsheets and graphical design software are examples of such software. Of course it is expected that such tools are reasonably fault-free and stable, but should they fail, the damage will usually be limited, typically a person-day of effort in the worst case scenario. Figure 2-5 shows the relationship between business-criticality and the other types of criticality defined here. As we see, safety-, performance-, and mission-critical systems can also be business-critical, but a business-critical system need not be one of the others. Table 2-1 illustrate the overlap between the different categories.

22

Figure 2-5 Relationship of business-critical and other types of criticality

Table 2-1 Examples of different systems’ criticality Criticality category

Example

Safety-critical Nuclear reactor control system. Performance-critical Electronic toll collection in traffic, must process and transfer

information quickly enough to keep up with traffic. Mission-critical Software handling financial transactions between banks.

Functional and non-functional aspects of such applications are considered.

Business-critical Software handling financial transactions between banks. As mission-critical, but wider consequences are also considered.

Non-critical Computer games, word processor application.

2.7 Techniques and methods used to develop safety-critical systems There are a number of methods and techniques that are commonly employed when making safety-critical systems. Some of them will be presented here and related to business-critical computing. According to [Leveson95] and [Rausand91], the most common ones are the following: o PHA (Preliminary Hazard analysis): Preliminary Hazard Analysis (PHA) is used

in the early project life cycle stages to identify critical system functions and broad system hazards, so as to enable hazard elimination, reduction or control further on in the project. The identified hazards are assessed and prioritized, and safety design criteria and requirements are identified. A PHA is started early in the concept

23

exploration phase so that safety considerations are included in tradeoff studies and design alternatives. This process is iterative, with the PHA being updated as more information about the design is obtained and as changes are being made. The results serve as a baseline for later analysis and are used in developing system safety requirements and in the preparation of performance and design specifications. Since PHA starts at the concept formation stage of a project, little detail is available, and the assessments of hazard and risk levels are therefore qualitative. A PHA should be performed by a small group with good knowledge about the system specifications.

o HAZOP (Hazard and Operability Analysis): This is a method to identify possible

safety-related or operational problems that can occur during the use and maintenance of a system. Both Preliminary Hazard Analysis and Hazard and Operability Analysis (HAZOP) are performed to identify hazards and potential problems that the stakeholders see at the conceptual stage, and that could be created by system usage. A HAZOP study is a systematic analysis of how deviations from the intended design specifications in a system can arise, and whether these deviations can result in hazards. Both analysis methods build on information that is available at an early stage of the project. This information can be used to reduce the severity or build safeguards against the effects of the identified hazards. HAZOP is a creative team method, using a set of guidewords to trigger creative thinking among the stakeholders and the cross-functional team in RUP. The guidewords are applied to all parts and aspects of the system concept plan and early design documents, to find and eliminate possible deviations from design intentions. An example of a guideword is MORE. This will mean an increase of some quantity in the system. For example, by using the “MORE” guideword on “a customer client application”, you would have “MORE customer client applications”, which could spark ideas like “How will the system react if the servers get swamped with customer client requests?” and “How will we deal with many different client application versions making requests to the servers?” A HAZOP study is conducted by a team consisting of four to eight persons with a detailed knowledge of the system to be analysed. The main difference between HazOp and PHA is that PHA is a lighter method that needs less effort and available information than the HAZOP method. Since HAZOP is a more thorough and systematic analysis method, the results will be more specific. If there is enough information available for a HAZOP study, and the development team can spare the effort, a HAZOP study will most likely produce more precise and suitable results for a safety requirement specification.

o FMEA (Failure Modes and Effects Analysis): The method of Failure Modes and

Effects Analysis, alternatively the variant Failure Modes, Effects and Criticality Analysis (FMECA), is used to study the potential effects of fault occurrences in a system. Failure Modes and Effects Analysis is a method for analyzing potential reliability problems early in the development cycle. Here, it is easier to overcome such issues, thereby enhancing the reliability through design. FMEA is used to identify potential failure modes, determine their effect on the operation of the system, and identify actions to mitigate such failures. A crucial step is anticipating what might go wrong with a product. While anticipating every failure mode is not

24

possible, the development team should formulate a extensive list of potential failure modes. Early and consistent use of FMEAs in the design process can help the engineer to design out failures and produce more reliable and safe products. FMEAs can also be used to capture historical information for use in future product improvement.

o FTA (Fault Tree Analysis): A Fault Tree Analysis diagram is a logical diagram

which illustrates the connection between an unwanted event and the causes of this event. The causes can include environment factors, human error, strange combinations of “innocent” events, normal events and outright component failures. The two main results are: 1) The fault tree diagram which shows the logical structure of failure effects. 2) The cut-sets, which show the sets of events which can cause the top event – system failure. If we can assign probability values or failure rates to each basic event, we can also get quantitative predictions for Mean Time To Failure (MTTF) and failure rate for the system.

o ETA (Event-tree analysis): An event-tree is a graphical representation of a

sequence of related events. Each branching point in the tree is a point in time where we can get one of two or more possible consequences. The event-tree can be described with or without branching probabilities. In economical analyses it is customary to assign a benefit or cost to each possible alternative – or branch. An event tree can help our understanding and documentation of one or more sequences of events in a system or part of a system. Areas where we can use event-trees are: 1) Study of error propagation through a complete system – people, operational procedures, hardware, and software. 2) Build usage scenarios to enhance HazOp: “what could happen if…?”

o CCA (Cause-Consequence Analysis): Cause-consequence analysis (CCA) is a

two-part system safety analytical technique that combines Fault Tree Analysis and Event Tree Analysis. Fault Tree Analysis considers the “causes” and Event Tree Analysis considers the “consequences”, and hence both deductive and inductive analysis is used. The purpose of CCA is to identify chains of events that can result in unwanted consequences. With the probabilities of the various events in a CCA diagram, the probabilities of the various consequences can be calculated, thus establishing the risk level of the system. A CCA starts with a critical event and determines the causes of the event (using top-down or backward search) and the consequences it might create (using forward search). The cause-consequence diagram can show both temporal dependencies and causal relationships among events. The notation builds on the FTA and ETA notations, and extends these with timing, condition and decision alternatives. The result is a diagram (along with elaborated documentation), showing both a logical structure of the cause of a critical event and a graphical representation of the effect the critical event can have on the system. CCA enables probability assessments of success/failure outcomes at staged increments of system examination. Also, the CCA method helps in creating a link between the FTA and ETA methods. CCA shows the sequence of events explicitly, which makes CCA diagrams especially useful in studying start-up, shutdown and other sequential control issues. Other advantages are that multiple outcomes are

25

analyzed from each critical event, and different levels of success/failure are distinguishable, as CCA may be used for quantitative assessment.

In addition to these techniques, we included the Safety Case method tool for use alongside the other safety criticality analysis methods. The purpose is to keep track of the requirements and information acquired when using safety criticality analysis methods. Usage of the Safety Case method is also presented in paper P1. o Safety Case: The Safety Case method seeks to minimise safety risks and

commercial risks by constructing a demonstrable safety case. Bishop and Bloomfield [Adelard98, Bishop98] define a safety case as: “A documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment”. The safety case method is a vehicle for managing safety claims, containing a reasoned argument that a system is or will be safe. It is manifested as a collection of data, metadata and logical arguments. The Safety Case documents will answer questions like “How will we argue that this system can be trusted/ is safe?” The Safety Case shows how safety requirements are decomposed and addressed, and will provide an appropriate answer to the above questions. The layered structure of the Safety Case allows lifetime evolution and helps to establish the safety requirements at different detail levels.

Table 2-2 shows a comparison of the safety criticality analysis methods we have considered. The properties shown are relevant when choosing between such analysis techniques. The costs involved are described for each method by the properties “Formalization” and “Effort needed”. Other properties are the requirements for available system information, which can range from a sketchy system description to a full system description including all technical documentation and code. The process stage is also important, as it tell us where in the development cycle the technique is best suited.

Table 2-2 Properties of some safety criticality analysis techniques PHA HAZOP FMEA Formalization Low Moderate High Effort needed Low Moderate Moderate

Process stage Early Middle Middle System information requirements

Low. Any system information.

Moderate. Specification and design documentation of the system.

Moderate. Detailed information about the system.

Technique output

Identification of hazards, their causes, effects and possible barriers or measures.

Identification of possible safety- or operational problems, their causes, effects and suggested solution.

Identification of fault modes for all components, their causes, effects and severity.

Participant roles/groups

Small group (with moderator).

Moderator, secretary, 4-6 domain experts.

System developers with good knowledge of the system’s operating environment.

Application Identifying hazards. Identifying hazards. Predict events.

26

ETA CCA FTA Safety Case Formalization Moderate High High Moderate Effort needed Low Moderate Moderate Moderate/high Process stage Early Middle Late Entire System information requirements

High. Quantitative reliability data for the analyzed parts.

High. As for ETA and FTA.

High. Knowledge about the system’s failure modes (from FMEA analysis).

High. All available information concerning system safety, and related documentation.

Output Identification of event chains that could lead to accidents.

Combination of ETA and FTA, where time sequenced events and discrete, staged levels of outcome are shown.

Logical illustration of the relationship between an unwanted event and its causes.

A documented body of evidence that provides a valid argument that a system is adequately safe for a given application in a given environment.

Participant roles/groups

1-4 persons, depending on the size of the system, some with knowledge of the system and its environment.

All personnel involved in software safety work.

Application Predict events. Combines ETA and FTA.

Analyzing causes of hazards.

Building a case for safe systems.

2.8 Empirical Software Engineering Empirical Software engineering is not software development per se, but a branch of software engineering research and practice which emphasizes empirical studies to investigate processes, methods, techniques and technology. According to Votta et al., the goal of empirical software engineering is to construct a “credible empirical base ... that is of value to both professional developers and researchers” [Votta95]. They argue that empirical software engineering inherits most of the methodological approaches and techniques of social sciences, since its goal is to examine complex social settings, contexts where the human interaction is the most critical factor determining the quality and effectiveness of the results being produced. In particular, empirical work is accomplished through the execution of empirical studies. This entails observations of specific settings, where the purpose is collecting and analysing/deriving useful information on their behaviour and attributes. Empirical studies can be classified in three categories, according to the increasing degree of rigor and confidence in the results of the study [Wohlin00]:

(1) Surveys (2) Case studies (3) Controlled experiments

27

This is in line with the classification made by Votta et al. where the different types of investigations are said to be anecdotal studies, case studies and experiments [Votta95]. It can be argued that surveys and case studies should swap positions, as case studies do tend to be hard to replicate and very open ended, while surveys can be made very rigorous and are definitely suitable for replication. Zelkowitz et al. describe different ways of studying technology [Zelkowitz98], and states that the empirical method is the following: “Empirical method: A statistical method is proposed as a means to validate a given hypothesis. Unlike the scientific method, there may not be a formal model or theory describing the hypothesis. Data is collected to verify the hypothesis.” Zelkowitz et al. describe 12 different ways of studying technology, in three dimensions: Observational, Historical and Controlled. The different ways are described in detail in [Zelkowitz98], and the the models they propose for validating technology are shown in table 2-3. In addition to these, there are a few other techniques commonly used in empirical software engineering. Firstly, as previously mentioned Wohlin et al. discusses surveys in [Wohlin00]. Secondly, the Action Research method is presented by Avison et al. This method involves researchers and practitioners acting together on a set of activities, including problem diagnosis, action intervention and reflective learning [Avison99]. Finally, a research method that is becoming more used in studies of software developing organizations is Grounded Theory which emphasizes generation of theory from data. This method originates from the sociologists Barney Glaser and Anselm Strauss and is presented in [Strauss98].

Table 2-3 12 ways of studying technology, from [Zelkowitz98] Validation method Category Description Project monitoring Observational Collect development data Case study Observational Monitor project in depth Assertion Observational Use ad hoc validation techniques Field study Observational Monitor multiple projects Literature search Historical Examine previously published studies Legacy Historical Examine data from completed projects Lessons learned Historical Examine qualitative data from completed projects Static analysis Historical Examine structure of developed product Replicated Controlled Develop multiple versions of product Synthetic Controlled Replicate one factor in laboratory setting Dynamic analysis Controlled Execute developed product for performance Simulation Controlled Execute product with artificial data

2.8.1 Research strategies in Empirical Research There are three main types of research strategies, each with distinct approaches to empirical studies, and they may all be used for Empirical Software Engineering [Wohlin00][Seaman99]:

• Quantitative approaches are mainly used to quantify a relationship or comparing groups, with the aim being to identify cause-effect relationships, verifying hypotheses or testing theories.

28

• Qualitative approaches are observational studies with the aim of interpreting a phenomenon based on information collected from various sources. This information is usually subjective and non-numeric.

• Mixed-method approaches are used to overcome limitations in the two strategies above, by triangulation of data and combining the advantages of the two former ones.

Table 2-4 gives an overview of empirical research approaches and examples of strategies for each. This is taken from [Moghaghegi04b] and [Creswell03]. The boundaries between these approaches are not sharp. For instance, case studies can combine qualitative and qualitative studies, and although case studies are often classed as qualitative in nature Yin states that case studies do not by their nature equal qualitative research [Yin03].

Table 2-4 Empirical research approaches

Approaches Quantitative Qualitative Mixed methods

Strategies • Experimental design

• Surveys • Case studies

• Ethnographies • Grounded theory • Case studies • Surveys

• Sequential • Concurrent • Transformative

Methods • Predetermined • Instrument-based

questions • Numeric data • Statistical analysis

• Emerging methods • Open-ended

questions • Interview data • Observation data • Document data • Text and image

analysis

• Both predetermined and emerging methods

• Multiple forms of data drawing on several possibilities

• Statistical and text analysis

Knowledge claims

Postpositivism: • Theory

verification • Empirical

observation and measurement

Constructivism: • Theory generation • Understanding • Interpretations of

data

Pragmatism: • Consequences of

action • Problem-centered • Pluralistic

29

2.9 Main challenges in business-critical software engineering The following is a short list of some of the general challenges in software engineering today. They are sometimes cited as reasons for difficulties in software engineering, e.g. in [Charette05]:

• Poor Requirements • Rising Complexity • Ongoing Change • Failure to Pinpoint Causes

In our work, we concentrate on issues dealing with reduction of product risk, improving requirement specifications, coping with complexity, and helping with pinpointing causes for failures. This adds up to a main challenge for business-critical software development:

• Developing methods that help reduce product risk, without increasing costs too much compared to the received benefit from these methods.

What this means is that we need to introduce low cost methods and techniques that focus on important areas, so as to spend just a little more effort to reduce the largest problems. This is in line with Boehm’s notion of value-based software engineering, where the main agenda is developing an overall framework to better focus effort where it is needed [Boehm03]. As far as empirical software engineering research is concerned, an important challenge is that of following up research and technology change proposals with continued observation and measurement in the field when practitioners put theory into practice. Much of the research being performed concerns the software development process up to a point, but does not follow the eventual implementation of these results further.

31

3 Research Context and Design In this chapter, the project context is presented with more details. The overall research design is presented, and it combines quantitative and qualitative studies. Finally a more detailed description of each study is given.

3.1 BUCS Context In the last decade, computers have taken on a more important role in several areas of commercial business. As an effect of this, many of the functions required by industry and services depend on software and computer systems. Failures in such systems can have serious consequences for businesses that depend on these systems for their livelihood. As in many related areas, there will be substantial savings by discovering, reducing or removing these potential failures early in the system’s life-cycle. In fact, most potential failures should be possible to reduce very early in the process of the system’s development. The main goal for the BUCS project is to better understand and to improve the software technologies and processes used for developing business-critical software. Much of the information about current practises and possible problems was collected at Norwegian IT companies. It was important that the relationship between the BUCS project and the involved companies and organizations was based on mutual profit for both parties. The BUCS project have – through literature reviews, controlled experiments, historical data analysis and case studies – investigated methods from the area of safety-critical software. These methods include Preliminary Hazard Analysis (PHA), HazOp, Fault Tree Analysis (FTA), Cause-Consequence Diagrams (CCD), and Failure Mode and Effect Analysis (FMEA). We have also studied important standards in this area, such as IEC 61508 – a standard for functional safety. The effects of both methods and standards have been studied using controlled student experiments and through industrial case studies. All the above-mentioned methods are rather general. They can be applied to both local and distributed systems, they can be used on hardware, software and “wetware” (people). This is specially important when we are dealing with a problem as wide and diverse as business criticality. Important techniques that we have sought to adapt from the development of safety-critical software are mainly PHA, HazOp, and FMEA.

32

As well as the industrial focus from the BUCS basic R&D project, some of the studies in this thesis also cooperated with organizations that were involved in the EVISOFT project, an industrially-driven research project [EVISOFT].

3.2 Research Focus When deciding the focus areas of this thesis, the input was the BUCS project context, less-exploited research areas, and available sources for research data. During our literature studies and after contact with Norwegian IT-companies some key areas to focus our work in this thesis was decided:

• Business-criticality in terms of software faults • Fault report analysis • Fault reporting in software development

In terms of goals for the research, we formulated the following research questions:

• RQ1. What is the role of fault reporting in existing industrial software development?

• RQ2. How can we improve on existing fault reporting processes? • RQ3. What are the most common and severe fault types, and how can we reduce

them in number and severity? • RQ4. How can we use safety analysis techniques together with failure report

analysis to improve the development process? To obtain answers to these research questions, we decided on common metrics for our studies. We started broadly, including attributes like structural fault location, functional fault location and fault repair effort. When we received actual data from industrial projects, we had to reduce the scope somewhat. This was due to lack of complete information in the data material, and great variation between organizations on what data they stored. The main metrics we identified for fault report studies were:

• The number of detected faults is an indirect metric, attained simply by counting the number of faults of a certain type or for a certain system part etc.

• The metrics that are used directly from the data in the fault reports are the reported type, severity, priority, and release version of the fault.

The reasons why we decided to focus on software faults were several:

• The BUCS project is concerned with business-critical systems. A recurring theme in the definition of business-criticality is that the major threat to such systems is failures that stop or limit the use of the system. As described in section 2.4, “Faults are potential flaws in a software system, that later may be activated to produce an error. An error is the execution of a "passive fault", leading to a failure.” This means that by working to reduce the number and criticality of faults in the software, we would also reduce the number or frequency of failures.

33

• As Avizienis et al. suggests, one way of attaining better dependability in a system is fault removal to reduce the number and severity of faults [Avizienis04]. By working to identify critical or numerous fault types, developers can eliminate a larger number of faults by focusing on preventing such fault types.

• As far as industrial data, fault report data is abundant in most software developing organizations. Thus we had a wide array of potential industrial partners to collect data from.

As shown in Figure 1-1, the studies we have performed are all connected on the topic of business-critical software and fault report analysis, and have been performed in sequence. A short description of the studies is given in Table 3-1, and each study is elaborated in Section 3.3.

Table 3-1 Description of our studies Study # Description Study 1 (2003)

Structured interviews: Preliminary interviews about business-critical software and state-of-practice in Norwegian IT industry.

Study 2 (2004)

Literature review: Software Criticality Techniques, Fault reporting and management literature.

Study 3 (2004-05)

Historical data mining: Fault report analysis of industrial projects from four organizations.

Study 4 (2006-07)

Historical data mining: Fault report analysis of industrial projects.

Study 5 (2007)

Structured interviews: Exploring the results from study 4 further, regarding fault report analysis and fault reporting processes.

Study 6 (2006-07)

Case study: Comparison of hazard analysis and fault report analysis in a practical setting.

Study 7 (2004-07)

Lessons learned from our experiences with fault report studies.

3.2.1 Data collection Before starting to plan and conduct our empirical studies, we decided on the goals of our studies, which types of studies we were going to perform, and which data sources we were going to need to complete them. Data collection was split up in three phases. First, there was a pre-study phase of initial data collection and pre-analysis of this to narrow down research areas and questions. Second, we would focus the data collection

34

on more deeper issues that seemed the most relevant. Finally, there would be an analysis phase to summarize and reflect and to collect lessons learned. As the BUCS project is aimed at supporting business-critical systems, and part of the BUCS goals was close cooperation with Norwegian IT industry, we chose early on to focus on empirical studies of Norwegian commercial projects and organizations. This meant that we had to contact and select relevant organizations developing business-critical applications. This raised the issue of which organizations we wanted to study. As it turned out, the sampling of companies was mostly done out of convenience, because of apparent reluctance to disclose sensitive information about quality data and processes, as well as many organizations simply being “too busy”. Another issue was whether data collection should be performed in pre-implementation phases or post implementation ones. Pre-implementation studies are better for working with possibilities for improvement initiatives, but a problem is knowing which data to collect. In this case the data would be more of a qualitative nature, and thus harder to analyse. Post-implementation studies, on the other hand, would be better for obtaining quantified data, but then there is the question about the data being relevant for investigation. Once the studies to be performed were tentatively planned, the actual data collection depended on the available projects and their data, i.e. which companies we were able to cooperate with, and which processes those companies were willing to let us participate in. The employed data collection methods were interviews, surveys, and field/case studies, as well as historical data mining and analysis of reports or logs on relevant issues. Because of the nature of historical data analysis, some of our research was based on bottom-up data collection. That is, we needed to examine the data material prior to being able to formulate research questions and goals. As Basili et al. states in [Basili94], data collection should ideally proceed in a top-down rather than a bottom up fashion, e.g. by employing GQM to define relevant metrics [Solingen99]. However, some reasons for why bottom-up studies also are useful, are given in [Mohaghegi04c]:

1. There is a gap between the state of the art (best theories) and the state of the practice (current practices). Therefore, most data gathered in companies’ repositories are not collected following the GQM paradigm.

2. Many projects have been running for a while without having improvement programs and may later want to start one. The projects want to assess the usefulness of the data that is already collected and to relate data to goals (reverse GQM).

3. Even if a company has a measurement program with defined goals and metrics, these programs need improvements from bottom-up studies.

Exploring industrial data repositories can be part of an exploratory study (identifying relations and trends in data) or a formal study (confirmatory) to validate other or newer theories than those originally underlying the collected data.

35

3.3 Research approach and research design This section explains the research design used to collect and analyze the relevant data. The thesis combines qualitative and quantitative techniques, mainly by using quantitative studies on historical data sources and qualitative studies on practice and processes. The reasons for combining these different types of studies are the following:

• By doing quantitative studies of ongoing commercial projects or by reusing historical data, we could collect information about real life projects.

• Results of these studies were confirmed by other studies using other and often qualitative methods, thus triangulating the data and results.

The research design for each individual study has been both bottom-up and top-down. The chosen design has depended on the maturity of the research and available information. Some of the research questions were a result of our literature studies and common work in the BUCS project, in a top-down manner. Other research questions were bottom-up, because of the available data sets and the actual practices in the organizations we studied. The research can be split into three phases, as shown in Figure 1-1: Phase 1: Literature studies of state-of-the-art and industrial interviews to increase the understanding of practice (top-down research qustions) (Study 1 and 2). Phase 2: Quantitative studies of fault reports. This started with a bottom-up exploratory (Study 3) study and continues with top-down confirmatory studies (Study 4). Phase 3: Qualitative studies to expand the knowledge gained from the quantitative studies (top-down research questions) (Study 5, 6 and 7). Sections 3.3.1 through 3.3.7 explain the research design and practical setting for each of the studies that make up this thesis.

3.3.1 Study 1: Interviews with company representatives To establish a basis for the most commonly used methods and most common problems encountered in companies that develop business-critical software, several semi-structured interviews were carried out with representatives from cooperating companies. These companies were chosen both for relevance to ‘business-critical’ issues, but also in some ways out of convenience of location and availability. Before the interviews, a list of topics were discussed and decided, on which the later interviews/talks with the company representatives were based. Research questions for Study 1:

36

RQ.S1.a: How the use of well known software development methods may improve business-critical system development? RQ.S1.b: Do companies know much about safety-critical methods at all? If so, how do they view the possibility of using such safety methods to improve business-critical system development? RQ.S1.c: What are the most common reliability/safety-related problems in business-critical system development? – We must identify the most important factors leading to failures or accidents. We also wanted answers to questions such as:

• What are the most important hindrances for achieving high quality products when developing business-critical software?

• How does industry handle these problems now? • What are the most important problems encountered during the operation of

business-critical software? • How can we remove or reduce these problems by changing the way business-

critical systems are developed, operated and maintained? Validity comment. The main validity concerns in this study would be the relatively low number of respondents and that the interviews were carried out by four different researchers.

3.3.2 Study 2: Literature review - Software Criticality Techniques, Fault reporting and management literature. This study proposed a way to integrate software criticality techniques into a common development regime like RUP. Taking the results from Study 1 into account, together with a literature review of state-of-the-art in software engineering and safety methods, we sought to combine the common and the special, by introducing special techniques from safety related development into the common way of developing business-critical software. The research questions for Study 2 were: RQ.S2.a: Which software criticality analysis techniques were most eligible for introduction into a common development framework? RQ.S2.b: Where in the development process would introduction of such techniques be most effective or easiest to implement? Validity comment. Being a literature review, we would not be able to validate any findings further than referring to literature.

37

3.3.3 Study 3: First Empirical analysis of software faults in industrial projects This study looks at when and how faults have been introduced into a system under development, and how they have been found and dealt with. By analysing fault-/change reports for several (semi-)completed development projects, we wanted to investigate if there are common causes for faults being introduced and not being discovered early enough. The goal was to improve the knowledge about why and how faults are introduced, and how we can identify and rectify them earlier in the software. This study is based on historical data collection/data mining, where the data consists of fault reports we have received from four commercial projects in four different companies. The steps of the study were the following:

1. Define study goals and research questions. 2. Contact eligible companies for cooperation. 3. Select suitable projects for study and agree on cooperation practicalities. 4. Collect and convert data from projects. 5. Filter data – extracting only fault reports from the total data sets (which in some

cases included change reports), and removing duplicate data. 6. Categorize faults according to fault type, software module and severity. 7. Analyze resulting data sets by comparing project internal data, as well as

projects against each other. This investigation was mostly a bottom-up process, because of the initial uncertainty about the available data from the potential participants. After establishing a dialogue with the participating projects, and acquiring the actual fault reports, our initial research questions and goals were altered accordingly. Initially we wanted to find which types of faults that are most frequent, and if there are some parts of the systems with higher fault-density than others. This also helps show if the pre-defined fault taxonomy is suitable. When we know which types of faults dominate and where these faults appear in the systems, we can focus on the most severe faults to identify the most important targets for later improvement work. The research questions for Study 3 are: RQ.S3.a: Which types of faults are most typical for the different software components and parts? RQ.S3.b: Are certain types of faults considered to be more severe than others by the developers?

Validity comment. Since the number of projects would not be large, we knew that external validity was a concern. The differences in domain, environments and fault reporting procedure, added to these concerns.

38

3.3.4 Study 4: Second Empirical analysis of software faults in industrial projects This study was based on the lessons learned in Study 3, with somewhat refined metrics to make sure the data material was more suitable for this type of study. The research design was similar to that of Study 3, i.e. it was a historical data collection/data mining study to further explore and confirm the issues from Study 3. In this study, we had access to five projects from one company. This investigation was a top-down study, as we had identified our research goals before initiating the study. The research questions for Study 4 are: RQ.S4.a: Which types of faults are the most common for the studied projects? RQ.S4.b: Which fault types are rated as the most severe faults? RQ.S4.c: How do the results of this study compare with our previous fault report study (Study3)? Validity comment. For this study, we had more fault reports and more projects to study, but everything would be collected from the same organization. Again this would impact external validity.

3.3.5 Study 5: Interviews focusing on empirical results Study 5 was a qualitative study where we interviewed representatives that had been involved in the five projects we studied in Study 4. We performed semi-structured interviews using an interview guide with seven main topics and 32 questions. We selected interviewees who had been actively involved in some of the five projects we had studied in this organization before and who also had hands-on experience with fault management in the same projects. The interviews were conducted as open-ended interviews, with the same questions asked to each interviewee. However, the interviewees were given room to talk about what they felt was important within the topic of the question. Each question in the interview guide was related to one or more local research questions, and the different responses for each question were compared to extract answers related to the research questions. In line with using the constant comparison method, we coded each answer into groups. The codes were postformed, i.e. constructed as a part of the coding process, since the interviews were open-ended. Additionally, we received feedback about the topic at hand through discussions and comments during two workshops that were held in the organization in conjunction with the fault report study and interviews.

39

This study is based on the results from Study 4, on fault reports. The main research questions for this study were therefore derived from the researchers’ viewpoint in Study 4. Firstly, we wished to see if the experience of the practitioners in the actual projects was similar to the analysis results we had found. Secondly, we wanted to draw on their experience to hear if they thought a common fault type classification scheme could be helpful towards improving their development processes. We also wanted to hear their opinions on possibly increasing the effort in data collection and fault report analysis in order to improve their software development processes. Lastly, we wanted to ask them where they thought that there was most potential of improvement in their fault management system, to elicit areas that they felt were lacking in their current fault reporting process. This lead to the following four research questions for Study 5: RQ.S5.a: How can the large number of identified faults from early development phases be explained? RQ.S5.b: Can the introduction of a standard fault classification scheme like Orthogonal Defect Classification (ODC) be useful to improve development processes? RQ.S5.c: Do they see feedback from fault report analysis as a useful software process improvement tool? RQ.S5.d: Do they see any potential improvement areas in their fault management system? Summed up, the main topics covered in the interviews were:

• The results from our quantitative Study 4 of their development projects, • The organization’s own measurements of faults, • Their existing quality and fault management system, • Fault categorization and fault management, • Communicating feedback from fault reporting to developers, • Attitudes to process change and quality improvement for fault management.

Validity comment. The interviews, transcription and data coding would all be performed by one person, which was a threat to internal validity. In addition there was a relatively low number of interviews, which would affect external validity.

3.3.6 Study 6: Comparing results from Hazard Analysis and analysis of Faults This study was prompted by our experiences with fault report analysis, and how some of the faults were comparable to hazards identified from hazard analysis.

40

By conducting a qualitative hazard analysis of a small existing web-application and database concept/specification, and comparing the results with a quantitative fault report analysis of the actual completed system, we wanted to explore the possibility of using the PHA hazard analysis method to reduce the number of faults being introduced into a system. The fault report analysis was performed in the same manner as in Studies 3 and 4, and applied on the fault reports we received from the maintainers of the DAIM system. The hazard analysis of the DAIM system was performed by a group of BUCS project researchers, and was performed in a series of PHA sessions. Finally the results of the two analyses were compared. The three research questions for Study 6 were the following: RQ.S6.a: What kind of faults in terms of Orthogonal Defect Classification (ODC) fault types does the PHA technique help elicit? RQ.S6.b: How does the distribution of fault types found in the fault analysis compare to the one found in the PHA? RQ.S6.c: Does the PHA technique identify potential hazards that also actually appear as faults in the software?

Validity comment. Being a study of just one system, external validity would be weak. Another concern was construct validity, as we would be making a comparison of hazards and faults, which are two different concepts.

3.3.7 Study 7: Fault management and reporting This study does not have explicit research questions, but is a compilation of lessons learned over the course of studying fault management and fault reporting in several different organizations. This was based on our experience from collecting and analysing fault reports as well as from literature studies and feedback from the organizations involved in our studies. Validity comment. The main validity concerns is that our experience comes from a limited number of organizations, and our main means of validating the lessons learned is comparison with literature review.

41

3.4 Overview of the studies In Table 3-2, an overview of the studies is given, together with short description of the type of study.

Table 3-2 Types of studies in this thesis Study Description Type Paper 1 Interviews with company

representatives Qualitative, explorative (P1)

2 Literature study Qualitative, descriptive P1 3 Fault report study of four projects Quantitative, explorative P2, P3, P5 4 Fault report study of five projects Quantitative, confirmative P4 5 Fault reporting and management

interviews Qualitative, confirmative P6

6 Hazard analysis vs. fault report analysis - DAIM

Quantitative and qualitative, combining two different types of results.

P7

7 Fault management and reporting Qualitative, descriptive (P8) Table 3-3 shows how the local research questions for each study relates to the main research questions in this thesis.

Table 3-3 Relation between main and local research questions Main research questions Local research questions RQ1 RQ.S1.a, RQ.S1.c

RQ.S5.c, RQ.S5.d RQ2 RQ.S5.b RQ3 RQ.S3.a, RQ.S3.b

RQ.S4.a, RQ.S4.b, RQ.S4.c RQ.S5.a RQ.S6.b

RQ4 RQ.S1.b RQ.S2.a, RQ.S2.b RQ.S6.a, RQ.S6.c

43

4 Results This chapter summarizes the research results for each of the studies. The results are reported in more detail in the papers in Appendix A, but this chapter also includes some results of work that so far have not been reported in papers.

4.1 Study 1: Preliminary Interviews with company representatives (used in P1) In order to learn more about the way business-critical software projects are being executed, we sought out a few companies and conducted short interviews with representatives from these companies. Eight interviews were conducted in eight different companies. The companies were picked partly for being representative among Norwegian IT industry, partly because of our suspicions of their relevance to the business-critical topic, and also partly because of convenience with respect to geographic location and general availability. The companies were represented by persons in different positions in the company structure, from directors to project managers and developers. The interviews lasted 30-45 minutes, and each interview was performed by one researcher taking notes. The questions, or topics, had been worked out beforehand. They were partly taken from literature studies, and dealt with areas we felt were important to solicit answers to this early in the project. After the interviews, the researchers compiled and wrote up an internal BUCS technical report, for use as future reference for the BUCS project members [Stålhane03]. The main results to be extracted form the interview sessions were the following:

• The industry defines the term ‘business-critical’ as something that is related to their economy, their reputation and their position in the market.

• RUP or some variant of it, is common among companies who actually employ some specified process.

• Business-critical software development is a very common activity among software development companies.

• A typical problem in development of business-critical software is communication, both within the company and towards the customer.

• The companies generally do not consider the technical risk aspects of a project in detail, perhaps mainly due to a lack of an instrument for this.

Contributions of Study 1: The purpose of these interviews was to elicit knowledge about how the situation in Norwegian IT industry was with respect to development of business-critical software.

44

As this was the first investigation of the BUCS project, the goal was in the main part to get an overview and a general impression of the situation. Also, it was intended as a basis for further work, both for further empirical studies, and as a tool to help us focus future research. This study was the first step towards the main contribution C1: “Describing how to utilize safety criticality techniques to improve the development process for business-critical software.”

4.2 Study 2: Combining safety methods in the BUCS project (Paper P1) Study 2 was carried out by doing a literature review of software engineering practices and safety criticality analysis methods. We wanted to propose a way to combine these into a more unified tool set. P1. Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer: "Safety activities during early software project phases" Abstract to P1: This paper describes how methods taken from safety-critical practises can be used in development of business-critical software. The emphasis is on the early phases of product development, and on use together with the Rational Unified Process. One important part of the early project phases is to define safety requirements for the system. This means that in addition to satisfying the need for functional system requirements, non-functional requirements about system safety must also be included. By using information that already is required or produced in the first phases of RUP together with some suitable “safety methods”, we are able to produce a complete set of safety requirements for a business-critical system before the system design process is started. In P1, we showed how the Preliminary Hazard Analysis, Hazard and Operability Analysis and Safety Case methods can be used together in the RUP inception phase, to help produce a safety requirements specification, this is illustrated in Figure 4-1. The shown example is simple, but demonstrates how the combination of these methods will work in this context. By building on information made available in an iterative development process like RUP, we can use the presented methods to improve the process for producing a safety requirements specification. The paper also emphasizes that early development phases are prime candidates for efficient safety analysis work.

45

Figure 4-1 Combining PHA/HazOp and Safety Case

Contributions of Study 2: The contribution of this study was showing possible integration of a common software development method with techniques taken from development of safety-critical systems. This study thus supports the main contribution C1.

4.3 Study 3: Fault report analysis (Papers P2, P3, P5) The work and results of Study 3 has been presented in three papers P2 (main), P3 and P5. The basis was a quantitative study of fault reports in four companies. P2. Jon Arvid Børretzen and Reidar Conradi: "A study of Fault Reports in Commercial Projects" Abstract to P2: Faults introduced into systems during development are costly to fix, and especially so for business-critical systems. These systems are developed using common development practices, but have high requirements for dependability. This paper reports on an ongoing investigation of fault reports from Norwegian IT companies, where the aim is to seek a better understanding on faults that have been found during development and how this may affect the quality of the system. Our objective in this paper is to investigate the fault profiles of four business-critical commercial projects to explore if there are differences in the way faults appear in different systems. We have conducted an empirical study by collecting fault reports from several industrial projects, comparing findings from projects where components and reuse have been core strategies with more traditional development projects. Findings show that some specific fault types are generally dominant across reports from all projects, and that some fault types are rated as more severe than others.

Environment

Safety Requirements

Safety case

PHA and/ or HazOp

Customer Requirements

46

P3. Parastoo Mohagheghi, Reidar Conradi, and Jon A. Børretzen: "Revisiting the Problem of Using Problem Reports for Quality Assessment” Abstract to P3: In this paper, we describe our experience with using problem reports from industry for quality assessment. The non-uniform terminology used in problem reports and validity concerns have been subject of earlier research but are far from settled. To distinguish between terms such as defects or errors, we propose to answer three questions on the scope of a study related to what (problem appearance or its cause), where (problems related to software; executable or not; or system), and when (problems recorded in all development life cycles or some of them). Challenges in defining research questions and metrics, collecting and analyzing data, generalizing the results and reporting them are discussed. Ambiguity in defining problem report fields and missing, inconsistent or wrong data threatens the value of collected evidence. Some of these concerns could be settled by answering some basic questions related to the problem reporting fields and improving data collection routines and tools. P5. Jingyue Li, Anita Gupta, Jon Arvid Børretzen, and Reidar Conradi: "The Empirical Studies on Quality Benefits of Reusing Software Components" Abstract to P5: The benefits of reusing software components have been studied for many years. Several previous studies have concluded that reused components have fewer defects in general than non-reusable components. However, few of these studies have gone a further step, i.e., investigating which type of defects has been reduced because of reuse. Thus, it is suspected that making a software component reusable will automatically improve its quality. This paper presents an on-going industrial empirical study on the quality benefits of reuse. We are going to compare the defects types, which are classified by ODC (Orthogonal Defect Classification), of the reusable component vs. the non-reusable components in several large and medium software systems. The intention is to figure out which defects have been reduced because of reuse and the reasons of the reduction. Paper P2 was the main paper for this study, and it presented some results of an investigation on fault reports in industrial projects. The main conclusions of this paper were:

• When looking at all faults in all projects, “functional logic” faults were the dominant fault type. For high severity faults, “functional logic” and “functional state” faults were dominant. This is shown in Tables 4-1 and 4-2.

47

Also, we saw that some fault types were rated more severe than others, for instance “Memory fault”. However, the fault type “GUI fault” was rated as less severe for the two projects employing systematic software reuse in development, this is illustrated in Figure 4-2.

0,0 % 20,0 % 40,0 % 60,0 % 80,0 % 100,0 %

Assignment fault

Data fault

Environment fault

Function fault logic

Function fault state

GUI fault

I/O fault

Interface fault

M emory fault

M issing data

M issing functionality

M issing value

Performance fault

Wrong function called

Wrong value used

D

C

B

Figure 4-2 Percentage of high severity faults in some fault categories

The main conclusions of P3 were the following: We identified three questions that define a fault: what- whether the term applies to manifestation of a problem or its cause, where- whether problems are related to software or the environment supporting it as well, and whether the problems are related to executable software or all types of artifacts, and when- whether the problem reporting system records problems detected in all or some life cycle phases. We also described how data from problem reports may be used to evaluate quality from different quality views, as shown in Figure 4-3, and how measures from problem or defect data is one the few measures used in all quality views. Finally, we discussed how data from problem reports should be collected and analyzed and what is the validity concerns using such reports for evaluating quality.

Table 4-2 Distribution of all faults in fault type categories Project Fault type A B C D Assignment 7 % 4 % 1 % 1 % Checking 4 % 3 % 2 % 1 % Data 4 % 6 % 5 % 4 % Documentation 0 % 1 % 6 % 3 % Environment 0 % 2 % 1 % 0 % Funct. comp. 13 % 1 % 1 % 0 % Funct. logic 20 % 29 % 49 % 58 % Funct. state 0 % 25 % 3 % 5 % GUI 2 % 8 % 8 % 7 % I/O 0 % 2 % 1 % 0 % Interface 0 % 4 % 0 % 0 % Memory 0 % 1 % 0 % 0 % Missing data 2 % 0 % 1 % 2 % Missing funct. 13 % 8 % 8 % 3 % Missing value 4 % 1 % 1 % 1 % Performance 0 % 1 % 3 % 1 % Wrong funct. 0 % 1 % 2 % 1 % Wrong value 27 % 3 % 3 % 4 % UNKNOWN 2 % 2 % 5 % 8 %

Table 4-1 Distribution of all faults in fault type categories Project Fault type A B C D Assignment 7 % 4 % 1 % 1 % Checking 4 % 3 % 2 % 1 % Data 4 % 6 % 5 % 4 % Documentation 0 % 1 % 6 % 3 % Environment 0 % 2 % 1 % 0 % Funct. comp. 13 % 1 % 1 % 0 % Funct. logic 20 % 29 % 49 % 58 % Funct. state 0 % 25 % 3 % 5 % GUI 2 % 8 % 8 % 7 % I/O 0 % 2 % 1 % 0 % Interface 0 % 4 % 0 % 0 % Memory 0 % 1 % 0 % 0 % Missing data 2 % 0 % 1 % 2 % Missing funct. 13 % 8 % 8 % 3 % Missing value 4 % 1 % 1 % 1 % Performance 0 % 1 % 3 % 1 % Wrong funct. 0 % 1 % 2 % 1 % Wrong value 27 % 3 % 3 % 4 % UNKNOWN 2 % 2 % 5 % 8 %

48

user

Q1. quality-in-useQ2. in

ternal a

nd extern

al

product qualit

y metri

cs

developers

Q3. process quality metrics

quality manager

Q5. val

ue of

correc

tions

vs. co

st of re

work

Q4. pro

ject p

rogress,

resou

rce pl

anning

project leader

Defectdata

Figure 4-3 Quality views associated to defect data, and their relations In P5, we presented the research design of an on-going empirical study to investigate the benefit or cost of software reuse on software quality. By analyzing the defect reports of several software systems, which include both reusable and non-reusable components, we planned to deepen the understanding on why reuse improves the quality of software. This paper also described the future work; to collect data from project with different contexts, such as application domains, technologies, and development processes, in order to find the common good practices and lessons learned of software reuse. Contributions of Study 3: In paper P2, the contributions were the description of the most typical and severe faults found by analyzing fault reports, which was related to main contribution C3: “Improved model of fault origins and types for business-critical software”. In paper P3, we described our experience with using fault reports for quality assessment, and in answering three questions about what, where and when faults are and how they are discovered, we showed that improvements in how faults are described and worked with are needed. This was related to the main contribution C2: “Identification of typical shortcomings in fault reporting”. The contribution in P5 was using fault categorization to compare defect types of reused and non-reused components, which was related to main contribution C3.

4.4 Study 4: Fault report analysis (Paper P4) P4. Jon Arvid Børretzen and Jostein Dyre-Hansen: “Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical Study” Abstract to P4: Improving software processes relies on the ability to analyze previous projects and derive which parts of the process that should be focused on for improvement. All software projects encounter software faults during development and have to put much effort into locating and fixing these. A lot of information is produced when handling faults, through fault reports. This paper reports a study of fault reports

49

from industrial projects, where we seek a better understanding of faults that have been reported during development and how this may affect the quality of the system. We investigated the fault profiles of five business-critical industrial projects by data mining to explore if there were significant trends in the way faults appear in these systems. We wanted to see if any types of faults dominate, and whether some types of faults were reported as being more severe than others. Our findings show that one specific fault type is generally dominant across reports from all projects, and that some fault types are rated as more severe than others. From this we could propose that the organization studied should increase effort in the design phase in order to improve software quality. The results from P4 were the following:

• We have found that "function" faults, closely followed by "GUI" faults are the fault types that occur most frequently in the projects as shown in Table 4-3. To reduce the number of faults introduced in the systems, the organization should focus on improving the processes which are most likely to contribute to these types of faults, namely the specification and design phases of development.

Table 4-3 Fault type distribution across all projects

Fault type # of faults % Function 191 27,0 % GUI 138 19,5 % Unknown 87 12,3 % Assignment 75 10,6 % Checking 58 8,2 % Data 46 6,5 % Algorithm 37 5,2 % Environment 36 5,1 % Interface 11 1,6 % Timing/Serialization 11 1,6 % Relationship 9 1,3 % Documentation 8 1,1 %

• The most severe fault types were "relationship" and "timing/serialization" faults,

while the fault types "GUI" and "documentation" were considered the least severe. This is illustrated in Figure 4-4. Although “function” faults were not rated as the most severe, this fault type still dominates when looking at the distribution of highly severe faults only.

• We also observed that the organization’s fault reporting process could be

improved by adding additional information to the fault reports, e.g. fault location (name of program module) and fault repair effort. This would facilitate more effective targeting of fault types and locations in order to better focus future efforts for improvement.

50

0 %

20 %

40 %

60 %

80 %

100 %

Functi

on GUI

Unkno

wn

Assig

nmen

t

Check

ing

Data

Algor

ithm

Enviro

nmen

t

Inter

face

Timing

/Seria

lizati

on

Relati

onsh

ip

Docum

entat

ion

5 - Enhancement

4 - Cosmetic

3 - Can be circumvented

2- Can not be circumvented

1- Critical

Figure 4-4 Distribution of severity with respect to fault types for all projects

Contribution of Study 4: In paper P4 describe findings on faults types and fault origins in commercial projects. We also identified some issues that are common shortcomings in fault reporting. These contributions relate to the main contributions C2 and C3.

4.5 Study 5: Interviewing practitioners about fault management (Paper P6) P6. Jon Arvid Børretzen: “Fault classification and fault management: Experiences from a software developer perspective” Abstract to P6: In most software development projects, faults are unintentionally injected in the software, and are later found through inspection, testing or field use and reported in order to be fixed later. The associated fault reports can have uses that go beyond just fixing discovered faults. This paper presents the findings from interviews performed with representatives involved in fault reporting and correcting processes in different software projects. The main topics of the interviews were fault management and fault reporting processes. The objective was to present practitioners’ view on fault reporting, and in particular fault classification, as well as to expand and deepen the knowledge gained from a previous study on the same projects. Through interviews and use of Grounded Theory we wanted to find the potential weaknesses in a current fault reporting process and elicit improvement areas and their motivation. The results show that fault management could and should include steps to improve product quality. The interviews also supported our quantitative findings in previous studies on the same development projects, where much rework through fault fixing need to be done after testing because areas of work in early stages of projects have been neglected.

51

The interviews were conducted by one interviewer, using an interview guide and a digital voice recorder. These interviews were later transcribed and coded by the same person. The main results of P6 were the following:

• The interviewees agreed with our conclusions from the previous quantitative study from P4, i.e. that the early phases in their development process had weaknesses that lead to a high number of software faults from early development phases.

• They also expressed a need for better fault categorization in their fault reports, in order to analyze previous projects with intention of improving their work processes.

• The proposed ODC fault types were seen as a useful basis for introducing a better fault classification scheme, although simplicity was important.

• They were positive to using fault report analysis feedback to improve development processes, although introducing such analysis for regular use would have to be done carefully in the organization.

• Finally, they revealed some areas in their fault reporting scheme that could be improved in order to improve analysis usefulness, for instance including attributes like fault finding and correction effort and component location of fault. The knowledge was present, it was just not recorded formally.

Contributions of Study 5: Our main contribution is showing that practitioners are motivated to use their existing knowledge of software faults in a more extensive manner to improve their work practices. These findings support our main contributions C2 and C3.

4.6 Study 6: Using hazard identification to identify faults (Paper P7) Abstract to P7: When designing a business-critical software system, early analysis with correction of software faults and hazards (commonly called anomalies) may improve the system’s reliability and safety, respectively. We wanted to investigate if safety hazards, identified by Preliminary Hazard Analysis, could also be related to the actual system faults that had been discovered and documented in existing fault reports from testing and field use. A research method for this is the main contribution of this paper. For validation, a small web-based database for management of student theses was studied, using both Preliminary Hazard Analysis and analysis of fault reports. Our findings showed that Preliminary Hazard Analysis was suited to find potential specification and design faults in software. P7 presented the description and an implementation of a novel method for identifying software faults using the PHA technique. This method identified 6 faults that were actually found in the system as well as 20 potential faults that may be in the system. We

52

also showed that there are certain types of faults that analysis techniques such as PHA can help to uncover in an early process phase. Performing the PHA elicited many hazards that could have been found in the system as “function” faults, as shown in Figure 4-5. That is, faults which originate from early phases of system development, and are related to the specification and design of the system. From this we conclude that PHA can be useful for identifying hazards that are related to faults introduced early in software development.

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

Funct

ion

Check

ing

Not fa

ult

Algor

ithm

GUI

Data

Enviro

nmen

t

Assig

nmen

t

Inte

rface

Timin

g/Seria

lizat

ion

Docum

enta

tion

Duplic

ate

Relatio

nship

Unkno

wn

Figure 4-5 Distribution of hazards represented as fault types (%)

As for finding direct ties between hazards found in PHA and faults reported in fault reports, we were not very successful. This, we feel, was mainly due to the studied system’s particular fault type profile which was very different from fault distribution profiles we had found in earlier studies. Some weak links were found, but the data did not support any systematic links. Contributions of Study 6: The main contribution of this paper was the description and implementation of the method for identifying software faults using the PHA technique. The contributions of this study are related to the main contributions C1 and C3.

4.7 Study 7: Experiences from fault report studies (Technical Report P8) This section will describe, sum up and reflect upon our experiences from several fault reporting studies. It has not yet been written as a final paper, but this is planned in the near future. See the technical report P8 in Appendix A. P8. Jon Arvid Børretzen: “Diverse Fault management – a prestudy of industrial practice” Abstract to P8: This report describes our experiences with fault reports and fault reporting from working with fault reports from several different organizations. Data

53

from projects we have studied is presented in order to show the variance and at times lack of information in the reports used. Also we show that although useful process information is readily available, it is seldom used or analyzed with process improvement in mind. An important challenge is to describe to practitioners why using a common description of faults is advantageous and also to propose a way to better use the knowledge gained in colleting data about faults. The main contribution is to explain why more effort should be put into the production of fault reports, and how this information can be used to improve the software development process. We explain how fault reports can become more useful just by including information that is already available in development projects. P8 presents an overview of studies performed concerning fault reports, and shows the type of information that exists and is lacking from such reports. Our learnings include that fault data is in some cases under-reported, and in most cases under-analyzed. By including some of the information that the organization already has, more focused analyses could be made possible. One possibility is to introduce a standard for fault reporting, where the most important and useful fault information is mandatory. Furthermore, we have learnt that the effort spent by external researchers to produce useful results based on the available data is quite small compared to the collective effort spent by developers recording this data. This shows that very little effort may give substantial effects for many software developing organizations. Finally, there are two main points we want to convey as a result of the studies we have presented:

• It is important to be able to approach the subject of fault data analysis with a bottom-up approach, at least in early phases of such research and analysis initiatives. The data is readily available, the work that has to be performed is designing and performing a study of these data.

• Much of the recorded fault data is of poor quality. This is most likely because of the lack of interest in use of the data.

We are planning to write a final paper P8 to combine lessons learned from Study 3 and 5, cf. Section 4.3 and 4.5. This is partly in response to very positive review comments on paper P3. The preliminary paper is presented as a Technical Report in Appendix A. Contributions of Study 7: This study directly identifies issues that are common shortcomings in fault reporting, and suggests actions to improve and support the use of fault report analysis as at tool for process improvement. These findings support our main contribution C2.

55

5 Evaluation and Discussion This chapter intends to answer the four research questions RQ1-RQ4 based on the results. This chapter discusses the relations between the thesis contributions and the research questions. The research context, papers and BUCS goals are also discussed. There is also a discussion of validity threats and experience from industrial cooperation.

5.1 Contributions From Section 1.4 we reiterate the main contributions in this thesis and elaborate on them:

C1. “Describing how to utilize safety criticality techniques to improve the development process for business-critical software.”

• We have described ways of integrating of safety criticality techniques with regular development practices to improve the development process for business-critical software. We have proposed integrating safety techniques like PHA and Hazop into early development phases in order to help improve safety and reliability of the resulting software [P1], although this has not been validated industrially. In addition we have shown that the PHA technique is useful in eliciting hazards that are related to faults that are introduced in early development process phases [P7].

C2. “Identification of typical shortcomings in fault reporting.”

• Through our studies on fault reports, we have described several issues concerning shortcomings in fault reporting. The most striking is that commercial organizations generally do not exploit the fault report data they possess for more than day-to-day fault logging or at most shallow analysis. Additionally, it is clear that fault reporting is treated more as a necessary chore, than as a potential source for process improvement. Fault reports are often inaccurate, incomplete or incomprehensible, which makes for poor reusability for analysis. In addition fault data that could easily have been recorded for process improvement gains, e.g. correction effort or location of fault, are not even considered in fault reports.

C3. Improved model of fault origins and types for business-critical software.

• We have described studies to give insight in what fault types are most common or severe in business-critical software. We found that the most common faults were ones that originated from early process phases, namely

56

specification and design. We have also shown that certain fault types tend to be more severe than others [P2][P4].

These contributions were described more briefly in Section 1.4. Table 5-1 shows the relationship between the contributions C1-C3 and research questions RQ1-RQ4.

Table 5-1 Relationship of contributions and research questions. Contribution RQ1 RQ2 RQ3 RQ4 C1 X X C2 X X C3 X X X

5.1.1 Contributions related to BUCS goals The relation between our contributions and the BUCS goals as defined in Section 1.1 are now considered: BG1 To obtain a better understanding of the problems encountered by Norwegian

industry during development, operation and maintenance of business-critical software.

Regarding BG1, we have found that early development phases like specification and design are a source of a high number of faults in software. Lack of communication and adequate tools and processes for describing development difficulties in these phases seem to be the main problem. It is thought that the work in this thesis advances the state-of-the-art of software engineering of business-critical software as defined by our contributions C1-C3. Better understanding of problems encountered by Norwegian industry is achieved, as is reflected in contributions C1 and C3. BG2 Study the effects of introducing safety-critical methods and techniques into the

development of business-critical software, to reduce the number of system failures (increased reliability).

Our studies on fault reports suggest concrete measures to reduce the largest group of faults found in studies of business-critical software in our contribution C3. In addition, we have found that lightweight hazard analysis such as the PHA method is useful in eliciting hazards that could be avoided to reduce the number of faults originating in early development phases, from our contribution C1. BG3 Provide adapted and annotated methods and processes for business-critical

software. Although the goal BG3 has not been an explicit focus of this thesis, we describe how fault report analysis and certain hazard analysis methods can be used to improve the development process, related to C1 BG4 Package and disseminate the effective methods into Norwegian software

industry.

57

Most results are published, or planned on being published, and presented at interational and national conferences and workshops. During this thesis work, several Masters students have directly or indirectly been involved in activities, project work or Masters theses concerning business-critical systems and the BUCS project. Furthermore, the knowledge gained from our studies in commercial organizations have been disseminated back to them through reports and internal workshops. This relates to all contributions C1-C3.

5.2 Contribution of this thesis vs. literature In this section we present how our results and contributions compare with state-of-the-art. Looking at the wide perspective, our research on business-critical systems and software has shown not to be directly comparable with much of the literature on software engineering. This is something we were aware of from the start of the BUCS project. The introduction of safety related methods into “regular” software engineering is not common, many of the methods are still regarded as resource-hungry and rigid methods, and this is difficult to combine with the emergence of agile and other lightweight methods [Beck99]. On the other hand, there are many types of systems that demand a more rigorous development process to ensure reliability and related qualities (e.g. financial systems), and for these types of systems we have contributed both on a process level and with techniques that could be applicable. In our work, we have proposed a novel method for doing fault inspections of specification and design documents [P6]. This adds to the existing literature on inspections, for instance that of Basili et. al concerning perspective-based reading [Basili00, Shull00]. The results of our quantitative studies on fault reports [P2][P4] show that in many systems, faults originating from specification and design phases constitute a major part of the total number of faults being found in testing. This is in line with the findings of Vinter et al. [Vinter00], but in contrast to findings by Eldh et al., where a common type of early process fault (function) was not very frequent [Eldh07]. Our fault report study of a small frame simple system in [P7] did, however, show that systems have different fault profiles .This may be as a result of both the type of system and the development method used when designing and implementing the system. Further, we have discussed the need for improving fault reporting as a support tool for process improvement. Several sources present fault management processes as useful for such improvement in a software organization, among others [Grady92, Chillarege92]. We support this stance and suggest how to better utilize the available fault information [P3][P4][P6][P8].

58

5.3 Revisiting the Thesis Research Questions, RQ1-RQ4 In answering our four research questions, we have the following: RQ1. What is the role of fault reporting in existing industrial software development?

a) Fault reporting seems to generally be underused and undervalued. Our experience is that the recorded data is often not of high quality, which not only makes any analysis hard, but also diminishes the usefulness of the fault reports for fixing faults.

b) All software developing organizations have a fault reporting system in operation, but its use differs substantially. The most basic fault reporting system is only used as a means to document faults that have been found and that are to be corrected, but more advanced use of the available data can easily be arranged.

c) Even where fault report data is thoroughly recorded and stored, it is not systematically used as a tool for software process improvement. A lot of detailed information is stored in the fault management systems of software organizations, but is never used beyond the simplest applications.

RQ2. How can we improve on existing fault reporting processes?

a) Developers should be more conscious about the potential for improvement by analyzing fault reports. Only through feedback on quality/fault data can an organization “learn from their mistakes”.

b) We need more formalized reporting schemes, and clearly defined procedures for reporting faults.

c) Introduce updated fault reporting schemes (fault type, severity, priority, effort, location etc) for the organization’s needs, so that the correct and complete information is reported. There is a need for a process looking at the requirements and possibilities in each organization.

RQ3. What are the most common and severe fault types, and how can we reduce them in number and severity?

a) P2 and P4 show that the most common fault type is the “function” fault type, i.e. faults related to faults in the specification and design phases of development. “GUI” faults are also numerous, and can in many cases also be related to specification and design phases.

b) Our studies on safety-critical analysis techniques have shown that the PHA technique is a useful tool for eliciting hazards that can be related to the fault types that are most common [P1] [P6].

RQ4. How can we use safety analysis techniques together with failure report analysis to improve the development process?

a) In P6, we have found that the PHA technique is useful for eliciting hazards that can be related to faults that are commonly introduced in early development phases.

59

5.4 Evaluation of validity For the validity of the work in this thesis, there are some overall issues to be discussed. Initial validity concerns of the individual studies are discussed for each study in Section 3, as well as in each individual paper. To improve validity of the studies seen as a whole, some possible actions can be performed:

1. Replication of studies, both over time and in other organizations. This applies especially to the quantitative studies, in order to track development over and also to ensure that the results are generalizable. Example: our fault report studies on projects from five different organizations show very similar main results for most projects [P2, P4].

2. Using different research strategies to triangulate the research results. By using different research methods for the same study objects etc., we increase the validity of the results. Examples: Fault report study combined with interview sessions on the topic of fault report management [P4,P6], combining a qualitative study and a quantitative one in the DAIM study [P7].

Wohlin et al. define four main categories of validity threats [Wohlin00], which are further discussed in the next section, for different types of studies performed.

5.4.1 Quantitative studies: construct, internal, conclusion and external validity Studies 3, 4 and 6 used quantitative methods, and were mostly concerned with analyzing fault report data. These data were collected from existing fault report collections made by the organizations’ internal measures. Our contribution was the categorization of faults in the data where this had not been performed. Some threats to the validity of quantitative studies and how this was handled is described here:

• Construct validity: In study 6, the main threat to construct validity is the conceptual difference between hazards and faults. We had to perform a conversion of the hazards found to potential fault types. It should be verified whether this type of hazard to fault type conversion is consequently correct, but during hazard analysis, there was a discussion of how each hazard could influence the system, and in many cases a software fault was proposed.

• Internal validity: In study 3, the greatest threat to internal validity is missing data in the fault reports. Many fault reports were not described well enough to be categorized and had to be left out. In certain fault reports, the fault had been classified by the developers, and they may have had a different opinion of the fault types than we had. In addition, with respect to severity of faults, it is not certain that the developers reporting the fault necessarily reported the true severity. In study 6, the hazard analysis sessions were time limited, so only the most obvious hazards were taken into account. Also, these sessions were

60

performed over a period of time, so some maturation in the form of better understanding of the system being analyzed can have occurred.

• Conclusion validity: One possible threat to conclusion validity in study 3 and 4 is low reliability of measures, because of some missing and ambiguous data. Because categorization of faults into fault types is a subjective task, it was important that the data we based the categorization on was correct and understandable. To prevent mistakes, we added an “unknown” type to filter out the faults we were not able to confidently categorize. The subjective nature of categorization is also a threat to conclusion validity.

• External validity: Especially in study 6 where we only studied one project, but also partly in studies 3 and 4 one threat to external validity is the relatively low number of projects studied. In study 6 we were not able to gain access to system documentation of more systems where we could also have fault report data. The projects under study may also not necessarily be the most typical business-critical systems, but this is hard to verify in any way.

5.4.2 Qualitative studies: internal and external validity. Studies 1, 2, 5 and partly 6 are qualitative studies, mostly explorative and descriptive in nature. The collected data is mainly from interviews and other subjective techniques (PHA) and are subject to interpretation. Here we have identified internal and external validity threats as the most serious.

• Internal validity. For Study 5, the main internal validity threat is that the same person performed interviewing, transcribing and information coding, which may introduce bias to how responses have been interpreted. By having workshops as feedback sessions after the interviews, we feel bias have been reduced.

• External validity. In Study 1, the main validity threat was the low number of organizations interviewed, and in Study 5 all interviews were performed with representatives from the same organization, although this is explained by us having to interview the people who had been involved in certain projects we had studied earlier.

5.5 Industrial relevance of results As many of our studies involved industrial data, our results were interesting not only to us, but also to the organizations the data was collected from. As such, we were able to present our results to the organizations and receive feedback both in terms of the results of the studies and how we should interpret the results. In general, the organizations received general reports on the results, but also a specific report concerning the results from their organization. After Study 4, a workshop was held in order to convey our results to the organization as well as to receive more feedback.

61

5.6 Reflection: Research cooperation with industry Both the BUCS and the EVISOFT research projects are based on cooperation with industrial partners. Whereas EVISOFT had a number of industrial organizations involved from the project start, the BUCS project had no formal connections to any industrial partners as the project got under way. This meant that some effort had to be made in order to initiate contact and agreement with organizations in order to collect research data. The hardest part of industrial cooperation was establishing contact and an agreement about what was going to be performed. In Study 3, the first fault report study, we initially contacted over 40 different organizations developing business-critical software. Despite many positive responses initially, we ended up only being able to use fault report data from four of them. There were two serious barriers for setting up cooperation with commercial organizations. Firstly, we experienced unwillingness by such organizations to disclose information about faults and failures in their systems, despite promises of anonymization, Secondly, many of the organizations decided that they were not able to spare the effort to facilitate our data collection, due to their own deadlines. In addition to this, a few organizations chose to end their cooperation with us before the data had been analyzed, because of resource issues. Finally, there was the issue of lack of communication, in one instance we were ready to collect data for analysis when it appeared that all but one fault report had been deleted from their fault management system. When performing the second fault report study, we were in contact with an organization that was already involved in the EVISOFT project as a participating partner, which made establishment of contact and research agreement much simpler. However, a common issue through all our industrial cooperation was that since we were external researchers who were just collecting and analyzing existing data, we were not part of a planned sequence of events for the organization, and therefore were not prioritized when times were busy.

63

6 Conclusions and future work This thesis presents the results from several empirical studies investigating management of fault reports within a business-critical software perspective. This is augmented by work concerning business-critical software in general. We have combined literature studies, quantitative studies of historical data sources, qualitative studies through interviews of industry representatives, and a case study using both qualitative and quantitative methods. By combining different empirical strategies in a mixed-method research design, we could combine results and answer questions that had not been answered previously. This work analyzed historical fault data that the source organizations had not analyzed in such a manner and to this extent. The results were backed up with interviews and feedback from the involved organizations to improve the validity of the results.

6.1 Conclusions 6.1.1 Fault reporting as a tool for process improvement Our findings show that there is much to gain by using fault report data to support process improvement through reduction of faults. Our analyses showed that a large number of faults had their origin in early development phases, something some of the organizations’ studies had suspected but had not been able (or willing) to quantify. We also uncovered a lack of consistency in fault reporting. Fault reports in an organization did often not follow a strict standard, which could make it difficult for the data to be used in an analytic fashion. Another finding is that many software organizations are in possession of data resources concerning their own products and processes that they do not exploit fully. Through better recording of available information and simple analysis, many organizations could be able to focus process improvement initiatives better. Added to this, our work has also included literature studies of fault categorization schemes. We have described how fault categorization and subsequent fault report analysis could identify improvement areas of the development process. 6.1.2 Empirical findings

64

During our fault report studies of several industrial projects, we have presented results on fault type frequency and severity that for larger business-critical applications seem to be valid and general. Some fault types have been shown to be considerably more frequent than others, and we have identified fault types that are likely to be more severe than others. Drawing on experience from others, we have concluded that many of the occurences of the most frequent fault types that are reported have their origins in early phases like system specification and design. 6.1.3 Software safety and reliability from a fault perspective This thesis’ overall contribution is showing how a focus on fault management and reporting in the software development process may pinpoint areas of improvement in terms of software safety and reliability. We have also proposed how to utilize techniques taken from safety analysis in software development to elicit and record possible faults in the software. Our conclusion is that such techniques should be used early in the development phases, both because suitable techniques like PHA works well in early process phases, and also because identifying and correcting faults early is more efficient than correcting them in later phases.

6.2 Future Work This work has covered several aspects of fault management and the use of hazard analysis techniques to improve the process of developing business-critical software. Still, we see the need for more work in these areas, and the following sections propose possible directions for future work. 6.2.1 Following fault reporting throughout the development process The software projects under study during this thesis have all been more or less completed development projects. Thus, we have not been able to get reports from all phases of the development projects. The faults found and fixed in design phases and in many cases also unit testing during implementation, have not been studied. By including this information in fault studies, we could learn even more about the potential for fault report analysis as a process improvement tool. 6.2.3 Further studies of Hazard Analysis results and fault reports

65

Combining hazard analysis and fault report analysis showed that hazard identification could be helpful in eliciting possible hazardous events caused by faults possibly existing in the system. Unfortunately the system we studied had a very different fault type profile (mostly coding faults) than the other systems we had studied. This may have been a contributing reason for the lack of actual faults being identified by hazard analysis, although the number of potential faults found was high. By performing a similar study on a system where the fault profile is more skewed towards faults introduced in early development phases, we may have a larger portion of faults found by the PHA technique and similar. This would be useful to validate this as a useful technique for reducing faults.

67

Glossary

Term definitions To address the relevant issues, we need reasonably precise definitions of the terms used. The following contain a table of short definitions of some terms. Where relevant, they are re-iterated and elaborated in the thesis. These terms are mostly taken from [Conradi07]. Availability The degree to which a system or component is operational and

accessible when required for use [IEEE 610.12].3

BUCS BUsiness Critical Software – a basic R&D project at NTNU in 2003-2007 under the ICT-2010 program at the Research Council of Norway, lead by Tor Stålhane. See http://www.idi.ntnu.no/grupper/su/bucs.html

Business-critical

The ability of core computer and other support systems of a business to have sufficient QoS to preserve the stability of the business [Sommerville04].

Business-critical systems

Systems whose failure could threaten the stability of the business.

Criticality A state of urgency. In this context to signify the graveness of the effects a failure (i.e. erroneous external behaviour) in a system can have.

3 Thus reliability means that it continues to be available

68

Dependability The trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [Avizienis01], an integrating concept that encompasses the following attributes:

• Availability: readiness for correct service; • Reliability: continuity of correct service; • Safety: absence of catastrophic consequences on the user(s) and

the environment; • Security: the concurrent existence of (a) availability for

authorized users only, (b) confidentiality, and (c) integrity. In the later [Avizienis04], security is split off as a separate quality, and dependability is rephrased as:

• Availability: readiness for correct service; • Reliability: continuity of correct service; • Safety: absence of catastrophic consequences on the user(s) and

the environment; • Integrity: absence of improper system alterations; • Maintainability: ability to undergo modifications and repairs.

Error • That at least one (or more) internal state of the system deviates from

the correct service state. The adjudged or hypothesized cause of an error is called a fault. In most cases, a fault first causes an error in the service state of a component that is a part of the internal state of the system and the external state is not immediately affected. … many errors do not reach the system’s external state and cause a failure [Avizienis04].

• The difference between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition. For example, a difference of 30 meters between a computed result and the correct result [IEEE 610.12].

Failure • The non-performance or inability of the system or component to

perform its intended function for a specified time under specified environmental conditions. A failure may be caused by design flaws – the intended, designed and constructed behavior does not satisfy the system goal [Leveson95].

• The inability of a system or component to perform its required function within specified performance requirements [IEEE 610.12].

• Since a service is a sequence of the system’s external states, a service failure means that at least one (or more) external state of the system deviates from the correct service state [Avizienis04].

Fault An incorrect step, process, or data definition in a computer program

[IEEE 610.12].

69

FMEA Failure Mode and Effects Analysis (FMEA) is a risk assessment technique for systematically identifying potential failures in a system or a process.

Hazard • A Hazard is a physical situation with a potential for human injury [IEC 61508]

• A state or set of conditions that, together with other conditions in the environment, will lead to an accident (loss event). Note that a hazard is not equal to a failure [Leveson95].

• A software condition that is a prerequisite to an accident [IEEE 1228]. + [IEC61508]

HazOp Hazard and Operability analysis is a systematic method for examining

complex facilities or processes to find actual or potentially hazardous procedures and operations so that they may be eliminated or mitigated.

Performance The speed or volume offered by a service, e.g. delay/transmission time for data communication, storage capacity in a database, image resolution on a screen, or sound quality over a telephone line.

Quality • The degree to which a system, component or process meets specified requirements.

• The degree to which a system, component or process meets customer or user needs or expectations [IEEE 610.12].

• Ability of a set of inherent characteristics of a product, system or process to fulfil requirements of customers and other interested parties [ISO 9000].

Quality of Service (QoS)

• In telephony, QoS can simply be defined as “user satisfaction with the service” [ITU-T E.800].

• "A set of quality requirements on the collective behavior of one or more objects" [ITU-T X.902].

Comment: That is, the behavioral properties of a service must be acceptable (of high enough quality) for the user, which can be another system, an end-user, or a social organization. Such properties encompass technical aspects like dependability (i.e. trustworthiness), security, and timely performance (transfer rate, delay, jitter, and loss), as well as human-social aspects (from perceived multimedia reception to sales, billing, and service handling). NB: not defined in IEEE 610.12. See popular paper on QoS [Emstad03] where the more subjective term QoE (Quality of Experience) is introduced, and also [Zekro99].

70

Reliability • The characteristic of an item expressed by the probability that it will perform its required function in the specified manner over a given time period and under specified or assumed conditions. Reliability is not a guarantee of safety [Leveson95].

• Continuity for correct service [Avizienis2004]. • The ability of a system or component to perform its required

functions under stated conditions for a specified period of time [IEEE 610.12].

• A set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of time [ISO 9126].

Often measured as Mean-Time-To-Failure (e.g. 1 year), failure rate (e.g. 10-9/second) or fault density (7 faults/KLOC).

Requirement 1. A condition or capability needed by a user to solve a problem or achieve an objective.

2. A condition or capability that must be met or possessed by a system or system component to satisfy a contract, standard, specification, or other formally imposed documents.

A documented representation of a condition or capability as in 1) or 2) [IEEE 610.12].

Robustness The ability to limit the consequences of an active error or failure, in order to resume (partial) service. Ways to improve this attribute are duplication, repair, containment etc.

RUP The Rational Unified Process [Kruchten00] [Kroll03], an incremental development process based around UML [Fowler04].

Safety • Freedom from unacceptable risk of physical injury or of damage to the health of people, either directly or indirectly as a result of damage to property or to the environment [IEC 61508].

• Freedom from software hazards [IEEE 1228].

Security Protection against unauthorized access (e.g. read / write / search) of data / information. Remedy: Encryption and strict access control e.g. by passwords and physical hinders.

Software Computer programs, procedures and possibly associated documentation and data pertaining to the operation of a computer system [IEEE 610.12].

71

Software Safety

Features and procedures which ensure that a product performs predictably under normal and abnormal conditions, thereby minimizing the likelihood of an unplanned event occurring, controlling and containing its consequences, and preventing accidental injury, death, destruction of property and/or damage to the environment, whether intentional or unintentional [Herrmann99].

Survivability The degree to which essential services continue to be provided in spite of either accidental or malicious harm [Firesmith03]

System An entity that interacts with other entities, i.e., other systems, including hardware, software, humans, and the physical world with its natural phenomena. These other systems are the environment of the given system. The system boundary is the common frontier between the system and its environment.

73

References [Aune00] Aune, A.: Kvalitetsdrevet ledelse, kvalitetsstyrte bedrifter. Gyldendal Norsk

Forlag, Oslo, 2000. [Avison99] Avison, D., Lau, F., Myers, M.D., Nielson, P.A.: Action Research.

Communications of the ACM, (42)1, pp. 94-97, January 1999. [Avizienis04] Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts

and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), pp.11-33, Jan.-March 2004.

[Bachmann00] Bachmann, F., Bass, L., Buhman, C., Comella-Dorda, S., Long, F.,

Robert, J., Seacord, R., and Wallnau, K.: Volume II: Technical Concepts of Component-Based Software Engineering. SEI Technical Report number CMU/SEI-2000-TR-008, 2000, available at: http://www.sei.cmu.edu/

[Basili94] Basili, V.R., Calidiera, G., Rombach, H.D.: Goal Question Metric Paradigm.

In: Marciniak, J.J. (ed.): Encyclopaedia of Software Engineering, pp. 528-532, Wiley, New York, 1994.

[Basili00] Basili, V., Green, S., Laitenberger, O., Shull, F., Sorumgaard, S., and

Zelkowitz, M.: The Empirical Investigation of Perspective-Based Reading. Empirical Software Engineering: An International Journal, 1(2), pp. 133-164, October 1996.

[Beck99] Kent Beck, Extreme programming explained. Embrace change, ISBN:

0201616416, Addison-Wesley Professional, 1999. [Boehm88] Boehm, B.W.: A Spiral Model of Software Development and Enhancement.

IEEE Computer, (21)5, pp. 61-72, May 1988. [Boehm91] Boehm B.W.: Software Risk Management: Principles and Practices. IEEE

Software, (17)1, pp. 32-41, January 1991. [Boehm03] Boehm B.: Value-Based Software Engineering. ACM Software Engineering

Notes, (28)2, pp.1-12, March 2003. [Beck99] Beck, K.: Extreme Programming Explained: Embrace Change. Addison-

Wesley, Boston, 1999.

74

[Bishop98] Bishop, P.G., Bloomfield, R.E.: A Methodology for Safety Case Development. Proceedings of the Safety-critical Systems Symposium, Birmingham, UK, Feb 1998.

[Cekro99] Cekro, Z.: Quality of Service – Overview of Concepts and Standards. Report

for COST 256, Free University of Brussels, April 1999, available from http://www.iihe.ac.be/internal-report/1999/COSTqos.doc.

[Charette05] Charette, R.N.: Why Software Fails. IEEE Spectrum, September 2005. [Chillarege92] Chillarege, R., Bhandari, I.S., Chaar. J.K., Halliday, M.J., Moebus, D.S.,

Ray, B.K., Wong, M.-Y.: Orthogonal defect classification - a concept for in-process measurements. IEEE Transactions on Software Engineering, 18(11), pp. 943 – 956, Nov. 1992.

[Chillarege02] Chillarege, R., Prasad, K.R.: Test and development process

retrospective- a case study using ODC triggers. Proceedings of the International Conference on Dependable Systems and Networks (DSN’02), pp. 669- 678, Bethesda, USA, 2002.

[Conradi03] Reidar Conradi (Ed.): Software engineering mini glossary. IDI, NTNU,

available from http://www.idi.ntnu.no/grupper/su/se-defs.html, August 2003. [Conradi07] Reidar Conradi (Ed.): Mini-glossary of software quality terms, with

emphasis on safety. IDI, NTNU, available from http://www.idi.ntnu.no/grupper/su/publ/ese/se-qual-glossary-v3_0-rc-4jun07.doc, June 2007.

[Crnkovic02] Crnkovic, I., Larsson M.: Building reliable component-based software

systems. Artech House, Boston, 2002. [Dawkins97] Dawkins, S., Kelly, T.: Supporting the use of COTS in safety critical

applications. IEE Colloquium on COTS and Safety Critical Systems (Digest No. 1997/013), pp. 8/1 -8/4, 28 Jan. 1997.

[Dybå00] Dybå, T., Wedde, K.J., Stålhane, T., Moe, N.B., Conradi, R., Dingsøyr, T.,

Sjøberg, D.I.K., Jørgensen, M.: SPIQ Metodehåndbok. Department of Informatics, University of Oslo, Research Report(282), 2000.

[Eldh07] Eldh, S., Punnekkat, S., Hansson, H., Jönsson, P.: Component Testing Is Not

Enough - A Study of Software Faults in Telecom Middleware. Proceedings of the 19th IFIP International Conference on Testing of Communicating Systems TESTCOM/FATES 2007, pp. 74-89, Tallinn, Estonia, June 2007.

[El Emam98] El Emam, K., Wieczorek, I.: The repeatability of code defect

classifications. Proceedings of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333, Paderborn, Germany, 4-7 Nov. 1998

75

[Emstad03] Emstad, P.J., Helvik, B.E., Knapskog, S.J., Kure, Ø., Perkis, A., Swensson, P.: A Brief Introduction to Quantitative QoS. In Annual Report for 2003 from Q2S Centre of Excellence, NTNU, pp. 18-29, 2003.

[EVISOFT] EVISOFT project, available at:

http://www.idi.ntnu.no/grupper/su/evisoft.html, 2006 [Fairley85] Fairley, R.: Software Engineering Concepts. McGraw-Hill, 1985. [Fenton97] Fenton, N., Pfleeger, S.L.: Software metrics (2nd ed.): a rigorous and

practical approach. PWS Publishing Co., Boston, 1997. [Firesmith03] Firesmith, D.G.: Common Concepts Underlying Safety, Security, and

Survivability Engineering. Technical Note CMU/SEI-2003-TN-033, Software Engineering Institute, Pittsburgh, Pennsylvania, December 2003.

[Fowler04] Fowler, M.: UML Distilled. Third Edition, Addison-Wesley, 2004. [Freimut01] Freimut, B: Developing and using defect classification schemes. IESE-

Report No. 072.01/E, Version 1.0, Fraunhofer IESE, Sept. 2001. [Gamma95] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: Elements

of reusable object-oriented software. Addison Wesley, 1995. [Glass94] Glass, R.L.: The Software Research Crisis. IEEE Software, (11)6, pp. 42-47,

Nov. 1994. [Grady92] Grady, R.: Practical Software Metrics for Project Management and Process

Improvement. Prentice Hall, 1992. [Heimdahl98] Heimdahl, M.P.E., Heitmeyer, C.L.: Formal methods for developing high

assurance computer systems: working group report. Proceedings of the 2nd IEEE Workshop on Industrial Strength Formal Specification Techniques, pp. 60-64, 21-23 Oct. 1998.

[Heineman01] Heineman, G.T., Councill, W.T.: Component-Based Software

Engineering. Addison-Wesley, Boston, 2001. [Herrmann99] Herrmann, D.S., Peercy, D.E.: Software reliability cases: the bridge

between hardware, software and system safety and reliability. Proceedings of the Annual Reliability and Maintainability Symposium, pp. 396-402, Washington, DC, USA,18-21 Jan. 1999.

76

[Hongxia01] Hongxia J., Santhanam, P.: An approach to higher reliability using software components. Proceedings of 12th International Symposium on Software Reliability Engineering, pp. 2-11, Hong Kong, China, 27-30 Nov. 2001.

[IEC61508] IEC: Functional safety and IEC 61508 – A basic guide. 11 p, Geneva,

Switzerland, available from http://www.iee.org/oncomms/pn/functionalsafety/HLD.pdf, Nov. 2002.

[IEEE 1228] IEEE: Standard for Software Safety Plans, IEEE STD 1228-1994. 17

logical p. of 23 physical pages. [IEEE 1044] IEEE: Standard Classification for Software Anomalies, IEEE STD 1044-

1993. December 2, 1993 [IEEE 610.12] IEEE: IEEE Standard Glossary of Software Engineering Terminology,

IEEE STD 610.12-1990. 84 p., created in 1990 and reaffirmed in 2002. [ISO91] ISO: ISO/IEC 9126 - Information technology - Software evaluation – Quality

characteristics and guide-lines for their use. December 1991. [ISO 9000] ISO: Quality management and quality assurance standards, Part 1:

Guidelines for selection and use, ISO 9000-1. Geneva, 1994 [ISO 9001] ISO: Quality Management Systems - Requirements for quality assurance,

ISO 9001:2000. Geneva, 2000. [ITU-T E.800] ITU: Telephone Network and ISDN, Quality of Service, Network

Management and Traffic Engineering – Terms and Definitions Related to Quality of Service And Network Performance Including Dependability, ITU-T Recommendation E.800. 54 p, Geneva, Switzerland, August 1994.

[ITU-T X.902] ITU: Open Distributed Processing – Reference Model – Part 2:

Foundations, ITU-T Recommendation X.902. 20 p, Geneva, Switzerland, 1995. [Jarke93] Jarke, M., Bubenko, J.A., Rolland, C., Sutcliffe, A., Vassiliou, Y.: Theories

Underlying Requirements Engineering: An Overview of NATURE at Genesis. Proceedings of the IEEE Symposium on Requirements Engineering, pp. 19-31, IEEE Computer Society Press, San Diego, January 1993.

[Kohl99] Kohl, R.J.: Establishing guidelines for suitability of COTS for a mission

critical application. Proceedings of The Twenty-Third Annual International Computer Software and Applications Conference, COMPSAC '99, pp. 98 -99, Phoenix, AZ, USA, 27-29 Oct. 1999.

[Kroll03] Kroll, P., Krutchen, P.: The Rational Unified Process Made Easy: A

Practitioner's Guide to Rational Unified Process. Addison Wesley, Boston, 2003.

77

[Kropp98] Kropp, N.P., Koopman Jr., P.J., Siewiorek, D.P.: Automated Robustness

Testing of Off-the-Shelf Software Components. Proceedings of the 29th Symposium on Fault-Tolerant Computing, pp. 230-239, Madison, Wisconsin, USA, June 15-18, 1999.

[Kruchten00] Kruchten, P.: The Rational Unified Process. An Introduction. Addison-

Wesley, Boston, 2000. [Laprie95] Laprie, J.-C.: Dependable computing and fault tolerance: Concepts and

terminology. Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing, Pasadena, California, June 27-30, 1995.

[Leveson95] Leveson, N.: Safeware: System safety and computers. Addison Wesley,

1995. [Leveson07] Leveson, N.: System Safety Engineering: Back To The Future (web version

of updates to 1995 book), available from http://sunnyday.mit.edu/book2.pdf , 2007. [Li06] Li, J., Bjoernson, F.O., Conradi, R., Kampenes, V.B.: An Empirical Study of

Variations in COTS-based Software Development Processes in Norwegian IT Industry. Journal of Empirical Software Engineering, 11(3), pp. 433-461, 2006.

[Littlewood00] Littlewood, B., Strigini, L.: Software reliability and dependability: a

roadmap. Proceedings of the Conference on The Future of Software Engineering, 22nd International Conference on Software Engineering, pp. 175-188, Limerick, Ireland, June 2000.

[Mohagheghi04] Mohagheghi, P., Conradi, R., Killi, O.M., Schwarz, H.: An Empirical

Study of Software Reuse vs. Defect Density and Stability. In Proceedings of the 26th International Conference on Software Engineering (ICSE'04), pp. 282-292, Edinburgh, Scotland, May 2004.

[Mohagheghi04b] Mohagheghi, P.: The Impact of Software Reuse and Incremental

Development on the Quality of Large Systems. PhD Thesis, NTNU 2004:95, ISBN 82-471-6408-6, 10 July 2004.

[Mohagheghi04c] Mohagheghi, P., Conradi, R.: Exploring Industrial Data Repositories:

Where Software Development Approaches Meet. In Proceedings of the 8th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering (QAOOSE’04), 9 p., Oslo, Norway, 15 June 2004.

[Mohagheghi06] Mohagheghi, P., Conradi, R., Børretzen, J.A.: Revisiting the Problem

of Using Problem Reports for Quality Assessment. Proceedings of the 4th Workshop on Software Quality, held at ICSE'06, Shanghai, pp. 45-50, 21 May 2006.

78

[Moløkken04] Moløkken-Østvold, K.J., Jørgensen, M., Tanilkan, S.S., Gallis, H., Lien, A.C., Hove, S.E.: Simula Report 2004-03. “Results from the BEST-Pro (Better Estimation of Software Tasks and Process Improvement) survey”, 2004.

[Neumann07] Neumann, P.G.: The Risks Digest. Available from: http://catless.ncl.ac.uk/Risks/, 2007.

[Parnas03] Parnas, D.L., Lawford, M.: The role of inspection in software quality

assurance. IEEE Transactions on Software Engineering, 29(8), pp 674-676, Aug. 2003.

[Price99] Price, J.: Christopher Alexander's pattern language. IEEE Transactions on

Professional Communication, (42)2, pp. 117-122, June 1999. [Rational] Rational Software, available at: http://www-306.ibm.com/software/rational/,

2007. [Rausand91] Rausand, M.: Risikoanalyse. Tapir Forlag, Trondheim, 1991. [Riehle96] Riehle, D. and Zullighoven, H.; Understanding and Using Patterns in

Software Development. Theory and Practice of Object Systems, 2(1), pp. 3-13, 1996. [Royce70] Royce, W.W.: Managing the Development of Large Software Systems.

Proceedings of IEEE WESCON, pp. 1-9, August 1970. [SAP] SAP AG: SAP ERP, http://www.sap.com/index.epx [Schneidewind98] Schneidewind, N.F.: Methods for assessing COTS reliability,

maintainability, and availability. Proceedings of IEEE International Conference on Software Maintenance, pp. 224-225, Bethesda, Maryland, USA, 16-20 Nov. 1998.

[Seaman99] Seaman, C.B.: Qualitative Methods in Empirical Studies of Software

Engineering. IEEE Transactions on Software Engineering, (25)4, pp. 557–572, July/August 1999.

[SEI] Carnegie Mellon Software Engineering Institute: Performance-Critical Systems

(PCS) Introduction. Available from: http://www.sei.cmu.edu/pcs/introduction.html, 2007.

[Shull00] Shull, F., Russ, I., Basili, V.: How Perspective-Based Reading Can Improve Requirements Inspections. IEEE Computer, 33(7), pp. 73-79, July 2000. [Solingen99] van Solingen, R., Berghout, E.: The Goal/Question/Metric Method.

McGraw Hill, 1999.

79

[Sommerville04] Sommerville, I.: Software Engineering. 7th edition, Addison-Wesley, 2004.

[Stålhane02] Stålhane, T., Conradi, R., Sjøberg, D.: Proposal for BUCS project. pp. 1-

29, October 2002. [Stålhane03] Stålhane, T, Myhrer, P.T., Lauritsen, T., Børretzen, J.A.: Intervju med

utvalgte norske bedrifter omkring utvikling av forretningskritiske systemer. Internal BUCS report, 6 pages, available at: http://www.idi.ntnu.no/grupper/su/bucs/files/BUCS-rapport-h03.doc, 2003.

[Strauss98] Strauss, A., Corbin, J.: Basics of Qualitative Research. Sage Publications,

London, UK, 1998. [Thomas96] Thomas, S.A., Hurley, S.F., Barnes, D.J.: Looking for the human factors in

software quality management. Proceedings of International Conference on Software Engineering: Education and Practice, pp. 474-480, Dunedin, New Zealand, 24-27 Jan. 1996.

[UKS] UKSMA- United Kingdom Software Metrics Association:

http://www.uksma.co.uk. [Vinter00] Vinter, O., Lauesen, S.: Analyzing Requirements Bugs. Software Testing &

Quality Engineering Magazine, Vol. 2-6, Nov/Dec 2000 [Votta95] Votta, L.G., Zajak, M.L.: Design Process Improvement Case Study Using

Process Waiver Data. Proceedings of the 5th European Software Engineering Conference, pp.44-58, Barcelona, Spain, September 25-28, 1995.

[Wohlin00] Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén,

A.: Experimentation in software engineering: an introduction. Kluwer Academic Publishers, Norwell, MA, USA, 2000.

[Yin03] Yin, R.K.: Case Study Research, Design and Methods. Sage Publications,

2003. [Zelkowitz98] Zelkowitz, M.V., Wallace, D.R.: Experimental models for validating

technology. IEEE Computer, (31)5, pp. 23-31, May 1998.

81

Appendix A: Papers This section contains the seven papers P1-P7 as presented in section 1.5, as well as a proposed paper P8 presented as a technical report. It should be noted that the papers have been re-formatted from their original format to fit into this thesis.

P1. Safety activities during early software project phases

Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer Department of Computer and Information Science,

Norwegian University of Science and Technology, NO-7491 Trondheim, Norway Email: [email protected]

Abstract This paper describes how methods taken from safety-critical practises can be used in development of business-critical software. The emphasis is on the early phases of product development, and on use together with the Rational Unified Process. One important part of the early project phases is to define safety requirements for the system. This means that in addition to satisfying the need for functional system requirements, non-functional requirements about system safety must also be included. By using information that already is required or produced in the first phases of RUP together with some suitable “safety methods”, we are able to produce a complete set of safety requirements for a business-critical system before the system design process is started. 1. Introduction Software systems play an increasingly important role in our daily lives. The technological development has lead to the introduction of software systems into an increasing number of areas. In many of these areas we become dependent on these systems, and their weaknesses could have grave consequences. There are areas where correctly functioning software is important for the health and well-being of humans, like air-traffic control and in health systems. There are, however, other systems that we also expect and hope will run correctly because of the negative effects of failure, even if the consequences are mainly of an economic nature. This is what we call business-critical

82

systems, and business-critical software. The number of areas where functioning software is at the core of operation is steadily increasing. Both financial systems and e-business systems are relying on increasingly larger and more complex software systems. In order to increase the quality and efficiency of such products we need methods, techniques and processes specifically aimed at improving the development, use and maintenance of this type of software. In this paper, we will discuss methods that can be used together with Rational Unified Process in the early parts of a development project. These methods are Safety Case, Preliminary Hazard Analysis and Hazard and Operability Analysis. Our contribution is to combine these methods into a comprehensive method for use early in the development of business-critical systems. 1.1 BUCS The BUCS project is a research project funded by the Norwegian Research Council (NFR). The goal of the BUCS project is to help developers, users and customers to develop software that is safe to use. In a business environment this means that the system seldom or never behaves in such a way that it causes the customer or the customer’s users to lose money or important information. We will use the term “business-safe” for this characteristic. The goal of the BUCS project is not to help developers to finish their development on schedule and to the agreed price. We are not particularly interested in delivered functionality or how to identify or avoid process and project risk. This is not because we think that these things are not important – it is just that we have defined them out of the BUCS project. The BUCS project is seeking to develop a set of integrated methods to improve support for analysis, development, operation, and maintenance of business-critical systems. Some methods will be taken from safety-critical software engineering practices, while others will be taken from general software engineering. Together they are tuned and refined to fit into this particular context and to be practical to use in a software development environment. The research will be based on empirical studies, where interviews, surveys and case studies will help us understand the needs and problems of the business critical software developers. Early in the BUCS project, we conducted a series of short interviews with eight software developing companies as a pre-study to find some important issues we should focus on [Stålhane03]. These interviews showed us that many companies used or wanted to use RUP or similar processes, and that a common concern in the industry was lack of communication, both internally and with the customers. With this basis, the BUCS project has decided to use RUP as the environment for our enhanced methods, and the methods used will be helpful in improving communication on requirements gathering, implementation and documentation in a software development project. Adaptation of methods from safety-critical development has to be done so that the methods introduced fit into RUP and are less complicated and time consuming than

83

when used in regular safety-critical development. That a system is business-safe does not mean that the system is error free. What it means is that the system will have a low probability of causing losses for the users. In this respect, the system characteristic is close to the term “safe”. This term is, however, wider, since it is concerned with all activities that can cause damage to people, equipment or the environment or severe economic losses. Just as with general safety, business-safety is not a characteristic of the system alone – it is a characteristic of the system’s interactions with its environment. BUCS is considering two groups of stakeholders and wants to help them both. • The customers and their users. They need methods that enables them to:

o Understand the dangers that can occur when they start to use the system as part of their business.

o Write or state requirements to the developers so that they can take care of the risks incurred when operating the system – product risk.

• The developers. They need help to implement the system so that: o It is business-safe. o They can create confidence by supporting their claims with analysis and

documentation. o It is possible to change the systems so that when the environment changes,

the systems are still business-safe. This will not make it cheaper to develop the system. It will, however, help the developers to build a business-safe system without large increases in the development costs. Why should developing companies do something that costs extra – is this a smart business proposition? We definitively mean that the answer is “Yes”, and for the following reasons: • The only solution most companies have to offer to customers with business-safety

concerns today is that the developers will be more careful and test more – which is not a good enough solution.

• By building a business-safe system the developers will help the customer achieve efficient operation of their business and thus build an image of a company that have their customers’ interest in focus. Applying new methods to increase the products’ business-safety must thus be viewed as an investment. The return on the investment will come as more business from large, important customers.

2. The Rational Unified Process The Rational Unified Process (RUP) is a software engineering process. It provides a disciplined approach to assigning tasks and responsibilities within a development organization. Its goal is to ensure the production of high-quality software that meets the needs of its end users within a predictable schedule and budget.

84

RUP is developed and supported by Rational Software [Rational]. The framework is based on popular development methods used by leading actors in the software industry. RUP consists of four phases; inception, elaboration, construction and transition. The BUCS project has identified the three first phases as most relevant to our work, and will make proposals for introduction of safety methods for these phases. In this paper, we will concentrate on the inception phase.

Figure 1 - Rational Unified Process; © IBM [Rational]

Figure 1 shows the overall architecture of the RUP, and its two dimensions: • The horizontal axis which represents time and shows the lifecycle aspects of the

process as it unfolds • The vertical axis which represents disciplines and group activities to be

performed in each phase. The first dimension represents the dynamic aspect of the process as it is enacted, and is expressed in terms of phases, iterations, and milestones. The second dimension represents the static aspect of the process: how it is described in terms of process components, disciplines, activities, workflows, artefacts, and roles [Kroll03] [Krutchen00]. The graph shows how the emphasis varies over time. For example, in early iterations, we spend more time on requirements, and in later iterations we spend more time on implementation. The ideas presented in this paper are valid even if the RUP process is not used. An iterative software development process will in most cases be quite similar to a RUP process in broad terms, with phases and where certain events, artefacts and actions exist. Some companies also use other process frameworks that in principle differ from RUP mostly in name. Therefore, it is possible and beneficial to include and integrate the safety methods we propose into any iterative development process. 2.1 Inception

85

Early in a software development project, system requirements will always be on top of the agenda. In the same way as well thought-out plans are important for a system in general, well thought-out plans for system safety are important when trying to build a correctly functioning, safe system. Our goal is to introduce methods that are helpful for producing a safety requirements specification, which can largely be seen as one type of non-functional requirements. However, safety requirements also force us to include the system’s environment. In RUP, with its use-case driven approach, this process can be seen as analogous to the process of defining general non-functional requirements, since use-case driven processes are not well suited for non-functional requirements specification. Because the RUP process itself does not explicitly command safety requirements in the same way it does not command non-functional requirements, other methods have to be introduced for this purpose. On the other hand, the architecture-centric approach in RUP is helpful for producing non-functional requirements, as these requirements are strongly linked to a system’s architecture. Considerations about system architecture will therefore influence non-functional and safety requirements. Although designing safety into the system from the beginning (upstream protection) may incur some design trade-offs, eliminating or controlling hazards may result in lower costs during both development and overall system lifetime, due to fewer delays and less need for redesign [Leveson95]. Working in the opposite direction, adding protection features to a completed design (downstream protection) may cut costs early in the design process, but will increase system costs, delays and risk to a much greater extent than the costs owing to early safety design. The main goal of the inception phase is to achieve a common understanding among the stakeholders on the lifecycle objectives for the development project [Krutchen00]. You should decide exactly what to build, and from a financial perspective, whether you should start building it at all. Key functionality should be identified early. The inception phase is important, primarily for new development efforts, in which there are significant project risks which must be addressed before the project can proceed. The primary objectives of the inception phase include (from [Kroll03] [Krutchen00]):

• Establishing the project's software scope and boundary conditions, including an operational vision, acceptance criteria and what is intended to be included in the product and what is not.

• Identifying the critical use cases of the system, the primary scenarios of operation that will drive the major design trade-offs. This also includes deciding which use cases that are the most critical ones.

• Exhibiting, and maybe demonstrating, at least one candidate architecture against some of the primary scenarios.

• Estimating the overall cost and schedule for the entire project (and more detailed estimates for the elaboration phase that will immediately follow).

• Assessing risks and the sources of unpredictability. • Preparing the supporting environment for the project.

86

3. Safety methods introduced by BUCS Early in a project’s life-cycle, many decisions have not yet been made, and we have to deal with a conceptual view or even just ideas for the forthcoming system. Therefore, much of the information we have to base our safety-related work on is at a conceptual level. The methods we can use will therefore be those that can use this kind of high-level information, and the ones that are suited to the early phases of software development projects. We have identified five safety methods that are suitable for the inception phase of a development project. Two of them, Safety Case and Intent Specification, are methods that are well suited for use throughout the development project [Adelard98] [Leveson00], as they focus on storing and combining information relevant to safety through the product’s life-cycle. The other three, Preliminary Hazard Analysis, Hazards and Operability Analysis and Event Tree Analysis are focused methods [Rausand91] [Leveson95], well suited for use in the inception phase, as they can be used on a project where many details are yet to be defined. In this paper, the Safety Case, Preliminary Hazard Analysis and Hazard and Operability Analysis methods are used as examples of how such methods can be used in a RUP context. When introducing safety related development methods into an environment where the aim is to build a business-safe system, but not necessarily error-free and completely safe, we have to accept that usage of these methods will not be as stringent and effort demanding as in a safety-critical system. This entails that the safety methods used in business-critical system development will be adapted and simplified versions, in order to save time and resources. 3.1 Safety Case A safety case is a documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment [Adelard98] [Bishop98]. The safety case method is a tool for managing safety claims, containing a reasoned argument that a system is or will be safe. It is manifested as a collection of data, metadata and logical arguments. The main elements of a safety case are shown in Figure 2: • Claims about a property of the system or a subsystem (Usually about safety

requirements for the system) • Evidence which is used as basis for the safety argument (Facts, assumptions, sub-

claims) • Arguments linking the evidence to the claim • Inference rules for the argument

87

The arguments can be:

• Deterministic: Application of predetermined rules to derive a true/false claim, or demonstration of a safety requirement.

• Probabilistic: Quantitative statistical reasoning, to establish a numerical level. • Qualitative: Compliance with rules that have an indirect link to the desired

attributes. The safety case method can be used throughout a system’s life-cycle, and divides a project into four phases: Preliminary, Architectural, Implementation, and Operation and Installation. This is similar to the phases of RUP, and makes it reasonable to tie a preliminary safety case to the inception phase of a development project. The development of a safety case does not follow a simple step by step process. The main activities interact with each other and iterate as the design proceeds and as the level of detail in the system design increases. This also fits well with the RUP process. The question the safety case documents will answer is in our case “How will we argue that this system can be trusted?” The safety case shows how safety requirements are decomposed and addressed, and will provide an appropriate answer to the question. The characteristics of the safety case elements in the inception phase are:

1. Establish the system context, whether the safety case is for a complete system or a component within a system.

2. Establish safety requirements and attributes for the current level of the design, and how these requirements and attributes are related to the system’s safety analysis.

3. Define important operational requirements and constraints such as maintenance levels, time to repair and issues related to the operating environment.

Figure 2 – How a safety case is built up

Claim

Sub-claim

Evidence

Evidence Inference rule

Inference rule

Argument structure

88

3.2 Preliminary Hazard Analysis and Hazard and Operability Analysis Preliminary Hazard Analysis (PHA) is used in the early life cycle stages to identify critical system functions and broad system hazards. The identified hazards are assessed and prioritized, and safety design criteria and requirements are identified. A PHA is started early in the concept exploration phase so that safety considerations are included in tradeoff studies and design alternatives. This process is iterative, with the PHA being updated as more information about the design is obtained and as changes are being made. The results serve as a baseline for later analysis and are used in developing system safety requirements and in the preparation of performance and design specifications. Since PHA starts at the concept formation stage of a project, little detail is available, and the assessments of hazard and risk levels are therefore qualitative. A PHA should be performed by a small group with good knowledge about the system specifications. Both Preliminary Hazard Analysis and Hazard and Operability Analysis (HazOp) are performed to identify hazards and potential problems that the stakeholders see at the conceptual stage, and that could be created by the system after being put into operation. A HazOp study is a more systematic analysis of how deviations from the design specifications in a system can arise, and whether these deviations can result in hazards. Both analysis methods build on information that is available at an early stage of the project. This information can be used to reduce the severity or build safeguards against the effects of the identified hazards. HazOp is a creative team method, using a set of guidewords to trigger creative thinking among the stakeholders and the cross-functional team in RUP. The guidewords are applied to all parts and aspects of the system concept plan and early design documents, to find possible deviations from design intentions that have to be handled. Examples of guidewords are MORE and LESS. This will mean an increase or decrease of some quantity. For example, by using the “MORE” guideword on “a customer client application”, you would have “MORE customer client applications”, which could spark ideas like “How will the system react if the servers get swamped with customer client requests?” and “How will we deal with many different client application versions making requests to the servers?” A HazOp study is conducted by a team consisting of four to eight persons with a detailed knowledge of the system to be analysed. The main difference between a HazOp and a PHA is that PHA is a lighter method that needs less effort and available information than the HazOp method. Since HazOp is a more thorough and systematic analysis method, the results will be more specific. If there is enough information available for a HazOp study, and the development team can spare the effort, a HazOp study will most likely produce more precise and more suitable results for the safety requirement specification definition.

89

4. Integration: Using safety methods in the RUP Inception phase In the inception phase we will focus on understanding the overall requirements and scoping the development effort. When a project goes through its inception phase, the following artifacts will be established/produced:

• Requirements, leading to a System Test Plan • Identification of key functionality • Proposals for possible solutions • Vision documents • Internal business case • Proof of concept

The artifacts in bold are the ones that are interesting from a system-safe point of view, and the fact that the RUP inception phase requires development teams to produce such information eases the introduction of safety methods into the process. Because of RUP’s demands on information collection, using these methods do not lead to extensive extra work for the development team. By using the safety methods we have proposed, we can produce safety requirements for the system. These are high-level requirements, and must be specified before the project goes from the inception to the elaboration phase. When the project moves on from the inception to the elaboration phase, identification of the business-critical aspects should be mostly complete; and we should have high confidence in having identified the requirements for those aspects. The safety work in the project continues into the elaboration phase, and some of the methods, like Safety Case and Intent Specification will also be used when the project moves on to this phase. 4.1 Software Safety Case in a RUP context According to [Bishop98], we need the following information when producing a safety case:

• Information used to construct the safety argument • Safety evidence

As indicated in 3.1, to implement a safety case we need to:

• make an explicit set of claims about the system • produce the supporting evidence • supply a set of safety arguments linking the claims to the evidence, shown in

Figure 2 • make clear the assumptions and judgements underlying the arguments

The safety case is broken down into claims about non-functional attributes for sub-systems, such as reliability, availability, fail-safety, response time, robustness to overload, functional correctness, accuracy, usability, security, maintainability, modifiability, and so on.

90

The evidence used to support a safety case argument comes from:

• The design itself • The development processes • Simulation of problem solution proposals • Prior experience from similar projects or problems

Much of the work done early in conjunction with safety cases tries to identify possible hazards and risks, for instance by using methods like Preliminary Hazard Analysis (PHA) and Hazard and Operability Analysis (HazOp). These are especially useful in combination with Safety Case for identifying the risks and safety concerns that the safety case is going to handle. Also, methods like Failure Mode and Effects Analysis, Event Tree Analysis, Fault Tree Analysis and Cause Consequence Analysis can be used as tools to generate evidence for the safety case [Rausand91]. The need for concrete project artefacts as input in the safety case varies over the project phases, and is not strictly defined. Early on in a project, only a general system description is needed for making the safety requirements specification. When used in the inception phase, the Safety Case method will support the definition of a safety requirements specification document by forcing the developers to “prove” that their intended system can be trusted. When doing that, they will have to produce a set of safety requirements that will follow the project through its phases, and which will be updated along with the safety case documents. The Safety Case method, when used to its full potential, will be too elaborate when not dealing with safety-critical projects. The main concept and structure will, however, help trace the connection between hazards and solutions through the design from top level down to detailed level implementation. Much of the work that has to be performed when constructing a software safety case is to collect information and arrange this information in a way that shows the reasoning behind the safety case. Thus, the safety case does not in itself bring much new information into the project; it is mainly a way of structuring the information. 4.2 Preliminary Hazard Analysis and Hazard and Operability Analysis in a RUP context By performing a PHA or HazOp we can identify threats attached to both malicious actions and unintended design deviations, for instance as a result of unexpected use of the system or as a result of operators or users without necessary skills executing an unwanted activity. To perform a PHA or HazOp, we only need a conceptual system description, and a description of the system’s environment. RUP encourages such information to be produced in the inception phase of a project. When a hazard is identified, either by PHA or HazOp, it is categorized and we have to decide if it is acceptable or if it needs further investigation. When trustworthiness is an issue, the hazard should be tracked in a hazard

91

log and subjected to review along the development process. This makes a basis for further analysis, and produces elements to be considered for the safety requirement specification. The result of a PHA or HazOp investigation is the identification of possible deviations from the intent of the system. For every deviation, the causes and consequences are examined and documented in a table. The results are used to focus work effort and to solve the problems identified. The results of PHA and HazOp are also incorporated into the safety case documents either as problems to be solved, or as evidence used in existing safety claim arguments. 4.3 Combining the methods By introducing the use of Safety Case and PHA/HazOp into the RUP inception phase, we have a process where the system safety requirements are maintained in the safety case documents. PHA and HazOp studies on the system specification, together with its customer requirements and environment description, produces hazard identification logs that are incorporated into the safety case as issues to be handled. This also leads to revision of the safety requirements. Thus, the deviations found with PHA/HazOp will be covered by these requirements as shown in Figure 3. From the inception phase of the development process, the safety requirements and safety case documents are used in the remaining phases where the information is used in the implementation of the system.

5. A small example Let us assume a business needing a database containing information about their customers and the customers’ credit information. When developing a computer system

Environment

Safety Requirements

Safety case

PHA and/ or HazOp

Customer Requirements

Figure 3 – Combining PHA/HazOp and Safety Case

92

for this business, not only should we ask the business representatives which functions they need and what operating system they would like to run their system on, but we should also use proper methods to improve the development process with regard to business-critical issues. An example of an important requirement for such a system would be ensuring the correctness and validity of customers’ credit information. Any problems concerning this information in a system would seriously impact a company’s ability to operate satisfactorily. The preliminary hazard analysis method will be helpful here, by making stakeholders think about each part of the planned system and any unwanted events that could occur. By doing this, we will get a list of possible hazards that have to be eliminated, reduced or controlled. This adds directly to the safety requirements specification. An example is the potential event that the customer information database becomes erroneous, corrupt or deleted. By using a preliminary hazard analysis, we can identify the possible causes that can lead to this unwanted event, and add the necessary safety requirements. We can use the system’s database as an example. In order to identify possible database problems – Dangers – we can consider each database item in turn and ask: “What will happen if this information is wrong or is missing?” If the identified effect could be dangerous for the system’s users or owner – Effects – we will have to consider how it could happen – Causes - and what possible barriers we could insert into the system. The PHA is documented in a table. The table, partly filled out for our example, is shown below in Table 1. Customer info management Danger Causes Effects Barriers

Wrong address inserted Update error

Check against name and public info, e.g. “Yellow pages”

Wrong address

Database error

Correspondence sent to wrong address

Testing Wrong credit info inserted

Manual check required

Update error Consistency check

Wrong credit info

Database error

Wrong billing. Can have serious consequences

Testing

Table 1 – PHA example

When we have finished the PHA, we must show that each identified danger that can have a serious effect will be catered to during the development process. In BUCS we have chosen to use safety cases for this. When using the safety case method, the developers will have to show that the way they want to implement a function or some part of the system is trustworthy. This is done by producing evidence and a reasoned argument that this way of doing things will be safe. From Table 1, we see that for the customer’s credit information, the safety case should be able to document what the developers are going to do to make sure that the credit information used in billing situations is correct. Figure 4 shows a high level example of how this might look in a safety case diagram. The evidence may come from earlier

93

experience with implementing such a solution, or the belief that their testing methods are sufficient to ensure safety.

The lowest level in the safety case in Figure 4 contains the evidences. In our case, these evidences give rise to three types of requirements: • Manual procedures. These are not realised in software but the need to perform

manual checks will put extra functional requirements onto the system. • The software. An example in Figure 4 is the need to implement a credit information

consistency check. • The process. The safety case requires us to put an extra effort into testing the

database implementation. Most likely this will be realised either by allocating more effort to testing or to allocate a disproportional part of the testing effort to testing the database.

After using these methods for eliciting and documenting safety requirements, in the next development stages the developers will have to produce the evidence suggested in the diagram, show how the evidence supports the claims by making suitable arguments and finally document that the claims are supported by the evidence and arguments. Some examples of evidence are trusted components from a component repository, statistical evidence from simulation, or claims about sub-systems that are supported by evidence and arguments in their own right. Examples of relevant arguments are formal proof that two pieces of evidence together supports a claim, quantitative reasoning to establish a required numerical level, or compliance with some rules that have a link to the relevant attributes.

Credit info DB is sufficiently

reliable

Credit info must be correct when sending

invoice

Implementation of credit info consistency

check

Database Implementation

testing

Implementation of manual credit info

check

Insertion and updating credit

info is made trustworthy

Claim

Arguments

Evidence

Figure 4 – Safety Case example

94

Further on in the development process, in the elaboration and construction phases, the evidence and arguments in the safety case will be updated with information as we get more knowledge about the system. Each piece of evidence and argumentation should be directly linked to some part of the system implementation. The responsibility of the safety case is to show that the selected barriers and their implementation are sufficient to prevent the dangerous event from taking place. When the evidence and arguments in the safety case diagram are implemented and later tested in the development process, the safety case documentation is updated to show that the safety case claim has been validated. By using PHA to find potential hazards and deviations from intended operation, and Safety Case to document how we intend to solve these problems, we produce elements to the safety requirements specification, which without these methods may have been missed. 6. Conclusion and further work We have shown how the Preliminary Hazard Analysis, Hazard and Operability Analysis and Safety Case methods can be used together in the RUP inception phase, to help produce a safety requirements specification. The shown example is simple, but demonstrates how the combination of these methods will work in this context. By building on information made available in an iterative development process like RUP, we can use the presented methods to improve the process for producing a safety requirements specification. As a development project moves into the proceeding phases, the need for safety effort will still remain to ensure the development of a trustworthy system. The other RUP phases contain different development activities and therefore different safety activities. The BUCS project will make similar descriptions of the other RUP phases and show how safety related methods can be used beneficially also in these phases. BUCS will also continue the effort in working with methods for improving safety requirements collection, and will make contributions in the following areas:

• Proposals on adaptation of methods from safety development for business-critical system development.

• Guides and advice on business-critical system development. • Tools supporting development of business-critical systems. • Investigations on use of component-based development in the development of

business-critical systems.

95

References [Adelard98] “ASCAD, Adelard Safety Case Development Manual”, Published 1998 by Adelard. [Bishop98] P.G. Bishop, R.E. Bloomfield, "A Methodology for Safety Case Development", Safety-critical Systems Symposium (SSS 98), Birmingham, UK, Feb, 1998. [Kroll03] P. Kroll, P. Krutchen, The Rational Unified Process Made Easy: A Practitioner's Guide to Rational Unified Process, Addison Wesley, Boston, 2003, ISBN: 0-321-16609-4. [Krutchen00] P. Krutchen, The Rational Unified Process: An Introduction (2nd Edition), Addison Wesley, Boston, 2000, ISBN: 0-201-70710-1. [Leveson95] N.G. Leveson, Safeware: System safety and computers, Addison Wesley, USA, 1995, ISBN: 0-201-11972-2. [Leveson00] N.G Leveson, “Intent specifications: an approach to building human-centered specifications”, IEEE Transactions on Software Engineering, Volume: 26, Issue: 1, Jan. 2000, Pages:15 – 35. [Rational] Rational Software, http://www.rational.com [Rausand91] M. Rausand, Risikoanalyse, Tapir Forlag, Trondheim, 1991, ISBN: 82-519-0970-8. [Stålhane03] T. Stålhane, T. Lauritsen, P.T. Myhrer, J.A. Børretzen, BUCS rapport - Intervju med utvalgte norske bedrifter omkring utvikling av forretningskritiske systemer, October 2003, available from: http://www.idi.ntnu.no/grupper/su/bucs/files/BUCS-rapport-h03.doc

97

P2. Results and Experiences from an Empirical Study of Fault

Reports in Industrial Projects Jon Arvid Børretzen, Reidar Conradi

Department of Computer and Information Science, Norwegian University of Science and Technology (NTNU),

NO-7491 Trondheim, Norway [email protected], [email protected]

Abstract. Faults introduced into systems during development are costly to fix, and especially so for business-critical systems. These systems are developed using common development practices, but have high requirements for dependability. This paper reports on an ongoing investigation of fault reports from Norwegian IT companies, where the aim is to seek a better understanding on faults that have been found during development and how this may affect the quality of the system. Our objective in this paper is to investigate the fault profiles of four business-critical commercial projects to explore if there are differences in the way faults appear in different systems. We have conducted an empirical study by collecting fault reports from several industrial projects, comparing findings from projects where components and reuse have been core strategies with more traditional development projects. Findings show that some specific fault types are generally dominant across reports from all projects, and that some fault types are rated as more severe than others.

1. Introduction Producing high quality software is an important goal for most software developers. The notion of software quality is not trivial, different stakeholders will have different views on what software quality is. In the Business-Critical Software (BUCS) project [1] we are seeking to develop a set of methods to improve support for analysis, development, operation, and maintenance of business-critical systems. These are systems that we expect and hope will run correctly because of the possibly severe effects of failure, even if the consequences are mainly of an economic nature. In these systems, software quality is important, and the main target for developers will be to make systems that operate correctly all the time [1]. One important issue in developing these kinds of systems is to remove any possible causes for failure, which may lead to wrong operation of the system. The study presented here investigated fault reports from two software projects using components and reuse strategies, and two projects using a more traditional development process. It compares the fault profiles of the reuse-intensive projects with the other two, in several dimensions; Fault type, fault severity and location of fault. 2. Previous studies on software faults and fault implications Software quality is a notion that encompasses a great number of attributes. When speaking about business-critical systems, the critical quality attribute is often

98

experienced as the dependability of the system. According to Littlewood et al. [2], dependability is a software quality attribute that encompasses several other attributes, the most important are reliability, availability, safety and security. Faults in the software lessen the software’s quality, and by reducing the number of faults introduced during development you can improve the quality of software. Faults are potential flaws in a software system, that later may be activated to produce an error. An error is the execution of a fault, leading to a failure. A failure results in erroneous external behaviour, system state or data state. Remedies known for errors and failures are to limit the consequences of a failure, in order to resume service, but studies have shown that this kind of late protection is more expensive than removing the faults before they are introduced into the code [3]. Faults are also known as defects or bugs, and a more extensive concept is anomalies, which is used in the IEEE 1044 standard [4]. Orthogonal Defect Classification – ODC – is a way of studying defects in software systems [5, 6, 7, 8]. ODC is a scheme to capture the semantics of each software fault quickly. It has been debated if faults can be tied to reliability in a cause-effect relationship. Some papers like [6, 8] indicate that this is valid, while others like [9] are more critical. Still, reducing the number of faults will make the system less prone to failure, so by removing faults without adding new ones, there is a good case for the system reliability increasing. This is called “reliability-growth models”, and is discussed by Hamlet in [9]. Avizienis et al. states [10] that fault prevention aim to provide the ability to deliver a service that can be trusted. Hence, preventing faults and reducing their numbers and severity in a system, the quality of the system can be improved in the area of dependability. 3. Research design Research questions. Initially we want to find which types of faults that are most frequent, and if there are some parts of the systems that have more faults than others: RQ1: Which types of faults are most typical for the different software parts? When we know which types of faults dominate and where these faults appear in the systems, we can choose to concentrate on the most serious ones in order to identify the most important issues to target in improvement work: RQ2: Are certain types of faults considered to be more severe than others by the developers? Research method. This study is based on data mining, where the data consists of fault reports we have received from four commercial projects. The investigation has mostly been a bottom-up process, because of the initial uncertainty about the available data from potential participants. After establishing a dialogue with the projects, and acquiring the fault reports, our initial research questions and goals were altered accordingly. The metrics used. The metrics have been chosen based on what we wanted to investigate and on what data turned out to be available from the projects participating in the study. The frequency number of detected faults is an indirect metric, attained by

99

counting the number of faults of a type or for a system part etc. The metrics used directly from the data in the reports are type, severity and location of the fault. 3.1 Fault categories There are several taxonomies for fault types, two examples are the ones used in the IEEE 1044 standard [4] and in a variant of the Orthogonal Defect Classification (ODC) scheme [6]. The fault types used in this study is shown in Table 1. They have been derived by using the existing data material in the reports, combined with two taxonomies found in literature, IEEE 1044 and ODC. Categorization of faults in this investigation has been performed partly by the projects themselves and completed by us as a part of this investigation, based on the fault reports’ textual description and partial categorization. Also, grading the faults’ consequences upon the system and system environment enables fault severities to be defined. All severity grading has been done by the fault reporters in the projects.

Table 1. Fault types used in this study

Fault types Assignment fault Functional fault - logic Missing data Checking fault Functional fault - state Missing functionality Data fault GUI fault Missing value Documentation fault I/O fault Performance fault Environment fault Interface fault Wrong functionality called Functional fault - computation Memory fault Wrong value used

3.2 Data collection The data sample. We contacted over 40 different companies that we believed had relevant projects we could study. In the end four projects fit our criteria and were willing to proceed with the study. The reasons for the low participation rate among the contacted companies were most likely issues like skepticism towards releasing sensitive information, lack of organized effort in fault handling and lack of resources. Table 2 contains information about the participating projects.

Table 2. Information about the participating projects

Project A B C D Project description

Financial system.

Real-time embedded system.

Public administration application.

Task management system.

Domain Finance Security Publ. administration Publ. administration Platform MVS, OS/2 VxWorks J2EE, EJB J2EE # reports 52 360 1684 379 Dev. effort ~27400 hours ~ 32000 hours ~17600 hours 2165 hours

Note that projects C and D have been developed using modern practices, including component-based development, while projects A and B have been developed using more traditional development practices.

100

4. Research results RQ1 – Which types of faults are most typical? To answer RQ1, we look at the distribution of the fault type categories for the projects, shown in Table 3. For projects C and D, we see that functional logic faults are dominant, with 49% and 58% of the faults for those projects. Functional logic faults are also a large part of the faults in projects A and B. In the same manner, the distribution of faults with a severity rating of “high” is shown in Table 4. Functional logic faults are still dominant in projects C and D, with 45% and 69% of the faults, respectively. Project A is a special case here, as only one single fault was reported to be of high severity.

When looking at the distribution of faults, especially for the high severity faults, we see that two categories of dominate the picture, “Functional logic” and “Functional state”. We also see that for all faults, “GUI” faults have a large share (around 8% for projects B, C, D) of the reports, while for the high severity faults the share of GUI faults are strongly reduced in projects C and D to 2% and 0% respectively. RQ2 – Are certain types of faults considered to be more severe? To answer RQ2, we need to look at the number of “high” severity rated faults for different fault categories. Figure 1 shows the percentage of high severity faults found in some fault categories for three of the projects. Project A is left out because of having only one high severity fault reported. From Figure 1, we see that some fault types seem to be judged as more severe than others. In the projects that do report them, “Memory fault” stands out as a high severity type of fault. For Projects C and D, “GUI faults” are not judged to be very severe, while Project B rates them in line with other fault types. We also see that Project B has generally rated more of their faults as being highly severe than Projects C and D.

Table 3. Distribution of all faults in fault type categories

Project Fault type A B C D Assignment 7 % 4 % 1 % 1 % Checking 4 % 3 % 2 % 1 % Data 4 % 6 % 5 % 4 % Documentation 0 % 1 % 6 % 3 % Environment 0 % 2 % 1 % 0 % Funct. comp. 13 % 1 % 1 % 0 % Funct. logic 20 % 29 % 49 % 58 % Funct. state 0 % 25 % 3 % 5 % GUI 2 % 8 % 8 % 7 % I/O 0 % 2 % 1 % 0 % Interface 0 % 4 % 0 % 0 % Memory 0 % 1 % 0 % 0 % Missing data 2 % 0 % 1 % 2 % Missing funct. 13 % 8 % 8 % 3 % Missing value 4 % 1 % 1 % 1 % Performance 0 % 1 % 3 % 1 % Wrong funct. 0 % 1 % 2 % 1 % Wrong value 27 % 3 % 3 % 4 % UNKNOWN 2 % 2 % 5 % 8 %

Table 4. Distribution of high severity faults in fault type categories

Project Fault type A B C D Assignment 100 % 1 % 0 % 0 % Data 0 % 6 % 15 % 4 % Documentation 0 % 0 % 2 % 0 % Environment 0 % 4 % 5 % 0 % Funct. logic 0 % 19 % 45 % 69 % Funct. state 0 % 36 % 8 % 9 % GUI 0 % 10 % 2 % 0 % I/O 0 % 1 % 5 % 0 % Interface 0 % 3 % 0 % 0 % Memory 0 % 3 % 0 % 2 % Missing data 0 % 0 % 2 % 4 % Missing funct. 0 % 7 % 2 % 4 % Missing value 0 % 1 % 2 % 0 % Performance 0 % 3 % 9 % 0 % Wrong funct. 0 % 0 % 0 % 2 % Wrong value 0 % 6 % 6 % 4 %

101

By comparing the two projects C and D, which had employed reuse strategies in development, with the other two projects, there is no evidence that development with reuse has had any significant effects on fault distribution or severity.

0,0 % 20,0 % 40,0 % 60,0 % 80,0 % 100,0 %

Assignment fault

Data fault

Environment fault

Function fault logic

Function fault state

GUI fault

I/O fault

Interface fault

M emory fault

M issing data

M issing functionality

M issing value

Performance fault

Wrong function called

Wrong value used

D

C

B

Figure 1. Percentage of high severity faults in some fault categories

5. Discussion A major issue when doing the analysis of the data collected was the heterogeneity of the data. These are four different companies where data collection has not been coordinated beforehand, and as each company used their own proprietary fault report system, no standards for reporting was followed. Another issue was cases of missing data in reports, e.g. missing information about fault location. Because the reports have been used for development rather than for research purposes, the developers have not always entered all data into the reports. A final issue was incompatibility between fault reports for one of the projects and other information concerning the project. No satisfactory link between the functional and structural modules was available in project D. This prevented us from separating the reused parts from the rest of the system, and hindered a valid study of comparing reused to non-reused system parts at this time. Concerning validity, the most serious threats to external validity are the small number of projects under investigation and that the chosen projects may also not necessarily be the most typical. As for conclusion validity, one possible threat is low reliability of measures, because of some missing data or parts of the data. 6. Conclusion and future work This paper has presented some preliminary results of an investigation on fault reports in industrial projects. The results answer our two questions: RQ1: Which types of faults are most typical for the different software parts? -Looking at all faults in all projects, “functional logic” faults were the dominant fault type. For high severity faults, “functional logic” and “functional state” faults were dominant.

102

RQ2: Are certain types of faults considered to be more severe than others? -We have seen that some fault types are rated more severe than others, for instance “Memory fault”, while the fault type “GUI fault” was rated as less severe for the two projects employing reuse in development. Results from this study are preliminary, and the next step is to focus on the differences between reuse-based development projects and non-reuse projects. We will also try to incorporate fault report data from 2-3 other projects into the investigation in order to increase the validity of the study. Later, the BUCS project wants to focus on the most typical and serious faults, and describe how we can identify and prevent these at an earlier development stage. This may be in the form of a checklist for some hazard analysis scheme. References 1. J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early

software project phases”. Proceedings, Norwegian Informatics Conference, 2004 2. B. Littlewood; L. Strigini, “Software reliability and dependability: a roadmap”, Proceedings

of the Conference on The Future of Software Engineering, Limerick, Ireland, 2000, Pages: 175 - 188

3. N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995 4. IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993, December 2,

1993 5. K. Bassin; P. Santhanam, “Managing the maintenance of ported, outsourced, and legacy

software via orthogonal defect classification”, Proceedings. IEEE International Conference on Software Maintenance, 2001, 7-9 Nov. 2001

6. K. El Emam; I. Wieczorek, “The repeatability of code defect classifications”, Proceedings. The Ninth International Symposium on Software Reliability Engineering, 1998, 4-7 Nov. 1998 Page(s):322 – 333

7. R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y. Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956

8. R.R. Lutz; I.C. Mikulski, “Empirical analysis of safety-critical anomalies during operations”, IEEE Transactions on Software Engineering, 30(3):172-180, March 2004

9. D. Hamlet, “What is software reliability?”, Proceedings of the Ninth Annual Conference on Computer Assurance, 1994. COMPASS '94 'Safety, Reliability, Fault Tolerance, Concurrency and Real Time, Security', 27 June-1 July 1994 Page(s):169 - 170

10. A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr; Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, January-March 2004

103

P3. Revisiting the Problem of Using Problem Reports for Quality Assessment

Parastoo Mohagheghi, Reidar Conradi, Jon Arvid Børretzen

Department of Computer and Information Science Norwegian University of Science and Technology

No-7491, Trondheim- Norway

{parastoo, conradi, borretze}@idi.ntnu.no

Abstract

In this paper, we describe our experience with using problem reports from industry for quality assessment. The non-uniform terminology used in problem reports and validity concerns have been subject of earlier research but are far from settled. To distinguish between terms such as defects or errors, we propose to answer three questions on the scope of a study related to what (problem appearance or its cause), where (problems related to software; executable or not; or system), and when (problems recorded in all development life cycles or some of them). Challenges in defining research questions and metrics, collecting and analyzing data, generalizing the results and reporting them are discussed. Ambiguity in defining problem report fields and missing, inconsistent or wrong data threatens the value of collected evidence. Some of these concerns could be settled by answering some basic questions related to the problem reporting fields and improving data collection routines and tools.

Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics- product metrics, process metrics; D.2.4 [Software Engineering]: Software/Program Verification- reliability, validation.

General Terms Measurement, Reliability.

Keywords Quality, defect density, validity.

1. INTRODUCTION Data collected on defect or faults (or in general problems) are used in evaluating software quality in several empirical studies. For example, our review of extant literature on industrial software reuse experiments and case studies verified that problem-related measures were used in 70% of the reviewed papers to compare quality of reused software components versus the non-reused ones, or development with systematic reuse to development without it. However, the studies report several concerns using data from problem reports and we identified some common concerns as well. The purpose of this paper is to reflect over these concerns and generalize the experience, get feedback from other researchers on the problems in using problem reports, and how they are handled or should be handled. In this paper, we use data from 6 large commercial systems all developed by the Norwegian industry. Although most quantitative results of the studies are already

104

published [4, 12, 18], we felt that there is a need for summarizing the experience in using problem reports, identifying common questions and concerns, and raising the level of discussion by answering them. Examples from similar research are provided to further illustrate the points. The main goal is to improve the quality of future research on product or process quality using problem reports. The remainder of this paper is organized as follows. Section 2 partly builds on work of others; e.g., [14] has integrated IEEE standards with the Software Engineering Institute (SEI)’s framework and knowledge from four industrial companies to build an entity-relationship model of problem report concepts, and [9] has compared some attributes of a number of problem classification schemes (the Orthogonal Defect Classification- ODC [5], the IEEE Standard Classification for Software Anomalies (IEEE Std. 1044-1993) and a classification used by Hewlett-Packard). We have identified three dimensions that may be used to clarify the vagueness in defining and applying terms such as problem, anomaly, failure, fault or defect. In Section 3 we discuss why analyzing data from problem reports is interesting for quality assessment and who the users of such data are. Section 4 discusses practical problems in defining goals and metrics, collecting and analyzing data, and reporting the results through some examples. Finally, Section 5 contains discussion and conclusion. 2. TERMINOLOGY There is great diversity in the literature on the terminology used to report software or system related problems. The possible differences between problems, troubles, bugs, anomalies, defects, errors, faults or failures are discussed in books (e.g., [7]), standards and classification schemes such as IEEE Std. 1044-1993, IEEE Std. 982.1-1988 and 982.2-1988 [13], the United Kingdom Software Metrics Association (UKSMA)’s scheme [24] and the SEI’s scheme [8], and papers; e.g., [2, 9, 14]. The intention of this section is not to provide a comparison and draw conclusions, but to classify differences and discuss the practical impacts for research. We have identified the following three questions that should be answered to distinguish the above terms from one another, and call these as problem dimensions: What- appearance or cause: The terms may be used for manifestation of a problem (e.g., to users or testers), its actual cause or the human encounter with software. While there is consensus on “failure” as the manifestation of a problem and “fault” as its cause, other terms are used interchangeably. For example, “error” is sometimes used for the execution of a passive fault, and sometimes for the human encounter with software [2]. Fenton uses “defect” collectively for faults and failures [7], while Kajko-Mattson defines “defect” as a particular class of cause that is related to software [14]. Where- Software (executable or not) or system: The reported problem may be related to software or the whole system including system configuration, hardware or network problems, tools, misuse of system etc. Some definitions exclude non-software related problems while others include them. For example, the UKSMA’s defect classification scheme is designed for software-related problems, while SEI uses two terms: “defects”

105

are related to the software under execution or examination, while “problems” may be caused by misunderstanding, misuse, hardware problems or a number of other factors that are not related to software. Software related problems may also be recorded for executable software or all types of artefacts: “Fault” is often used for an incorrect step, logic or data definition in a computer program (IEEE STd. 982.1-1998), while a “defect” or “anomaly” [13] may also be related to documentation, requirement specifications, test cases etc. In [14], problems are divided into static and dynamic ones (failures), where the dynamic ones are related to executable software. When- detection phase: Sometimes problems are recorded in all life cycle phases, while in other cases they are recorded in later phases such as in system testing or later in field use. Fenton gives examples of when “defect” is used to refer to faults prior to coding [7], while according to IEEE STd. 982.1-1998, a “defect” may be found during early life cycle phases or in software mature for testing and operation [from 14]. SEI distinguishes the static finding mode which does not involve executing the software (e.g., reviews and inspections) from the dynamic one. Until there is agreement on the terminology used in reporting problems, we must be aware of these differences and answer the above questions when using a term. Some problem reporting systems cover enhancements in addition to corrective changes. For example, an “anomaly” in IEEE Std. 1044-1993 may be a problem or an enhancement request, and the same is true for a “bug” as defined by OSS (Open Source Software) bug reporting tools such as Bugzilla [3] or Trac [23]. An example of ambiguity in separating change categories is given by Ostrand et al. in their study of 17 releases of an AT&T system [20]. In this case, there was generally no identification in the database of whether a change was initiated because of a fault, an enhancement, or some other reason such as a change in the specifications. The researches defined a rule of thumb that if only one or two files were changed by a modification request, then it was likely a fault, while if more than two files were affected, it was likely not a fault. We have seen examples where minor enhancements were registered as problems to accelerate their implementation and major problems were classified as enhancement requests (S5 and S6 in Section 4). In addition to the diversity in definitions of a problem, problem report fields such as Severity or Priority are also defined in multiple ways as discussed in Section 4. 3. QUALITY VIEWS AND DEFECT DATA In this section, we use the term “problem report” to cover all recorded problems related to software or other parts of a system offering a service, executable or non-executable artefacts, and detected in phases specified by an organization, and a “defect” for the cause of a problem. Kitchenham and Pfleeger refer to David Garvin’s study on quality in different application domains [15]. It shows that quality is a complex and multifaceted concept that can be described from five perspectives: The user view (quality as fitness for

106

purpose or validation), the product view (tied to characteristics of the product), the manufacturing view (called software process view here or verification as conformance to specification), the value-based view (quality depends on the amount a customer is willing to pay for it), and the transcendental view (quality can be recognized but not defined). We have dropped the transcendental view since it is difficult to measure, and added the planning view (quality as conformance to plans) as shown in Figure 1 and described below (“Q” stands for a Quality view). While there are several metrics to evaluate quality in each of the above views, data from problem reports are among the few measures of quality being applicable to most views.

user

Q1. quality-in-useQ2. in

ternal a

nd extern

al

product qualit

y metri

cs

developers

Q3. process quality metrics

quality manager

Q5. val

ue of

correc

tions

vs. co

st of re

work

Q4. pro

ject p

rogres

s,

resou

rce pl

anning

project leader

Defectdata

Figure 1. Quality views associated to defect data, and relations between them

Q1. Evaluating product quality from a user’s view. What truly represents software quality in the user’s view can be elusive. Nevertheless, the number and frequency of defects associated with a product (especially those reported during use) are inversely proportional to the quality of the product [8], or more specific to its reliability. Some problems are also more severe from the user’s point of view.

Q2. Evaluating product quality from the organization’s (developers’) view. Product quality can be studied from the organization’s view by assuming that improved internal quality indicators such as defect density will result in improved external behavior or quality in use [15]. One example is the ISO 9126 definition of internal, external and quality-in-use metrics. Problem reports may be used to identify defect-prone parts and take actions to correct them and prevent similar defects.

Q3. Evaluating software process quality. Problem reports may be used to identify when most defects are injected, e.g., in requirement analysis or coding. Efficiency of Verification and Validation (V&V) activities in identifying defects and the organization’s efficiency in removing such defects are also measurable by defining proper metrics of defect data [5].

Q4. Planning resources. Unsolved problems represent work to be done. Cost of rework is related to the efficiency of the organization to detect and solve defects and to the maintainability of software. A problem database may be used to evaluate whether the product is ready for roll-out, to follow project progress and to assign resources for maintenance and evolution.

Q5. Value-based decision support. There should be a trade-off between the cost of repairing a defect and its presumed customer value. Number of problems and

107

criticality of them for users may also be used as a quality indicator for purchased or reused software.

Table 1. Relation between quality views and problem dimensions Quality view Problem

Dimension Examples of problem report fields to evaluate a quality view

Q1-user, Q4-planning and Q5-value-based

what-external appearance where-system, executable software or not (user manuals), when-field use

IEEE Std. 1044-1993 sets Customer value in the recognition phase of a defect. It also asks about impacts on project cost, schedule or risk, and correction effort which may be used to assign resources. The count or density of defects may be used to compare software developed in-house with reused.

Q2- developer and Q3-process

what-cause, where-software, executable or not, when-all phases

ODC is designed for in-process feedback to developers before operation. IEEE Std. 1044-1993 and the SEI’s scheme cover defects detected in all phases and may be used to compare efficiency of V&V activities. Examples of metrics types of defects and the efficiency of V&V activities in detecting them.

Table 1 relates the dimensions defined in Section 2 to the quality views. E.g., in the first row, “what-external appearance” means that the external appearance of a problem is important for users, while the actual problem cause is important for developers (Q2-developer). Examples of problem report fields or metrics that may be used to assess a special quality view are given. Mendonça and Basili [17] call identifying quality views as identifying data user groups. We conclude that the contents of problem reports should be adjusted to quality views. We discuss the problems we faced in our use of problem reports in the next section. 4. INDUSTRIAL CASES Ours and other’s experience from using problem reports in assessment, control or prediction of software quality (the three quality functions defined in [21]) shows problems in defining measurement goals and metrics, collecting data from problem reporting systems, analyzing data and finally reporting the results. An overview of our case studies is shown in Table 2. 4.1 Research Questions and Metrics The most common purpose of a problem reporting system is to record problems and follow their status (maps to Q1, Q4 and Q5). However, as discussed in Section 3, they may be used for other views as well if proper data is collected. Sometimes quality views and measurement goals are defined top-down when initiating a measurement program (e.g., by using the Goal-Question-Metric paradigm [1]), while in most cases the top-down approach is followed by a bottom-up approach such as data-mining or Attribute

108

Focusing (AF) to identify useful metrics when some data is available; e.g., [17, 19, 22]. We do not intend to focus on the goals more than what is already discussed in Section 3 and refer to literature on that. But we have encountered the same problem in several industrial cases which is the difficulty of collecting data across several tools to answer a single question. Our experience suggests that questions that need measures from different tools are difficult to answer unless effort is spent to integrate the tools or data. Examples are: − In S6, problems for systems not based on the reusable framework were not recorded

in the same way as those based on it. Therefore it was not possible to evaluate whether defect density is improved or not by introducing a reusable framework [12].

− In S5, correction effort was recorded in an effort reporting tool and modified modules could be identified by analyzing change logs in the configuration management tool, without much interoperability between these tools and the problem reporting tool. This is observed in several studies. Although problem reporting systems often included fields for reporting correction effort and modifications, these data were not reliable or consistent with other data. Thus evaluating correction effort or the number of modified modules per defect or type of defect was not possible.

Graves gives another example on the difficulty of integrating data [11]. The difference between two organizations’ problem reporting systems within the same company lead to a large discrepancy in the fault rates of modules developed by the two organizations because the international organization would report an average of four faults for a problem that would prompt one fault for the domestic organization. To solve the problem, researchers often collect or mine industrial data, transform it and save it in a common database for further analysis. Examples are given in the next section.

Table 2. Case Studies using data from problem reports System Id. and description Approximate size (KLOC) and

programming language No. of problem reports

No. of releases reported on

S1- Financial system Not available (but large) in C, COBOL and COBOL II

52 3

S2- Controller software for a real-time embedded system

271 in C and C++ 360 4

S3- Public administration application

952 in Java and XML 1684 10

S4- combined web system and task management system

Not available (but large), in Java 379 3

S5- Large telecom system 480 in the latest studied release in Erlang, C and Java

2555 2

S6- a reusable framework for developing software systems for oil and gas sector

16 in Java 223 3

109

4.2 Collecting and Analyzing Data Four problems are discussed in this section: 1. Ambiguity in defining problem report fields even when the discussion on

terminology is settled. A good example is the impact of a problem:

− The impact of a problem on the reporter (user, customer, tester etc.) is called for Severity in [24], Criticality in [8] or even Product status in IEEE Std. 1044-1993. This field should be set when reporting a problem.

− The urgency of correction from the maintenance engineer’s view is called Priority in [24], Urgency in [8] or Severity in IEEE Std. 1044-1993. It should be set during resolution.

Some problem reporting systems include the one or the other, or even do not distinguish between these. Thus, the severity field may be set by the reporter and later changed by the maintenance engineer. Here are some examples on how these fields are used: − For reports in S1 and S4 there was only one field (S1 used “Consequence”, while

S4 used “Priority”), and we do not know if the value has been changed from the first report until the fault has been fixed.

− S2 used the terms “Severity” and “Priority” in the reports.

− S3 used two terms: “Importance” and “Importance Customer”, but these were mostly judged to be the same.

In [14], it is recommended to use four fields for reporter and maintenance criticality, and reporter and maintenance priority. We have not seen examples of such detailed classification. In addition to the problem of ambiguity in definitions of severity or priority, there are other concerns: − Ostrand et al. reported that severity ratings were highly subjective and also

sometimes inaccurate because of political considerations not related to the importance of the change to be made. It might be downplayed so that friends or colleagues in the development organization “looked better”, provided they agreed to fix it with the speed and effort normally reserved for highest severity faults [20].

− Severity of defects may be downplayed to allow launching a release.

− Probably, most defects are set to medium severity which reduces the value of such classification. E.g., 90% of problem reports in S1, 57% in S2, 72% in S3, 57% in S4, and 57% in release 2 of S5 (containing 1953 problem reports) were set to medium severity.

2. A second problem is related to release-based development. While most systems are developed incrementally or release-based, problem reporting systems and procedures may not be adapted to differ between releases of a product. As an example, in S6 problem reports did not include release number, only date of reporting. The study assumed that problems are related to the latest release. In S5, we experienced that the size of software components (used to measure defect density) was not collected systematically on the date of a release. Problem report fields had also changed between releases, making data inconsistent.

110

3. The third problem is related to the granularity of data. Location of a problem used to measure defect density or counting defects may be given for large components or subsystems (as in S6) or fine ones (software modules or functional modules as in S4) or both (as in S5). Too coarse data gives little information while collecting fine data needs more effort.

4. Finally, data is recorded in different formats and problem reporting tools. The commercial problem reporting tools used in industry in our case studies often did not help data collection and analysis. In S1, data were given to researchers as hardcopies of the problem reports, which were scanned and converted to digital form. In S2, the output of the problem reporting system was a HTML document. In S3 and S4, data were given to researchers in Microsoft Excel spreadsheet, which provides some facilities for analysis but not for advanced analysis. In S5, problem reports were stored in text files and were transferred to a SQL database by researchers. In S6, data were transferred to Microsoft Excel spreadsheets for further analysis. Thus, researchers had to transform data in most cases. In a large-scale empirical study to identify reuse success factors, data from 25 NASA software projects were inserted by researchers in a relational database for analysis [22]. One plausible conclusion is that the collected data were rarely analyzed by organizations themselves, beyond collecting simple statistics.

The main purpose for industry should always be to collect business-specific data and avoid "information graveyards". Unused data are costly, lead to poor data quality (low internal validity) and even animosity among the developers. Improving tools and routines allows getting sense of collected data and giving feedback. 4.3 Validity Threats We have identified the following main validity threats in the studies: 1. Construct validity is related to using counts of problems (or defects) or their density

as quality indicators. For example, high defect density before operation may be an indicator of thorough testing or poor quality. Since this is discussed in the papers in the reference list, we refer to them and [21] on validating metrics.

2. Internal validity: Missing, inconsistent or wrong data is a threat to internal validity. Table 3 shows the percentages of missing data in some studies. In Table 3, “Location” gives the defect-prone module or component, while “Type” has different classifications in different studies.

Table 3. Percentages of missing data System Id

Severity Location Type

S1 0 0 0 S2 4.4 25.1 2.5 S3 20.0 20.0 8.6* (4.3) S4 0 0 9.0* (8.4) S5 0** 22 for large

subsystems, 46 for smaller

44 for 12 releases in the dataset

111

blocks inside

subsystems **

Notes: *These are the sum of uncategorized data points (unknown, duplicate, not fault). In parentheses are “unknown” only. ** For release 2

The data in Table 3 shows large variation is different studies, but the problem is significant in some cases. Missing data is often related to the problem reporting procedure that allows reporting a problem or closing it without filling all the fields. We wonder whether problem reporting tools may be improved to force developers entering sufficient information. In the meantime, researchers have to discuss the introduced bias and how missing data is handled, for example by mean substitution or verifying random missing. One observation is that most cases discussed in this paper collected data at least on product, location of a fault or defect, severity (reporter or developer or mixed) and type of problem. These data may therefore base a minimum for comparing systems and release, but with sufficient care.

3. Conclusion validity: Most studies referred to in this paper have applied statistical tests such as t-test, Mann-Whitney test or ANOVA. In most cases, there is no experimental design and neither is random allocation of subjects to treatments. Often all available data is analyzed and not samples of it. Preconditions of tests such as the assumption of normality or equal variances should be discussed as well. Studies often chose a fixed significance level and did not discuss the effect size or power of the tests (See [6]). The conclusions should therefore be evaluated with care.

4. External validity or generalization: There are arguments for generalization on the background of cases, e.g., to products in the same company if the case is a probable one. But “formal” generalization even to future releases of the same system needs careful discussion [10]. Another type of generalization is to theories or models [16] which is seldom done. Results of a study may be considered as relevant, which is different from generalizable.

4.4 Publishing the Results If a study manages to overcome the above barriers in metrics definition, data collection and analysis, there is still the barrier of publishing the results in major conferences or journals. We have faced the following: 1. The referees will justifiably ask for a discussion of the terminology and the relation

between terms used in the study and standards or other studies. We believe that this is not an easy task to do, and hope that this paper can help clarifying the issue.

2. Collecting evidence in the field needs comparing the results across studies, domains and development technologies. We tried to collect such evidence for studies on software reuse and immediately faced the challenge of inconsistent terminology and ambiguous definitions. More effort should be put in meta-analysis or review type studies to collect evidence and integrate the results of different studies.

112

3. Companies may resist publishing results or making data available to other researches.

5. DISCUSSION AND CONCLUSION We here described our experience with using problem reports for quality assessment in various industrial studies. While industrial case studies assure a higher degree of relevance, there is little control of collected data. In most cases, researchers have to mine industrial data, transform or recode it, and cope with missing or inconsistent data. Relevant experiments can give more rigor (such as in [2]), but the scale is small. We summarize the contributions of this paper in answering the following questions: 1. What is the meaning of a defect versus other terms such as error, fault or failure?

We identified three questions to answer in Section 2: what- whether the term applies to manifestation of a problem or its cause, where- whether problems are related to software or the environment supporting it as well, and whether the problems are related to executable software or all types of artifacts, and when- whether the problem reporting system records problems detected in all or some life cycle phases. We gave examples on how standards and schemes use different terms and are intended for different quality views (Q1 to Q5).

2. How data from problem reports may be used to evaluate quality from different views? We used the model described in [15] and extended in Section 3. Measures from problem or defect data is one the few measures used in all quality views.

3. How data from problem reports should be collected and analyzed? What is the validity concerns using such reports for evaluating quality? We discussed these questions with examples in Section 4. The examples show challenges that researchers face in different phases of research.

One possible remedy to ensure consistent and uniform problem reporting is to use a common tool for this - cf. the OSS tools Bugzilla or Trac (which stores data in SQL databases with search facilities). However, companies will need local relevance (tailoring) of the collected data and will require that such a tool can interplay with existing processes and tools, either for development or project management - i.e., interoperability. Another problem is related to stability and logistics. Products, processes and companies are volatile entities, so that longitudinal studies may be very difficult to perform. And given the popularity of sub-contacting/outsourcing, it is difficult to impose a standard measurement regime (or in general to reuse common artifacts) across subcontractors possibly in different countries. Nevertheless, we evaluate adapting an OSS tool and defining a common defect classification scheme for our research purposes and collecting the results of several studies. 6. REFERENCES

[1] Basili, V.R., Caldiera, G. and Rombach, H.D. Goal Question Metrics Paradigm. In Encyclopedia of Software Engineering, Wiley, I (1994), 469-476.

113

[2] Basili, V.R., Briand, L.C. and Melo, W.L. How software reuse influences productivity in object-oriented systems. Communications of the ACM, 39, 10 (Oct. 1996), 104-116.

[3] The Bugzilla project: http://www.bugzilla.org/

[4] Børretzen, J.A. and Conradi, R. Results and experiences from an empirical study of fault reports in industrial projects. Accepted for publication in Proceedings of the 7th International Conference on Product Focused Software Process Improvement (PROFES'2006), 12-14 June, 2006, Amsterdam, Netherlands, 6 p.

[5] Chillarege, R. and Prasad, K.R. Test and development process retrospective- a case study using ODC triggers. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’02), 2002, 669- 678.

[6] Dybå, T., Kampenes, V. and Sjøberg, D.I.K. A systematic review of statistical power in software engineering experiments. Accepted for publication in Journal of Information and Software Technology.

[7] Fenton, N.E. and Pfleeger, S.L. Software Metrics. A Rogorous & Practical Approach. International Thomson Computer Press, 1996.

[8] Florac, W. Software quality measurement: a framework for counting problems and defects. Software Engineering Institute, Technical Report CMU/SEI-92-TR-22, 1992.

[9] Freimut, B. Developing and using defect classification schemes. IESE- Report No. 072.01/E, Version 1.0, Fraunhofer IESE, Sept. 2001.

[10] Glass, R.L. Predicting future maintenance cost, and how we’re doing it wrong. IEEE Software, 19, 6 (Nov. 2002), 112, 111.

[11] Graves, T.L., Karr, A.F., Marron, J.S. and Harvey, S. Predicting fault incidence using software change history. IEEE Trans. Software Eng., 26, 7 (July 2000), 653-661.

[12] Haug, M.T. and Steen, T.C. An empirical study of software quality and evolution in the context of software reuse. Student project report, Department of Computer and Information Science, NTNU, 2005.

[13] IEEE standards on http://standards.ieee.org

[14] Kajko-Mattsson, M. Common concept apparatus within corrective software maintenance. In Proceedings of 15th IEEE International Conference on Software Maintenance (ICSM'99), IEEE Press, 1999, 287-296.

[15] Kitchenham, B. and Pfleeger, S.L. Software quality: the elusive target. IEEE Software, 13, 10 (Jan. 1996), 12-21.

[16] Lee, A.S. and Baskerville, R.L. Generalizing generalizability in information systems research. Information Systems Research, 14, 3 (2003), 221-243.

[17] Mendonça, M.G. and Basili, V.R. Validation of an approach for improving existing measurement frameworks. IEEE Trans. Software Eng., 26, 6 (June 2000), 484-499.

[18] Mohagheghi, P., Conradi, R., Killi, O.M. and Schwarz, H. An empirical study of software reuse vs. defect-density and stability. In Proceedings of the 26th International Conference on Software Engineering (ICSE’04), IEEE Press, 2004, 282-292.

[19] Mohagheghi, P. and Conradi, R. Exploring industrial data repositories: where software development approaches meet. In Proceedings of the 8th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering (QAOOSE’04), 2004, 61-77.

[20] Ostrand, T.J., Weyuker, E.J. and Bell, R.M. Where the bugs are. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA’04), ACM SIGSOFT Software Engineering Notes, 29, 4 (2004), 86–96.

114

[21] Schneidewind, N.F. Methodology for validating software metrics. IEEE Trans. Software Eng., 18, 5 (May 1992), 410-422.

[22] Selby, W. Enabling reuse-based software development of large-scale systems. IEEE Trans. SE, 31, 6 (June 2005), 495-510.

[23] The Trac project: http://projects.edgewall.com/trac/

[24]UKSMA- United Kingdom Software Metrics Association: http://www.uksma.co.uk/

115

P4. Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An

Empirical Study

Jon Arvid Børretzen and Jostein Dyre-Hansen


NO-7491 Trondheim, Norway {borretze, dyrehans}@idi.ntnu.no

Abstract. Improving software processes relies on the ability to analyze previous projects and derive which parts of the process that should be focused on for improvement. All software projects encounter software faults during development and have to put much effort into locating and fixing these. A lot of information is produced when handling faults, through fault reports. This paper reports a study of fault reports from industrial projects, where we seek a better understanding of faults that have been reported during development and how this may affect the quality of the system. We investigated the fault profiles of five business-critical industrial projects by data mining to explore if there were significant trends in the way faults appear in these systems. We wanted to see if any types of faults dominate, and whether some types of faults were reported as being more severe than others. Our findings show that one specific fault type is generally dominant across reports from all projects, and that some fault types are rated as more severe than others. From this we could propose that the organization studied should increase effort in the design phase in order to improve software quality.

1. Introduction Improving software quality is a goal most software development organizations aim for. This is not a trivial task, and different stakeholders will have different views on what software quality is. In addition, the character of the actual software will influence what is considered the most important quality attributes of that software. For many organizations, analyzing routinely collected data could be used to improve their process and product quality. Fault report data is one possible source of such data, and research shows that fault analysis can be a good approach to software process improvement [1]. The Business-Critical Software (BUCS) project [2] is seeking to develop a set of techniques to improve support for analysis, development, operation, and maintenance of business-critical systems. Aside from safety-critical systems, like air-traffic control and health care systems, there are other systems that we also expect will run correctly because of the possibly severe effects of failure, even if the consequences are mainly of an economic nature. This is what we call business-critical systems and software. In these systems, software quality is highly important, and the main target for developers will be to make systems that operate correctly [2]. One important issue in developing these kinds of systems is to remove any possible causes for failure, which may lead to wrong operation of the system. In a previous study [3], we investigated fault reports

116

from four business-critical industrial software projects. Building on the results of that study, we look at fault reports from five further projects. The study presented here investigated fault reports from five industrial software projects. It investigates the fault profiles in two main dimensions; Fault type and fault severity. The rest of this paper is organized as follows. Section 2 gives our motivation and related work. Section 3 describes the research design and research questions. Section 4 presents the results found, and Section 5 presents analysis and discussion of the results. The conclusion and further work is presented in Section 6. 2. Motivation and Related Work The motivation for the work described in this paper is to further the knowledge gained from a previous study on fault reports from industrial projects. We also wanted to present empirical data on the results of fault classification and analysis, and show how this can be of use in a software process improvement setting. When considering quality improvement in terms of fault analysis, there are several related topics to consider. Several issues about fault reporting are discussed in [4] by Mohagheghi et al. General terminology in fault reporting is one problem mentioned, validity of use of fault reports as a means for evaluating software quality is another. One of its conclusions is that “There should be a trade-off between the cost of repairing a fault and its presumed customer value. The number of faults and their severity for users may also be used as a quality indicator for purchased or reused software.” Software quality is a notion that encompasses a great number of attributes. The ISO 9126 standard defines many of these attributes as sub-attributes of the term “quality of use” [5]. When speaking about business-critical systems, the critical quality attribute is often experienced as the dependability of the system. In [6], Laprie states that “a computer system’s dependability is the quality of the delivered service such that reliance can justifiably be placed on this service.” According to Littlewood and Strigini [7], dependability is a software quality attribute that encompasses several other attributes, the most important are reliability, availability, safety and security. The term dependability can also be regarded subjectively as the “amount of trust one has in the system”. Much effort is being put into reducing the probability of software failures, but this has not removed the need for post-release fault-fixing. Faults in the software are detrimental to the software’s quality, to a greater or lesser extent dependent on the nature and severity of the fault. Therefore, one way to improve the quality of developed software is to reduce the number of faults introduced into the system during development. Faults are potential flaws in a software system, that later may be activated to produce an error. An error is the execution of a "passive fault", leading to a failure. A failure results in observable and erroneous external behaviour, system state or data state. The remedies known for errors and failures are to limit the consequences of an active error or failure, in order to resume service. This may be in the form of duplication, repair, containment

117

etc. These kinds of remedies do work, but as Leveson states in [8], studies have shown that this kind of downstream (late) protection is more expensive than preventing the faults from being introduced into the code. Faults that have been introduced into the system during implementation can be discovered either by inspection before the system is run, by testing during development or when the application is run on site. The discovered faults are then reported in a fault reporting system, to be fixed later. Faults are also commonly known as defects or bugs, while another, similar but more extensive concept is anomalies, which is used in the IEEE 1044 standard [9]. Orthogonal Defect Classification – ODC – is one way of studying defects in software systems, and is mainly suited to design and coding defects. [10, 11, 12, 13, 14] are some papers on ODC and using ODC in empirical studies. ODC is a scheme to capture the semantics of each software fault quickly. It has been discussed in several papers if faults can be tied to the reliability in a more or less cause-effect relationship. Some papers like [12, 14, 15] indicate that this kind of connection is valid, while others like [16] are more critical to this approach. Even if many of the studies point towards a connection being present between faults and reliability, they also emphasize that it is not easy to tie faults to reliability directly. Thus, it is not given that a system with a low number of faults necessarily has a higher reliability than a system with a high number of faults. Still, reducing the number of faults in a system will make the system less prone to failure, so if you can remove the faults you find without adding new ones, there is a good case for the reliability of the system being increased. This is called “reliability-growth models”, and is discussed by Hamlet in [16] and by Paul et al. in [15]. Avizienis et al. state [17] that the fault prevention and fault tolerance aim to provide the ability to deliver a service that can be trusted, while fault removal and fault forecasting aim to reach confidence in that ability by justifying that the functional and the dependability and security specifications are adequate and that the system is likely to meet them. Hence, by working towards techniques that can prevent faults and reduce the number and severity of faults in a system, the quality of the system can be improved in the area of dependability. An example of results in a related study is the work done in Vinter and Lauesen [18]. This paper used a different fault taxonomy as proposed by Bezier [19], and reports that in their studied project close to a quarter of the faults found were of the type “Requirements and Features”. 3. Research design This paper builds on a previous study [3] where we investigated the fault profiles of industrial projects, and this paper expands on those findings, using a similar research design. We want to explore the fault profiles of the studied projects with respect to fault

118

types and fault severity. In order to study the faults, we categorized them into fault types as described in Section 3.2. 3.1 Research questions Initially we want to find which types of faults which are most frequent, and also the distribution of faults into different fault types: RQ1: Which types of faults are most common for the studied projects? When we know which types of faults dominate and where these faults appear in the systems, we can choose to concentrate on the most serious ones in order to identify the most important issues to target in improvement work (note that the severity of the faults are judged by the developers who report the faults): RQ2: Which fault types are rated as the most severe faults? We also want to compare the results from this study with the results we found in the previous study on this topic [3]: RQ3: How do the results of this study compare with our previous fault report study? 3.2 Fault categorization There are several taxonomies for fault types, two examples are the ones used in the IEEE 1044 standard [9] and in a variant of the Orthogonal Defect Classification (ODC) scheme by El Emam and Wieczorek [12]. The fault reports we received were already categorized in some manner by the developers and testers, but using a very broad categorization scheme, which mainly placed the fault into categories of “fault caused by others”, “change request”, “test environment fault”, “analysis/design fault”, “test fault” and “coding fault”. The fault types used in this study is shown in Table 1. This is very similar to the ODC scheme used in [12], but with the addition of a GUI fault type. The reason this classification scheme was used, is that it is quite simple to use but still discerns the fault types well. Further descriptions of the fault types used can be found in Chillarege et al. [13].

Table 1. Fault types used in this study

Fault types Algorithm Function Assignment GUI Checking Interface Data Relationship Documentation Timing/serialization Environment Unknown

The categorization of faults in this investigation has been performed by the authors of this paper, based on the fault reports’ textual description and partial categorization. In addition, grading the faults’ consequences upon the system and system environment enables fault severities to be defined. All severity grading was done by the developers and testers performing the fault reporting in the projects. In the projects under study, the faults have been graded on a severity scale from 1 to 5, where 1 is “critical” and 5 is “change request”. The different severity classifications are shown in Table 2.

119

Table 2. Fault severity classification

Fault severity classification

1 Critical 2 Can not be circumvented 3 Can be circumvented 4 Cosmetic 5 Change request

3.3 The data sample The data collected for this study comes from five different projects, all from the same company, but from variously located development groups. The software systems developed in these projects are all on-line systems of a business-critical nature, and they have all been put into full or partial production. Altogether, we classified and analyzed 981 fault reports from the five projects. Table 3 contains information about the participating projects. The fault reports consisted of fault summary, severity rating, a coarse fault categorization, description of fault and comments made by testers and developers after the fault had been reported, while fixing the fault.

Table 3. Information about the participating projects

Project P1 P2 P3 P4 P5 Project description

Registering data

Administration tool

Merging of applications

Administration tool

Transaction tool

Technical platform

J2EE J2EE Unix, Oracle J2EE, Unix, Oracle

N/A

Development language

Java Java Java Java Java

Development effort (hours)

N/A 7900 14000 6000 2100

Number of fault reports

490 212 42 34 123

4. Results 4.1 RQ1 – Which types of faults are most frequent? To answer RQ1, we look at the distribution of the fault type categories for the different projects. Table 4 shows the distribution of faults types across all projects studied, Table 5 shows distribution of faults for each project. A plot of Table 5 is shown in Figure 1. We see that “function” and “GUI” faults are the most common fault types, with Assignment also being quite frequent. Some faults like “documentation”, “relationship”, “timing/serialization” and “interface” faults are not frequent.

120

Table 4. Fault type distribution across all projects

Fault type # of faults % Function 191 27,0 % GUI 138 19,5 % Unknown 87 12,3 % Assignment 75 10,6 % Checking 58 8,2 % Data 46 6,5 % Algorithm 37 5,2 % Environment 36 5,1 % Interface 11 1,6 % Timing/Serialization 11 1,6 % Relationship 9 1,3 % Documentation 8 1,1 %

Table 5. Fault type distribution for each project

Fault type P1 P2 P3 P4 P5 Algorithm 1,1 % 12,0 % 4,9 % 6,7 % 8,6 % Assignment 9,5 % 7,4 % 14,6 % 26,7 % 14,0 % Checking 6,3 % 15,4 % 2,4 % 0,0 % 7,5 % Data 1,9 % 15,4 % 2,4 % 3,3 % 10,8 % Documentation 1,4 % 0,6 % 0,0 % 0,0 % 2,2 % Environment 4,6 % 7,4 % 2,4 % 3,3 % 4,3 % Function 25,3 % 24,0 % 53,7 % 36,7 % 24,7 % GUI 29,9 % 5,7 % 14,6 % 6,7 % 10,8 % Interface 0,3 % 1,1 % 0,0 % 10,0 % 5,4 % Relationship 0,3 % 1,7 % 0,0 % 3,3 % 4,3 % Timing/Serialization 1,4 % 2,3 % 2,4 % 0,0 % 1,1 % Unknown 18,2 % 6,9 % 2,4 % 3,3 % 6,5 %

0 %

10 %

20 %

30 %

40 %

50 %

60 %

70 %

80 %

90 %

100 %

P1 P2 P3 P4 P5

Unknow n

Timing/Serialization

Relationship

Interface

GUI

Function

Environment

Documentation

Data

Checking

Assignment

Algorithm

Fig. 1. Fault type distribution for each project

If we focus only on the faults that are rated with “critical” severity (7.6% of all faults), the distribution is as shown in Figure 2. “Function” faults do not just dominate the total distribution, but also the distribution of “critical” faults. A very similar distribution is also the case for “can not be circumvented” severity rated faults. When looking at the distribution of faults, especially for the high severity faults, we see that “function” faults dominate the picture, We also see that for all faults, “GUI” faults

121

have a large share (19.5% in total) of the reports, while for the critical severity faults the share of “GUI” faults are strongly reduced to 1.5%.

0,00 %

5,00 %

10,00 %

15,00 %

20,00 %

25,00 %

30,00 %

35,00 %

40,00 %

Funct

ion

Unkno

wn

Assig

nmen

t

Enviro

nmen

t

Algor

ithm

Data

Relat

ionsh

ip

Timin

g/Seria

lizat

ion

Check

ing

GUI

Inte

rface

Docum

enta

tion

Fig. 2. Distribution of faults rated as critical

4.2 RQ2 – What types of faults are rated as most severe? As for the severity of fault types, Figure 3 illustrates how the distribution of severities was for each fault type. The “relationship” fault type has the highest share of “critical” faults, and also the highest share when looking at both “critical” and “can not be circumvented” severity faults. The most numerous fault type “function”, does not stand out as a particularly severe fault type compared with the others. The fault types that show themselves to be rated as least severe, are “GUI” and “data” faults.

0 %

20 %

40 %

60 %

80 %

100 %

Functi

on GUI

Unkno

wn

Assig

nmen

t

Check

ing

Data

Algor

ithm

Enviro

nmen

t

Inter

face

Timing

/Seria

lizati

on

Relati

onsh

ip

Docum

entat

ion

5 - Enhancement

4 - Cosmetic

3 - Can be circumvented

2- Can not be circumvented

1- Critical

Fig. 3. Distribution of severity with respect to fault types for all projects

4.3 RQ3 – How do the results compare with the previous study? Previously, we conducted a similar study of fault reports from industrial projects, which is described in [3]. In the previous study, “function” faults were the dominant fault type, making out 33.3% to 61.3% of the reported faults in the four investigated projects. The percentage of “function” faults is lower for the five projects studied for this paper, but is still the dominant fault type making out 24.0% to 53.7% of the reported faults in P1 to P5 as shown in Table 5.

122

When looking at the highest severity rated faults reported, this study also shows that “function” faults are the most numerous of the “critical” severity rated faults as shown in Figure 2 with 35.8%. This is in line with the previous study where “function” faults were also dominant among the most severe faults reported, with 45.3%. 5. Analysis and discussion 5.1 Implications of the results The results found in this study coincide with the results of the previous fault study we performed with different development organizations. In both studies the “function” faults have been the most numerous, both in general and among the faults rated as most severe. As “function” faults are mainly associated with the design process phase, as stated by Chillarege et al. in [13] and also by Zheng et al. in [20] as shown in Table 6, this indicates that a large number of faults had their origin in early phases of development. This is a sign that the design and specification process is not working as well as it should, making it the source of faults that are demanding and expensive to fix, as “function” faults will generally involve larger fixing efforts than pure code errors like “checking” and “assignment” types of faults. This means that we can recommend the developers in the projects that have been studied to increase the effort used during design in order to reduce the total number of effort demanding faults in their products. This finding is also similar to the one from the study of Vinter and Lauesen [18], where “Requirements and Features” faults were the dominating fault type.

Table 6. ODC fault types and development process phase associations [20]

Process Association

Fault types

Design Function Low Level Design

Interface, Checking, Timing/Serialization, Algorithm

Code Checking, Assignment Library Tools Relationship Publications Documentation

When looking at each fault type in Figure 3, we see which fault types that tend to produce the most severe faults. One observation here is that although “function” faults dominate the picture for critical severity faults in Figure 2, it is the “relationship” and “timing/serialization” fault types that consist of the most critical severity rated faults. It can therefore be argued that the fault types “relationship” and “timing/serialization” fault types are important to prevent, as it is likely that these types of faults have greater consequences than those of for instance “GUI” and “data” type faults. “Function” faults show themselves to be important to focus on preventing due to the sheer number of them, both in general and for the “critical” severity rated faults. Although “function”

123

faults do not stand out as a fault type where most faults are rated as “critical”, it is still the biggest contributor to “critical” severity rated faults. When informing the organization involved of the results of this study, the feedback was anecdotal confirmation of our findings, as they informed us that they were indeed having issues with design and specification, even though their own fault statistics showed most faults to be coding faults. We would like to study this issue further in our future work on the subject. In many cases, fault reporting is performed with one goal in mind, to fix faults that are uncovered through inspection and testing. Once the fault has been corrected, the fault report information is not used again. The available information can be employed in a useful fashion as long as future development projects are similar to, or based on previous projects. By reusing the information that has been accumulated during fault discovery through testing and during production, we are able to learn about possible faults for new similar projects and further development of current projects. Measuring quality and effects on quality in a software system is not a trivial matter. As presented in Section 2, the opinion on how and if this can be done is divided. One of the means Avizienis et al. suggests for attaining better dependability in a system is fault removal in order to reduce the number and severity of faults [17]. By identifying common fault types, developers can reduce a larger number of faults by focusing their efforts on preventing these types of faults. Also, identifying the most severe fault types makes developers able to focus on preventing those faults that have the biggest detrimental impact on the system. 5.2 Further issues concerning fault reporting in this organization In addition to our quantitative study results, we were able to identify some points of possible improvement in the studied organization's fault reporting. Two attributes that we found lacking, which should be possible to include in fault reporting are Fault Location and Fault Fixing Effort. The location of a fault should be readily known once a fault report has been dealt with, as fault fixing must have a target module or software part. This information would be very helpful if the organization wants to investigate which software modules produce the most serious faults, and they can then make a reasoned argument if these modules are of a particularly critical type (like infrastructure or server components), or if some modules are simply of a poorer quality than others. Including fault fixing effort into the fault reports is also an issue that could be of great benefit when working to improve fault prevention processes. By recording such information, we can see which fault types that produce the most expensive faults in terms of effort when fixing them. These are issues that will be presented to the organization under study. Their current process of testing and registering faults in a centralized way hinders the testers and developers from including this valuable information from the fault reports. The testers who initially produce the fault reports do not necessarily know which software modules the fault is located in, and developers fixing the fault do not communicate the location it was found in after it has been found and fixed.

124

5.3 Threats to validity When performing an empirical study on industrial projects, it is not possible to control the environment or data collected as we would do in an experiment. The following is a short presentation of what we see as the main validity threats. Internal validity. An issue here might be factors affecting the distribution of fault types. When the fault data was collected the intention of use was solely for fault fixing, it was not intended to be studied in this way. The coarse classification given by the developers could have been biased. Such bias or other inconsistencies were hopefully reduced by us classifying the fault reports with new fault types. External validity. The small number of projects under investigation is a threat to external validity. However, the results of this study support the findings of a previous similar study of fault reports from other software development organizations. The projects under study may also not necessarily be the most typical, but this is hard to verify in any way. Conclusion validity. One possible threat here is the reliability of measures, as the categorization of faults into fault types is a subjective task. To prevent categorizing faults we were unsure of into the wrong category, we used a type “unknown” to filter out the faults we were not able to confidently categorize. 6. Conclusion and future work In this paper we have described the results of a study of fault reports from five software projects from a company developing business-critical software. The fault reports have been categorized and analyzed according to our research questions. From the research questions we have found that "function" faults, closely followed by "GUI" faults are the fault types that occur most frequently in the projects. To reduce the number of faults introduced in the systems, the organization should focus on improving the processes which are most likely to contribute to these types of faults, namely the specification and design phases of development. Faults of the fault types "documentation", "relationship", "timing/serialization" and "interface" are the least frequent occurring fault types. The fault types that are most often rated as most severe are "relationship" and "timing/serialization" faults, while the fault types "GUI" and "documentation" are considered the least severe. Although “function” faults are not rated as the most severe type of fault, this fault type still dominates when looking at the distribution of highly severe faults only. In additions to these results, we observed that the organization’s fault reporting process could be improved by adding some information to the fault reports. This would facilitate more effective targeting of fault types and locations in order to better focus future efforts for improvement. In terms of future work, we want to continue studying the projects explored in this paper, using qualitative methods to further explain our quantitative results. Feedback

125

from the developers’ organization would aid us understand the source of these results, and help us suggest concrete measures for process improvement in the organization. Acknowledgements The authors would like to thank Reidar Conradi for careful reviewing and valuable input. We also thank the organization involved for their participation and cooperation during the study. References 1. R. Grady, Practical Software Metrics for Project Management and Process Improvement,

Prentice Hall, 1992 2. J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early

software project phases”. Proceedings, Norwegian Informatics Conference, 2004 3. J. A. Børretzen; R. Conradi, ”Results and Experiences From an Empirical Study of Fault

Reports in Industrial Projects”. Proc. 7th International Conference on Product Focused Software Process Improvement (PROFES'2006), 12-14 June 2006, Amsterdam, Pages: 389-394

4. P. Mohagheghi; R. Conradi; J.A. Børretzen, "Revisiting the Problem of Using Problem Reports for Quality Assessment", Proc. the 4th Workshop on Software Quality, held at ICSE'06, 21 May 2006, Shanghai, Pages: 45-50

5. ISO, ISO/IEC 9126 - Information technology - Software evaluation – Quality characteristics and guide-lines for their use, ISO, December 1991

6. J.-C. Laprie, “Dependable computing and fault tolerance: Concepts and terminology”, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years', June 27-30, 1995

7. B. Littlewood; L. Strigini, “Software reliability and dependability: a roadmap”, Proceedings of the Conference on The Future of Software Engineering, Limerick, Ireland, 2000, Pages: 175 - 188

8. N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995 9. IEEE, IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993,

December 2, 1993 10. K.A. Bassin; T. Kratschmer; P. Santhanam, “Evaluating software development objectively”,

IEEE Software, 15(6): 66-74, Nov.-Dec. 1998 11. K. Bassin; P. Santhanam, “Managing the maintenance of ported, outsourced, and legacy

software via orthogonal defect classification”, Proceedings. IEEE International Conference on Software Maintenance, 2001, 7-9 Nov. 2001

12. K. El Emam; I. Wieczorek, “The repeatability of code defect classifications”, Proceedings. The Ninth International Symposium on Software Reliability Engineering, 1998, 4-7 Nov. 1998 Page(s):322 – 333

13. R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y. Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956

14. R.R. Lutz; I.C. Mikulski, “Empirical analysis of safety-critical anomalies during operations”, IEEE Transactions on Software Engineering, 30(3):172-180, March 2004

126

15. R.A. Paul; F. Bastani; I-Ling Yen; V.U.B. Challagulla, “Defect-based reliability analysis for mission-critical software”, The 24th Annual International Computer Software and Applications Conference, 2000. COMPSAC 2000. 25-27 Oct. 2000 Page(s):439 - 444

16. D. Hamlet, “What is software reliability?”, Proceedings of the Ninth Annual Conference on Computer Assurance, 1994. COMPASS '94 'Safety, Reliability, Fault Tolerance, Concurrency and Real Time, Security', 27 June-1 July 1994 Page(s):169 – 170

17. A. Avizienis; J.-C. Laprie; B. Randell; and C. Landwehr, “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, January-March 2004

18. O. Vinter; S. Lauesen, “Analyzing Requirements Bugs”, Software Testing & Quality Engineering Magazine, Vol. 2-6, Nov/Dec 2000.

19. B. Beizer, Software Testing Techniques. Second Edition, Van Nostrand Reinhold, New York, 1990

20. J. Zheng; L. Williams; N. Nagappan; W. Snipes; J.P. Hudepohl; M.A. Vouk, “On the value of static analysis for fault detection in software”, IEEE Transactions on Software Engineering, Volume 32, Issue 4, April 2006 Page(s):240 – 253

127

P5. The Empirical Studies on Quality Benefits of Reusing Software Components

(Position paper) Jingyue Li, Anita Gupta, Jon Arvid, Børretzen, and Reidar Conradi

Department of Computer and Information Science (IDI) Norwegian University of Science and Technology (NTNU)

{jingyue, anitaash, borretze, conradi}@idi.ntnu.no

Abstract The benefits of reusing software components have been studied for many years. Several previous studies have concluded that reused components have fewer defects in general than non-reusable components. However, few of these studies have gone a further step, i.e., investigating which type of defects has been reduced because of reuse. Thus, it is suspected that making a software component reusable will automatically improve its quality. This paper presents an on-going industrial empirical study on the quality benefits of reuse. We are going to compare the defects types, which are classified by ODC (Orthogonal Defect Classification), of the reusable component vs. the non-reusable components in several large and medium software systems. The intention is to figure out which defects have been reduced because of reuse and the reasons of the reduction. 1. Introduction Software reuse is a management strategy for software evolution, in terms of development for and with reuse. Development for reuse refers to the generalization of components towards reuse, while development with reuse has to do with the inclusion of these reusable components in new and future development [1]. Understanding the issues related to software reuse, involving its purpose and promises, has been a focus since the 1970s. The focuses have been on how to develop for/with reuse, technical/managerial/organizational aspects, measuring reuse in terms of quality and productivity, as well as reporting success and failures of reuse practices. Although some studies have found that reusable software components have less defect density than non-reusable components [2]-[7], few studies have studied why software defects have been reduced because of reuse. In this study, we first have collected defect reports from several large and medium software systems, which include both reusable software components and non-reusable software components. Second, we will classify defects in the defect reports using ODC (Orthogonal Defect Classification) [8]. Then, we will compare the defect density and severity of different defect types of the reusable components vs. those non-reusable components. We expect to figure out the types of defects that are not related to reuse (i.e., their presence is the same for both reusable and non-reusable components). In addition, we hope to find out some defects that may be more probable in reusable components than non-reusable components. Finally, we will show the results of our defect density analysis to project members that building the reusable components and to

128

those building the non-reusable component. By discussing and interviewing these project members, we expect to find out why making a component reusable has helped or not helped to reduce certain defect types. This paper is structured as follows: Section 2 introduces related work; Section 3 presents our research motivation and research questions; Section 4 illustrates detailed research design; Section 5 shows the current available data that are going to analyze; Section 6 concludes the paper. 2. Related work Several industrial empirical studies shown in Table 1 conclude that reuse reduces the defect density, and therefore helps to improve the quality (especially the reliability) of the system. However, most studies focus on quantity of defects, such as the number of defects, or the defect density, without considering the defect type. Some studies have investigated why reusable components have better quality than non-reusable components. The study of Succi et al. [6] concludes that implementing a systematic reuse policy, such as the adoption of a domain specific library, improves the customer satisfaction. Results from study of Selby [7] show that software modules without revisions had the fewest faults. However, the results of these studies only show the connection between the reuse policy and the number of defects.

Table 1: Studies related to defect density and reuse

Quality focus Quality measures Conclusion Reusable vs. non-reusable components[2]

No definition of what a defect is. Defect density is given as defects/kilo non-comment source statements (KNCSS).

Reuse can provide improved quality, increased productivity, shortened time-to-market, and enhanced economics.

Reusable vs. non-reusable components [3]

Defect density (number of defects/LOC) and stability (module size and size of modified code). Size is in SLOC.

Reused components had lower defect density than the non-reused ones, and they were also less modified (more stable) than non-reused ones. Reused components had a higher number of defects of the highest severity before delivery, but fewer defects post-delivery.

Reusable vs. non-reusable components [4]

Defect density (number of defects/SLOC) and change density (number of change requests/SLOC).

The quality of the reusable framework improves and it becomes more stable over several releases.

Reusable vs. newly developed components [5]

Error/defect densities (errors/defects per thousand source statements). No definition for error/defect. Size is in SLOC.

Reuse provides an improvement in error density (more than a 90% reduction) compared to new development.

Code reuse [6] Customer Complaint Density (CCD) is the ratio of customer complaints to LOC, and is post-release defect density.

Reuse is significantly positively correlated with customer satisfaction.

Reused, modified and newly developed modules [7]

Module fault rate number of faults in a module. An error correction may affect more than one module. Each module affected by an error is counted as having a fault. Size is in SLOC.

Software modules reused without revision had the fewest faults, fewest faults per source line, and lowest fault correction effort. Software modules reused with major revisions had the highest fault correction effort and highest fault isolation effort.

129

3. Research motivation and research questions Our motivation here is to investigate the issues related to defects, namely classification and severity, in relation to software reuse. In another word, we are interested in the connection between the reuse policy and different kinds of defects. We want to know whether system reuse will help to reduce all kinds of defects or it is only helpful to minimize certain kinds of defects. We also suspect that reuse may increase certain types of defects. For example, we expect that familiarity with a component might prevent testing a component thoroughly. The results of this study will help industrial practitioners to better understand the benefits of reuse. It can also help them to improve their reuse policy in order to get a better quality system. Our research questions are as following: • RQ1: What are the more common defect types in the reusable components vs. those non-reusable components? • RQ2: What are the severities of defects for the reusable components vs. the non-reusable components? • RQ3: What are the reasons of reduced defects in the reusable components?

4. Detailed research design When abnormalities in the operation of a system are found, these are reported as failures. These failures are reported to developers through failure reports. A fault is an underlying problem in a software system that leads to a failure. Error is used to denote the execution of a passive fault which leads to incorrect behavior (with respect to the requirements) or system state [9], and also for any fault or failure resulting from human activity [10]. In our study, defect is used in place of fault, error or failure, without distinguishing the origin or whether it’s active or passive. To answer the first research question RQ1, we plan to use the Orthogonal Defect Classification (ODC) scheme defined by IBM [8] to classify the defects into different types. The goal of IBM’s ODC scheme is to categorize defects such that each defect type is associated with a specific stage of development. ODC has been used o evaluate and improve technology. For example, in order to investigate the value of automatic static analysis, the defects found by the static analysis and not found by static analysis can be classified [11]. Reuse is proposed as a mechanism to improve the efficiency and quality of software development. It is therefore reasonable to use ODC to analyze the quality improvement due to reuse. The attribute defect type in ODC captures the fix that was made to resolve the defect. For example, defects of type function are those that require a formal design change. Examples of the defect types are given in Table 2. Details of other defect types are in [12].

130

Table 2. Examples of defect types in ODC

Name Description Examples Assignment Value(s) assigned incorrectly or not assigned

at all Internal variable or variable within a control block did not have correct value, or did not have any value at all

Checking Errors caused by missing or incorrect validation of parameters or data in conditional statements.

Value greater than 100 is not valid, but the check to make sure that the value was less than 100 was missing.

Algorithm or Method Efficiency or correctness problems that affect the task and can be fixed by (re)implementing an algorithm or local data structure without the need for requesting a design change

The number and/or types of parameters of a method or an operation are incorrectly specified

To answer RQ1, we will compare the density and distribution of different defect types of reusable components with those non-reusable components. To answer RQ2, we will compare the distribution of defect severities, which are usually defined by testers or developers, of the reusable components with that of the non-reusable components. After getting the results of the RQ1 and RQ2, we will do a causal analysis by comparing the development processes, quality assurances, change management, application domain, and other contexts of the projects building the reusable components with those building non-reusable components to answer RQ3. The causal analysis will be done by interviewing project managers with supplemental documentation analysis. The purpose is to figure out why the reusable components have less/more defect density/severity of certain defect types than the non-reusable components.

5. The available data We currently have data from two software systems in company A and B (the company names are not shown in the paper due to confidential reasons). More data in several other companies will be collected in the near future.

5.1. Available data from the Company A The company A is a large, Norwegian company, in the Oil & Gas industry. The central IT-department of the company is responsible for developing and delivering software, which is meant to give key business areas better flexibility in their operation. It is also responsible for the operation and support of IT-systems. This department consists of approximately 100 developers, located mainly in Norway. Since 2003, a central IT strategy of the O&S (Oil Sales, Trading and Supply) business area has been to explore the potential benefits of reusing software systematically. Company A has developed a customized framework of reusable components, which is based on J2EE (Java 2 Enterprise Edition). The reusable components have been developed for three releases with in total 56 KLOC. There are several applications using the function of the reusable

131

components. One application is the document storage application and includes several components. The components in this application are defined as non-reusable components in our study. The application has also three releases with total of 67 KLOC. In company A, the defects are recorded by the Rational ClearQuest tool. Each trouble report contains an ID, headline description, severity (that indicates how critical the problem is evaluated by developers, such as critical, high, medium, or low), classification (Error, Error in other system, Duplicate, Rejected and Postponed), estimated time to fix, remaining time to fix, subsystem location, as well as an updated action and timestamp record for each new state the defect enters in the workflow. There are 223 trouble reports for the reusable framework and 438 trouble reports for the non-reusable application.

5.2. Available data from the Company B Company B is a large Nordic company, working in the IT industry. They specialize in applications for workflow and process support both for public and corporate purposes, as well as doing consultancy work for their customers. The company employs around 500 people. The project studied from company B is a combined web presentation system and task management system used in administration of public information and application processing. The defect reports stem from three releases of the project occupying 6-7 developers. The project from company B is developed using a framework with reusable components and generated code. This project is also based on J2EE. Major parts of the reused code were automatically generated by a code generation tool, and the company did not report on number of lines of code. The defects have been reported in the Atlassian Jira Bug Tracking tool. The trouble reports contains an ID (Key), a short summary, type, status (Resolved/Closed/Open), severity evaluated by the developers (Blocker, Critical, Major, Normal, Minor, Trivial), resolution (Fixed, Cannot Reproduce, Won't fix etc), which person has been assigned responsibility (Assignee), who reported the defect (Reporter), time created, time updated, version defect was found, version the defect should be fixed, and the subsystem location (Components). There are 379 trouble reports from company B, of which 286 trouble reports come from the reused parts, and 93 trouble reports come from the non-reused parts.

6 Conclusion and future work In this position paper, we present the research design of an on-going empirical study to investigate the benefit or cost of software reuse on software quality. By analyzing the defect reports of several software systems, which include both reusable and non-reusable components, we expect to deepen the understanding on why reuse improves the quality of software. The conclusions of this study will be given as guidelines on improving the reuse process to companies involved this study. In order to generalize our conclusion, the future work is to collect data from project with different contexts, such

132

as application domains, technologies, and development processes, in order to find the common good practices and lessons learned of software reuse.

7 References [1] G. Sindre, R. Conradi, and E. Karlsson, “The REBOOT Approach to Software Reuse”,

Journal of System Software, 30(3): 201–212, 1995. [2] W. C. Lim, “Effect of Reuse on Quality, Productivity and Economics”, IEEE Software,

11(5): 23-30, Sept./Oct. 1994. [3] P. Mohagheghi, R. Conradi, O. M. Killi, H. Schwarz, “An Empirical Study of Software

Reuse vs. Defect Density and Stability”, Proc. 26th Int’l Conference on Software Engineering (ICSE’2004), 23-28 May 2004, Edinburgh, Scotland, pp. 282-291, IEEE-CS Press.

[4] A. Gupta, O. P. N. Slyngstad, R. Conradi, P. Mohagheghi, H. Rønneberg, and E. Landre:

“An Empirical Study of Defect-Density and Change-Density and their Progress over Time in Statoil ASA”, Proc. 11th European Conference on Software Maintenance and Reengineering (CSMR’07), 21-23 March 2007, Amsterdam, The Netherlands, p. 10.

[5] W.M. Thomas, A. Delis and V.R. Basili, “An analysis of Errors in a Reuse-Oriented

Development Environment”, Journal of Systems and Software, 38(3): 211-224, September 2004.

[6] G. Succi, L. Benedicenti, and T. Vernazza, “Analysis of the Effects of Software Reuse on

Customer Satisfaction in an RPG Environment”, IEEE Transactions on Software Engineering, 27(5): 473-479, May 2001.

[7] W. Selby, “Enabling Reuse-Based Software Development of Large-Scale Systems”, IEEE

Transactions on Software Engineering, 31(6): 495-510, June 2005. [8] R. Chillarege, I. S. Bhandari, J. K. Chaar; M. J. Halliday, D. S. Moebus, B. K. Ray, M.-Y.

Wong, “Orthogonal Defect Classification - a Concept for in-Process Measurements, IEEE Transactions on Software Engineering, 18 (1): 943-956, Nov. 1992.

[9] IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12,

1990. [10] A. Endres and D. Rombach, “A Handbook of Software and Systems Engineering:

Empirical Observations, Laws and Theories”, Addison-Wesley, 2004. [11] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J. P. Hudepohl, M. A. Vouk, “On the

Value of Static Analysis for Fault Detection in Software”, IEEE Transactions on Software Engineering, 32 (4): 240-253, April, 2006.

[12] ODC defect type: http://www.research.ibm.com/softeng/ODC/DETODC.HTM#type

133

P6. Fault classification and fault management: Experiences from a software developer perspective



NO-7491 Trondheim, Norway [email protected]

Abstract: In most software development projects, faults are unintentionally injected in the software, and are later found through inspection, testing or field use and reported in order to be fixed later. The associated fault reports can have uses that go beyond just fixing discovered faults. This paper presents the findings from interviews performed with representatives involved in fault reporting and correcting processes in different software projects. The main topics of the interviews were fault management and fault reporting processes. The objective was to present practitioners’ view on fault reporting, and in particular fault classification, as well as to expand and deepen the knowledge gained from a previous study on the same projects. Through interviews and use of Grounded Theory we wanted to find the potential weaknesses in a current fault reporting process and elicit improvement areas and their motivation. The results show that fault management could and should include steps to improve product quality. The interviews also supported our quantitative findings in previous studies on the same development projects, where much rework through fault fixing need to be done after testing because areas of work in early stages of projects have been neglected. Keywords: Process improvement, Fault management, Fault classification, Software quality. 1. Introduction An important goal for most software development organizations is improving software quality, typically reliability. There are several ways to go about this, but it is not a trivial task, and different stakeholders have different views on what software quality is. In addition, the application domain of the actual software will influence what quality attributes we consider to be most relevant. For many organizations, their routinely collected data is an untapped source for process analysis and possible process improvement. Indeed, leaving collected data largely unused can demotivate the developers and reduce data quality. Our conjecture, and supported by previous research, is that fault analysis can be an effective approach to software process improvement (Grady, 1992).

The Business-Critical Software (BUCS) project (Børretzen et al., 2004) works to develop a set of techniques to improve support for analysis, development, operation, and maintenance of business-critical systems. Business-critical systems are systems that we expect will run correctly and safely, even if the consequences are mainly of a “mild”

134

economic nature. In these systems, the software is critical, and the main target for developers will be to make systems that operate correctly and without serious consequences in case of failures (erroneous behaviour vs. specifications). One important issue when developing these kinds of systems is to remove possible causes for failure, which may lead to wrong operation of the system.

In two previous studies (Børretzen and Conradi, 2006; Børretzen and Dyre-Hansen, 2007), we have investigated fault reports in nine business-critical industrial software projects. These studies were quantitative studies based on fault report analysis, where we mainly studied fault types and severity of faults. Building on the results of the most recent study, we want to gain a better understanding of the results this study gave us. A considerable share of the reported faults were of a type that are associated with early software development phases, indicating flaws in the quality control in these phases. This paper presents the results from interviews done with representatives from some of the projects studied in (Børretzen and Dyre-Hansen, 2007), as well as two workshops with further representatives on fault reporting and fault classification.

The rest of this paper is organized as follows. Section 2 gives our motivation and related work. Section 3 describes the research, research questions and procedure. Section 4 presents the results from the interviews and workshop feedback, and Section 5 presents a discussion of the work and the results. The conclusion and further work is presented in Section 6.

2. Background and Motivation

The motivation for the work described in this paper is to expand on the knowledge gained from a previous quantitative study on fault reports from five projects in a software development organization. By performing an additional qualitative investigation with representatives from some of the same projects, we sought to better understand the reasons for the results we saw in the quantitative study, as well as receiving input from practitioners on how a more thorough fault management can be part of a software process improvement initiative. “Value-based” software engineering say that developing models and measures that focus on value received, makes us able to perform trade-off decisions which helps us concentrate on the right issues in process improvement (Biffl et al., 2006). There are several motivations for preventing and correcting faults as early in the process as possible, but the main issue will usually be of an economical nature. 2.1 State-of-the-art

When considering quality improvement through use of fault analysis, there are many related topics to consider. Several issues about fault reporting are discussed by Mohagheghi et al. in (Mohagheghi et al., 2006). General terminology in fault reporting is one problem, validity of use of fault reports as a means for evaluating software quality is another. Important issues to consider are how to describe the fault by “what” – the cause of the fault, “where” – the location of the fault, and “when” – the detection phase of the fault. One of the conclusions in (Mohagheghi et al., 2006) is that “There should be a trade-off between the cost of repairing a fault and its presumed customer

135

value. The number of faults and their severity for users may also be used as a quality indicator for purchased or reused software.” By using fault report analysis, one can get a step closer to understand the cost of repairing faults of various categories. Fault reports is the term describing the information that is recorded about faults that are discovered during development, testing and field use of a software system. These reports can contain a range of different information, examples some common fault report attributes are fault description, fault severity, fault type and fault location. Their most obvious function is to be the link between fault discovery and fault correction, but they are also valuable when performing fault analysis of a system.

One way to improve the quality of developed software is to reduce the number and severity of faults introduced into the system during development. Faults are potential flaws in a software system, that later may be activated to produce an error. An error is the execution of a "passive fault", leading to a failure. A failure results in observable and erroneous external behaviour, i.e. inconsistent system state. Faults that have been introduced into the system during implementation can be discovered either by inspection before the system is run, by testing during development or when the application is run on site. The discovered faults are then reported in a fault reporting system, and will normally be fixed later. Faults are also commonly known as defects or bugs, and also under the more extensive concept anomalies.

Orthogonal Defect Classification – ODC – is a way of studying defects in software systems, and is mainly suited to design and coding defects. (Chillarege et al., 1992; El Emam and Wieczorek, 1998) are some papers on ODC and using ODC in empirical studies. ODC goes along with a scheme where the semantics of each software fault can be captured quickly and easily.

Avizienis et al. (2004) state that the fault prevention and fault tolerance aim to provide the ability to deliver a service that can be trusted, while fault removal and fault forecasting aim to reach confidence in that ability by justifying that the functional and the dependability and security specifications are adequate and that the system is likely to meet them. Hence, by working towards techniques that can prevent faults and reduce the number and severity of faults in a system, the quality of the system can be improved in the area of dependability.

There are different perspectives and motivations for working to prevent and correct faults in software, but the most important motivation is that of cost. Correcting faults is costly and in many instances is nothing but redoing the work that should have been done correctly in the first place. 2.2 Previous work Previously, we had conducted a fault report analysis of five projects in a software development organization. We studied the projects with regard to reported faults, through analysing fault reports from the development of the applications. Looking at descriptions of the individual faults, as well as other data reported about the faults, we classified the faults into different fault types. By grouping faults into fault types, we tried to find indications on where reported faults originated in the development process. The analysis was based on ODC, and we categorized faults into fault types based on that technique. Table 1 shows the fault types that were used in that study. The

136

fault analysis of the five projects and the results are described further in (Børretzen and Dyre-Hansen, 2007). The results of the fault analysis were presented to the interviewees, and the basis for the interviews was therefore our analysis of projects in their organization as well as their own experiences on fault reporting in the organization. Quantitative results from the fault report study indicated that a large share of faults reported originated from early phases of development, and we wanted to explore this area further with qualitative feedback from people involved in the studied projects. An earlier study performed by us in four other companies had shown the same tendency of high numbers of early phase faults (Børretzen and Conradi, 2006). Another example of results in a related study is the work done by Vinter and Lauesen (2000). This paper used a different fault type taxonomy, but reports that in their studied project close to a quarter of the faults found were of the similar type “Requirements and Features”. In addition to building on the quantitative results, we wanted to learn about practitioners’ opinions and knowledge about fault reporting and management.

Table 2. Fault types used in (Børretzen and Dyre-Hansen, 2007) Fault types

Algorithm Assignment Checking Data Documentation Environment Function GUI Interface Relationship Timing/serialization

Following on to our two previous qualitative studies on fault reports; we wanted to continue studying the projects explored in (Børretzen and Dyre-Hansen, 2007) using qualitative methods to further explain our quantitative results. Feedback from the developers’ organization would aid us understand the source of these results, and help us suggest concrete measures for process improvement in the organization. 2.3 Study Context The development organization we have studied is a part of a large company developing and maintaining business-critical applications in the financial sector. It has been involved in the user-driven EVISOFT research and development project concerning industrial process improvement since 2006 (EVISOFT, 2006). The organization under study is distributed over several locations, and it employs hundreds of software developers. Our study has mainly involved representatives from two locations in Norway.

137

Table 2. The organization’s existing fault categorization scheme. Fault types Other/not fault User error Fault at subcontractor Change Request Fault in generation Wrong version Failt in internal test systems Fault when establishing test environment Fault in sequencing Fault in external test environment Fault in test data Fault in test specification Specification/design fault Code fault

This organization had been working on process improvement concerning their fault management routines for some time, and some changes in the way faults were reported and handled were on the verge of being introduced when our first quantitative study started. They had an existing fault categorization scheme, although this was mostly focused on issues concerning the test environment, and did not capture the semantics of specification, design and coding faults in a detailed manner. This existing scheme is shown in Table 2. The feedback they received from our study (Børretzen and Dyre-Hansen, 2007) prompted some further proposals for change, which were to be implemented in a pilot project. The organization used a common process for reporting and managing faults, with minor customizations for individual projects. Test leaders were responsible for dealing with fault reports and reporting the faults back to developers for fixing. 2.4 The Grounded Theory Approach Grounded Theory is a systematic research methodology originating from the social sciences developed by the sociologists Barney Glaser and Anselm Strauss emphasizing generation of theory from data. When the principles of grounded theory are followed, a researcher using this approach will formulate a theory about the phenomena they are studying that can be evaluated. The grounded theory approach is a “qualitative research method that uses a systematic set of procedures to develop an inductively derived grounded theory about a phenomenon” (Strauss and Corbin, 1998). The methodology is designed to help researchers produce “conceptually dense” theories that consist of relationships among concepts representing “patterns of action and interaction between and among various types of social units” (Strauss and Corbin, 1998). Potential sources of data for developing grounded theory include interviews, field observations, documents, and videotapes. At the heart of the grounded theory methodology, are three coding procedures of open coding, axial coding, and selective coding (Strauss and Corbin, 1998). These codes are generated and validated using the constant comparison method, and coding, at

138

each stage, terminates when theoretical saturation is achieved with no further codes or relationships among codes emerging from the data. Open coding involves immersion in the data and generation of concepts with dimensionalized properties using constant comparison. This is done by “breaking down, examining, comparing, conceptualizing, and categorizing data”, often in terms of properties and dimensions (Strauss and Corbin, 1998). The examination of data in order to fracture it and generate codes could proceed “line by line”, by sentence or paragraph, or by a holistic analysis of an entire document. The open coding process, while procedurally guided and promoting a realist ontology, requires researchers to “include the perspectives and voices of the people” whom they study. Data, for open coding, is selected using a form of theoretical sampling known as “open sampling.” Open sampling involves identifying situations/ portions of the transcripts that lead to greater understanding of categories and their properties. Axial coding refers to the analytic activity for "making connections between a category and its sub-categories" developed during open coding, that is, reassembling fractured data by utilizing "a coding paradigm involving conditions, context, action/interactional strategies and consequences" (Strauss and Corbin, 1998). Selective coding involves the identification of the “core category” (central phenomenon that needs to be theorized about) and linking the different categories to the core category using the paradigm model (consisting of conditions, context, strategies, and consequences). Often, this integration takes the shape of a process model with the linking of action/interactional sequences. Although Grounded Theory is not the most common research method in computer sciences, there are several studies showing that this way of building theory and drawing conclusions from qualitative data is highly applicable in this field as well as in the social studies (Bryant, 2002; Hansen and Kautz, 2005; Sarker et al., 2000). 3. Research design As this study is based on qualitative methods, we had not initially made any rigorous research questions before initializing the study, just an interview guide. As the interview guide was prepared, though, we could put the different questions into related groups that would help answer some common questions. Interviews were carried out based on the interview guide, and the transcribed interview responses were coded and analyzed using the Grounded Theory method. 3.1 Research Questions This investigation is based on the results we got from a previous study on fault reports. The main research questions for this study derived from the researchers’ viewpoint after the quantitative study are the following: Firstly, we wished to hear if the experience of the practitioners involved in the projects we had analyzed was similar to the analysis results we had found in previous studies. RQ1: How can the large number of faults originating in early development phases which was found in the quantitative study be explained?

139

Secondly, we wanted to draw on their experience to hear if they thought a fault (type) classification scheme could be helpful towards improving their development processes. RQ2: Can the introduction of a fault classification scheme like ODC be useful to improve development processes? We also wanted to hear their opinions on increasing effort in data collection and fault report analysis in order to improve their software development processes. RQ3: Do they see feedback from fault report analysis as a useful software process improvement tool? Lastly, we wanted to ask them where they thought that there was most potential of improvement in their fault management system, to elicit areas that they felt were lacking in their current fault reporting process. RQ4: Do they see any potential improvement areas in their fault management system? We designed a semi-structured interview using an interview guide containing seven topics and incorporating 32 questions. 3.2 Research Procedure We started by defining our research goals and questions, by drawing on the conclusions from our previous quantitative study in the organization. We proceeded to design the interview guide, with adjustment of research questions accordingly. Figure 1 shows the main structure of our research procedure.

Figure 1. Research procedure Interviews The data from which we wanted answering our research questions were collected through expert interviews. The interviews were conducted by the author, a PhD-student, using an interview guide and a digital voice recorder. The interviews were subsequently transcribed and coded by the same person.

Interview guide and research questions

Conducting interviews

Transcribing interviews, data coding

Workshops to get developer feedback

Presentation of results

140

When selecting the interviewees, we wanted to have individuals who had been actively involved in some of the five projects we had studied in this organization before and who also had hands-on experience from dealing with fault management in these projects. From these criteria, we contacted project managers and test managers from the five projects we had studied in conjunction with our contact person in the organization. The outcome was interviews with three persons from projects we had studied, in addition to one other person who had worked in a similar project.

The interviews were conducted as open-ended, but structured interviews. The same questions were asked in every interview, but the interviewees were given a lot of room to talk about what they felt was important within the topic of the question. Interview data analysis Each question in the interview guide was related to one or more research questions, and the different responses for each question were compared to extract answers related to the research question. Grouping the answers related to each research question, we could extract information helping to answer our four research questions. Following this, the answers to each question were coded according to Grounded Theory in order to be able to separate different views on the questions. In line with using the constant comparison method, we coded each answer into groups. The codes were postformed, i.e. constructed as a part of the coding process, since the interviews were open and we had not made any expectations of how the interviewees would answer. After one round of coding had been performed, we went through the data once more in order to make sure that the responses grouped together actually said the same things. As Seaman (1999) states, the work of finding patterns and trends is largely creative, but as most of the responses in the interviews were rather direct, drawing general conclusions from interview responses was not difficult. Workshops In addition, we later received feedback about the topic at 7hand through discussions and comments during two workshops that were held in the organization in conjunction with the fault report study and interviews. The participants of these workshops had the same job description and responsibilities as the participants in the interviews, and all of them worked on similar business-critical application development projects. These comments and discussions were not formally recorded by voice recorder, but notes were taken as the workshop endured. This information was used to clarify and support the findings in the interviews. 3.3 Research Execution The organization under study is a large Norwegian software development organization that develops and maintains applications in several business-critical domains. The interviewees had worked in projects which had been developed for external customers. The software systems developed in these projects are all on-line systems of a business-critical nature, and they have all been put into full or partial production. The organization used a commercially available and common fault management tool,

141

Mercury Quality Center, and had developed a fault report template that were used in all projects, with minor amendments as needed by the projects. Interviews were conducted with four representatives from the organization from which we had studied five development projects with respect to fault reports (Børretzen and Dyre-Hansen, 2007). Two of the interviewees had worked as test managers in the projects we had studied, and one interviewee had worked as a test manager in another, but similar project in the organization. One interviewee had been the project manager for one of the studied projects.

Prior to the interview, the interviewees had been given information about our fault report study and our intentions with the interview. On the day of the interview, we had a presentation about the findings in our study and our interpretations of these results.

There were four separate interviews, one with each interviewee, held at two separate locations. The duration of the interviews was from 40 to 55 minutes. The interviews were subsequently transcribed, coded and analyzed using the constant comparison method (Seaman, 1999), and following the grounded theory method as described in (Strauss and Corbin, 1998). The interview guide consisted of 32 questions from seven categories as shown in Table 3.

Table 3. Interview categories

Interviewee background Results of the previous study of the same organization The organization’s own measurements and analysis on faults The existing quality system and fault management system used in the organization Fault reporting and fault categorization Feedback on fault reporting to developers Attitude to process changes and quality improvement initiatives in the organization

In addition to the interviews comes feedback that we got from participants

during two workshops on fault reporting and fault categorization that we had organized. The workshop participants were all test managers from the same organization where we had conducted the fault report study. The three test managers that had been interviewed also participated in these workshops. As the discussions and feedback from these workshops were very fruitful, we decided to include some of this elicited information in this study, to augment the knowledge we got from the interviews. The workshops also worked to confirm much of what had been talked about in the interviews.

4. Results This section presents the findings in our study, together with the augmenting feedback we received from the workshop we arranged with the organization.

142

4.1 Interview response The following presents a summary of how the interviewees responded to the interview questions related to each research question. This is based on the coded interview answers, and using the grounded theory method we can formulate some general findings for each of the research questions we had defined. RQ1: How can the large number of faults originating in early process phases which was found in the quantitative study be explained? In answering RQ1, there was consensus that work processes in early stages of development, i.e. specification and design, should be improved. Many of the faults were caused by poor specifications or design, or difficult access to specifications and design. Lack of information forwarded from analysts and designers to the developers was often cited as a cause of faults. Especially for more complex systems there was a need for more effort in early development phases. Poor guidelines for developers were also mentioned as a probable cause for the high number of faults related to specification and design. The interviewees agreed that the general results from the study of the five projects also were relevant for their individual projects. The findings in our previous qualitative fault analysis work supported the suspicion of the interviewees, that the work processes in early development were not optimal and that introducing fault analysis that could help pinpoint fault origins would be very useful. RQ2: Can the introduction of a fault classification scheme like ODC be useful to improve development processes? When answering questions related to RQ2, the response was that introducing a fault categorization scheme like ODC would be a good idea, given that the introduction of a new scheme was performed in steps and with the cooperation of everyone involved. They felt that by introducing a reporting scheme like this, it would be easier to document how development processes needed improvement, since the different fault types are quite distinct and descriptive. They were not particularly pleased with their current fault categorization scheme, which was not very detailed except for faults related to test environment. Introducing a new scheme was seen as an easy technical task, as they used a very flexible fault management system. RQ3: Do they see feedback from fault report analysis as a useful process improvement tool? In answers to questions on RQ3, there was a strong agreement that fault management could and should include steps to improve process and product quality. There had been spread attempts at fault analysis in the organization, but they didn’t believe the correct metrics had been used to exploit it fully. Using fault report analysis more actively, with more descriptive fault reports was seen as a very useful tool, but they also warned that the concept would have to be introduced to the developers who were going to use and be affected by this in the right manner, especially since more detailed fault reporting

143

could lay developers more open to “blame” for faults that had been introduced in development. RQ4: Do they see any potential improvement areas in their fault management system? The response to the questions related to RQ4 indicated that information flow could be improved between testers discovering faults and developers who were fixing faults. Although all respondents initially said they were pleased with the fault reporting scheme in terms of what information was entered into it, they had some comments on expansion of the current scheme. The potential improvements that were mentioned were better registration of effort (hours) used in fault discovery and correction tasks, and better registration of fault location on component or code level. 4.2 Feedback and experience from workshops In addition to the response and results we got from the interviews, we also would like to include some of the information and feedback we received on this topic when we arranged a workshop on the topics of fault reporting and fault categorization for representatives of the studied organization. These representatives were only shown the results from our quantitative study, not any information from the interviews, although some of the participants in the workshop had also been involved in the interview sessions.

Related to RQ1, on the large number of faults related to early process phases, the general reaction was again that in their experience, a documented development process was lacking, and there were clear indications that improving specification and design processes and work would be a positive move. The major complaints were that design phases were too hasty, there were not enough reviews and documentation was not good enough. Discussions around topics for RQ2 told us that most of the people at the workshop could see the need for better fault classification. Still, several people were skeptical of introducing an all new fault taxonomy without involving the people who were going to actually use it for classifying faults found. On the topic of feedback from fault report analysis as a process improvement tool (RQ3), the general consensus was that it could be very useful, and that the completed quantitative study performed showed that it had been useful already. As for potential improvement of their fault management system (RQ4), they seemed to agree that the actual system in use was sufficient, but that the information put into the system should be improved. 5. Discussion This chapter first discusses some issues concerning the results in view of our research questions, the study’s validity, and relevance for the company. In RQ1, we wished to hear if the experience of the practitioners involved in the projects we had analyzed, was similar to the analysis results we had found in previous studies. They generally agreed that the results from the quantitative study were valid and seemed relevant for their individual projects as well. Results also showed that

144

experience in the organization was in line with our previous findings, that there were weaknesses in the early development phases, specification and design, that were origins for faults being introduced in the software. This is a similar conclusion as presented by Vinter and Lauesen (2000). In RQ2, we wanted to draw on their experience to hear if they thought a fault type classification scheme could be helpful towards improving their development processes. The response was that introducing a better and more structured fault classification scheme would be a good idea, as long as it was a scheme that was clearly defined and useable for the developers. El Emam and Wieczorek (1998) claims that as long as fault type classes are well understood, developers are able to correctly categorize faults into fault types. In RQ3, we also wanted to hear their opinions on increasing effort in data collection and fault report analysis in order to improve their software development processes. The respondents were all very positive about introducing fault report analysis for process improvement into their fault management process. By using fault report analysis more actively, they could be able to produce better metrics for process improvement. As most software developing organizations produce large quantities of information about the faults in the system they develop, it makes a lot of sense to utilize this information further than the simplest and most obvious way that is purely using fault report data as a fault reporting and correction log. Lastly in RQ4, we wanted to ask where they thought there was most potential of improvement in their fault management system. This was to elicit areas that they felt were lacking in their current fault reporting process. The answers indicated that information flow between testers and developers should be improved, and there were areas of improvement on the information stored in fault reports. This is an issue we have touched upon in our previous studies (Børretzen and Conradi, 2006; Børretzen and Dyre-Hansen, 2007). 5.1 Validity Concerning the validity of this study, we have to consider internal validity - i.e., credibility, believability, plausibility of findings and results and external validity - i.e., generalizability or applicability of the study's findings, results and conclusions to other circumstances. Also, we touch upon an issue concerning construct validity.

The main internal validity threat we have identified is that the interview, transcribing and information coding were all performed by the same person. This may introduce bias to how certain responses have been interpreted. The reason for having just one person performing all the tasks in the study has been due to resource limitations. By using feedback from the workshops to augment the interview responses, we feel that the potential bias has been reduced. In addition, we had a dialogue with some of the interviewees to clear up some questions we had after the transcription of the interview recordings. Also, as Sarker et al. (2000) states, “grounded theory coding and sampling must never be delegated to hired assistants, but must be done by the researchers who have a stake in the theory emerging from the project.”

145

The main threat to external validity is that interviews have all been carried out with interviewees from the same organization. This means that we have only investigated the opinions of people using the same form of fault management and reporting system. The reason for this was that we wanted to base our interviews on information gathered in the previous quantitative study, and we therefore had to interview the people who had been involved in the actual projects.

As for construct validity, one threat was in the truthfulness of the responses in the interviews. The topics of faults in software and fault management are delicate ones, as it touches upon aspects concerning quality deficiencies in the product an organization delivers. The interviewees might feel that they were being evaluated, and present a better picture than what was actually true. Nevertheless, we found the interviewees to be truthful and open minded, and do not suspect that they were holding information back or presenting a more attractive situation in their organization than what is actually the case. A more likely issue could actually be participants being more positive to the ideas when presented with them in interviews than what they would actually feel about them in real life.

5.2 Relevance of results for the studied organization The organization involved in the study were already working to improve their fault management processes, which meant that our suggestions and findings based on their fault report data would be welcome. Our previous quantitative fault report study lead to a specific suggestion of new fault types being introduced into a pilot project, and the results of these interviews and workshops will most likely be used to fine tune the selection of fault types used. Compared to their original fault type classification, the new classification scheme should be better suited for process improvement work. 6. Conclusion and Future Work We have performed qualitative interviews with representatives of a software developing organization regarding fault reporting and fault classification. This has expanded our knowledge and is built on results from a quantitative study of five projects in the same organization. Our main contribution is showing that practitioners are motivated to use their existing knowledge of software faults in a more extensive manner to improve their work practices. By triangulation of both qualitative and quantitative methods, we have increased the validity of our studies. Our main findings are that:

• The interviewees agreed with our conclusions from the previous quantitative study (Børretzen and Dyre-Hansen, 2007), i.e. that the early phases in their development process had weaknesses that lead to a high number of software faults from early development phases.

• They also expressed a need for better fault categorization in their fault reports, in order to analyze previous projects with intention of improving their work processes.

• The proposed ODC fault types were seen as a useful basis for introducing a better fault classification scheme, although simplicity was important.

146

• They were positive to using fault report analysis feedback to improve development processes, although introducing such analysis for regular use would have to be done carefully in the organization.

• Finally, they revealed some areas in their fault reporting scheme that could be improved in order to improve analysis usefulness, for instance including attributes like fault finding and correction effort and component location of fault. The knowledge was present, it was just not recorded formally.

In terms of future work, we would want to perform a second series of interviews

in the organization after the new fault categorization scheme has been in use for some time. Through this we would be able to ascertain how this initiative has worked in the organization, and how it influences their project analyses and development process. We would also like to expand the generalizability of the study by including other software developing organizations using similar fault management processes. Acknowledgements The author would like to thank Reidar Conradi for careful reviewing and valuable input. We also thank the organization involved for their participation and cooperation during the study. References Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C., 2004. Basic Concepts and Taxonomy of

Dependable and Secure Computing. In: IEEE Transactions on Dependable and Secure Computing. (1)1, January-March 2004.

Biffl, S., Aurum, A., Boehm, B., Erdogmus, H., Grünbacher, P., 2006. Value-Based Software

Engineering, Springer, Berlin Heidelberg. Bryant, A., 2002. Grounding systems research: re-establishing grounded theory. Proceedings of

the 35th Annual Hawaii International Conference on System Sciences, (HICSS’02). IEEE Computer Society, pp. 3446-3455, Big Island, Hawaii, 7-10 January 2002.

Børretzen, J.A., Stålhane, T., Lauritsen, T., Myhrer, P.T., 2004. Safety activities during early

software project phases. Proceedings of the Norwegian Informatics Conference, pp. 180-191, Stavanger, Norway.

Børretzen, J.A., Conradi, R., 2006. Results and Experiences From an Empirical Study of Fault

Reports in Industrial Projects. Proceedings of the 7th International Conference on Product Focused Software Process Improvement (PROFES'2006), pp. 389-394, Amsterdam, 12-14 June 2006.

Børretzen, J.A., Dyre-Hansen, J., 2007. Investigating the Software Fault Profile of Industrial

Projects to Determine Process Improvement Areas: An Empirical Study, Proceedings of the European Systems & Software Process Improvement and Innovation Conference 2007 (EuroSPI07), pp. 212-223, Potsdam, Germany, 26-28 September 2007.

147

Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Wong, M.-

Y., 1992. Orthogonal defect classification - a concept for in-process measurements. IEEE Transactions on Software Engineering, (18)11, pp. 943 – 956, Nov. 1992.

El Emam, K., Wieczorek, I., 1998. The repeatability of code defect classifications. Proceedings

of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333, 4-7 November 1998.

EVISOFT user-driven R&D project on SPI, 2006. available at:

http://www.idi.ntnu.no/grupper/su/evisoft.html. Grady, R., 1992. Practical Software Metrics for Project Management and Process Improvement,

Prentice Hall. Hansen, B.H., Kautz, K., 2005. Grounded Theory Applied - Studying Information Systems

Development Methodologies in Practice. Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05), IEEE Computer Society, 10 p., Big Island, Hawaii, January 3-6 2005.

Hove, S.E., Anda, B., 2005. Experiences from conducting semi-structured interviews in

empirical software engineering research. 11th IEEE International Symposium on Software Metrics, 10 pages, 19-22 Sept. 2005.

Leveson, N., 1995. Safeware: System safety and computers, Addison-Wesley, Boston. Mohagheghi, P., Conradi, R., Børretzen, J.A., 2006. Revisiting the Problem of Using Problem

Reports for Quality Assessment. Proceedings of the 4th Workshop on Software Quality, held at ICSE'06, pp. 45-50, Shanghai, 21 May 2006.

Sarker, S., Lau, F., Sahay, S., 2000. Building an inductive theory of collaboration in virtual

teams: an adapted grounded theory approach. Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, pp. 1-10, Hawaii, 4-7 Jan. 2000.

Seaman, C.B., 1999. Qualitative Methods in Empirical Studies of Software Engineering. IEEE

Transactions on Software Engineering, (25)4, pp. 557-572, July 1999. Strauss, A., Corbin, J., 1998. Basics of Qualitative Research, Sage Publications, London, UK. Vinter, O., Lauesen, S., 2000. Analyzing Requirements Bugs. Software Testing & Quality

Engineering Magazine, Vol. 2-6, Nov/Dec 2000.

149

P7. Using Hazard Identification to Identify Potential Software Faults: A Proposed Method and Case Study



NO-7491 Trondheim, Norway [email protected]

Abstract When designing a business-critical software system, early analysis with correction of software faults and hazards (commonly called anomalies) may improve the system’s reliability and safety, respectively. We wanted to investigate if safety hazards, identified by Preliminary Hazard Analysis, could also be related to the actual system faults that had been discovered and documented in existing fault reports from testing and field use. A research method for this is the main contribution of this paper. For validation, a small web-based database for management of student theses was studied, using both Preliminary Hazard Analysis and analysis of fault reports. Our findings showed that Preliminary Hazard Analysis was suited to find potential specification and design faults in software.

1. Introduction

When developing a critical software system, much effort is put into ensuring that the system will have as few critical anomalies (faults and hazards) as possible in the context of its environment and modes of operation. Despite this effort, critical systems are still failing due to software faults, i.e. reducing reliability and possibly safety. A goal for the research community is to develop and introduce processes and techniques to reduce the number of critical faults and hazards in software. In this paper we present a novel method, results and lessons learned in a study where we compared the findings from Preliminary Hazard Analysis (PHA) with findings by traditional analysis of system testing and field use fault reports, both applied to the same system. PHA is a review technique for safety-critical systems, and used in early stages of development.

This paper is organized as follows. Section 2 gives our motivation and related work. Section 3 describes the research method, research questions and procedure. Section 4 presents the proposed method, the results from the hazard analysis and the fault report analysis. Section 5 presents the interpretation of these results and evaluation of the work. The conclusion and further work is presented in Section 6.

150

2. Motivation and Background

When proposing a method in the border area between reliability and safety, we need to clarify some of the terminology. A fault is an incorrect part of the system (program, hardware, even “data”), i.e. where possible later execution will violate stated requirements and cause a system failure (reliability dimension). A hazard is a state or set of conditions of a system or an object that, together with other conditions in the environment of the system or object, may lead to an accident (safety dimension) [1]. In this paper, we will investigate whether the hazard analysis technique PHA can help us to identify not only hazards, but also faults and in turn failures, thereby reducing the reliability of the product stemming from these failures.

Since hazard analysis e.g. by PHA typically is performed in earlier phases of the system development, we were motivated to investigate whether the PHA technique can be used to reveal faults early in the system development process of a given system. This paper describes a method and study where we analyzed fault reports from system testing and field use and compared them with results from hazard analysis of the same software system. In doing this we can compare the results from the PHA and the analysis of fault reports, to see if some faults could potentially have been identified and removed earlier.

2.1 State-of-the-art

Measuring quality and effects on quality in a software system is not a trivial matter. One of the means Avizienis et al. suggest for attaining dependability in a system is fault removal and fault prevention in order to reduce the number and severity of faults [2]. By identifying common fault types, developers can reduce the number of critical faults by focusing their efforts on preventing such faults. Also, identifying the most severe fault types makes developers able to focus on preventing those faults that have the biggest detrimental impact on the system. This concurs with Boehm’s concept of “value-based” software engineering and value-based testing, as presented in [3, 4]. Fault report analysis can thereby be of help in identifying the most important fault types, in order to focus quality improvement work on these in later projects.

Basili et al. have presented several experiments where inspection techniques are compared to testing, for example in [5]. In a related article, Shull et al. presents Perspective-Based Reading and how this technique can improve requirements inspections [6]. Wagner has made a survey of the quality economics of defect-detection techniques in [7], where he presents some numbers on the costs of removing faults at different stages of development. In [8], Ciolkowski et al. state that the software review is a popular quality assurance method, and presents a survey concluding that reviews should be integrated in the development process, performed systematically rather than ad hoc, and be optimized for their target system.

The PHA method is used in the early life cycle stages to identify critical system functions and broad system hazards. The identified hazards are assessed and prioritized, and safety design criteria and requirements are identified. As Rausand states, a PHA is started early in the concept exploration phase, so that safety considerations are included in trade-off studies and design alternatives [9]. This process is iterative, with the PHA being updated as more information about the design is obtained and as changes are

151

being made. PHA is a relatively light-weight method, the information requirements are low as high-level documentation like concept and requirements specification is sufficient for an early PHA analysis. The method is also not very training-intensive, and practitioners can start using the method fairly quickly. The PHA sessions are performed as semi-structured brainstorming using the available documentation as source of information. The results are sets of PHA sheets containing the identified hazards and further information about the hazards, e.g. the cause and effect of the hazard and also proposed measures for removing the hazard. This serves as a baseline for later analysis and is used in developing system safety requirements and in the preparation of performance and design specifications. Since PHA starts at the concept formation stage of a project, little detail is available, and the assessments of hazard and risk levels are therefore qualitative. A PHA should be performed by a small group with some knowledge about the system requirements [10]. PHA is usually performed in order to identify system hazards, translate system hazards into high-level system safety design constraints, assess hazards if necessary, and establish a hazard log. These system hazards are not equivalent with faults or failures. Failures (incorrect behaviour vs requirement specifications) may contribute to hazards, but hazards are system states that combined with certain environmental conditions, cause accidents regardless of whether requirement specifications are violated.

More commonly used alternatives than the PHA method are different inspection techniques for specification, design and code. Table 1 shows in which development phases the different techniques are used to identify faults.

Table 1. Different techniques identify faults in different development phases.

Fault identifying technique Development phase

Inspections PHA Program execution

Requirements ● ● Design ● ● Coding ● (●) Testing ● Field use ●

2.2 The DAIM context

DAIM is a web-based database for delivery and processing of academic master theses. It is a small-framed system developed internally at the Department of Computer and Information Science at the Norwegian University of Science and Technology (NTNU). The development process was small-scaled, with strong user-orientation. The specification and design process involved system users and administrators, and used interviews and paper prototyping to produce specification and design documents. The implementation was carried out by a small team, and consists mainly of a database implementation and a php-based web presentation application.

The system description contains 14 distinct use cases, with description of functionality for the different user types.

152

3. Research Method

This work proposes a method which combines two different analysis techniques, where PHA is applied in early stages of software development, and testing or field use with fault analysis is performed late in the development process, typically after system testing or when the system is put in production. By comparing the results from a PHA performed on available documentation of system concepts and specifications, with the results from analysis of fault reports from late testing and field use, we want to investigate how the PHA helps us in identifying hazards that are relevant towards faults actually found in the system.

The dotted lines in Figure 1 show the common view of how faults are related to reliability and hazards are related to safety, and how our work proposes how we may possibly relate findings in hazard analysis to reliability as well, as shown by the full line.

Figure 1. Linking hazards to reliability.

This results in a method as described in Section 4.1, of using safety reviews (like

PHA) on requirements and design documentation to not only find hazards but also find faults. The converse could also be considered, using reliability reviews and inspections of requirements, design and code documents to not only find faults but also hazards. In Figure 1, this would have been represented by an arrow linking faults to safety.

3.1 Research questions The research questions we wanted to explore in this study were the following: RQ1: What kind of faults in terms of Orthogonal Defect Classification (ODC) fault

types does the PHA technique help elicit? RQ2: How does the distribution of fault types found in the fault analysis compare to

the one found in the PHA? RQ3: Does the PHA technique identify potential hazards that also actually appear as

faults in the software?

3.2 Hazard analysis by PHA

The hazard analysis was to be performed prior to studying the fault reports, so that we would not be influenced by the faults that had actually been reported. This is also the same order of analyses in a practical project; the hazard analysis would be performed at an early stage of development, while the fault report analysis would be performed at the very end of the development process.

Faults

Hazards

Reliability

Safety

153

To be able to compare the results from fault report analysis with those from hazard analysis, we assigned one or more fault types to each of the hazards identified in the PHA. We had to assign several fault types for some hazards that were somewhat generic in nature and which could correspond to several different fault types. Some hazards were not possible to relate to a fault type, for instance hazards related to human error or manual routines not directly related to the software under study. These were then classed as “Not fault” in accordance with our classification scheme.

3.3 Fault analysis

The fault analysis was based on Orthogonal Defect Classification (ODC) [11, 12], and was performed by categorizing faults into fault types based on that technique. Table 2 shows the fault types that were used. The reason for using this categorization scheme was that we had already performed fault analysis of several projects previously using this scheme. We were therefore accustomed to using these fault types and felt they worked well. In addition to the actual fault types, we added two categories. “Unknown” which could be used for faults that we could not classify with certainty into one of the fault types, and “Not fault” which was used when a reported fault was a false positive, i.e. reported as a fault, but not actually a fault vs. system specifications.

Table 2. Fault types used in fault analysis.

Fault types Algorithm Assignment Checking Data Documentation Environment Function GUI Interface Relationship Timing/serialization Unknown Not fault

One property of the ODC fault types is that they can be associated with different

process phases, as stated by Chillarege et al. in [11] and also by Zheng et al. in [13]. Table 3 shows the associations as presented in [13]. This division of fault types into process phases can not be considered to be unassailable, but it gives a good indication of where a fault of a certain type is most likely to have originated from.

3.4 Research execution This PHA was performed not as a part of analyzing specifications, but rather after the

DAIM system had been developed and been put in use for some time. Usually a PHA would be performed much earlier, but we chose to analyze a completed system in order to compare PHA results with fault analysis results. The hazard analysis was performed

154

in four sessions, each session concentrating on the use-cases for certain user types. These sessions were attended by five to six persons, of which one participant was a system expert, and the others had experience in performing PHA. One person was responsible for leading the sessions, and one person was scribe, recording the PHA elicitations to PHA sheets. In total, the PHA sessions consisted of 38 staff-hours of effort.

The fault analysis of the DAIM system was done by two researchers individually categorizing the fault reports using a fault categorizing scheme based on that used in the Orthogonal Defect Classification (ODC) scheme [11, 12]. We used fault descriptions in the fault reports to categorize the faults into the fault types shown in Table 2. Afterwards, we compared our categorization results and came to a consensus on the reports where our initial categorization was dissimilar.

Table 3. ODC fault types and development process phase associations. Process phase association Fault types Design Function Low Level Design Interface, Checking, Timing/Serialization, Algorithm Code Checking, Assignment Library Tools Relationship Publications Documentation

4. Results

The results are presented in form of a description of the method we used for evaluating the use of PHA to identify faults in Section 4.1, and then the presentation of the results of this evaluation in Section 4.2 – 4.5. 4.1 Method description: Using PHA to identify faults

1) Initially, we define and delimit the system to be studied. Information required is the same as that of a PHA analysis; a system description and requirements and design documentation like use-cases, high level class-diagrams, or similar documentation. It is important to make clear the system context and the roles of the members of the PHA group: Are they to be independent of development, are they part of the development team, are they domain experts?

2) Executing the PHA session(s). This involves making sure the group understands the use of the PHA technique and that they have some knowledge of the system to be analyzed, like its main functionality and the actors involved in system use. This group meets and performs a systematic walkthrough of the available use-cases and system descriptions to identify possible hazards. These are decided upon through discussion and consensus and recorded in PHA tables, an example of which is shown in Table 4.

3) Next, the resulting hazards are considered in terms of which fault types they potentially may cause. This is not necessarily a one-to-one relation; a hazard can be the

155

potential origin for several faults. The fault types used should be the same that are used in the categorization task in step 4.

4) A collection of fault reports (from testing and field use) are compiled from the

system. If the fault reports are not already categorized, the faults are categorized by using the same fault type categorization scheme as in step 3. This categorization should be performed by persons that understand the fault type categories well. This also requires that the fault reports are descriptive enough to be properly categorized.

5) Finally, we perform a comparison of the fault reports with the possible faults from

the PHA session, helped by the categorization of faults. We sum up some attributes for this method in Table 4, which is based on a similar

description of review methods by Laitenberger et al. in [14].

Table 4. Method attributes. Goals Quality improvement

through fault and hazard detection and removal.

Participants Personnel familiar with the PHA method, and at least high-level system knowledge.

Prescription

Walkthrough of system documentation, like use-cases and high-level design documents.

Preparation for meeting

Participants have studied system description and documentation, but need not have performed any individual inspection.

Pro

cess

Meeting Group members suggest and discuss system hazards, with or without a designated moderator.

4.2 Hazard analysis (PHA)

The result of the PHA sessions was PHA sheets containing potential hazards the

group had elicited from the system description and documentation. Table 5 shows two short examples of hazard results from the PHA sessions.

In total the PHA identified 33 hazards in the DAIM system. By assigning fault types to the hazards, with some hazards potentially causing several types of faults, we identified 43 potential faults in total. Six of these potential faults were later classified as “Not fault” bringing the actual number of identified potential faults to 37. Figure 3 shows the distribution of hazards represented as fault types.

156

Table 5. PHA Sheet example. Actor: Student

Hazard description Cause Effect Barriers/ measures

Unauthorized access

Illegal username in DB

Data destroyed System feedback

Username missing from DB because of faulty data import

Cannot perform work Manual control routines

No access

Too strict network policy

Cannot perform work Use different login system

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

Funct

ion

Check

ing

Not fa

ult

Algor

ithm

GUI

Data

Enviro

nmen

t

Assig

nmen

t

Inte

rface

Timin

g/Seria

lizat

ion

Docum

enta

tion

Duplic

ate

Relatio

nship

Unkno

wn

Figure 3. Distribution of hazards represented as fault types (%).

4.3 Fault analysis In total, 117 fault reports collected by both human reporting during system testing

and automatic failure log generation during system use were categorized using the ODC fault types shown in Table 2. Figure 4 shows the distribution of fault types. Of these 117 faults, 25 were categorized as “Not fault”, giving us 92 actual faults found in the system.

The collected fault reports from the DAIM system were split in two different groups - one from system testing, and one from the first months of field use. The two groups had different distribution of fault types. Figure 5 shows the difference in the distribution of the faults reported in field use and the faults reported in system testing.

We see that there are certain fault types that were only reported at system test level and not in field use, such as “documentation”, “function” and “GUI” type faults.

157

0,0 %

5,0 %

10,0 %

15,0 %

20,0 %

25,0 %

30,0 %

Functi

on

Check

ing

Not fa

ult

Algorit

hm GUIDat

a

Enviro

nmen

t

Assign

men

t

Inte

rface

Timing

/Ser

ializa

tion

Docum

enta

tion

Duplic

ate

Relatio

nship

Unkno

wn

Figure 4. Distribution of fault types (%).

0,0

5,0

10,0

15,0

20,0

25,0

30,0

Algor

ithm

Assig

nmen

t

Check

ing

Data

Docum

enta

tion

Duplic

ate

Enviro

nmen

t

Funct

ion GUI

Inte

rface

Not fa

ult

Relat

ionsh

ip

Timin

g/Seria

lizat

ion

Unkno

wn

System test Field use

Figure 5. Distribution of fault reports in the two DAIM fault report collections (%).

4.4 Comparing hazards and faults

The method of employing hazard analysis and analysis of fault reports that is used here, could be described as triangulation of techniques to show how hazard analysis can be used to identify possible faults in software.

Combining the distributions from Figures 3 and 4 in one graph, we get Figure 6, which shows that the distribution of hazards and faults to be quite different to each other.

0,0

5,0

10,0

15,0

20,0

25,0

30,0

35,0

Functi

on

Check

ing

Not fa

ult

Algorit

hm GUIDat

a

Enviro

nmen

t

Assign

men

t

Inte

rface

Timing

/Ser

ializa

tion

Docum

enta

tion

Duplic

ate

Relatio

nship

Unkno

wn

Hazards (%) Faults (%)

Figure 6. Distribution of hazards identified and faults reported (%).

158

During the PHA, we found 33 hazards, of which 7 were classed as ”not fault”, i.e. the hazards could not be connected to faults in the software vs. specifications. Of the remaining 26 hazards, we found 6 hazards that could be linked to the actual faults that were reported from testing and field use. Table 6 shows a short description of the hazards and faults that were connected.

In addition to these 6, there were 20 more hazards that could be assigned a fault type, i.e. they could potentially lead to a fault in the software. These could still exist in the system but they have not been discovered in testing or field use. In Figure 7 we illustrate this by adding the numbers to Figure 1. The six faults identified by hazard analysis are shown in bold, and signify the same faults for each arrow.

Table 6. Examples of hazards that were linked to faults in fault reports. Fault type Hazard description Fault report Data Missing/Unauthorized

access due to user import failure

Missing username, user import error

Environment E-mail addresses containing illegal characters

Wrong character set in use

One person in group delivers thesis without consent

Email was not sent when group was established

Function

Group composition difficulties

Missing member of group

GUI Wrong type of values used in contract

Student used wrong entry fields

Timing/Ser. Group composition difficulties

No group members assigned to master thesis

Figure 7. Linking hazards to faults.

With only 6 of the 92 faults reported also being identified by the PHA analysis, there

are many faults that was not identified by the PHA. With many of the faults reported being pure coding faults, like the faults of the “assignment” fault type, this was to be

Fault report analysis • 92 faults

PHA • 26 hazards leading to faults

Potential hazards affecting safety • 33 hazards

20 + 6

20 + 6 + 7

Potential faults affecting reliability • 112 potential faults

86 + 6

159

expected. A typical example of a fault report description in this fault type was “Thesis-ID missing for document”.

4.5 Efficiency of PHA for fault identification

Looking at staff-hours spent per fault identified, we get two figures, one for the faults that were actually found in system testing and field use, and one for the total number of potential faults in the system (including the actual faults). Table 7 shows these figures, and compares them with some mean numbers on inspection efficiency from [7].

Table 7. Staff-hours spent per fault found. Staff-hours/ fault Reported faults from testing also identified by PHA in this study 38/6=6.33 Potential faults identified by PHA in this study 38/26=1.46 Mean cost of requirements inspection [7] 1.06 Mean cost of design inspection [7] 2.31

This result is based on four PHA sessions with five or six participants, but PHA can also be performed with as little as two participants, and still produce good results in the number of anomalies identified. This would certainly have reduced the ratio of staff-hours per fault considerably. 5. Discussion 5.1 The results in terms of our Research Questions

Our main findings related to RQ1 were that PHA was most useful in eliciting hazards that were related to “function” faults. These types of faults are related to specification and design, as shown in Table 3 and stated in [11, 13]. Hazards related to the “checking” and “algorithm” fault types were also common. Our reasoning about this result is that when performing a PHA, you are mostly basing your analysis on documentation and artefacts for the early stages of development. This means that you are more likely to be able to elicit possible hazards that are related to more general design and specification. Other types of hazards are found as well, but as the system details are unclear, it is more difficult in the PHA to specify exactly what can go wrong technically.

For RQ2, we did not find any correlation between hazards elicited through PHA and faults found in the fault analysis. As for finding direct matches between PHA findings and fault analysis, as stated in RQ3, there was a very low match rate. Of the 92 fault reports, only 6 of them could be said to have been specifically elicited as hazards in the PHA.

In this instance we believe that an important reason for the lack of match between elicited hazards and faults reported is the nature of the system under study. Compared to other systems we have performed fault analyses of, the DAIM system has a very different fault distribution profile. Earlier, we have performed similar analyses of fault

160

reports, and these have had a distribution where “function” and “GUI” faults have been the most frequent. [15, 16]. 5.2 Comparing fault distribution with previous studies

When comparing the fault distribution for DAIM with fault distributions we have found in previous studies, we see that the distribution for DAIM seems atypical. As an example, we compare the fault distributions of DAIM and that of another fault report study where five industrial projects were analyzed from system testing and field use [16]. Figure 8 shows the difference between the distributions.

0,0

5,0

10,0

15,0

20,0

25,0

30,0

Assign

ment

Not fa

ultDat

a

Inte

rface

Enviro

nmen

t

Algorit

hm GUI

Docum

enta

tion

Functi

on

Check

ing

Duplic

ate

Relatio

nship

Tim

ing/S

erial

izatio

n

Unknow

n

DAIM

[16]

Figure 8. Comparison of DAIM fault distribution with previous fault study (%).

This is one example of why we think the DAIM system has an atypical fault

distribution, and this is supported also by findings we made in [15] and also by Vinter et al. in [17]. As we see in Figure 8, the very numerous fault types “function” and “GUI” for the systems analyzed in [16] are not at all numerous in the fault reports for the DAIM system. This will of course have an impact on the ability to compare the fault reports in the DAIM system to the hazards found, where “function” and “GUI” were more numerous. It seems it would have been more appropriate to perform a “post-mortem” hazard analysis on the systems studied in [16] as they had a fault profile more similar to the fault profile that hazard analyses are likely to result in.

5.3 Method evaluation: Improving specification and design inspection

The comparison of possible hazards identified during PHA sessions and the faults found after system testing and field use is a novel approach for exploring how the PHA method can be used for eliciting possible faults in a software system. PHA is a very light-weight and easy to learn method, which is suited for use in very early phases of development, as shown in Table 1. If we compare with the Perspective-Based Reading technique from [7], using PHA will result in inspections where the role of the readers will be with an emphasis on system safety.

Compared to the economics of other inspection methods, the results in terms of efficiency depend on using the number of faults actually found or the number of potential faults found. According to Wagner’s literature survey, the mean inspection efficiency is 1.06 staff-hours per defect found for requirements and 2.31 staff-hours per

161

defect found for design [7]. Our study showed for actual faults found in fault reports an efficiency of 6.33 staff-hours per defect, and for potential faults found an efficiency of 1.46 staff-hours per defect, as shown in Table 6. As the DAIM system under study had not been injected with known faults, but was used because it was an accessible real life system with available documentation and fault reports, it is difficult to say how many actual faults the system had.

Also, it should be noted that the fault distribution of the DAIM system was very different from several other systems we have studied in previous studies. Another remark is that since PHA is a safety review technique, it will also catch potential safety hazards like the technique is originally meant to do. So when using the proposed method you could combine the results for safety and fault review, which would give another number on the efficiency, staff-hours per caught anomaly (hazards and faults). 5.4 Validity threats

The main validity threats in this study are: Construct validity:

The main threat to construct validity is the difference between hazards and faults. Hazard analysis and fault report analysis do not produce the same type of reports. By converting the hazards found to types of fault we were able to make a comparison between the two. As the results show, most of the hazards identified did not show up in the fault reports as actual faults. Still, a great deal of the hazards identified through the PHA would have manifested themselves as faults, if these hazards over time and diverse users had in fact occurred in the system. If these faults would have manifested themselves as observable failures in some future execution context is another matter. Internal validity:

One threat to internal validity is that the fault categorization was performed by us. Since such categorization is a subjective task, this can affect the reliability of measures. By having two persons independently categorize and then compare results, we feel we reduced this threat.

Similarly, PHA sessions are also based on subjective views on the description of a system. This threat is not possible to circumvent as the PHA technique is based on personal ideas and collective “brainstorming”. The quality of the PHA results is dependant on the participants experience and knowledge.

Another threat is the issue of unconscious bias or “fishing”. By comparing the results from the PHA and the fault analysis, we were looking to find some connections between the results and this could have lead us to find ”weak” connections which others would not have found.

The group of hazards that were compared to the reported faults may cause a certain degree of validity threat in the form of selection. The PHA sessions were time limited, so only the most obvious hazards were taken into account. Also, the PHA sessions were performed over a period of time, so some maturation in the form of better understanding of the actual system may have occurred.

162

Another issue here is the time span of the study. It is possible that by studying too short a time span of fault report collection, some fault types were underreported in the collected fault reports. External validity:

In our work we have analyzed data from only one and possibly atypical software system, DAIM, and this will conflict with the ability to generalize the results. Another issue is the size and simplicity of the software system studied, which may be smaller than many other web based projects of similar type. The development process of this system has also been rather small-scaled, with few people involved in design, implementation and testing. This may influence the distribution of faults found, as the developers have had a less complex system to develop. The reason thie DAIM system was chosen for study was that it was system developed close to us, which gave us a lot of freedom with respect to documentation accessibility and possibility for data collection and clarification with developers. 6. Conclusion and further work

This paper has presented the description and an implementation of a novel method for identifying software faults using the PHA technique. Because of the nature of the system, the results did not turn out as clear as we had hoped. The fault reports were few and mostly limited to certain types. On the other hand, we did identify 6 faults that were actually found in the system as well as 20 potential faults that may be in the system. The hazard analysis also showed that there is a certain type of faults that analysis techniques such as PHA can help to uncover in an early process phase. Performing the PHA elicited many hazards that could have been found in the system as “function” faults. That is, faults which originate from early phases of system development, and are related to the specification and design of the system. From this we conclude that PHA can be useful for identifying hazards that are related to faults introduced early in software development.

As for finding direct ties between hazards found in PHA and faults reported in fault reports, we were not very successful. This, we feel, is mainly due to the studied system’s particular fault type profile which was very different from fault distribution profiles we had found in earlier studies. Some weak links were found, but the data did not support any systematic links.

The method we have proposed in this paper should be validated by performing future similar studies. Because of the circumstances and type of system we analyzed here, interesting further work would be to execute a similar study on a larger system where the fault distribution would be more similar to other systems we have conducted fault report analyses of. Acknowledgements The author wishes to thank Professor Reidar Conradi for valuable input and reviewing. I would also like to thank Jostein Dyre-Hansen, Professor Tor Stålhane, Kai Torgeir

163

Dragland, Torgrim Lauritsen and Per Trygve Myhrer for their assistance during the execution of this study. References [1] N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995.

[2] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr: “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, (1)1, January-March 2004.

[3] L. Huang, B. Boehm: “How Much Software Quality Investment Is Enough: A Value-Based Approach”, IEEE Software, (23)5, pp. 88-95, Sept.-Oct. 2006.

[4] S. Biffl, A.Aurum, B. Boehm, H. Erdogmus, P. Grünbacher: Value-Based Software Engineering, Springer, Berlin Heidelberg, 2006.

[5] Basili, V.R., Selby, R.W.: “Comparing the Effectiveness of Software Testing Strategies”, IEEE Transactions on Software Engineering, (13)12, pp. 1278 – 1296, Dec. 1987.

[6] Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements inspections, IEEE Computer, (33)7, pp. 73-79, July 2000.

[7] Stefan Wagner: “A literature survey of the quality economics of defect-detection techniques”, Proceedings of the 2006 ACM/IEEE international symposium on International symposium on empirical software engineering, Rio de Janeiro, Brazil, September 21-22, 2006.

[8] Ciolkowski, M, Laitenberger, O, Biffl, S.: “Software reviews, the state of the practice”, IEEE Software, (20)6, pp. 46-51, Nov.-Dec. 2003.

[9] M. Rausand: Risikoanalyse, Tapir Forlag, Trondheim, 1991.

[10] J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early software project phases,” Proceedings of the Norwegian Informatics Conference, pp. 180-191, Stavanger, Norway, 2004.

[11] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y. Wong: “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, (18)11, pp. 943-956, Nov. 1992.

[12] K. El Emam, I. Wieczorek: “The repeatability of code defect classifications”, Proceedings of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333, 4-7 Nov. 1998.

164

[13] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J.P. Hudepohl, M.A. Vouk: “On the value of static analysis for fault detection in software”, IEEE Transactions on Software Engineering, (32)4, pp. 240-253, April 2006.

[14] Oliver Laitenberger, Sira Vegas, Marcus Ciolkowski: “The State of the Practice of Review and Inspection Technologies in Germany”, Technical Report ViSEK/010/E, ViSEK, 2002.

[15] J. A. Børretzen, R. Conradi: “Results and Experiences From an Empirical Study of Fault Reports in Industrial Projects”, Proceedings of the 7th International Conference on Product Focused Software Process Improvement, pp. 389-394, Amsterdam, 12-14 June 2006.

[16] Jon Arvid Børretzen, Jostein Dyre-Hansen: “Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical Study”, Proceedings of the 14th European Systems & Software Process Improvement and Innovation Conference, pp. 212-223, Potsdam, Germany, 26-28 Sept. 2007.

[17] O. Vinter, S. Lauesen: “Analyzing Requirements Bugs”, Software Testing & Quality Engineering Magazine, Vol. 2-6, Nov/Dec 2000.

165

Technical Report (P8) Diverse Fault Management – a comment and prestudy of

industrial practice

Jon Arvid Børretzen Department of Computer and Information Science,

Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway

[email protected]

Abstract: This report describes our experiences with fault reports and their processing from several organizations. Data from investigated projects is presented in order to show the diversity and at times lack of information in the reports used. Also, we show that although useful process information is readily available, it is seldom used or analyzed with process improvement in mind. An important challenge is to describe to practitioners why a standard description of faults could be advantageous and also to propose better use of the knowledge gained about faults. The main contribution is to explain why more effort should be put into codifying of fault reports, and how this information can be used to improve the software development process. 1. Introduction In all software development organizations there is a need for some minimum fault logging and follow-up to respond to faults discovered during development and testing, as well as claimed fault reports (really failure reports) from customers and field use. Such reports typically contain fault attributes that are used to describe, classify, analyze, decide on and correct faults. There are many standards for this kind of information, although the original fault reports will be more of an ad hoc character than of a specified standard. A software development organization will, in addition to a fault report scheme, have defined own customized metrics and processes related to fault management. Systematic fault management is often also motivated by certification efforts, for instance ISO 9000. Software Process Improvement (SPI) and Quality Assurance (QA) initiatives can also be a motivation for fault management improvement work. Despite of this, in most organizations there is still much underused or even non-used data, either from lack of knowledge about the subject or from lack of procedures to assist in using the available data. As Jørgensen et al. states, “no data is better than unused data” [Jør98]. This is because collection of data that is not used, leads to waste of effort during data collection, poor data quality and even possibly a negative attitude to any kind of measurements during development or other SPI and QA activities.

166

In this investigative pre-study, we are reporting earlier experience from case studies and data mining in 8 Norwegian IT organizations and also an Open Source Software community, where fault data has been under-reported and/or under-analyzed. That is, poor or wrongly coded classification of faults, missing fault information for affected program module, no effort registration and so on. There is also the issue of fragmented data representation, partial fault reports, Software Configuration Management logs, or merely comments in code. This paper describes our experiences with fault reports and fault reporting from working with fault reports from several different organizations. Results from these studies have been published papers like [Bor06, Bor07, Con99, Moh04, Moh06], but there is also a need to describe what we have learned from these studies in a descriptive manner. In this field of study, different terminology is used in various sources. For this paper, we use the term fault in the same meaning as bug or defect. That is, a fault is the passive flaw in the system that could lead to an observable failure (vs. requirements) when executed. For fault report, other terms that are used are problem report and trouble report. 2. Metrics Our studies have given us insight and knowledge about the practice and information available from fault report repositories in several commercial organizations. This section presents these organizations and some attributes of these repositories. Such information gives a quick insight into how fault reporting is performed and what possibilities are available in termas of analysis and process improvement. Table 1 shows an overview of the 8+1 involved organizations. Because of non-disclosure agreements with some organizations, their identities have been anonymized (O1-O8). We compare with the Open Source organization Gentoo [Gen].

Table 1. Organization information Organization O1 O2 O3 O4 O5 O6 O7 O8 Gentoo Period under study

1993-98 2000-04 2004-05 2004-05 2004-05 2004-05 2006-07 2007 2004-06

Domain Telecom Telecom Finance Knowledge and process management

Knowledge and process management

Security Financial Risk management

Operating system

Organization size (not just SW developers)

400 300 ~500 in Norway

~150 in Norway

Over 500 in Scandinavia

~320 in Norway

~3900 in Norway and Sweden

~7000 worldwide

~320 active

Development language

SDL, PLEX

Erlang, C C, COBOL, COBOL II

Java Java C, C++ Java N/A Several

Information collected from

1 project, 3 releases

1 project. 3 releases

1 project, 18 months


1 project, 9 months


5 projects, 6 months


1 project

Studied by Master students

Master and PhD students

PhD student

PhD student

PhD student

PhD student

Master and PHD student

Post.Doc. PhD student

167

For each of these organizations, we have studied and analyzed fault reports in one or more development projects. From this, we have selected some relevant fault report attributes, and report the situation for each of the organizations. The attributes are the following:

• Fault report description: Whether the initial description of the fault is long or short, this indicates how well the fault has been described when found.

• Fault severity: How many levels of fault severity does the organization use to discern their faults?

• Fault type categorization: Does the organization classify faults according to type?

• Fault location: Does the organization describe where the fault is located, either structurally (i.e. which component) or functionally (what user function the fault relates to)?

• Release version of fault: Does the organization register in which release of the software the fault was found?

• Correction log: Does the organization keep a correction log for each fault, where developers can enter information relevant to the identification of fault cause and correction?

• Solution description: Does the organization record what the solution of the problem was and how the fault was corrected?

• Correction effort: Is information recorded about the effort needed to find and correct the fault?

• Mandatory completion: Are all fault report entry fields mandatory for completion?

• Specialized fault report system or change reports: Is the fault reporting system a separate entity, or is it used in combination with all change reports?

• Standard or custom fault reporting system: Does the organization use a standard available fault reporting system, or do they use a custom made system?

These attributes are shown for each organization in Table 2. Table 2 shows that there is a wide range of information used in the fault reports of these organizations. For instance the amount of information recorded differs from well described faults with correction log and solution description, to cases where the faults are scantily described and the only information about correction or solution is whether it has been solved or not.

168

Table 2. Fault report attributes for each organization Organization O1 O2 O3 O4 O5 O6 O7 O8 Gentoo Fault report description

Long Long Long Short Short Long (mostly)

Long Long Long

Fault severity 2 levels 2 levels 3 levels 3 levels 6 levels 3 levels 5 levels 5 levels 7 levels Fault status Yes Yes No Yes Yes Yes Yes Yes Yes Fault type categorization

No Yes No Coarse No No Yes No No

Fault location (functional or structural component)

Structural Structural (but many mis-spellings)

Funct. Funct. Mix Structural comp.

Anecdotal description in some correction logs

Funct. Structural comp.

Release version for fault

Yes Yes Yes (coarse)

Yes Yes Yes Date Date Yes

Correction log or description

Yes Yes Yes No No Yes Yes No Yes

Solution description

No Partly Yes No No Yes Yes No No

Correction effort

? ? No No No No No No No

Mandatory completion of fault reports

Yes Yes No No Yes Yes No No No

Specialized fault report system or common change report system

Fault reports

Change reports

Fault reports

Change reports

Change reports

Change reports

Change reports

Fault reports

Fault reports

Standard or custom fault reporting system

Custom Standard: ClearCase

Custom Custom Standard:Jira

Custom: SQL and web

Standard: Mercury Quality Centre

Custom Standard: Bugzilla

3. Process Using fault report information to support process improvement can be a viable approach to certain parts of software process improvement [Gra92]. Some organizations have used this approach actively, while others have not. For the most part, organizations have done little work in this area until other researchers start studying (by data mining) their fault report repositories. For each organization, we describe the level of fault report use beyond fault correction, i.e. if any analysis work has been performed by the organization itself, followed by what have been performed by us as researchers. This is shown in Table 3. As we see, as external researchers we have been able to exploit the available data in the companies in a much larger degree than the organizations themselves.

169

Table 3. Level of fault report work and external reasearch Organization Organizations’

work beyond fault correction

External research process performed

O1 Marginal Study of the relation between complexity/modification-rate and number of defects in different development phases, and whether defects found during design inspections can be used to predict defects in the same module in later phases and releases.

O2 Planned only Study of defect-density and stability of software components in the context of reuse.

O3 None Study of faults and fault types with the aim of locating potential for general process improvement, as well as identifying the software components with numerous faults.

O4 Basic analysis Study of faults and fault types with the aim of locating potential for general process improvement, as well as identifying the most numerous fault types and the fault types leading to severe faults.

O5 Basic analysis Study of faults and fault types with the aim of locating potential for general process improvement, as well as identifying the most numerous fault types and the fault types leading to severe faults.

O6 Basic analysis Study of faults and fault types with the aim of locating potential for general process improvement, as well as identifying the most numerous fault types and the fault types leading to severe faults and the software components with numerous and severe faults.

O7 Some analysis Study of faults and fault types with the aim of locating potential for general process improvement, as well as identifying the most numerous fault types and the fault types leading to severe faults. Follow-up study on organization’s attitude to fault type classification.

O8 N/A N/A O9 None N/A

One example of research results is from organization O1, where one conclusion of the work performed by the external researchers was performing software inspections was cost-effective for that organization. Inspections found about 70% of the recorded defects, but only cost 6-9% of the effort compared with testing. This yielded a saving of 21-34%. In addition, this study showed that the existing inspection methods were based on too short inspections. By increasing the length of inspections, there was a large saving of effort compared to the effort needed in testing. Figure 1 shows that by slowing down inspection rate from 8 pages/hour to 5 pages/hour, they could find almost twice as many faults. Calculations showed that by spending 200 extra analysis hours, and 1250 more inspection hours, they could save ca. 8000 test hours! In O7, external researchers concluded through analysis of fault reports and fault types that the organization’s development process had definitive weaknesses in the specification and design phases, as a large percentage of faults that were found during system testing were of types that originated mainly in these early phases. Additionally, this external research lead the organization to alter the way they classified faults in a pilot project in order to study these issues further.

170

Figure 1 Inspection rates/defects in organization O1.

Other results we have drawn from several studies, is that the data material is not always well suited for analysis. This is mostly because of missing, incorrect or ambiguous data. It is apparent that since the organization generally does not use this data after recording it, the motivation for recording the correct data is low. In O3, for instance, 97% of the fault reports were classed as “medium” severity faults. This was the default severity when recording a fault, and was rarely altered even if the fault was actually of a more severe character. There are some interesting fault report attributes that are not in wide use, even if the information is most likely available. Such types of information could be very useful in process improvement initiatives and the cost of collecting and analyzing this data is marginal. Some examples of such attributes are the following:

• Fault location: This attribute addresses where the fault is located, either as a functional location or structural location. When the functional location of a fault is reported, this is mainly from the view of the user or testers. It tells us which function or functional part of the system where the fault is discovered through a failure. In case of structural location, the fault report points to a place (or several) in the code, an interface or to a component where the fault has been found. For analysis purposes, the structural location is often the more useful information.

• Fault injection/discovery phase: The fault injection phase describes when in the development process the fault has been introduced. Sometimes faults are injected in the specification phase, but the most common faults are introduced in design and implementation. Even during testing faults can be introduced, if you include test preparation as part of the system implementation. The fault discovery phase describes in which phase the fault has been discovered, and the gap between injection and discovery should preferably be as small as possible,

>1

0.66

8

Recommended rate

actual rate

171

because the longer a fault is present in the system, the more effort it will take to remove it.

• Fault cost (effort): This shows how much effort has gone into finding and/or correcting a fault. Such information shows how expensive a fault has been for a project, and may be an indication on fault complexity or areas where a project needs to improve their knowledge or work process.

By introducing and implementing a core set of fault data attributes (i.e. a metric) to be recorded and analyzed, we could make a common process for fault reporting. Already, several schemes for recording and classifying faults exist, like the Orthogonal Defect Classification scheme [Chi92], or the IEEE 1044 standard [IEEE 1044]. A core process could be customized for organizations who want a broader approach to analysis of fault reports. Some organizations use custom made tools for fault reporting, but a great deal do use standard commercial or open source tools. Introducing a core set of fault report attributes in these tools would help encourage organizations to record the most useful information that can be used as a basis for process improvement. Many tools already have functionality for analysis of the data sets they contain. 4. Use of data repositories With industrial data repositories, we mean contents of defect reporting systems, source control systems, or any other data repository containing information on a software product or a software project. This is data gathered during the lifetime of a product or project and may be part of a measurement program or not. Some of this data is stored in databases that have facilities for search or mining, while others are not. Zelkowitz and Wallace define examining data from completed projects as a type of historical study [Zel98]. As the fields of Software Process Improvement (SPI) and empirical research have matured, these communities have increasingly focused on gathering data consciously and according to defined goals. This is best reflected in the Goal-Question-Metric (GQM) paradigm developed first by Basili [Bas94]. This explicitly states that data collection should proceed in a top-down way (i.e. designing research goals and process before examining data) rather than a bottom-up way (i.e. designing reseach goals and process after seeing what data is available). However, some reasons why bottom-up studies are useful are (taken from [Moh04]):

1. There is a gap between the state of the art (best theories) and the state of the practice (current practices). Therefore, most data gathered in companies’ repositories are not defined and collected following the GQM paradigm.

2. Many projects have been running for a while without having improvement programs and may later want to start one. The projects want to assess the usefulness of the data that is already collected and to relate data to goals (reverse GQM).

3. Even if a company has a measurement program with defined goals and metrics, these programs need improvements from bottom-up studies.

172

Another issue of data repositories is the ease of which data can be extracted for analysis. An example is from O1, where the researchers had to go to a great deal of effort to convert the fault data into a form that could be analyzed. In O3, the fault reports could only be accessed for analysis by printing hardcopies of the reports, which in turn had to be scanned and converted into data that could be analyzed. To be able to support process analysis in an efficient manner, the availability and form of the fault repositories should be in a standard and well kept form. 5. Discussion and conclusion We have presented an overview of studies performed concerning fault reports, and shown the type of information that exists and is lacking from such reports. What we have learnt from the studies of the fault report repositories of these organizations is that the data is in some cases under-reported, and in most cases under-analyzed. By including some of the information that the organization already has, more focused analyses could be made possible. For instance, specific information about fault location and fault correction effort is generally not reported even though this information is easy to register. One possibility is to introduce a standard for fault reporting, where the most important and useful fault information is mandatory. A reasonable approach to improving fault reporting and using fault reports as a support for process improvement is to start by being pragmatic. At first, use the readily available data that has already been collected, and in time change the amount and type of data that is collected through development and testing to tune this process. We have learnt that the effort spent by external researchers to produce useful results based on the available data is quite small compared to the collective effort spent by developers recording this data. This shows that very little effort may give substantial effects for many software developing organizations. Finally, there are two main points we want to convey as a result of the studies we have done in these organizations:

• It is important to be able to approach the subject of fault data analysis with a bottom-up approach, at least in early phases of such research and analysis initiatives. The data is readily available, the work that has to be performed is designing and performing a study of these data.

• Much of the recorded fault data is of poor quality. This is most likely because of the lack of interest in use of the data.

References [Bas94] Basili, V.R., Calidiera, G., Rombach, H.D.: Goal Question Metric Paradigm. In: Marciniak, J.J. (ed.): Encyclopaedia of Software Engineering, pp. 528-532, Wiley, New York, 1994.

173

[Bor06] Børretzen, J.A., Conradi, R.: Results and Experiences From an Empirical Study of Fault Reports in Industrial Projects. Proceedings of the 7th International Conference on Product Focused Software Process Improvement (PROFES'2006), pp. 389-394, Amsterdam, 12-14 June 2006. [Bor07] Børretzen, J.A., Dyre-Hansen, J.: Investigating the Software Fault Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical Study. Proceedings of the European Systems & Software Process Improvement and Innovation Conference 2007 (EuroSPI’07), pp. 212-223, Potsdam, Germany, 26-28 September 2007. [Con99] Conradi, R., Marjara, A.S., Skåtevik, B.: An Empirical Study of Inspection and Testing Data at Ericsson. Proceedings of the International Conference on Product Focused Software Process Improvement (PROFES'99), p. 263-284, Oulu, Finland, 22-24 June 1999. [Chil92] R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y. Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956 [Gra92] Grady, R.: Practical Software Metrics for Project Management and Process Improvement, Prentice Hall, 1992. [Gen] The Gentoo linux project, available from: http://www.gentoo.org/ IEEE 1044] IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993, December 2, 1993. [Jør98] Jørgensen, M., Sjøberg, D.I.K., Conradi R.: Reuse of software development experience at Telenor Telecom Software. In Proceedings of the European Software Process Improvement Conference (EuroSPI'98), pp. 10.19-10.31, Gothenburg, Sweden, 16-18 November 1998. [Moh04] Mohagheghi, P., Conradi, R.: Exploring Industrial Data Repositories: Where Software Development Approaches Meet. In Proceedings of the 8th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering (QAOOSE’04), pp. 61-77, Oslo, Norway, 15 June 2004. [Moh06] P. Mohagheghi, P., Conradi, P., Børretzen, J.A.: Revisiting the Problem of Using Problem Reports for Quality Assessment. Proceedings of the 4th Workshop on Software Quality, held at ICSE'06, pp. 45-50, Shanghai, 21 May 2006. [Zel98] Zelkowitz, M.V., Wallace, D.R.: Experimental models for validating technology. IEEE Computer, (31)5, pp. 23-31, May 1998.

175

Appendix B: Interview guide Questions for Test Managers Background

1. Which responsibilities do you have in the organization? 2. How long have you been working in the company? 3. What was your involvement in the project under study? 4. Are you still involved in work with this project?

On the study results

1. The results from our study (both on the organization in general and this project) shows that many faults are of a character that points to them having been introduced in specification and design phases. How does this compare to your impression of faults that are found in your projects?

2. How do you feel that the analysis results for this project compares to your experience of the project?

3. What do you think about the fault categorization scheme we have used, based on ODC?

On the organization’s own measurements and results

1. The organization uses its own way of categorizing faults today, how do you think this works?

2. Some results we have received from the organization indicates where in the development process faults have been introduced and where they have been discovered, does your project report this type of information?

3. How do you separate design faults and implementation faults, when fault reporting is concerned? Do design faults sometimes get reported as change requests?

The quality system

1. What is the fault reporting process like in your organization, and who is responsible for quality?

2. Which tools do you use in fault reporting? Are they the same as in change request reporting?

3. What is the fault correction process like? 4. How much effort does it take to register a fault report? Do you think this task

could or should be simplified? 5. Do the reporters of a fault have the same access to system and information as the

ones who are going to correct it?

176

6. Do you think that all the necessary information is accessible when reporting a fault?

7. Do you think that all the necessary information is accessible when correcting a fault?

8. Is the fault reporting in any way used as a basis for process improvement, or is it only used as a fault reporting log for faults that are to be corrected.

9. Do you register information about hours of effort for fault finding and correction? This is relevant towards knowing which faults that requires the most resources.

10. Do you think the tool support for fault reporting is good enough? Fault reports: Available information Amount of information, correct fields, number of fields

1. Do you think the fields that are used in the fault reporting system are sufficient? 2. Are there any extraneous fields that are not used or that is used without further

use of that information? 3. Do you think that any fields are missing? 4. Do you have any information about fault location? In some projects you use the

field “Testobjekt”, does this describe functional modules or structural modules which can be linked to code?

5. Do you have the necessary information available to tell which components that are involved in a fault correction, or is this implicit knowledge only the developers have?

6. Is it possible to later use the fault reporting tool to look for which components/code parts that have been involved in a fault correction? For example to be able to find which components that have the most severe faults and so on.

Feedback from fault reporting

1. Do you have any sort of feedback to the developers based on what you find in your quality system?

2. How do you think feedback from what is being done can be used for improvement?

3. Have there been any changes in terms of technical issues, development processes or that of being a systems developer for you, based on what is being uncovered as faults during development of your systems?

Process changes

1. Do you think the organization is willing to change the reporting routines, with respect to adding information for use in analysis (or change in order to increase preciseness/correctness of the information)?

2. Do you think such changes would be useful in order to improve product quality? 3. How much effort and which actions do you think the company initiates in order

to change processes related to process improvement?

software fault reporting processes in business …...software fault reporting processes in...

Documents