l7 2019 metrics tmmfileadmin.cs.lth.se/cs/education/etsn20/lectures/l7_metrics_tmm.pdf · !2 lund...

7
!1 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Software Testing ETSN20 http://cs.lth.se/etsn20 Chapters 13, 17, 18 Management: Measure and improve Prof. Per Runeson 1 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Lecture Chapter 13 – Metrics in test execution Chapter 17 – Software Quality Chapter 18 – Process improvement and assessment 2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Purpose of metrics Project monitoring – check the status Project controlling – corrective actions Plan new projects Measure and analyze results The profit of testing The cost of testing The quality of testing The quality of the product Basis of improvement, not only for the test process CC Pink Sherbet Photography 4 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Selecting the right metrics What is the purpose of the collected data? What kinds of questions can they answer? Who will use the data? How is the data used? When and who needs the data? Which forms and tools are used to collect the data? Who will collect them? Who will analyse the data? Who have access to the data? 7 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Example Team Alice Bob # test cases run 235 15 # defects found 12 12 # hours spent 86 56 Which team is best? 8 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group Goal-Question-Metric Paradigm (GQM) • Goals – What is the organization trying to achieve? – The objective of process improvement is to satisfy these goals • Questions – Questions about areas of uncertainty related to the goals – You need process knowledge to derive the questions • Metrics – Measurements to be collected to answer the questions [van Solingen, Berghout, The Goal/Question/Metric Method, McGraw-Hill, 1999] 9

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!1

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Software TestingETSN20http://cs.lth.se/etsn20

Chapters 13, 17, 18Management: Measure and improve

Prof. Per Runeson

1Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Lecture

• Chapter 13– Metrics in test execution

• Chapter 17– Software Quality

• Chapter 18– Process improvement and assessment

2

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Purpose of metrics

• Project monitoring – check the status• Project controlling – corrective actions• Plan new projects• Measure and analyze results

– The profit of testing– The cost of testing– The quality of testing– The quality of the product– Basis of improvement, not only for the test process

CC Pink Sherbet Photography

4Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Selecting the right metrics

• What is the purpose of the collected data?– What kinds of questions can they answer?– Who will use the data?– How is the data used?

• When and who needs the data?– Which forms and tools are used to collect the data?– Who will collect them?– Who will analyse the data?– Who have access to the data?

7

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Example

Team Alice Bob# test cases run 235 15# defects found 12 12# hours spent 86 56

Which team is best?

8Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Goal-Question-Metric Paradigm (GQM)• Goals

– What is the organization trying to achieve? – The objective of process improvement is to satisfy these

goals• Questions

– Questions about areas of uncertainty related to the goals

– You need process knowledge to derive the questions• Metrics

– Measurements to be collected to answer the questions

[van Solingen, Berghout, The Goal/Question/Metric Method, McGraw-Hill, 1999]

9

Page 2: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!2

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Goal-Question-Metric

Goal: analyze

How efficient

Time # defects

How effective

% defects

11Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Measurement basics

Basic data:• Time

– calendar and – staff hours

• Defects (fault/failure)• Size

Basic rule:• Feedback to origin• Use data or don’t measure• The act of measurement

will affect the metrics you get

– Consequences of using LOC to measure programmer productivity?

12

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test metrics: Coverage

What?% statements covered% branches covered% data flow% requirements% equivalence classes

Why?

• Track completeness of test

13Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test metrics: Development status• Test case development status

– Planned– Available– Unplanned (not planned for, but

needed)• Test harness (=automated test

framework) development status– Planned– Available– Unplanned

14

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

0

20

40

60

80

100

v4.3v4.0

v3.5

v3.4

v3.3v3.2

v3.1

v3.0

v2.4

v2.2

v2.0

v1.3v1.1

(1)

(2)

(4)

(5)

(3)

Production CodeTest Code

Production ClassesTest Classes

Test Command

Figure 4. Checkstyle growth history view.considered an attention point that is monitored carefully.

Two other observations stand out. First, release 2.2 hasan interesting phenomenon: a sudden sharp decline forclass, method and statement coverage, with a mild drop ofblock coverage. Secondly, there is a decline in coverage (atall levels) between release 2.4 and 3.0. The version numberssuggest that the system has undergone major changes.

4.2 Internal evaluationTo evaluate these observations, we first contrasted them

with log messages at key points.“Up until #280 there is a single unit test”. The single test

with file ID 20 is called CheckerTest. Inspection of thisfile pointed out that this actually was not a typical unit test,but rather a system test [3]. CheckerTest receives a num-ber of input files and checks the output of the tool againstthe expected output.

“Testing has been neglected before the release 2.2”. In-spection reveals that this coverage drop is due to the intro-duction of a large number (39) of anonymous classes, thatare not tested. These anonymous classes are relatively sim-ple and only introduce a limited number of blocks per class,

Figure 5. Checkstyle Test Quality Evolution

and therefore, their introduction has a limited effect on theblock coverage level. Class coverage however, is more af-fected because the number of classes (29) has more thandoubled with the 39 additional anonymous classes. In-depthinspection taught us that the methods called by the anony-mous classes are tested separately. In the next version, allcoverage levels increase because of the removal of most ofthe anonymous classes. The drop is thus due to irregulari-ties in the coverage measurement, falsifying the statement.

“There is a period of pure testing right after release 2.2and before 3.0”. We sought for evidence that tests are ne-glected during this period, but instead we encountered logsfor 2.2 such as Added [6 tests] to improve code coverage(#285), updating/improving the coverage of tests (#286 and#308) and even Added test that gets 100% code coverage(#309). The assumption of a test reinforcement period be-fore 3.0 is backed up by several messages between #700and #725 mentioning improving test coverage and addingor updating tests.

“From version v2.2 until beyond v2.4, synchronous co-evolution happens”. To counter this, we looked for signsthat pure development was happening, e.g., by new fea-tures being added. Investigation of the log messages aroundthat time however showed that it concerns a period of bugfixing or patching (#354,#356,#357,#369,#370,#371,#415)and refactoring (#373,#374,#379,#397,#398,#412). More-over, during this period production classes and test caseswere committed together.

“Halfway between release 3.1 and 3.2 is a period of puredevelopment”. For this period, we could not find back thehabit of committing corresponding test cases alongside pro-duction classes. Rather, a couple of large commits con-sisting of batches of production files occur, with log mes-sages reporting the addition of certain functionality (#1410-#1420). Shortly after that, developers mention the additionof new tests (#143x and #1457).

“Between 3.4 and 3.5 testing happens more phased(ann. 4, Figure 4), followed by more synchronicity again”.We could not really confirm this behavior nor distinguishboth phases by means of the log messages, as we deducethat this period concerns mainly fixes of bugs, code style,spelling, build system and documentation.

“Around #670 and #780, developers were performingphased testing.” The message of #687 mentions “Upgrad-ing to JUnit 3.8.1”, which makes us conclude that it con-cerns shallow changes. The same accounts for the periodaround #780: test cases are (i) modified to use a new testhelper function; and (ii) rearranged across packages.4.3 External evaluation

Two Checkstyle developers completed the survey wesent, sharing their opinions about our observations. As ananswer to questions about the system’s evolution and testprocesss, they indicate that automated tests have always

225225225225

Example: Development status

[Zaidman et al ICST 2008]

(a) Synchronous (b) Time Delay (c) Test Backlog

Figure 2. Example patterns of synchronousco-evolution.

Number of test classes (tClasses) and Number of testcommands1 (tCommands).

• Metrics are presented as a cumulative percentage chartup to the last considered version (which is depictedat 100%), as we are particularly interested in the co-evolution and not so much in the absolute growth.

• The X-axis is annotated with release points.Interpretation. First of all, we can observe phases ofrelatively weaker or stronger growth throughout a system’shistory. Typically, in iterative software development newfunctionality is added during a certain period after a majorrelease, after which a “feature freeze” prevents new func-tionality to be added. At that point, bugs get fixed, testingeffort is increased and documentation written.

Secondly, the view allows us to study growth co-evolution. We observe (lack of) synchronization by study-ing how the measurements do or do not evolve together ina similar direction. The effort of writing production andtest code is spent synchronously when the two curves aresimilar in shape (see Figure 2(a)). A horizontal translationindicates a time delay between one activity and a related one(2(b)), whereas a vertical translation signifies that a histor-ical testing or development backlog has been accumulatedover time (2(c)). Such a situation occurs, e.g., when the test-writing effort is lagging the production code writing effortfor many subsequent releases. In the last version consideredin the view, both activities reach the 100% mark, which isexplained through the fact that we are measuring relativeefforts for both activities.

Thirdly, the interaction between measurements yieldsvaluable information as well. In Table 1 a number of theseinteractions are outlined. For example, the first line in Ta-ble 1 states that an increase in production code and a con-stant level of test code (with the other metrics being unspec-ified) points towards a “pure development” phase.Technicalities. To separate production classes from testclasses we use regular expressions to detect JUnit test caseclasses. As a first check, we look whether the class extendsjunit.framework.TestCase. If this fails, e.g., because ofan indirect generic test case [21], we search for a combina-tion of org.junit.* imports and setUp() methods.

Counting the number of test commands was done on thebasis of naming conventions. More specifically, when wefound a class to be a test case, we looked for methods that

1A test command is a container for a single test [21].

pLO

C

tLO

C

pCla

sses

tCla

sses

tCom

man

ds

interpretation! " pure development" ! pure testing! ! co-evolution" ! " " test refinement" " ! ! skeleton co-evolution

" ! test case skeletons" ! test command skeletons

" # test refactoring

Table 1. Co-evolution scenarios.

would start with test. We are aware that with the introduc-tion of JUnit 4.0, this naming convention is no longer nec-essary, but the projects we considered still adhere to them.

Trade-off. Both the change history and growth historyview are deduced from quantitative data on the developmentprocess. To contrast this with the resulting quality of thetests, we introduce a view incorporating test coverage.

2.3 Test Quality Evolution View

Goal. Test coverage is often seen as an indicator of “testquality” [24]. To judge the long-term “test health” of asoftware project, we draw the test coverage of the subjectsystem in function of the fraction of test code tLOCRatio(tLOCRatio = tLOC/tLOC + pLOC) and in function oftime.

Description. In this view:• We use an XY-chart representing tLOCRatio on the X-

axis and the overall test coverage percentage on theY-axis. Individual dots represent releases over time.

• We plot four coverage measures (distinguished by thecolor of the dots): class, method, statement and block2

coverage.

Interpretation. Constant or growing levels of coverageover time indicate good testing health, as such a trend indi-cates that the testing-process is under control. The fractionof test code, however, is expected to remain constant or in-crease slowly alongside coverage increases. Severe fluctua-tions or downward spirals in either measure implies weakertest health.

Technicalities. For now we only compute the test cover-age for the major and minor releases of a software system.We do not compute coverage for every commit as: (i) weare specifically interested in long-term trends in contrast tofluctuations between releases due to the development pro-cess; (ii) computing test coverage (for a single release) istime-consuming; and (iii) automating this step for all re-leases proved difficult, due to changing build systems and

2A basic block is a sequence of bytecode instructions without anyjumps or jump targets, also see http://emma.sourceforge.net/faq.html (ac-cessed April 13, 2007)

222222222222

15Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test metrics: Test execution status

What?• # defects• # executed tests• Requirements

coverage

D. J. Anderson, Making the Business Case for Agile Management -Simplifying the Complex System of Software Engineering

Why?• Track progress of test

project• Decide stopping

criteria

First published at 2004 Motorola S3 Symposium July 12-15, 2004

cannot be used in isolation as an introduction rate of 30 Features per day is not sustainable. It must be balanced with the WIP Inventory Control Chart [Figure 10] and the production rate plotted on Figure 2 as a trend line.

Scope Control Chart

150160170180190200210220230

10-Feb

17-Feb

24-Feb2-M

ar9-M

ar

16-Mar

23-Mar

30-Mar

Time

Feat

ures

Upper limit Total Scope Lower Limit

Figure 9. Project Scope Control Chart Figure 9 charts the buffer usage by project dark matter. At the beginning of the project there are 185 Features in scope. If the Feature count were to fall below 170 then the manager may decide that it is safe to add some Features from the backlog which did not make the prioritization cut during the planning stage. Additional Features above 185 represent buffer usage and above 220 the project buffer is consumed in full. This chart cannot be used in isolation to calculate buffer usage, the production rate as plotted on Figure 2 as a trend line must also be used to validate the anticipated delivery date.

Figure 10. WIP Inventory Control Chart The WIP Inventory Control Chart [Figure 10] shows us how much work is in progress. Again, the calculation of the control limits on the chart is out with the scope of this paper. However, the upper control limit indicates whether the aggregate complexity is dangerously high

and potentially out of control as well as indicating whether or not the desired lead time of less than two weeks can be maintained. The lower control limit indicates that there is insufficient work-in-progress to maintain the desired production rate. The process is either stalling or starving from a lack of upstream material. A lagging trace of lead time can be maintained [Figure 11] and the data should remain within the control limits of 2 and 14 days providing the WIP Inventory was not permitted to become too large or too small [Figure 10].

1

Lead Time Control Chart

02468

10121416

10 30 50 70 90 1013

015

017

019

0

Features Complete

Cal

enda

r Day

s

Lead Time Lower Limit Upper Limit

Figure 11. Lead Time Control Chart

Monthly Operations Review

Data from CFDs can be used to identify the productivity rates or different elements in the whole process of software engineering. This data can be presented at an Operations Review. CFDs are intended for day-to-day tactical control use. Operations Review is about learning and organizational feedback. Learning provides input for strategic and operational investment decisions. It provides the material for good governance. Figure 12 shows a process diagram labeled with element capacities per month clearly identifying the current system constraint as System Test. According to the Theory of Constraints, maximum throughput (or productivity) is achieved by exploiting a constrained resource to the full and subordinating everything else in the system to the exploitation decision made. Continuous improvement is achieved by elevating the constraint thus removing it as the system capacity constrained resource (CCR). The effect is to move the constraint elsewhere whilst the system goes on to achieve a higher throughput.

WIP Inventory Control Chart

0102030405060708090

100

10-Feb

17-Feb

24-Feb

2-Mar

9-Mar

16-M

ar

23-M

ar

30-M

ar

Time

Feat

ures

Upper limit WIP Inventory Lower Limit

The exploit decision made for this example is: all system testers have been relieved of all non-value-adding tasks. The subordination decisions made are: extra project managers have been assigned to complete those non-value-adding administrative tasks; system analysts have been assigned to write test plans for a future release, freeing up testers to work purely on

Copyright David J. Anderson 2004 Page 8 of 13

16

Page 3: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!3

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test metrics: Defects(Trouble reports)

What?• # defects/size• repair time• lead time

– Lead time 1 week– Repair time 1hour

• root cause

Why?• Monitor quality• Monitor efficiency• Improve

Research Section C. Andersson and P. Runeson

identifies which status the defect had at the extrac-tion date. Thereby task reports rejected, cancelledor not repeatable were excluded from the analysis.

The severity of the defects is not includedas an attribute for classifying the defects in theanalysis, i.e. all reported failures are treated equally,regardless of criticality. This approach is chosensince discussions indicated that all defects wereimportant for planning of the proper effort tocorrect them. Thus, the analysis of each defect isperformed regardless of its severity. In later stagesof the analysis, a new perspective of the data in theform of defect severity could be of interest, althoughthis is not yet implemented.

5. DATA ANALYSIS AND RESULTS

The data were presented to the representatives ofthe company as part of the feedback inherent in eachcycle of the case study process. Its purpose was tomotivate the company representatives to continuethe analysis of the data within the organization,in terms of root cause analysis. This would inturn support them in identifying software processimprovements possibilities. During these feedbackmeetings, different views of the data were examinedto find the most valuable data formats for thisorganization. The data were presented both in theshape of graphs for the entire projects as well asseparately for each feature group. Some of theviews are presented below. Cycles 2 and 3 in thecase study explored projects 1 and 2 in terms ofdefect distributions over time for all the projectsand separated for the feature groups (Table 1).In the fourth cycle of the case study, which wasconfirmatory, project 3 was examined by the samemeans. Thus, the presentation in the following textincludes all three projects, even though the analysiswas conducted during different case study cycles.

The data are primarily given in three dimensions,based on time (date found for the defect) andtest activity (based on which activity the reporterbelonged to); both dimensions are derived fromcase study cycles 2 and 4, while feature group (whichfunctionality was affected by the defect) is derivedfrom case study cycles 3 and 4. The milestones, andother scheduled events that were common for theprojects, were used as reference points of time whenexploring the data for similarities.

5.1. Defect Detection over Time

5.1.1. Entire ProjectThe first exploratory case study cycle (cycle 2in Table 1) resulted in a presentation format thatwas used for describing the current state of theprojects (Figure 6) in terms of number of defectsdetected over time. To give an overview of thedistributions of total defect detection, we showthe number of defects detected per month overtime. The distributions reflect defects reported fromproject start to project end, reported by FT, ST,CAT and Misc. The three projects were comparedat specific dates. The ship dates are indicated in thefigure, but other milestones following the technicaldevelopment process that are comparable in theprojects were also used.

As can be seen from Figure 6, the shape of thetime distributions differ between the projects. Theappearance of the defect distributions illustratedin Figure 6, and more specifically distributionsfor each test activity over time, were discussedand explained by the project staff during thefeedback meetings. The response from the projectparticipants was information about specific eventsthat explain why a certain maximum is reachedat a certain point, when the test activities aremore intensive. An example of an event is whenfunction testers run a regression test period. Anunexpected minimum could be due to Christmas,when most employees take some days off. Hence,

Project 1

Project 2

Project 3

Figure 6. Defect distributions, showing number of defectsdetected over time, for the three studied projects. Shipdates are indicated with a vertical bar

Copyright ! 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2007; 12: 125–140

134 DOI: 10.1002/spip

17Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

18

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Orthogonal Defect Classification – ODC

At discovery• Activity• Trigger• ImpactAt fixing time• Target• Defect type• Qualifier (missing, incorrect or extra code)

• Source • Age (new, base, rewritten, refixed)

Defect categoriesBaseFunctionalityRobustnessInteroperabilityLoad and stabilityPerformanceStressScalabilityRegressionDocumentation

19Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Defect life cycles

13.2 MODELING DEFECTS 409

13.2 MODELING DEFECTS

The key to a successful defect tracking system lies in properly modeling defectsto capture the viewpoints of their many stakeholders, called cross-functionalitygroups. The cross-functionality groups in an organization are those groups thathave different stakes in a product. For example, a marketing group, a customersupport group, a development group, a system test group, and a product sustaininggroup are collectively referred to as cross-functionality groups in an organization. Itis not enough to merely report a defect from the viewpoint of software developmentand product management and seek to understand it by means of reproduction beforefixing it. In reality, a reported defect is an evolving entity that can be appropriatelyrepresented by giving it a life-cycle model in the form of a state transition diagram,as shown in Figure 13.1. The states used in Figure 13.1 are briefly explained inTable 13.1.

The state transition model allows us to represent each phase in the life cycleof a defect by a distinct state. The model represents the life cycle of a defect fromits initial reporting to its final closing through the following states: new, assigned,open, resolved, wait, FAD, hold, duplicate, shelved, irreproducible, postponed, andclosed. When a defect moves to a new state, certain actions are taken by the ownerof the state. By “owner” of a state of a defect we mean the person or group ofpeople who are responsible for taking the required actions in that state. Once theassociated actions are taken in a state, the defect is moved to a new state.

Two key concepts involved in modeling defects are the levels of priority andseverity . On one hand, a priority level is a measure of how soon the defect needsto be fixed, that is, urgency. On the other hand, a severity level is a measure ofthe extent of the detrimental effect of the defect on the operation of the product.Therefore, priority and severity assignments are separately done. In the following,four levels of defect priority are explained:

N

FS

P

IW

H D

A

O

R

C Figure 13.1 State transition diagramrepresentation of life cycle of defect.

NewAssignedOpenResolvedClosedWaitFunction as designed

HoldShelvedDuplicateIrreproduciblePostponed

Figure 13.1

20

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Survey of defect reporting in SW industry

Laukkanen E. I. and Mäntylä M. V., "Survey Reproduction of Defect Reporting in Industrial Software Development" in Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement(ESEM), 2011

21Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Few defects found in testing -> What is the product quality?

Test quality

Product quality

Many defects

Few defects

Few defects

Few defects

Are we here?

Or are we here?

22

Page 4: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!4

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test metrics: defect types

430 CHAPTER 13 SYSTEM TEST EXECUTION

visualizing data is needed. The ODC analyst must provide regular feedback, say,on a weekly basis, to the software development team so that appropriate actionscan be taken. Once the feedback is given to the software development team, theycan then identify and prioritize actions to be implemented to prevent defects fromrecurring.

The ODC along with application of the Pareto analysis technique [9, 10] givesa good indication of the parts of the system that are error prone and, therefore,require more testing. Juran [9] stated the Pareto principle very simply as “concen-trate on the vital few and not the trivial many.” An alternative expression of theprinciple is to state that 80% of the problems can be fixed with 20% of the effort,which is generally called the 80–20 rule. This principle guides us in efficiently uti-lizing the effort and resources. As an example, suppose that we have data on testcategory and the frequency of occurrence of defects for a hypothetical Chainsawsystem test project as shown in Table 13.12. The data plotted on a Pareto diagram,shown in Figure 13.4, which is a bar graph showing the frequency of occurrenceof defects with the most frequent ones appearing first. Note that the functionalityand the basic groups have high concentration of defects. This information helpssystem test engineers to focus on the functionality and the basic category parts ofthe Chainsaw test project. The general guideline for applying Pareto analysis is asfollows:

• Collect ODC and non-ODC data relevant to problem areas.• Develop a Pareto diagram.• Use the diagram to identify the vital few as issues that should be dealt with

on an individual basis.• Use the diagram to identify the trivial many as issues that should be dealt

with as classes.

TABLE 13.12 Sample Test Data of Chainsaw TestProject

Number ofCategory Defect Occurrences

1. Basic 252. Functionality 483. Robustness 164. Interoperability 45. Load and stability 66. Performance 127. Stress 68. Scalability 79. Regression 3

10. Documentation 2

13.6 DEFECT CAUSAL ANALYSIS 431

60

50

40

30

20

10

0

Category

Def

ect f

requ

ency

2 1 3 6 8 5 7 4 9 10

Figure 13.4 Pareto diagram for defect distribution shown in Table 13.12.

Remark. Vilfredo Federico Damaso Pareto was a French–Italian sociologist,economist, and philosopher. He made several important contributions, especiallyin the study of income distribution and in the analysis of individuals’ choices. In1906 he made the famous observation that 20% of the population owned 80%of the property in Italy, later generalized by Joseph M. Juran and others into theso-called Pareto principle (also termed the 80–20 rule) and generalized further tothe concept of a Pareto distribution.

13.6 DEFECT CAUSAL ANALYSIS

The idea of defect causal analysis (DCA) in software development is effectivelyused to raise the quality level of products at a lower cost. Causal analysis can betraced back to the quality control literature [11] as one of the quality circle activitiesin the manufacturing sector. The quality circle concept is discussed in Section 1.1.The causes of manufacturing defects are analyzed by using the idea of a qualitycircle, which uses cause–effect diagrams and Pareto diagrams. A cause–effectdiagram is also called an Ishikawa diagram or a fishbone diagram. Philip Crosby[12] described a case study of an organization called “HPA Appliance” involvingthe use of causal analysis to prevent defects on manufacturing lines. Causal analysisof software defects is practiced in a number of Japanese companies, usually in thecontext of quality circle activities [13, 14].

The idea of DCA was developed at IBM [15]. Defects are analyzed to(i) determine the cause of an error, (ii) take actions to prevent similar errors from

23Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Software Quality –Test Process Improvement

34

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Five views of quality

• Transcendental – I know it when I see it• User – personal and subjective elements• Manufacturing – process control• Product – quality comes from the inside• Value – trade-off with willingness to pay

35Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Process quality and product quality

• Quality in process –> quality in product

• Project: instantiated process

• Quality according to ISO 9126– Process quality contributes to improving product quality,

which in turn contributes to improving quality in use– Compare to McCall’s Quality Factors, table 17.1

Process

Project

Product

36

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Principles

Test organisation

Maturity ModelAssess

Improve

37Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Process improvement models

• (Integrated) Capability maturity model (CMM, CMMI)• Software process improvement and capability

determination (SPICE)• ISO 9001, Bootstrap• Test maturity model (TMM)• Test process improvement model (TPI)• Test improvement model (TIM)• Minimal Test Practice Framework (MTPF)

40

Page 5: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!5

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

41Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

CMMProcess change managementTechnology change managementDefect prevention

Software quality managementQuantitative process management

Peer reviewsIntergroup coordinationSoftware product engineeringIntegrated software managementTraining programmeOrganization process definitionOrganization process focus

Software configuration managementSoftware quality assuranceSoftware subcontract managementSoftware project tracking and oversightSoftware project planningRequirements management

Initial

Repeatable

Defined

Managed

Optimizing

43

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Test Maturity Model (TMM)

• Levels• Maturity goals and sub-goals

– Scope, boundaries, accomplishments– Activities, tasks, responsibilities

• Assessment model– Maturity goals– Assessment guidelines– Assessment procedure

44Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Level 2: Phase definition

• Institutionalize basic testing techniques and methods

• Initiate a test planning process• Develop testing and debugging tools

45

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Level 3: Integration

• Control and monitor the testing process

• Integrate testing into software life-cycle

• Establish a technical training program

• Establish a software test organization

46Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Level 4: Management and Measurement

• Software quality evaluation• Establish a test management

program• Establish an organization-wide

review program

47

Page 6: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!6

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Level 5: Optimizing, Defect Prevention, and Quality Control

• Test process optimization• Quality control• Application of process data for

defect prevention

48Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Minimal Test Practice Framework

Phase 3 (30+) Maintain &

Evolve System Define Teams Perform Inspections

Risk Management Coordinate

Software Quality

AssurancePhase 2 (»20) Create System Define Roles Perform

Walkthroughs Test Cases

Phase 1 (»10)

Define Syntax Define Responsibility Use Checklists

Basic Administration

of Test Environment

Test Plan

Category

Problem & Experience Reporting

Roles & Organisation

Verification & Validation

Test Administration

Test Planning

[Karlström et al 2005]

50

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Can the organization be too mature?

51Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

The castle (CMM) and the tiger (Agile)

U.S Department of defenseScientific management Statistical process controlManagementControlLarge team & low skill

Leading industry consultantsTeam creates own processWorking softwareSoftware craftsmanshipProductivitySmall team & high skill

53

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Plan-driven vs. Agile (Boehm & Turner, 2003, IEEE Computer, 36(6), pp 64-69)

!54

54Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Software quality assurance comparison: castle vs. tiger

OrganisationIndependent QA team Integrated into the project team

EnsuringCompliance to documented processes Applicability and improvement of

the current processes and practicesEvaluation Criteria

Against predefined criteria Identifying issues and problemsFocus

Documents & processes & control Productivity & quality & customerCommunication

Formal: Reporting to management Informal: Supporting the team

55

Page 7: L7 2019 Metrics TMMfileadmin.cs.lth.se/cs/Education/ETSN20/lectures/L7_Metrics_TMM.pdf · !2 Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering

!7

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

General advice

• Identify the real problems before starting an improvement program

• “What the customer wants is not always what it needs”

• Implement “easy changes” first• Involve people• Changes take time!

56Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Summary of Testing – a multitude

unit

integration

system

efficiencymaintainability

functionality

white box black box

Level of detail

Accessibility

Characteristics

usabilityreliability

acceptance

portability

! Tools! Organization! Maturity! Domain

57

Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Remaining parts of the course• Deadline for project report, Fri Dec 13• Guest lectures from Axis and SystemVerification Dec 16• Project Presentations

– Dec 20, 9:15-12.00, E:3336• 10 min presentations + 5 min discussion by peer group• Follow report structure + highlight the interesting• http://cs.lth.se/etsn20/project/project-conference/

• Labs– Any follow-up hand-ins: Jan 8, 12.00

• Exam– Wednesday Jan 15, 14.00-19.00, MA 9E, 9F

61Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group

Recommended exercises

• Chapter 13– 1, 3, 6, 10, 12

• Chapter 17– 1, 2, 3, 6

• Chapter 18– 4, 5, 9

62