l7 2019 metrics tmmfileadmin.cs.lth.se/cs/education/etsn20/lectures/l7_metrics_tmm.pdf · !2 lund...
TRANSCRIPT
!1
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Software TestingETSN20http://cs.lth.se/etsn20
Chapters 13, 17, 18Management: Measure and improve
Prof. Per Runeson
1Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Lecture
• Chapter 13– Metrics in test execution
• Chapter 17– Software Quality
• Chapter 18– Process improvement and assessment
2
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Purpose of metrics
• Project monitoring – check the status• Project controlling – corrective actions• Plan new projects• Measure and analyze results
– The profit of testing– The cost of testing– The quality of testing– The quality of the product– Basis of improvement, not only for the test process
CC Pink Sherbet Photography
4Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Selecting the right metrics
• What is the purpose of the collected data?– What kinds of questions can they answer?– Who will use the data?– How is the data used?
• When and who needs the data?– Which forms and tools are used to collect the data?– Who will collect them?– Who will analyse the data?– Who have access to the data?
7
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Example
Team Alice Bob# test cases run 235 15# defects found 12 12# hours spent 86 56
Which team is best?
8Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Goal-Question-Metric Paradigm (GQM)• Goals
– What is the organization trying to achieve? – The objective of process improvement is to satisfy these
goals• Questions
– Questions about areas of uncertainty related to the goals
– You need process knowledge to derive the questions• Metrics
– Measurements to be collected to answer the questions
[van Solingen, Berghout, The Goal/Question/Metric Method, McGraw-Hill, 1999]
9
!2
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Goal-Question-Metric
Goal: analyze
How efficient
Time # defects
How effective
% defects
11Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Measurement basics
Basic data:• Time
– calendar and – staff hours
• Defects (fault/failure)• Size
Basic rule:• Feedback to origin• Use data or don’t measure• The act of measurement
will affect the metrics you get
– Consequences of using LOC to measure programmer productivity?
12
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test metrics: Coverage
What?% statements covered% branches covered% data flow% requirements% equivalence classes
Why?
• Track completeness of test
13Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test metrics: Development status• Test case development status
– Planned– Available– Unplanned (not planned for, but
needed)• Test harness (=automated test
framework) development status– Planned– Available– Unplanned
14
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
0
20
40
60
80
100
v4.3v4.0
v3.5
v3.4
v3.3v3.2
v3.1
v3.0
v2.4
v2.2
v2.0
v1.3v1.1
(1)
(2)
(4)
(5)
(3)
Production CodeTest Code
Production ClassesTest Classes
Test Command
Figure 4. Checkstyle growth history view.considered an attention point that is monitored carefully.
Two other observations stand out. First, release 2.2 hasan interesting phenomenon: a sudden sharp decline forclass, method and statement coverage, with a mild drop ofblock coverage. Secondly, there is a decline in coverage (atall levels) between release 2.4 and 3.0. The version numberssuggest that the system has undergone major changes.
4.2 Internal evaluationTo evaluate these observations, we first contrasted them
with log messages at key points.“Up until #280 there is a single unit test”. The single test
with file ID 20 is called CheckerTest. Inspection of thisfile pointed out that this actually was not a typical unit test,but rather a system test [3]. CheckerTest receives a num-ber of input files and checks the output of the tool againstthe expected output.
“Testing has been neglected before the release 2.2”. In-spection reveals that this coverage drop is due to the intro-duction of a large number (39) of anonymous classes, thatare not tested. These anonymous classes are relatively sim-ple and only introduce a limited number of blocks per class,
Figure 5. Checkstyle Test Quality Evolution
and therefore, their introduction has a limited effect on theblock coverage level. Class coverage however, is more af-fected because the number of classes (29) has more thandoubled with the 39 additional anonymous classes. In-depthinspection taught us that the methods called by the anony-mous classes are tested separately. In the next version, allcoverage levels increase because of the removal of most ofthe anonymous classes. The drop is thus due to irregulari-ties in the coverage measurement, falsifying the statement.
“There is a period of pure testing right after release 2.2and before 3.0”. We sought for evidence that tests are ne-glected during this period, but instead we encountered logsfor 2.2 such as Added [6 tests] to improve code coverage(#285), updating/improving the coverage of tests (#286 and#308) and even Added test that gets 100% code coverage(#309). The assumption of a test reinforcement period be-fore 3.0 is backed up by several messages between #700and #725 mentioning improving test coverage and addingor updating tests.
“From version v2.2 until beyond v2.4, synchronous co-evolution happens”. To counter this, we looked for signsthat pure development was happening, e.g., by new fea-tures being added. Investigation of the log messages aroundthat time however showed that it concerns a period of bugfixing or patching (#354,#356,#357,#369,#370,#371,#415)and refactoring (#373,#374,#379,#397,#398,#412). More-over, during this period production classes and test caseswere committed together.
“Halfway between release 3.1 and 3.2 is a period of puredevelopment”. For this period, we could not find back thehabit of committing corresponding test cases alongside pro-duction classes. Rather, a couple of large commits con-sisting of batches of production files occur, with log mes-sages reporting the addition of certain functionality (#1410-#1420). Shortly after that, developers mention the additionof new tests (#143x and #1457).
“Between 3.4 and 3.5 testing happens more phased(ann. 4, Figure 4), followed by more synchronicity again”.We could not really confirm this behavior nor distinguishboth phases by means of the log messages, as we deducethat this period concerns mainly fixes of bugs, code style,spelling, build system and documentation.
“Around #670 and #780, developers were performingphased testing.” The message of #687 mentions “Upgrad-ing to JUnit 3.8.1”, which makes us conclude that it con-cerns shallow changes. The same accounts for the periodaround #780: test cases are (i) modified to use a new testhelper function; and (ii) rearranged across packages.4.3 External evaluation
Two Checkstyle developers completed the survey wesent, sharing their opinions about our observations. As ananswer to questions about the system’s evolution and testprocesss, they indicate that automated tests have always
225225225225
Example: Development status
[Zaidman et al ICST 2008]
(a) Synchronous (b) Time Delay (c) Test Backlog
Figure 2. Example patterns of synchronousco-evolution.
Number of test classes (tClasses) and Number of testcommands1 (tCommands).
• Metrics are presented as a cumulative percentage chartup to the last considered version (which is depictedat 100%), as we are particularly interested in the co-evolution and not so much in the absolute growth.
• The X-axis is annotated with release points.Interpretation. First of all, we can observe phases ofrelatively weaker or stronger growth throughout a system’shistory. Typically, in iterative software development newfunctionality is added during a certain period after a majorrelease, after which a “feature freeze” prevents new func-tionality to be added. At that point, bugs get fixed, testingeffort is increased and documentation written.
Secondly, the view allows us to study growth co-evolution. We observe (lack of) synchronization by study-ing how the measurements do or do not evolve together ina similar direction. The effort of writing production andtest code is spent synchronously when the two curves aresimilar in shape (see Figure 2(a)). A horizontal translationindicates a time delay between one activity and a related one(2(b)), whereas a vertical translation signifies that a histor-ical testing or development backlog has been accumulatedover time (2(c)). Such a situation occurs, e.g., when the test-writing effort is lagging the production code writing effortfor many subsequent releases. In the last version consideredin the view, both activities reach the 100% mark, which isexplained through the fact that we are measuring relativeefforts for both activities.
Thirdly, the interaction between measurements yieldsvaluable information as well. In Table 1 a number of theseinteractions are outlined. For example, the first line in Ta-ble 1 states that an increase in production code and a con-stant level of test code (with the other metrics being unspec-ified) points towards a “pure development” phase.Technicalities. To separate production classes from testclasses we use regular expressions to detect JUnit test caseclasses. As a first check, we look whether the class extendsjunit.framework.TestCase. If this fails, e.g., because ofan indirect generic test case [21], we search for a combina-tion of org.junit.* imports and setUp() methods.
Counting the number of test commands was done on thebasis of naming conventions. More specifically, when wefound a class to be a test case, we looked for methods that
1A test command is a container for a single test [21].
pLO
C
tLO
C
pCla
sses
tCla
sses
tCom
man
ds
interpretation! " pure development" ! pure testing! ! co-evolution" ! " " test refinement" " ! ! skeleton co-evolution
" ! test case skeletons" ! test command skeletons
" # test refactoring
Table 1. Co-evolution scenarios.
would start with test. We are aware that with the introduc-tion of JUnit 4.0, this naming convention is no longer nec-essary, but the projects we considered still adhere to them.
Trade-off. Both the change history and growth historyview are deduced from quantitative data on the developmentprocess. To contrast this with the resulting quality of thetests, we introduce a view incorporating test coverage.
2.3 Test Quality Evolution View
Goal. Test coverage is often seen as an indicator of “testquality” [24]. To judge the long-term “test health” of asoftware project, we draw the test coverage of the subjectsystem in function of the fraction of test code tLOCRatio(tLOCRatio = tLOC/tLOC + pLOC) and in function oftime.
Description. In this view:• We use an XY-chart representing tLOCRatio on the X-
axis and the overall test coverage percentage on theY-axis. Individual dots represent releases over time.
• We plot four coverage measures (distinguished by thecolor of the dots): class, method, statement and block2
coverage.
Interpretation. Constant or growing levels of coverageover time indicate good testing health, as such a trend indi-cates that the testing-process is under control. The fractionof test code, however, is expected to remain constant or in-crease slowly alongside coverage increases. Severe fluctua-tions or downward spirals in either measure implies weakertest health.
Technicalities. For now we only compute the test cover-age for the major and minor releases of a software system.We do not compute coverage for every commit as: (i) weare specifically interested in long-term trends in contrast tofluctuations between releases due to the development pro-cess; (ii) computing test coverage (for a single release) istime-consuming; and (iii) automating this step for all re-leases proved difficult, due to changing build systems and
2A basic block is a sequence of bytecode instructions without anyjumps or jump targets, also see http://emma.sourceforge.net/faq.html (ac-cessed April 13, 2007)
222222222222
15Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test metrics: Test execution status
What?• # defects• # executed tests• Requirements
coverage
D. J. Anderson, Making the Business Case for Agile Management -Simplifying the Complex System of Software Engineering
Why?• Track progress of test
project• Decide stopping
criteria
First published at 2004 Motorola S3 Symposium July 12-15, 2004
cannot be used in isolation as an introduction rate of 30 Features per day is not sustainable. It must be balanced with the WIP Inventory Control Chart [Figure 10] and the production rate plotted on Figure 2 as a trend line.
Scope Control Chart
150160170180190200210220230
10-Feb
17-Feb
24-Feb2-M
ar9-M
ar
16-Mar
23-Mar
30-Mar
Time
Feat
ures
Upper limit Total Scope Lower Limit
Figure 9. Project Scope Control Chart Figure 9 charts the buffer usage by project dark matter. At the beginning of the project there are 185 Features in scope. If the Feature count were to fall below 170 then the manager may decide that it is safe to add some Features from the backlog which did not make the prioritization cut during the planning stage. Additional Features above 185 represent buffer usage and above 220 the project buffer is consumed in full. This chart cannot be used in isolation to calculate buffer usage, the production rate as plotted on Figure 2 as a trend line must also be used to validate the anticipated delivery date.
Figure 10. WIP Inventory Control Chart The WIP Inventory Control Chart [Figure 10] shows us how much work is in progress. Again, the calculation of the control limits on the chart is out with the scope of this paper. However, the upper control limit indicates whether the aggregate complexity is dangerously high
and potentially out of control as well as indicating whether or not the desired lead time of less than two weeks can be maintained. The lower control limit indicates that there is insufficient work-in-progress to maintain the desired production rate. The process is either stalling or starving from a lack of upstream material. A lagging trace of lead time can be maintained [Figure 11] and the data should remain within the control limits of 2 and 14 days providing the WIP Inventory was not permitted to become too large or too small [Figure 10].
1
Lead Time Control Chart
02468
10121416
10 30 50 70 90 1013
015
017
019
0
Features Complete
Cal
enda
r Day
s
Lead Time Lower Limit Upper Limit
Figure 11. Lead Time Control Chart
Monthly Operations Review
Data from CFDs can be used to identify the productivity rates or different elements in the whole process of software engineering. This data can be presented at an Operations Review. CFDs are intended for day-to-day tactical control use. Operations Review is about learning and organizational feedback. Learning provides input for strategic and operational investment decisions. It provides the material for good governance. Figure 12 shows a process diagram labeled with element capacities per month clearly identifying the current system constraint as System Test. According to the Theory of Constraints, maximum throughput (or productivity) is achieved by exploiting a constrained resource to the full and subordinating everything else in the system to the exploitation decision made. Continuous improvement is achieved by elevating the constraint thus removing it as the system capacity constrained resource (CCR). The effect is to move the constraint elsewhere whilst the system goes on to achieve a higher throughput.
WIP Inventory Control Chart
0102030405060708090
100
10-Feb
17-Feb
24-Feb
2-Mar
9-Mar
16-M
ar
23-M
ar
30-M
ar
Time
Feat
ures
Upper limit WIP Inventory Lower Limit
The exploit decision made for this example is: all system testers have been relieved of all non-value-adding tasks. The subordination decisions made are: extra project managers have been assigned to complete those non-value-adding administrative tasks; system analysts have been assigned to write test plans for a future release, freeing up testers to work purely on
Copyright David J. Anderson 2004 Page 8 of 13
16
!3
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test metrics: Defects(Trouble reports)
What?• # defects/size• repair time• lead time
– Lead time 1 week– Repair time 1hour
• root cause
Why?• Monitor quality• Monitor efficiency• Improve
Research Section C. Andersson and P. Runeson
identifies which status the defect had at the extrac-tion date. Thereby task reports rejected, cancelledor not repeatable were excluded from the analysis.
The severity of the defects is not includedas an attribute for classifying the defects in theanalysis, i.e. all reported failures are treated equally,regardless of criticality. This approach is chosensince discussions indicated that all defects wereimportant for planning of the proper effort tocorrect them. Thus, the analysis of each defect isperformed regardless of its severity. In later stagesof the analysis, a new perspective of the data in theform of defect severity could be of interest, althoughthis is not yet implemented.
5. DATA ANALYSIS AND RESULTS
The data were presented to the representatives ofthe company as part of the feedback inherent in eachcycle of the case study process. Its purpose was tomotivate the company representatives to continuethe analysis of the data within the organization,in terms of root cause analysis. This would inturn support them in identifying software processimprovements possibilities. During these feedbackmeetings, different views of the data were examinedto find the most valuable data formats for thisorganization. The data were presented both in theshape of graphs for the entire projects as well asseparately for each feature group. Some of theviews are presented below. Cycles 2 and 3 in thecase study explored projects 1 and 2 in terms ofdefect distributions over time for all the projectsand separated for the feature groups (Table 1).In the fourth cycle of the case study, which wasconfirmatory, project 3 was examined by the samemeans. Thus, the presentation in the following textincludes all three projects, even though the analysiswas conducted during different case study cycles.
The data are primarily given in three dimensions,based on time (date found for the defect) andtest activity (based on which activity the reporterbelonged to); both dimensions are derived fromcase study cycles 2 and 4, while feature group (whichfunctionality was affected by the defect) is derivedfrom case study cycles 3 and 4. The milestones, andother scheduled events that were common for theprojects, were used as reference points of time whenexploring the data for similarities.
5.1. Defect Detection over Time
5.1.1. Entire ProjectThe first exploratory case study cycle (cycle 2in Table 1) resulted in a presentation format thatwas used for describing the current state of theprojects (Figure 6) in terms of number of defectsdetected over time. To give an overview of thedistributions of total defect detection, we showthe number of defects detected per month overtime. The distributions reflect defects reported fromproject start to project end, reported by FT, ST,CAT and Misc. The three projects were comparedat specific dates. The ship dates are indicated in thefigure, but other milestones following the technicaldevelopment process that are comparable in theprojects were also used.
As can be seen from Figure 6, the shape of thetime distributions differ between the projects. Theappearance of the defect distributions illustratedin Figure 6, and more specifically distributionsfor each test activity over time, were discussedand explained by the project staff during thefeedback meetings. The response from the projectparticipants was information about specific eventsthat explain why a certain maximum is reachedat a certain point, when the test activities aremore intensive. An example of an event is whenfunction testers run a regression test period. Anunexpected minimum could be due to Christmas,when most employees take some days off. Hence,
Project 1
Project 2
Project 3
Figure 6. Defect distributions, showing number of defectsdetected over time, for the three studied projects. Shipdates are indicated with a vertical bar
Copyright ! 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2007; 12: 125–140
134 DOI: 10.1002/spip
17Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
18
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Orthogonal Defect Classification – ODC
At discovery• Activity• Trigger• ImpactAt fixing time• Target• Defect type• Qualifier (missing, incorrect or extra code)
• Source • Age (new, base, rewritten, refixed)
Defect categoriesBaseFunctionalityRobustnessInteroperabilityLoad and stabilityPerformanceStressScalabilityRegressionDocumentation
19Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Defect life cycles
13.2 MODELING DEFECTS 409
13.2 MODELING DEFECTS
The key to a successful defect tracking system lies in properly modeling defectsto capture the viewpoints of their many stakeholders, called cross-functionalitygroups. The cross-functionality groups in an organization are those groups thathave different stakes in a product. For example, a marketing group, a customersupport group, a development group, a system test group, and a product sustaininggroup are collectively referred to as cross-functionality groups in an organization. Itis not enough to merely report a defect from the viewpoint of software developmentand product management and seek to understand it by means of reproduction beforefixing it. In reality, a reported defect is an evolving entity that can be appropriatelyrepresented by giving it a life-cycle model in the form of a state transition diagram,as shown in Figure 13.1. The states used in Figure 13.1 are briefly explained inTable 13.1.
The state transition model allows us to represent each phase in the life cycleof a defect by a distinct state. The model represents the life cycle of a defect fromits initial reporting to its final closing through the following states: new, assigned,open, resolved, wait, FAD, hold, duplicate, shelved, irreproducible, postponed, andclosed. When a defect moves to a new state, certain actions are taken by the ownerof the state. By “owner” of a state of a defect we mean the person or group ofpeople who are responsible for taking the required actions in that state. Once theassociated actions are taken in a state, the defect is moved to a new state.
Two key concepts involved in modeling defects are the levels of priority andseverity . On one hand, a priority level is a measure of how soon the defect needsto be fixed, that is, urgency. On the other hand, a severity level is a measure ofthe extent of the detrimental effect of the defect on the operation of the product.Therefore, priority and severity assignments are separately done. In the following,four levels of defect priority are explained:
N
FS
P
IW
H D
A
O
R
C Figure 13.1 State transition diagramrepresentation of life cycle of defect.
NewAssignedOpenResolvedClosedWaitFunction as designed
HoldShelvedDuplicateIrreproduciblePostponed
Figure 13.1
20
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Survey of defect reporting in SW industry
Laukkanen E. I. and Mäntylä M. V., "Survey Reproduction of Defect Reporting in Industrial Software Development" in Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement(ESEM), 2011
21Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Few defects found in testing -> What is the product quality?
Test quality
Product quality
Many defects
Few defects
Few defects
Few defects
Are we here?
Or are we here?
22
!4
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test metrics: defect types
430 CHAPTER 13 SYSTEM TEST EXECUTION
visualizing data is needed. The ODC analyst must provide regular feedback, say,on a weekly basis, to the software development team so that appropriate actionscan be taken. Once the feedback is given to the software development team, theycan then identify and prioritize actions to be implemented to prevent defects fromrecurring.
The ODC along with application of the Pareto analysis technique [9, 10] givesa good indication of the parts of the system that are error prone and, therefore,require more testing. Juran [9] stated the Pareto principle very simply as “concen-trate on the vital few and not the trivial many.” An alternative expression of theprinciple is to state that 80% of the problems can be fixed with 20% of the effort,which is generally called the 80–20 rule. This principle guides us in efficiently uti-lizing the effort and resources. As an example, suppose that we have data on testcategory and the frequency of occurrence of defects for a hypothetical Chainsawsystem test project as shown in Table 13.12. The data plotted on a Pareto diagram,shown in Figure 13.4, which is a bar graph showing the frequency of occurrenceof defects with the most frequent ones appearing first. Note that the functionalityand the basic groups have high concentration of defects. This information helpssystem test engineers to focus on the functionality and the basic category parts ofthe Chainsaw test project. The general guideline for applying Pareto analysis is asfollows:
• Collect ODC and non-ODC data relevant to problem areas.• Develop a Pareto diagram.• Use the diagram to identify the vital few as issues that should be dealt with
on an individual basis.• Use the diagram to identify the trivial many as issues that should be dealt
with as classes.
TABLE 13.12 Sample Test Data of Chainsaw TestProject
Number ofCategory Defect Occurrences
1. Basic 252. Functionality 483. Robustness 164. Interoperability 45. Load and stability 66. Performance 127. Stress 68. Scalability 79. Regression 3
10. Documentation 2
13.6 DEFECT CAUSAL ANALYSIS 431
60
50
40
30
20
10
0
Category
Def
ect f
requ
ency
2 1 3 6 8 5 7 4 9 10
Figure 13.4 Pareto diagram for defect distribution shown in Table 13.12.
Remark. Vilfredo Federico Damaso Pareto was a French–Italian sociologist,economist, and philosopher. He made several important contributions, especiallyin the study of income distribution and in the analysis of individuals’ choices. In1906 he made the famous observation that 20% of the population owned 80%of the property in Italy, later generalized by Joseph M. Juran and others into theso-called Pareto principle (also termed the 80–20 rule) and generalized further tothe concept of a Pareto distribution.
13.6 DEFECT CAUSAL ANALYSIS
The idea of defect causal analysis (DCA) in software development is effectivelyused to raise the quality level of products at a lower cost. Causal analysis can betraced back to the quality control literature [11] as one of the quality circle activitiesin the manufacturing sector. The quality circle concept is discussed in Section 1.1.The causes of manufacturing defects are analyzed by using the idea of a qualitycircle, which uses cause–effect diagrams and Pareto diagrams. A cause–effectdiagram is also called an Ishikawa diagram or a fishbone diagram. Philip Crosby[12] described a case study of an organization called “HPA Appliance” involvingthe use of causal analysis to prevent defects on manufacturing lines. Causal analysisof software defects is practiced in a number of Japanese companies, usually in thecontext of quality circle activities [13, 14].
The idea of DCA was developed at IBM [15]. Defects are analyzed to(i) determine the cause of an error, (ii) take actions to prevent similar errors from
23Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Software Quality –Test Process Improvement
34
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Five views of quality
• Transcendental – I know it when I see it• User – personal and subjective elements• Manufacturing – process control• Product – quality comes from the inside• Value – trade-off with willingness to pay
35Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Process quality and product quality
• Quality in process –> quality in product
• Project: instantiated process
• Quality according to ISO 9126– Process quality contributes to improving product quality,
which in turn contributes to improving quality in use– Compare to McCall’s Quality Factors, table 17.1
Process
Project
Product
36
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Principles
Test organisation
Maturity ModelAssess
Improve
37Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Process improvement models
• (Integrated) Capability maturity model (CMM, CMMI)• Software process improvement and capability
determination (SPICE)• ISO 9001, Bootstrap• Test maturity model (TMM)• Test process improvement model (TPI)• Test improvement model (TIM)• Minimal Test Practice Framework (MTPF)
40
!5
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
41Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
CMMProcess change managementTechnology change managementDefect prevention
Software quality managementQuantitative process management
Peer reviewsIntergroup coordinationSoftware product engineeringIntegrated software managementTraining programmeOrganization process definitionOrganization process focus
Software configuration managementSoftware quality assuranceSoftware subcontract managementSoftware project tracking and oversightSoftware project planningRequirements management
Initial
Repeatable
Defined
Managed
Optimizing
43
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Test Maturity Model (TMM)
• Levels• Maturity goals and sub-goals
– Scope, boundaries, accomplishments– Activities, tasks, responsibilities
• Assessment model– Maturity goals– Assessment guidelines– Assessment procedure
44Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Level 2: Phase definition
• Institutionalize basic testing techniques and methods
• Initiate a test planning process• Develop testing and debugging tools
45
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Level 3: Integration
• Control and monitor the testing process
• Integrate testing into software life-cycle
• Establish a technical training program
• Establish a software test organization
46Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Level 4: Management and Measurement
• Software quality evaluation• Establish a test management
program• Establish an organization-wide
review program
47
!6
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Level 5: Optimizing, Defect Prevention, and Quality Control
• Test process optimization• Quality control• Application of process data for
defect prevention
48Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Minimal Test Practice Framework
Phase 3 (30+) Maintain &
Evolve System Define Teams Perform Inspections
Risk Management Coordinate
Software Quality
AssurancePhase 2 (»20) Create System Define Roles Perform
Walkthroughs Test Cases
Phase 1 (»10)
Define Syntax Define Responsibility Use Checklists
Basic Administration
of Test Environment
Test Plan
Category
Problem & Experience Reporting
Roles & Organisation
Verification & Validation
Test Administration
Test Planning
[Karlström et al 2005]
50
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Can the organization be too mature?
51Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
The castle (CMM) and the tiger (Agile)
U.S Department of defenseScientific management Statistical process controlManagementControlLarge team & low skill
Leading industry consultantsTeam creates own processWorking softwareSoftware craftsmanshipProductivitySmall team & high skill
53
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Plan-driven vs. Agile (Boehm & Turner, 2003, IEEE Computer, 36(6), pp 64-69)
!54
54Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Software quality assurance comparison: castle vs. tiger
OrganisationIndependent QA team Integrated into the project team
EnsuringCompliance to documented processes Applicability and improvement of
the current processes and practicesEvaluation Criteria
Against predefined criteria Identifying issues and problemsFocus
Documents & processes & control Productivity & quality & customerCommunication
Formal: Reporting to management Informal: Supporting the team
55
!7
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
General advice
• Identify the real problems before starting an improvement program
• “What the customer wants is not always what it needs”
• Implement “easy changes” first• Involve people• Changes take time!
56Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Summary of Testing – a multitude
unit
integration
system
efficiencymaintainability
functionality
white box black box
Level of detail
Accessibility
Characteristics
usabilityreliability
acceptance
portability
! Tools! Organization! Maturity! Domain
57
Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Remaining parts of the course• Deadline for project report, Fri Dec 13• Guest lectures from Axis and SystemVerification Dec 16• Project Presentations
– Dec 20, 9:15-12.00, E:3336• 10 min presentations + 5 min discussion by peer group• Follow report structure + highlight the interesting• http://cs.lth.se/etsn20/project/project-conference/
• Labs– Any follow-up hand-ins: Jan 8, 12.00
• Exam– Wednesday Jan 15, 14.00-19.00, MA 9E, 9F
61Lund University / Faculty of Engineering/ Department of Computer Science / Software Engineering Research Group
Recommended exercises
• Chapter 13– 1, 3, 6, 10, 12
• Chapter 17– 1, 2, 3, 6
• Chapter 18– 4, 5, 9
62