1 quiz t/f: tqm is a clearly defined quality management process standard. define the following:...

1

Quiz

• T/F: TQM is a clearly defined quality management process standard.

• Define the following:– Defect Rate

– FPA

– Ratio Scale

– OO

– Ordinal Scale

• List at least 5 quality parameters/attributes used to measure software quality (from the customer perspective)

• Why is LOC a “poor” measure of code size?

2

Project SampleOS X

• Project Replaced Carbon– and NeXT and Yellow Box and...

• Developers

• Customers

• The Media

• iCEO

10

Software Quality EngineeringCS410

Class 3a

Measurement Theory

11

Measurement Theory• “It is an undisputed statement that

measurement is crucial to the progress of all sciences” (Kan 1995)

• “Scientific progress is made through observations and generalizations based on data and measurements, the derivation of theories as a result, and in turn the confirmation or refutation of theories via hypothesis testing…” (Kan 1995)

12

Measurement Theory• Basic measurement theory steps

– Proposition• an idea is proposed

– Definition• components of the idea are defined

– Operational Definition• operational characteristics of components are

identified

– Metric definition• metrics are identified based on operational definition

13

Measurement Theory– Hypothesis definitions

• hypotheses are drawn from combination of proposition and definitions

– Testing and metric gathering• testing is performed and empirical data is collected

– Confirmation or refutation of hypothesis• hypotheses are confirmed or refuted based on

analysis of empirical data

14

Measurement Theory• Example:

– Proposition - “the more rigorous the front end of the software development process is executed, the better the quality at the back end”

– Definitions• Front end SW process = design through unit test

• Back end SW process = integration through system test

• Rigorous implementation = total adherence to process (assume process designates 100% design and code inspections)

15

Measurement Theory– Operational Definitions

• Rigorous implementation can be measured by amount of design inspection, and lines of code (LOC) inspection

• Back end quality means low number of defects found in system test

– Metric Definitions• Design inspection coverage can be expressed as

percentage of designs inspected

• LOC inspection coverage can be expressed as percentage of LOC inspected

• Back end quality can be expressed as defects per thousand lines of code (KLOC)

16

Measurement Theory– Hypothesis definition(s)

• The higher percentage of designs and code inspected, the lower the defect rate will be at system test.

– Testing and metric gathering (multiple projects)

• Track and record inspection coverage

• Track and record defects found in system testing

– Confirmation or refutation of hypothesis• Analyze data

• Hypothesis supported?

17

Measurement Theory

• The operationalization (definition) process produces metrics and indicators for which data can be collected, and the hypotheses can be tested empirically.

• In other words - You have to gather, analyze and compare data to prove whether the hypothesis is true or not.

18

Level of Measurement

• How measurements are classified and compared:– Nominal Scale

– Ordinal Scale

– Interval Scale

– Ratio Scale

• Scales are hierarchical, each higher level scale posses all of the properties of the lower ones.

• Operationalization should take advantage of highest level scale possible (I.e. don’t use low/medium/high if you can use 1…10)

19

Level of Measurement• Nominal Scale

– Lowest level scale

– Classification of items (sort items into categories)

– Two requirements• Jointly exhaustive (all items can be categorized)

• Mutually exclusive (only one category applies)

– Names of categories and sequence order bear no assumptions about relationships between categories

– Example:• Categories of SW dev: Waterfall, Spiral, Iterative, OO

• Does not imply that Waterfall is ‘better/greater’ than Spiral

20

Level of Measurement• Ordinal Scale

– Like nominal except comparison can be applied– But - we cannot determine magnitude of difference– Example:

• Categories of SW dev orgs based on CMM levels (1-5)• We can state that dev orgs at level 2 are more mature then orgs at

level 1, and so on...• But we cannot state how much better 2 is than 1, or 3 is than 2, or 3

is than 1, and so on…

– Likert rating scale often used at with this scale1 = completely dissatisfied

2 = somewhat dissatisfied

3 = neutral

4 = satisfied

5 = completely satisfied

21

Level of Measurement• Interval Scale

– Like ordinal scale, except now we can determine exact differences between measurement points

– Can use addition/subtraction expressions

– Requires establishment of a well-defined, repeatable, unit of measurement

– Example of interval scale• Temperature in Fahrenheit (vs. cool, warm, hot)

• Day 1’s high temperature was 80 degrees

• Day 2’s high temperature was 87 degrees

• Day 2 was 7 degrees warmer than day 1 (addition)

• Day 1 was 7 degrees cooler than day 2 (subtraction)

22

Level of Measurement

• Ratio scale– Interval scale with absolute, non-arbitrary zero point

– Highest level scale

– Can use multiplication and division

– Example• MBNQA scores

• Company A scored 800 in the range of 0...1000

• Company B scored 400 in the range of 0…1000

• Company A doubled company B’s score (multiplication)

• Company B scored half as well as Company A (division)

23

Basic Measures• Measures are ways of analyzing and comparing data

to extract meaningful information.• Data vs. Information

– Data - raw numbers or facts– Information

• relevant - related to subject

• qualified - characteristics specified

• reliable - dependable, high confidence level

• Basic measures– Ratio– Proportion– Percentage– Rate

24

Basic Measures• Ratio

– Result of dividing one quantity by another– Best use is with two distinct groups– Numerator, denominator are mutually exclusive– Examples 1:

• Developers = 10, Testers = 5

• Developer to Tester ratio = 10 / 5 x 100% = 200%

– Example 2:• Developers = 5, Testers = 10

• Developer to Tester ratio = 5 / 10 x 100% = 50%

25

Basic Measures• Proportion

– Best use is with multiple categories within one group

– For n categories (C) in the group (G) then

– C1/G + C2/G … + Cn/G = 1

– P of category = desired category / total group size

• Example– Number of customers surveyed = 50

– Number of satisfied customers = 30

– Proportion of satisfied customers = 30 / 50 or .6

– Proportion of unsatisfied customers = 20 / 50 or .4

– satisfied (.6) plus unsatisfied (.4) = 1

26

Basic Measures• Percentage

– A proportion expressed in terms of per hundred units

– Percentages represent relative frequencies

– Total number of cases should always be included

– Total number of cases should be sufficiently large

– Example

• 200 bugs found in 8 KLOC

• 30 requirements bugs (30 / 200 x 100%) = 15%

• 50 design bugs (50 / 200 x 100%) = 25%

• 100 code bugs (100 / 200 x 100%) = 50%

• 20 other bugs (20 / 200 x 100%) = 10%

27

Basic Measures• Rate

– Associated with dynamic changes of a quantity over time

– Changes in y per each unit of x• x is usually a quantity of time

• time unit of x must be expressed

– Example• Opportunity For Error = 5000 (1. based on 5KLOC)

• Number of defects = 200 (2. after one year)

• Defect rate = 200 / 5000 * 1K = 40 defects per KLOC

• Notes

1. - extremely had to determine OFE

2. - hard to know when to measure

28

Basic Measures• Rate

– Six Sigma

– A specific defect rate of 3.4 defective parts per million (ppm) which has become an industry standard for the ultimate quality goal.

– Sigma is the Greek symbol for standard deviation

– By definition, if the variations in the process are reduced then it’s easier to obtain Six Sigma quality

– Some problems arise in SW engineering• What are the parts:

lines of source code?

lines of assembly code?

29

Reliability• Reliability - consistency of a number of

measurements taken using the same measurement method on the same subject

• High degree of reliability - repeated measurements are consistant

• Low degree of reliability - repeated measurements have large variations

• Operational definitions (specifics of how measurement is taken) are key to achieving high degrees of reliability

30

Validity• Validity is whether the measurement really

measures what is intended to be measured– Construct Validity - validity of a metric to represent a

theory• Difficult to validate abstract concepts

• Example:

Concept - Intelligent people attend college

Measurement - Sum college enrollment

Conclusion - “The sum of the college enrollment is the number of intelligent people” - Not valid

31

Validity– Criterion-related (predictive) Validity - validity of a

metric to predict a theory or relationship• Example:

Concept - Safe driving requires knowledge of the rules and regulations

Measurement - Drivers license test

Conclusion - Those that have low scores on driver’s license tests are more likely to have an accident

- Content Validity - the degree to which a metric covers the meaning of the concept

Example - A general math knowledge test needs to include more than just addition and subtraction.

32

Measurement Errors• Two types of measurement Errors

– Systematic Errors - errors associated with validity

– Random Errors - errors associated with reliability

• Example:

A bathroom scale which is off by 10 pounds

Each time scale is used the reading equals:actual weight + 10 pounds + variation

true + systematic error + random error

systematic error makes reading invalid

random error makes reading unreliable

33

Measurement Errors

• Ways of assessing Reliability– Test/Restest - one or more retests are performed and

results compared to previous tests• May expose random errors

– Alternative-form - acquire same measurements using alternate testing means

• May expose systematic errors

34

Correlation• Correlation - a statistical method for assessing

relationships among observed or empirical data sets

• If the correlation coefficient between two variables is weak, then there is no linear correlation (but there may be non-linear)

• Example - negative linear relationship between LOC inspected and defects shipped

02

468

1012

0 500 1000 1500 2000

LOC Inspected (independent variable)

35

Causality

• Identification of cause and effect relationships in experiments

• Three criteria for cause-effect:1. Cause must precede effect

2. Two variables are empirically related (relationship can be measured)

3. Empirical relationship is direct (not coincidence, or in error)

36

Summary• Operational definitions are valuable in

determining levels and types of metrics to use• Scales and measures have different characteristics

and different intended uses• Avoid using the wrong scale or measure• Validity and Reliability represent measurement

quality• Correlation and Causality are goals of

measurement (I.e. quest to identify and prove a cause-effect relationship)

37

Follow-up:

• List at least 5 quality parameters/attributes used to measure software quality from the customer perspective

38

Pop Quiz

• What is the difference between validity and reliability?

• Why are software development process models important to the study of software quality?

• Define Six Sigma

• Define MTTF• T/F Defect density and

PUM combined represent a true measure of customer satisfaction.

• T/F If a hypothesis is refuted, then the wrong metrics were used.

39

Software Quality EngineeringCS410

Class 3bProduct Quality Metrics

Process Quality Metrics

Function Point analysis

40

Software Quality Metrics• Three kinds of Software Quality Metrics

– Product Metrics - describe the characteristics of product• size, complexity, design features, performance, and quality

level

– Process Metrics - used for improving software development/maintenance process

• effectiveness of defect removal, pattern of testing defect arrival, and response time of fixes

– Project Metrics - describe the project characteristics and execution

• number of developers, cost, schedule, productivity, etc.

• fairly straight forward

41

Software Quality Metrics• Product Metrics

– Mean Time to Failure (MTTF)

– Defect Density

– Problems per User Month (PUM)

– Customer Satisfaction

• Process Project Metrics– Defect density during machine testing

– Defect arrival patterns during machine testing

– Phased-based defect removal

– Defect removal effectiveness

42

Software Quality Metrics

• Some terminology:– Error - a human mistake that results in incorrect (or

incomplete) software• faulty requirement, design flaw, coding error

– Fault (a.k.a. defect) - a condition within the system that causes a unit of the system to not function properly

• GPF, Abend, crash, lock-up, dead-lock, error message, etc.

– Failure - required function (I.e. the goal) cannot be performed

• An error results in a fault which may cause one or more failures.

43

MTTF• Mean Time To Failure (MTTF) - measures

how long the software can run before it encounters a “crash”

• Difficult measurement to obtain because it’s tied to the “real” use of the product

• Easier to define requirements for special purpose software than general use software

• MTTF is not widely used by commercial software developers for these reasons

44

Defect Density

• Defect Density (a.k.a. Defect Rate) - is the number of estimated defects

• Estimated because defects are found throughout the entire life-cycle of the product

• Important for cost and resource estimates for the maintenance phase of the life cycle

45

Defect Density

• More specific– Defect Density (rate) = number of defects /

opportunities for errors during a specified time– Number of defects can be approximated as

equal to the number of unique causes of observed failures

– Opportunities for error can be expressed as KLOC

– Time frame (life of product or LOP) varies

46

Defect Density

• Defect Density Example– Product is one year old, and is 10 KLOC

• Unique causes of observed failures = 50

• Current Defect Density = 50/10K x 1K = 5 defects per KLOC per year

– After second year• Unique causes of observed failures = 75

• Current Defect Density = 75 / 10K x 1K = 7.5 defects per KLOC per 2year or 3.75 per KLOC per year

47

Defect Density• Comparison Issues

– How LOC is calculated• Count only executable lines

– Note - what is an executable line?? HLL vs. Assembler• Count executable lines, plus data definitions• Count executable lines, plus data definitions, plus

comments• Count executable lines, plus data definitions, plus

comments, plus job control language• Count physical lines• Count logical lines (terminated by ‘;’)

• Function Point Analysis (FPA) is an alternative measure of program size

48

Defect Density• Comparison Issues (cont.)

– Timeframes must be the same• Cannot compare (current) defect rate for a one year

old product to the (current) defect rate of a four year old product

• IBM considers life of product to be 4 years

– Must account for new and modified code in LOC count (otherwise metric is skewed)

– LOC counting must remain consistent– Defect rate should be calculated for each release

(must use change flags)

49

Defect Density• Change Flags Example:

/* Module A - Prolog */

/* Release 1.1 modifications 12/01/97 @R11 */

/* Fix for problem report #1127 03/15/98 @F1127 */...Total_Records = 0; /* Init records @R11A */...Bad_Records = Total_Records - Good_Records; /* Calculate num bad

recs @F1127C */

• Flags (a.k.a. Change Control) - CMM level 2+A - line added by release/fix

C - line changed by release/fix

M - line moved by release/fix

D - line deleted by release/fix (optional)

50

Defect Density• IBM Example:

SSI (current release) = SSI (previous release)

+ CSI - Deleted - Changed• SSI - Shipped Source Instructions

• CSI - Changed (and new) Source Instructions

Defect Rate Metrics for Current Release:TVUA/KSSI - all APARS (defects) reported on the total release

(inclusive of previous release)

TVUA/KCSI - all APARS (defects) reported on the new release code

– APAR - Authorized Program Analysis Report (Severity 1-4)

– TVUA - Total Valid Unique Apars

51

Customer Problem Metrics

• In addition to valid defects, other issues are viewed as ‘problems’ by customers:– Usability– Unclear documentation/information– Missing documentation/information– Duplicate problems (counted as invalid)– User errors (traps)

52

Customer Problem Metrics• From customers’ perspective, the total problem

space is the combination of the defect-oriented problems and the non-defect-oriented problems. They all impact the customer, regardless of how the SW company classifies them.

• Total problems can be expressed as Problems per User Month (PUM)PUM = Total Problems / License-Months

• License-Months = Total number of licenses x number of months in calculation period

53

Customer Problem Metrics• PUM example:

Total defects = 75, Licenses = 50, Months = 6

PUM = 75 / (50 x 6) = .25 problems/user month

• PUM is usually calculated for each month after a software release, and averaged for the year.

• Note - PUM counts a defect multiple times, depending on how pervasive it is (I.e. mainstream function defects are costly)

• Ways to lower PUM:– Improve the development process to reduce defects

– Reduce non-defect-oriented problems (better documentation, usability, etc.)

– Increase the number of licenses (?!)

54

Customer Satisfaction• PUM and Defect Rate are not true measurements of

customer satisfaction, but they do contribute.

• Timing, availability, company image, services, and

(customized) customer solutions also contribute.

• Customer satisfaction is usually measured using the five

point (Likert scale), via a customer survey1. - Very dissatisfied

2. - Dissatisfied

3. - Neutral

4. - Satisfied

5. - Very satisfied

55

Customer Satisfaction• Common metrics for Customer Satisfaction:

– Percent of very satisfied customers– Percent of satisfied customers (very satisfied and satisfied)

– Percent of dissatisfied customers (dissatisfied and very dissatisfied)

– Percent of non-satisfied (neutral, dissatisfied, and very dissatisfied)

• Scope of three quality metrics (defects, customer problems, customer satisfaction). Fig. 4.1 p. 94

56

Defect Density During Machine Testing

• Machine Testing - testing after code is integrated into the system library (I.e. integration testing, function testing, system testing, regression testing)

• Commonly held beliefs: – There is a positive correlation between defect rates found during

testing and the number of defects injected during development.

– There is a positive correlation between the defect rates found during testing and the defect rate once product is released.

• Counter argument: Better testing will uncover more defects (I.e. maybe the code is just being tested better)

57

Defect Density During Machine Testing• Release quality:

If defect rate during testing is the same or lower than previous release then:

If current release testing is worse then:testing needs to be improved (inconclusive about quality)

Else if release testing is the same (or better):the quality is better than previous release

If defect rate during testing is higher than previous release then:

If testing process was improved then:the quality is the same or better then previous release

Else if testing process was not improved then:the quality is worse than previous release (more defects)

58

Defect Arrival Rate During Machine Testing

• Defect arrival rate provides more information to supplement the defect density metric

• This metric is a view of the patterns and time between defects.

• Different arrival patterns (can) indicate different quality levels in the software.

• Objective - to see declining and stabilizing arrival rates over time– Supports the idea of “shake-out” testing where you

attempt to find all the highest level bugs first so that additional testing is not impacted.

59

Defect Arrival Rate During Machine Testing

• Three different metrics for arrival rate:– Raw defect (includes duplicates, and invalids) arrivals

during testing phase per some time interval (day, week, month, etc).

– Valid defect arrivals during testing phase per some time interval

– Defect backlog over time. This is a measure of workload which could adversely affect quality.

60

Phased-Based Defect Removal Pattern

• An extension of defect density metric.• Defects are tracked at all (inspection/test) phases

of development cycle (design reviews, code reviews, unit test, integration test, function test, and system test).

• This metric can be correlated to inspection coverage, and test coverage metrics.

• Helps to identify the overall defect removal ability of the development process.

• Fig. 4.3 p. 103

61

Defect Removal Effectiveness

• Defect Removal Effectiveness (DRE):• DRE = (Defects removed in the phase / defects

latent in product) x 100%• Where the latent defects can be calculated as the

sum of all defects found in later phases, and the field (this is a constantly changing number)

62

Defect Removal Effectiveness• Example - Defects per phase:

HLD (I0) review I0 = 5

(found= 5, latent=4, total=9), DRE=(5/9x100%)=55%

LLD (I1) review I0 = 3, I1 = 4

(found = 4, latent=6, total=10), DRE=(4/10x100%)=40%

Code inspection (I2) I0 = 1, I1 = 1, I2 = 10


Unit Test (UT) I0 = 0, I1 = 1, I2 = 5, UT = 3

(found = 3, latent=1, total=4), DRE = (3/4x100%) = 75%

Component Test (CT) I0 = 0, I1 = 0, I2 = 1, UT = 1, CT = 3

(found = 3, latent=1, total=4), DRE=(3/4x100%)= 75%

System Test (ST) I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 1, ST = 2(found = 2, latent=1, total=3), DRE = (2/3x100%) = 67%

Field = 2 I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 0, ST = 1

63

Defect Removal Effectiveness• Example - Defects per phase:

HLD (I0) review I0 = 5

(found= 5, latent=4, total=9), DRE=(5/9x100%)=55%

LLD (I1) review I0 = 3, I1 = 4


Code inspection (I2) I0 = 1, I1 = 1, I2 = 10

(found = 10, latent=6, total=16), DRE=(10/16x100%)=62.5%

Unit Test (UT) I0 = 0, I1 = 1, I2 = 5, UT = 3

(found = 3, latent=1, total=4), DRE = (3/4x100%) = 75%

Component Test (CT) I0 = 0, I1 = 0, I2 = 1, UT = 1, CT = 3

(found = 3, latent=1, total=4), DRE=(3/4x100%)= 75%

System Test (ST) I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 1, ST = 2(found = 2, latent=1, total=3), DRE = (2/3x100%) = 67%

Field = 2 I0 = 0, I1 = 0, I2 = 0, UT = 0, CT = 0, ST = 1

Found

Latent

64

Defect Removal Effectiveness

• Notes• Must account for where a defect was introduced.

• As number of field bugs increases DRE must be recalculated.

• Latent - present but not evident (at this phase).

65

Function Point Analysis (FPA)

• Alternative size measure to LOC

• Can measure productivity (function points per person), and quality (defects per function point)

• Idea: The defect rate should be measured against how many functions the software provides

• Functionality is independent of code size

66


• Function Points is a weighted total of five major components– External inputs x 4– External outputs x 5– Logical internal files x 10– External interface files x 7– External inquiries x 4

67


• Low and high weighting factors are used to account for complexity– External inputs, low = 3, high = 6– External outputs, low = 4, high = 7– Logical internal files, low = 7, high = 15– External interface files, low = 5, high = 10– External inquiries, low = 3, high = 6

• Function Count (FC) is then calculated– FC = sum of each component

68


14 system characteristics are then accessed for impact on scale of 0 to 5

1. Data communications

2. Distributed functions

3. Performance

4. Heavily used configuration

5. Transaction rate

6. On-line data entry

7. End-user efficiency

8. On-line update

9. Complex processing

10. Reusability

11. Installation ease

12. Operational ease

13. Multiple sites

14. Facilitation of change

69


• Value Adjustment Factor (VAF) then calculated (a.k.a Processing Complexity Adjustment)VAF = 0.65 + (0.01 x C)

where C the sum of all the complexity ratings

• Then Function Points (FP) are calculatedFP = FC x VAF

• The resulting value is the function point rating for the software. This number can also be used to convert to a LOC rating for comparison reasons.

70

Summary

• Product Quality Metrics - focus on quality aspects of product, both intrinsic and from customer view point– Mean Time To Failure– Defect Density– Problems per User Month– Customer Satisfaction

71

Summary (cont.)• Process quality metrics - focus on quality

and effectiveness of the process.– Defect density during machine testing– Defect arrival rate during machine testing– Phased based defect removal– Defect removal effectiveness

• Function Point analysis– An alternative method to LOC counting

1 quiz t/f: tqm is a clearly defined quality management process standard. define the following:...

Documents

quality parametersattributes

software development

unit testback end sw

record defects

system testingconfirmation

higher level scale posses

refutation of theories

poor measure of code