quality assurance - northeastern university · ... [ieee 610.12 ieee standard glossary of software...

52

Upload: nguyenkhuong

Post on 25-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

QUALITY ASSURANCE

Michael Weintraub

Fall, 2015

• Understand what quality assurance means

• Understand QA models and processes

Unit Objective

• Software Assurance: The planned and systematic set of activities that ensures that software life cycle processes and

products conform to requirements, standards, and procedures.

• Software Quality: The discipline of software quality is a planned and systematic set of activities to ensure quality is

built into the software. It consists of software quality assurance, software quality control, and software quality engineering. As

an attribute, software quality is (1) the degree to which a system, component, or process meets specified requirements. (2)

The degree to which a system, component, or process meets customer or user needs or expectations [IEEE 610.12 IEEE

Standard Glossary of Software Engineering Terminology].

• Software Quality Assurance: The function of software quality that assures that the standards, processes, and

procedures are appropriate for the project and are correctly implemented.

• Software Quality Control: The function of software quality that checks that the project follows its standards,

processes, and procedures, and that the project produces the required internal and external (deliverable) products.

• Software Quality Engineering: The function of software quality that assures that quality is built into the software by

performing analyses, trade studies, and investigations on the requirements, design, code and verification processes and

results to assure that reliability, maintainability, and other quality factors are met.

• Software Reliability: The discipline of software assurance that 1) defines the requirements for software controlled

system fault/failure detection, isolation, and recovery; 2) reviews the software development processes and products for

software error prevention and/or controlled change to reduced functionality states; and 3) defines the process for measuring

and analyzing defects and defines/derives the reliability and maintainability factors.

• Verification: Confirmation by examination and provision of objective evidence that specified requirements have been

fulfilled [ISO/IEC 12207, Software life cycle processes]. In other words, verification ensures that “you built it right”.

• Validation: Confirmation by examination and provision of objective evidence that the particular requirements for a specific

intended use are fulfilled [ISO/IEC 12207, Software life cycle processes.] In other words, validation ensures that “you built

the right thing”.

Definitions According to NASA

From: http://www.hq.nasa.gov/office/codeq/software/umbrella_defs.htm

Technology Objective: Designing a quality system and writing quality software

√ The tech team aims to deliver a correctly behaving system to the client

Software Quality Assurance is about assessing if the system meets expectations

Доверяй, но проверяй

(Russian Proverb - Doveryay, no proveryay)

Trust, but verify

Software Quality Assurance

Validation

Are we building the right

product or service?

Verification

Are we building the

product or service right?

Validation Versus Verification

Both involve testing – done at every stage

but “testing can only show the presence of errors,

not their absence” Dijkstra

Product Trials User Experience

Evaluation

Validation

Typically a client-leaning activity

After all, they are the ones who asked for the

system

Optimist: It’s about showing

correctness/goodness

Pessimist: It’s about identifying defects

Verification

Bad Input

Good Input

System

Bad Output

Good Output?

?

Quality versus Reliability

Quality Assurance

Assessing whether a

software component or

system produces the

expected/correct/accepted

behavior or output

relationship between a

given set of inputs

OR

Assessing features of the

software

Reliability

Probability of failure-free

software operation for a

specified duration in a

particular environment

Cool phrases

Five 9’s

No down-time

The First "Computer Bug". Moth found trapped between points at Relay # 70, Panel F, of the Mark II Aiken Relay Calculator while it was being tested at Harvard University, 9 September 1947.

The operators affixed the moth to the computer log, with the entry: "First actual case of bug being found". They put out the word that they had "debugged" the machine, thus introducing the term "debugging a comp...uter program".

In 1988, the log, with the moth still taped by the entry, was in the Naval Surface Warfare Center Computer Museum at Dahlgren, Virginia. The log is now housed at the Smithsonian Institution’s National Museum of American History, who have corrected the date from 1945 to 1947. Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. NHHC Photograph Collection, NH 96566-KN (Color).

Fun Story – First Computer Bug (1947)

From https://www.facebook.com/navalhistory/photos/a.77106563343.78834.76845133343/10153057920928344/

Other factors include

Quality of the Process

Quality of the Team

Quality of the Environment

Testing is Computationally Hard

The space is huge and it is generally infeasible to test anything completely

Assessing quality is an exercise in establishing confidence in a system

Or Minimizing Risks

Hardware

Host OS

OS1

App1

VM

Each layer

introduces

risk

• Component behavior

• Interactions between components

• System and sub-system behavior

• Interactions between sub-systems

• Negative path

• Behavior under load

• Behavior over time

• Usability

Lots to Consider

Static Evaluations

Making judgments

without executing the

code

Dynamic Evaluations

Involves executing the

code and judging

performance

Two Approaches

Often a formal process

Value: finding issues at design/definition time rather than waiting for results of the step to

complete

Highly effective, but does not replace the need for dynamic

techniques

Static Technique - Reviews

Fundamental QA Technique

Peer(s) reviews artifact for correctness and clarity

Requirements

Architecture & Design

Implementation

Test Plans

• Single reviewer model– Usually a “certified” / senior person

• Panel model– Highly structured reviews

– Can take significant preparation

• Usually done at the design or development stage

• May introduce delay between when code is written and when it gets reviewed

One Extreme: Jury/Peer Reviews

Before anything is accepted, someone other than

the creator must review it and approve it

Review Meeting

Value

Second opinion on clarity, effectiveness, and efficiency

Learning from others

Avoids “board blindness” on seeing flaws

Peer pressure to be neat and tie up loose ends

Reviews

Models exist for both reviewer or author to lead the discussion

Author usually provides participants materials to study in advance

Requires positive and open attitudes and preparation

Author

Moderator Scribe

Review Panel

Peers

Experts

Client(s)

Lightweight Peer Reviews

One person drives while the other watches/reviews

Derived from Extreme Programming, current favorite in

agile

When compared to solo dev models,

MAY cause higher initial cost per module created (time and

resource), BUT higher quality and lower overall cost

Paired Programming

Continuous review

Shared problem solving

Better communications

Learning from Peer

Social!

Peer PressureSee as an example

http://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF

Clarity

Can the reader easily and directly understand what the artifact is doing

Correctness

Analysis of algorithm used

Common Code Faults

1. Data initialization, value ranges and type mismatches

2. Control: are all the branches really necessary (are the conditions properly and efficiently organizated)? Do loops terminate?

3. Input: are all parameters or collected values used?

4. Output: every output is assigned a value?

5. Interface faults: Parameter numbers, types, and order; structures and shared memory

6. Storage management: memory allocation, garbage collection, inefficient memory access

7. Exception handling: what can go wrong, what error conditions are defined and how are they handled

What do reviews look for?

List adapted from W. Arms: http://www.cs.cornell.edu/Courses/cs5150/2015fa/slides/H2-testing.pdf

You are asked to sort an array. There are many algorithms to sort an array. [You aren’t going to use a library function so you have to write this]

Many choices exist. Suppose you are deciding between bubble sort, quicksort, and merge sort. All will work (sort an array), but which will be the better code ?

Examples

Bubble sort is very easy to write: two loops. Slow on average O(n2) –how big will n be?? O(n) for memory.

Quicksort is complicated to write. O(n log(n)) on average, O(n2) worst case. Requires constant memory O(n). Very effective on in-memory data. Most implementations are very fast.

Mergesort is moderate to write. O(n log(n)) worst case. Memory required is a function of the data structure. Very effective on data that requires external access.

boolean SquareRoot (double dValue,

double &dSquareRoot)

{

boolean bRetValue = false;

if (dValue < 0) {

dSquareRoot = NULL;

bRetValue = false;

}

else {

dSquareRoot = pow(dValue, 0.5);

bRetValue = true;

}

return bRetValue;

}

boolean SquareRoot (double dValue,

double &dSquareRoot)

{

dSquareRoot = NULL;

if (dValue < 0)

return false;

dSquareRoot = pow(dValue, 0.5);

return true;

}

Expressively Logical…

Evaluate code modules automatically looking for errors or odd things

Loops or programs within multiple exits (more common) or entries (less common)

Undeclared, uninitialized, or unused variables

Unused functions/procedures, parameter mismatches

Unassigned pointers

Memory leaks

Show paths through code/system

Show how outputs depend on inputs

Static Program Analyzers

1. Write SIMPLE code

2. If code is difficult to read,

RE-WRITE IT

3. Test implicit assumptions

– Check all parameters

passed in from other

modules

4. Eliminate all compiler

warnings from code

5. It never hurts to check

system states after

modification

Rules of Defensive Programming

(taken from Bill Arms)

Based on Murphy’s Law:

Anything that can go wrong, will

Quick Terminology

• Mistake– A human action that results in

an incorrect result

• Fault / Defect– Incorrect step, process, or data

within the software

• Failure– Inability of the software to

perform within performance criteria

• Error– The difference between the

observed and the expected value or behavior

Objective

Write test cases and organize

them into suites that cause failure

and illuminate faults

Ideally you will fail in striving for

this objective,

but you will be surprised how

successful you may be

Dynamic Evaluations

Developers

Experienced Outsiders

and Clients

Inexperienced Users

Mother Nature

Good for exposing known risks areas

Good for finding gaps missed by developers

Good for finding other errors

Always finds the hidden flaw

Who is a Tester?

1. Top Down

– System flows are tested

– Units are stubbed

2. Bottom Up

– Each unit is tested on its own

3. Stress – Test at or past design limits

Approaches

Especially useful in• UI’s, UX• Workflows• Very large systems

Testing Flow

(Dynamic Evaluation)

UnitTest

Integration Functional

PerformanceInstallation

Soak –Operational Readiness Acceptance

System Test

Unit

Operational

Test

Client

Test

Black Box White Box

Two Forms of Testing

• No access to the internal workings of the system under test (SUT)

• Testing against specifications– The tester knows what the

SUT’s I/O or behavior should be

• The tester observes the results or behavior

• With software, this tests the interface

→What is input to the system?

→What you can do from the outside to change the system?

→What is output from the system?

Black Box Testing

Can a Component Developer Do

Black Box Testing?

White Box Testing

• Have access to the internal workings of the system under test (SUT)

• Testing against specifications, with access to algorithms, data structures, and messaging.

• The tester observes the results or behavior

• Testing evaluates logical paths through code– Conditionals

– Loops

– Branches

• Impossible to exercise all paths completely, so you make compromises

– Focus paths on only important paths• Keeping components

small is a big help here

– Focus on only importantdata structures

Tests focus an individual component

1. Interfaces

2. Messages

3. Shared memory

4. Internal functions

Emphasizes adherence to the specifications

Code bases often include the code and the unit tests

as a coherent pieceUsually done by developers building the

component

Unit tests decouple the

developer from the code

Individual code ownership is not

required if unit tests protect the

code

Unit tests enable

refactoring

After each small change, the unit

tests can verify that a change in

structure did not introduce a

change in functionality

Ground Floor – Unit Testing

What Makes for a Good Test

Test Perspective

• Either addresses a partition of inputs or tests for common developer errors

• Automated

• Runs Fast – To encourage frequent use

• Small in scope – Test one thing at a time

• When a failure occurs, it should pinpoint the issue and not require much debugging

– Failure messages help make the issue clear

– Should not have to refer to the test to understand the issue

Tester Perspective

Know why the test exists

– Should target finding specific

problems

– Should optimize the cost of

defining and running the test

against the likelihood of

finding a fault/failure

Organizing Testing

Test PlanDescribes test activities

1. Scope

2. Approach

3. Resources

4. Schedule

Identifies

• What is to be tested

• The tasks required to do the testing

• Who will do each task

• The test environment

• The test design techniques

• Entry and exit criteria to be used

• Risk identification and contingency planning

Test Suite

A set of test cases and scripts to

measure answers

Often the post condition of one test

is often used as the precondition for

the next one

OR

Tests may be executed in any order

Adapted from http://sqa.stackexchange.com/questions/9119/test-suite-vs-test-plan

An assessment of a defect’s impact

Can be a major source of contention between dev and test

Defect Severity

Critical Show stopper. The functionality cannot be delivered unless that defect is cleared. It does not have a workaround.

Major Major flaw in functionality but it still can be released. There is a workaround; but it is not obvious and is difficult.

Minor Affects minor functionality or non-critical data. There is an easy workaround.

Trivial Does not affect functionality or data. It does not even need a workaround. It does not impact productivity or efficiency. It is merely an inconvenience.

1. Document PurposeShort description about the objective

2. Application Overview Overview of the SUT

3. Testing ScopeDescribes the functions/modules in and out of scope for testing. Also identifies what was omitted.

4. MetricsResults of testing, including summaries

– Number of test cases planned vs executed

– Number of test cases passed/failed

– Number of defects identified and their Status & Severity

– Distribution of defects

5. Types of testing performed– Description of tests run

6. Test Environment and Tools– Description of the environment.

Helpful for recreating issues and understanding context

7. Recommendations– Workaround options

8. Exit Criteria– Statement whether SUT passes or

not

9. Conclusion/Sign Off– Go/ no go recommendation

Test Exit Report – Input to Go/No Go Decision

• If a single value, try– Negative values

– Alternate types

– Very small or very large inputs (overflow buffers if you can)

– Null values

• If input is a sequence, try– Using a single valued sequence

– Repeated values

– Varying the length of sequences and the order of the data

– Forcing situations where the first, last and middle values are used

• Try to force each and every error message

• Try to force computational overflows or underflows

Testing Hint #1 – Mess With Inputs

Each logical path must be exercised at least onceEach execution path through the code

• If…then…else = Two paths

• Switch…case() = One path per case • +one path if no catch-all case

• Repeat…Until ≥ Two paths

• While…Do ≥ Two paths

• Object member functions = 1 path per signature

Testing Hint #2 – Force Every Path

• Remember, interfaces may be involve1. References to data or functions

• Data may be passed by-reference or by-data

• Methods only have data interfaces

2. Shared memory

3. Messages

• Set interface parameters to extremely low and high values

• Set pointer values to NULL

• Mis-type the parameters or violate value boundaries – e.g. set input as negative where the signature expects ≥ 0

• Call the component so it will fail and check the failure reactions

• Pass too few or too many parameters

• Bombard the interface with messages

• With shared memory, vary accessor instantiation and access activities

Testing Hint #3 – Mess With Interfaces

Internals1. Functions2. Data

Try to break the system by using data with

extreme values to crash the system

Testing Hint #4 – Be Diabolical

• If unit testing is not thorough, all subsequent testing will likely be a waste of time.

• You should always take the time to do a good job with unit testing– Even when the project is

falling behind

• The end of a project is almost always compressed– Developers often defer testing-

related tasks until as late as possible

• Unit tests will be most needed

when you have the least

amount of time

– Unit tests should be created

before they are needed, not

when you need them

Life Lessons

Like Unit Test, activities focus

on following uses and data

1. Typical

2. Boundaries

3. Outliers

4. Failures

Unlike Unit Test

• Components may come from many, independent parties

• Bespoke development may meet Off-The-Shelf or reused components

• Testing becomes a group activity

• Testing may move to an independent team altogether

System Test

Integrating components and sub-systems to create the system

Testing checks on component compatibility, interactions, correctly passing information, and timing

Will be a complete and utter waste if

components are not thoroughly tested

WARNING

Some behavior is only clear when you put

components together

This has to be tested too,

although it can be very hard to plan in advance!

Unlike Components, Systems Have

Emergent Behavior

Integrating Multiple Parties May Introduce

Conflict

System Integration

• Components may come from

multiple, possibly independent,

parties

• Bespoke development may

meet Off-The-Shelf or reused

components

• Testing becomes a group

activity

• Testing may move to an

independent team altogether

Implications

• Who controls integration readiness?– What does lab entry mean?

– Are COTS components trusted?

• How to assign credit for test results and then who is responsible for repairs?– How to maintain momentum

when everyone isn’t at the table?

– When partner priorities are not shared?

– What about open source?

Use Cases are a useful testing model

• Forces components to interact

• Sequence diagrams form a strong basis for designing these tests– Articulates the inputs required and

the expected behaviors and outputs

Testing Focus

Emphasizes component compatibility, interactions, correctly passing information, and timing

Integration aims to find misunderstandings one component introduces when it interacts with other components

Two senses

1. Create tests incrementally

2. Run tests iterativelya. On check-in and branch merge, test all affected modules

b. On check-in, test all modules

c. Per a schedule, test all modules– E.g. daily

Each change, especially after a bug fix, should mean adding at least one new test case

It is always best to test after each change as completely as you can, and completely before a release

Iterative Development Leads to Iterative Testing

Regression Testing

many defects found few defects found

few defects found few defects found

Picking the Subset

Selection based on company policy

Every statement must executed one

Every path must be exercised

Crafted by specific end user use cases (scenario testing)

Selection based on testing team experience

Your testing is good enough until a problem

shows that it is not good enough

It is hard to know when you should feel enough confidence to release the system

Confidence comes, in part, on the sub-test of possible tests selected

Software

Quality

Test Quality

highlow

low

high

bugDensityrelease(i) =bugspre−release (i) + bugspost−release (i)

𝑐𝑜𝑑𝑒𝑀𝑒𝑎𝑠𝑢𝑟𝑒

If density for the next release’s additional code is within ranges of prior releases, it is a candidate for

release

Unless test or development practices have improved

Measuring Quality: Defect Density

Using the past to estimate the future

Judges code stability by comparing past number of bugs per code measure (lines of code, number of modules,…) to

present measured levels

7

9.5

0

1

2

3

4

5

6

7

8

9

10

Release 1 2

Poor Test Coverage/Quality

Poor Software Quality

Expected Quality

Def

ect

Den

sity

Using a known quantity as inference to the unknown

Judges code stability by intentionally inserting bugs into a program and then measuring how many get found as an estimator for the

actual number of bugs

bugsrelease(i) =seededBugsp𝑙𝑎𝑛𝑡𝑒𝑑(i)

seededBugsfound(i)∗ bugsfound(i)

Challenges

1. Seeding is not easy. Placing right kinds of bugs in enough of the code is hard.

– Bad seeding, being too easy or too hard to find, creates false senses of confidence in your reviews and testing

• Too easy: doesn’t mean that most or all of the real bugs were found.

• Too hard: danger of looking past the Goodenov line or for things that aren’t there

2. Seeded code must be cleansed of any missed seeds before release. Post clean-up, the code must be tested to insure nothing got accidently broken.

Measuring Quality: Defect Seeding

Applies estimating technique used in predicting wild-life populations (Humphrey, Introduction to Team Software Process, Addison

Wesley, 2000)

Uses data collected by two or more independent collectors

Collected via reviews or tests

Example: Estimating Turtle Population

You tag 5 turtles and release them.

You later catch 10 turtles, two have tags.

𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑡𝑢𝑟𝑡𝑙𝑒𝑠

5 𝑡𝑢𝑟𝑡𝑙𝑒𝑠≈

10 𝑡𝑢𝑟𝑡𝑙𝑒𝑠

2 𝑡𝑢𝑟𝑡𝑙𝑒𝑠𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑡𝑢𝑟𝑡𝑙𝑒𝑠 =

10 𝑡𝑢𝑟𝑡𝑙𝑒𝑠 ∗ 5 𝑡𝑢𝑟𝑡𝑙𝑒𝑠

2 𝑡𝑢𝑟𝑡𝑙𝑒𝑠= 25 𝑡𝑢𝑟𝑡𝑙𝑒𝑠

Measuring Quality: Capture-Recapture

Each collector finds some defects out of the total number of defects

Some of these defects found will overlap

Method

1. Count the number of defects found by each collector (A, B)

2. Count the number of intersecting defects found by each collector (C)

3. Calculate defects found = (A+B) - C

4. Estimate total defects = (𝐴∗𝐵)

𝐶

5. Estimate remaining defects = (𝐴∗𝐵)

𝐶- (A+B)-C

If multiple collectors, assign A to the highest collected number and set B to the rest of the collected defects. When multiple engineers find the same defect, count it just once.

Capture-Recapture

Performance

Aims to assess compliance with

non-functional requirements

StressIdentify defects that emerge only

under load

Performance Testing

Measures the system’s capacity to process load

Involves creating and executing an operational profile that reflects the expected values of uses

Endurance

Measures reliability and availability

Ideally the system should degrade gracefully rather than collapse under load

Under load, other issues like protocol overhead or timing issues take center stage