software quality: testing and verification i. © lethbridge/laganière 2001 chapter 9: architecting...

Software Quality:Testing and Verification I

© Lethbridge/Laganière 2001

Chapter 9: Architecting and designing software

2

1.A failure is an unacceptable behaviour exhibited by a system

— The frequency of failures measures software reliability Low failure rate = high reliability

— Failures result from violation of a requirement

2.A defect is a flaw that contributes to a failure— It might take several defects to cause one

failure

3.An error is a software decision that leads to a defect

Software Flaws are identified at three levels



3

Eliminating Failures: Testing vs Verification

Testing = running the program with a set of inputs to gain confidence that the software has few defectsGoal: reduce the frequency of failures When done: after the programming is completeMethodology: develop test cases; run the program with

each test caseVerification = formally proving that the software has no defects

Goal: eliminate failuresWhen done: before and after the programming is

completeMethodology: write separate specifications for the

code; prove that the code and the specifications are mathematically equivalent



4

Effective and Efficient Testing

Effective testing uncovers as many defects as possible Efficient testing finds defects using the fewest possible tests•Good testing is like detective work:

—The tester must try to understand how programmers and designers think, so as to better find defects.

—The tester must cover all the use case scenarios and options.

—The tester must be suspicious of everything.—The tester must not take a lot of time.

•The tester is not the programmer



5

Testing Methods

1. Black box: Testers run the software with a collection of inputs and observe the outputs— none of the source code or design

documentation is available2. Glass box (aka ‘white-box’ or

‘structural’): Testers watch all the steps taken by the software during a run — Testers have access to the source

code and documentation— Individual programmers often use

glass-box testing to verify their own code



6

Equivalence classes

•It is impossible to test a software product by brute force, using every possible input value.•So a tester divides all the inputs into groups that will be treated similarly by the software. —These groups are called equivalence classes. —A representative from each group is called a test case.

—The assumption is that if the software has no defects for the test case, then it will have no defects for the entire equivalence class.

•This approach is practical, but•This approach is also flawed (it will not find all defects)



7

Examples of equivalence classes

1. Valid input is a month number (1-12). Equivalence classes could be: [-∞..0], [1..12], [13.. ∞]. — E.g., the three test cases could be -1, 5, and 45.

2. Valid input is a course id, with a department name (e.g., CSCI), a 3-digit number (e.g., 260) in the range 001-499, and an optional section (e.g., A, B, C, D, or E). Equivalence classes (test cases) could be:— A valid course id from each one of the 25 departments,

each having a 3-digit number in the range 001-499.— A valid course id with a section— A course id with an invalid department name— A course id with an invalid number — A course id with an invalid section



8

Fighting combinatorial explosion

•Combinatorial explosion means that you cannot realistically use a test case from every combination of equivalence classes across the system.—E.g., With just 10 inputs and 5 possible values each, the system has 105 = 100,000 equivalence classes.

•Sooo…—Make sure that at least one test case represents an equivalence class of every different input.

—Include test cases just inside the boundaries of the input values.

—Include test cases just outside the boundaries.

—Include a few random test cases.



9

Common Programming Errors

1. Incorrect logical conditions on loops and conditionalsThe landing gear must be deployed whenever the plane is within 2 minutes from landing or takeoff, or within 2000 feet from the ground. If visibility is less than 1000 feet, then the landing gear must be deployed

whenever the plane is within 3 minutes from landing or lower than 2500 feet.

if(!landingGearDeployed && (min(now-takeoffTime,estLandTime-now))< (visibility < 1000 ? 180 :120) || relativeAltitude < (visibility < 1000 ? 2500 :2000) ){ throw new LandingGearException();}



10

2. Performing a calculation in the wrong part of a control construct E.g., while(j<maximum)

{ k=someOperation(j); j++;

}if(k==-1) signalAnError();

3. Not terminating a loop or recursive method properlyE.g., while (i < courses.size())

if (id.equals(courses.getElement(i))) … ;

4. Not enforcing the preconditions (correctly) in a use caseE.g., Failure to check that a courseOffering is

not full before adding a student to its class list.



11

5. Not handling null conditions (null references) properlyE.g., a Student with no schedule.

6. Not handling singleton conditions (one or zero of something that is normally more than one). E.g., a schedule with 0 courses in it.

7. Off-by-one errorsE.g., for (i=1; i<arrayname.length; i++) {

/* do something */ }

8. Operator precedence errorsE.g., x*y+z instead of x*(y+z)

9. Use of inappropriate standard algorithmsE.g., a non-unstable sort



12

Defects in Numerical Algorithms

1. Not enough bits or digits (magnitude/overflow)

2. Not enough decimal places (precision)3. Ordering operations poorly, allowing

errors to propagate4. Assuming exact equality between two

floating point valuesE.g., use abs(v1-v2) < epsilon

instead of v1 == v2



13

Defects in Timing and Co-ordination

Critical race — One thread fails because another thread interferes with the ‘normal’ sequence of events.

— Critical races can be prevented by locking data so they cannot be accessed by another thread simultaneously. In Java, synchronized can be used to lock an object until the method terminates.

E.g., consider two students wanting to add the same courseOffering to their schedules at the same time. These two threads must be synchronized in order to prevent a critical race.

Deadlock and livelock— Deadlock is a situation where two or more threads are stopped, each waiting for the other to do something.The system hangs and the threads cannot do anything.

— Livelock is similar, except that the threads can do some computations even though the system is hanging.

E.g., consider a student wanting to access a course that another student is adding to her schedule, and the other student suspends this action and goes to lunch. How can this kind of deadlock be prevented in StressFree?



14

Defects in Handling Other Unusual Situations

1. Insufficient throughput or response time

2. Incompatibility with specific hardware/software configurations

3. Inability to handle peak loads or missing resources

4. Inappropriate management of resources5. Inability to recover from a crash6. Ineffective documentation (user

manual, reference manual or on-line help)



15

Strategies for Testing Large Systems

Big bang vs integration testing • In big bang testing, you test the entire system as a unit

• A better strategy is incremental testing (sometimes called unit testing): — First test each individual subsystem alone

— Then add more and more subsystems and test them one at a time

— Can do this horizontally or vertically, depending on the architecture (e.g., a client-server architecture allows horizontal testing; server side first and client side second)



16

Top-down vs Bottom-up testing

Top-down1. Start by testing the user interface (GUI).— Simjulate he underlying functionality

using stubs (code with the same interface but no functionality).

2. Then work downward, integrating lower and lower layers one at a time.

Bottom-up1. Start by testing the very lowest levels of

the software.— Use drivers to test these modules (Drivers

are simple programs that call the modules at the lower layers).

2. Now work upward, replacing the drivers with the actual modules that call the lower level modules.



17

Strategies for incremental testing



18

The test-fix-test cycle

When testing exposes a failure:1. A failure report is entered into a failure

tracking system. 2. The failure is screened and assigned a

priority. 3. Low-priority failures might be put on a

known bugs list included with the software’s release notes.

4. Some failure reports might be merged if they seem to expose the same defects.

5. The failure is investigated.6. The defect causing the failure is tracked

down and fixed. 7. A new version of the software is created,

and this cycle is repeated.



19

The ripple effect

Efforts to remove one defect will likely add new ones •The maintainer tries to fix problems without fully understanding the ramifications

•The maintainer makes ordinary human errors•The system can regress into a more and more failure-prone state

Regression testing reruns only a subset of the previously-successful test cases at each iteration (i.e., focus on the trouble spots).•It’s expensive to re-run every test case every time the software is updated.

•Regression test cases are carefully selected to cover as much of the system as possible.



20

So when do we stop testing?

Stop testing when: 1. all the level 1 test cases are successfully

executed.2. a certain predefined percentage of level 2 and

level 3 test cases have been executed successfully.

3. the targets have been achieved and are maintained for at least two build cycles, where

— A build involves compiling and integrating all the system’s components.

Failure rates fluctuate between builds because:— Different sets of regression tests are used,

and— New defects are introduced as old ones are

fixed



21

Who is involved in testing?

1. Original developers conduct the first pass of unit and integration testing.

2. A separate group of developers conducts independent testing.— They have no vested interest, and — They have specific expertise in test case

design and test tool utilization.3. Users and clients

— Alpha testing: performed under the supervision of the software development team.

— Beta testing: Performed in a normal work environment.(An open beta release is the release of low-quality software to the general population.)

— Acceptance testing: customers do it on their own initiative.



22

Inspections

An activity in which one or more people critically examine source code or documentation, looking for defects. •Normally team activities, with roles:

— The author — The moderator— The secretary— The paraphrasers try to explain the code

•A peer review process•Inspect only completed documents•Complementary to testing: better at finding maintainability or efficiency defects

•Inspect before before testing.



23

Quality Assurance: When things go wrong…

Perform root cause analysis Determine whether problems are caused by: — Lack of training — Schedules too tight — Poor designs or choices of reusable components

Measure— the number of failures encountered by users— the number of failures found when testing— the number of failures found when inspecting— the percentage of code that is reused— The number of questions asked by users at the help desk (as a measure of usability and the quality of documentation)

Strive for continual improvement



24

Software Process standards

The personal software process (PSP): • A disciplined approach that a developer can use to improve the quality and efficiency of his or her personal work.

• One of the key tenets is personally inspecting your own work.

The team software process (TSP): • Describes how teams of software engineers can work together effectively.

The software capability maturity model (CMM): • Contains five levels, Organizations start in level 1, and as their processes become better they can move up towards level 5.

ISO 9000-2: • An international standard that lists how an organization can improve its overall software process.



25

Difficulties and Risks in Quality Assurance

It’s easy to forget to test some aspects of a software system:

— ‘running the code a few times’ is not enough. — Forgetting certain types of tests impacts quality.

There’s a natural conflict between quality and meeting deadlines. So…

— Create a separate department to oversee QA. — Publish statistics about quality.— Plan adequate time for all activities.

People have different skills, knowledge, and preferences when it comes to quality. So…

— Assign tasks that fit their strengths. — Train people in testing and inspecting techniques.— Provide feedback about performance vis-a-vis quality in software.

— Require developers and maintainers to work alternately on a testing team.

software quality: testing and verification i. © lethbridge/laganière 2001 chapter 9: architecting...

Documents