presentation bio return to main menu f9presentation bio return to main menu paper bio f9 friday, nov...

45
P R E S E N T A T I O N International Conference On Software Testing, Analysis & Review NOV 8-12, 1999 BARCELONA, SPAIN Presentation Paper Bio Return to Main Menu F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, Coupling, Diagnosis and Repetitive Failure

Upload: others

Post on 17-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

P R E S E N T A T I O N

International Conference On

Software Testing, Analysis & ReviewNOV 8-12, 1999 • BARCELONA, SPAIN

Presentation

Bio

Return to Main Menu

PresentationPaperBio

Return to Main Menu F9

Friday, Nov 12, 1999

Les Hatton

Testing, Complexity,

Coupling, Diagnosis and

Repetitive Failure

Page 2: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

Title Slide

OAKWOOD COMPUTING - SURVIVAL AND AVOIDANCE STRATEGIES FOR SOFTWARE FAILURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

"Testing: the influence of complexity, couplingrepetitive failure and diagnosis"

EuroStar’99, Barcelona, 12th Nov, 1999

by

Les Hatton

Oakwood Computing, Surrey, U.K.and

Computing Laboratory, University of Kent, UK

[email protected]

Version 1.0: 30/Sep/1999

©Copyright, L.Hatton, 1999

Page 3: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 2)

Overview of talk

! Some observations on testing

! The nature of fault

! Repetitive failure and diagnosis

! Conclusions

Page 4: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 3)

Observations

! Why do we test ?

! Growing problems

! Risk management

Page 5: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 4)

Why do we test ?

! The classic view is to find fault. A successfultest causes the system to fail.

! A more modern view is that testing does twothings:-

– Finds faults and

– Quantifies the run-time behaviour" Risk management

" Demonstration of standard of care

" To study severity of system failure

Page 6: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 5)

Growing problemsfor testing

! Testing is getting harder in modern systemsbecause of:-

– Increasing complexity

– Increasing coupling

– Failures in education

– ‘Reduced time to market’

! The balance between fault finding and riskassessment is also changing because of:-

– Increasing diagnostic problems

Page 7: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 6)

Increasing complexity

The amount of software in consumer electronicproducts is currently doubling about every18 months.

• Line-scan TVs have ~250,000 lines of C.

• There are around 200,000 lines of C in a car.

• Modern commercial passenger aircraft havebetween 2 and 5 million lines of code in over ahundred computer systems

Page 8: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 7)

Increasing complexity

Flat system

Function 1

Function 2

Function 3

Function 4

For M modules and N paths per module,complexity - M x N

Tests easy to construct but many required

This architecture is common in real-time systems

Page 9: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 8)

Increasing complexity

Deep system

Function 1.1.1.1

Function 1.1.1.2

Function 1.1.1

Function 1.1.2

Function 1.1.3

Function 1.1

Function 1

Function 2

For M modules and N paths per module,and binary fan-out of µ,

complexity -

Tests hard to construct but not so many requiredThis architecture is common in conventional systems

N1 − 2µ( )

ln M

ln 2

1− 2µ[ ]

Page 10: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 9)

Increasing coupling

Coupling is the degree of interdependencebetween otherwise separate systems

• In telecommunications systems, coupling canbe very high

• In consumer appliances such as cars, manycomputer systems communicate with eachother giving high coupling

Page 11: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 10)

Failures in education

Note the following quotations:-“Our students graduate and move into industry without anysubstantial knowledge of how to go about testing a program.Moreover, we rarely have any advice to provide in ourintroductory courses on how a student should go abouttesting and debugging his or her exercises”.

“Every programmer and programming organisation couldimprove immensely by performing a detailed analysis of thedetected errors, or at least a subset of them of duty of care”

“An efficient program debugger should be able to pinpointmost errors without going near a computer”

The most depressing aspect about these is thatthey were made in 1979 by Glen Myers.

Page 12: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 11)

Overview of talk

! Some observations on testing

! The nature of fault

! Repetitive failure and diagnosis

! Conclusions

Page 13: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 12)

Easy faults ...

Dereference pointer contents 0x0 at

strlen(...) called from

line 126 of myc_constexpr.c called from

line 247 of myc_evalexpr.c called from

line 2459 of myc_expr.c

This is called a stack trace. It points unerringly at theresponsible code line and usually takes a matter ofmoments to fix. This is why pointer failuresamount for a relatively small amount of failure inreleased systems.

Page 14: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 13)

... and hard faults

...

if ( tolerance == acceptable_tolerance )

...

This is a comparison of real valued variables. It isbroken in most programming languages. In 1982,the author and a colleague spent 14 weeks tryingto find this in the middle of 70,000 lines of signal-processing software, because it occasionallybehaved slightly differently on one machine thananother. An acceptance test found the symptoms.

Page 15: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 14)

How do faults lead to failure ?

! Ed Adams of IBM (1984) found that

– ~33% of all faults only failed < once every 5000execution years

– The most common failures, ( > once every 5years) were caused by only 2% of the faults.

– Any correction had about a 15% chance ofintroducing a problem at least as big into thesystem.

! Pfleeger and Hatton (1997) found (amongstother things) that:-

– static faults and dynamic failure were highlycorrelated in a high reliability system.

Page 16: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 15)

The relationship between faultand failure

All faults

Those faultswhich fail

Page 17: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 16)

Fault mean free path

! As Voas (1998) points out:-

– A significant number of deliberately injectedfaults caused no change in the externalbehaviour of the program.

Note also that inspections find fault and dynamictesting finds failure.

Page 18: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 17)

Inspection detectable faults inC applications

������

������������

��������

��������

��������

������

����

����

�������

����������

����������

����������

����������

����������

����������

���������

��������

��������

�������

������

������

������

��������

���������

��������

��������

��������

������

��������

���������

��������

������

��������

����������

��������

�������

������

��������

����������

����������

����������

�����������

������������

������������

������������

��������

�������

������

������

����������

����������

��������

������

������������

��������

��������

������

��������

����������

����������

����������

����������

����������

����������

��������

������

������

������

����������������

������

�����������������

���������

������������

������������

������������

����������

�������

������

�����

����

����������

��������

������

������������

������������

����������

������

����

��������

�������

������

����

����

�������

����������

����������

��������

�����

����

����

����

��������

��������

��������

������

�������

����������

����������

��������

�����

�������

���������

������

�������

���������

������

����

����

����

����

����������

��������

����

Wei

ghte

d f

ault

s p

er 1

000

lin

es.

0

5

10

15

20

25

Gra

phic

s

Gen

eral

Ele

c-en

g

Des

ign

Sys

tem

Con

trol

Dat

abas

e

Gra

phic

s

Par

sing

Par

sing

Insu

ranc

e

Uti

liti

es

Uti

liti

es

Uti

liti

es

Con

trol

Com

ms

Com

ms

Page 19: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 18)

Failure rate of staticallydetectable faults

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

����������������������������������������������

������������������������������������������������

������������������������������������������������

Data derived from CAA CDIS

0

0.5

1

1.5

2

2.5

3

3.5

4

Average

dynamic

testing

Thorough

dynamic

testing

Page 20: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 19)

The failure density U curve

Defects perKLOC

Average component complexity

For Ada, assembler, C, C++, Cobol, Fortran, Pascal, and PL/M systems:

Page 21: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 20)

The failure density U curve -invasive truncation

Defects perKLOC

Average component complexity

In those systems where excessive complexity has been restricted, thecurve is truncated. Here a component specification issue is closelyinvolved with deciding the eventual optimal test strategy.

Page 22: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 21)

Some observations on OO(Humphrey’s (1995) data)

������������������

�������������������

�������������������

����������������

����������������

��������������������������������

�����������������

�����������������

������������������

������������������

������������������

������������������

������������������

������������������

������������������

������������������

������������������

������������������

Relative time to fix defects in C++

v. Pascal (Humphrey)

0

10

20

30

40

50

60

Code

review

Unit

testing

After

unit

testing

����������������

Pascal����������������

C++

In OO systems, the cost-detection curve appears to rise much quickersuggesting that inspection-weighted testing will be far moreeffective.

Page 23: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 22)

Some observations on OO(Hatton’s (1998) data)

0

10

20

30

40

50

60

70

80

90

100

< 1

hou

r

< 2

hou

rs

< 5

hou

rs

< 1

0 ho

urs

< 2

0 ho

urs

< 5

0 ho

urs

< 1

00 h

ours

< 2

00 h

ours

C++

C

In these OO systems, around 5% of all failures took an extremely longtime to fix, although they were found easily.

Page 24: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 23)

The nature of fault

! We can conclude:-

– Certain classes of fault yield much easier totesting than others.

– Certain classes of fault have no effect on theprogram’s expected behaviour.

– Certain classes of fault fail but are entirelyavoidable.

– Most faults never fail

– A knowledge of the design is necessary todetermine the best way to test it.

– The balance between fault finding and riskassessment changes with design.

Page 25: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 24)

Overview of talk

! Some observations on testing

! The nature of fault

! Repetitive failure and diagnosis

! Conclusions

Page 26: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 25)

An observablefact

Software systems are unique amongstengineering systems in that their behaviour isdominated by repetitive failure. Why ?

Page 27: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 26)

Risk Management

! All studies of real systems failure show:-

– It is overwhelmingly likely that systems will fail

– Systems are dominated by repetitive failure

– Relating failure to a responsible fault or faults isgetting much harder leading to a substantial“don’t know” category - the diagnosis problem.

All this leads us inevitably to recognise thatfailure is an inevitable property of softwaresystems and we should assess the risk by testquantification and plan for its management.

Page 28: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 27)

Risk management

The Patriot missile problem, 1991

• The Patriot missile missed an incoming Scud inthe 1991 Gulf War, which then killed 29 people.

• The tracking software failed because therewere anomalously two different representationsof the constant “0.1”. This in turn was multipliedby system uptime to give a spatial error.

• The defect was found in systems testing toolate to fix.

• The system carried the message, “System mustbe rebooted every 8 hours”. This keeps spatialerror small enough. System was left up 48hours.

Page 29: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 28)

More evidence for repetitivefailure

����������������������

����������������������

������������������������������������������

��������������������

��������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�����������������������������������������

����������������������

����������������������

����������������������

����������������������

����������������������

����������������������

����������������������

����������������������

����������������������

������������������������������������������

��������������������

��������������������

��������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������

�������������������Per

cen

t of

pos

t-d

eliv

ery

fail

ure

s

0

5

10

15

20

25

30

35

Maj

or

Med

ium

Min

or

Tes

ting

Doc

umen

tati

on

Una

ssig

ned

This data suggests that major / minor failure ratio is somewherebetween 5% and 10%. Note that fully 1/3 were not diagnosable.

Page 30: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 29)

Diagnosis

Factors inhibiting diagnosis

• System complexity and coupling

• Engineer over-optimism leading to poordiagnostics and hence to poor diagnosis

Page 31: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 30)

Diagnosis

DiagnosticDistance

DiagnosticQuality

Difficult

Easy

Moderate

Moderate

Poor Good

Close

Distant

Page 32: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 31)

Moderate, (distant/good)

An example from real life, Airbus A320 AF319,25/8/88, (Mellor (1994)):-• MAN PITCH TRIM ONLY, followed in quick succession by ...

• Fault in right main landing gear

• Fault in electrical flight control system computer 2

• Fault in alternate ground spoilers 1-2-3-5

• Fault in left pitch control green hydraulic circuit

• Loss of attitude protection

• Fault in Air Data System 2

• Autopilot 2 shown as engaged when it was disengaged

• LAVATORY SMOKE

Page 33: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 32)

Airbus A340 G-VAEL,Sept 1994

Programmers effort:-

Please wait ...

Symptom: The Flight management system hung (andlots of other exciting things).

Translation into English:-

The Flight Management System has crashedand will take slightly less than N of yourearth minutes to reboot. Try whistling.

(This is still a problem after a very large amount of effort).

Page 34: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 33)

The great local bar disaster

Programmers effort:-

System stressed ...

Symptom: The author’s local bar was unable todispense beer.

Translation into English:-

The printer has run out of paper(Two hours of author and friend, entirely wasted).

Page 35: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 34)

The computer for the rest of us

Programmers effort:-

More than 64 TCP or UDPstreams open ...

Symptom: The author’s lovely new G3 Mac would notlog onto to his Internet Service Provider

Translation into English:-

The modem is not switched on(Nearly 3 hours wasted).

Page 36: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 35)

Moderate, (close/poor)

“Button push ignored”

• This appears on the Flight ManagementSystem of a McDonnell-Douglas MD-11, (Drury(1997))

It is not clear what the programmer is trying toconvey. “Paris is the capital of France” wouldhave been equally useful.

• The pilot also noted “The airplane[computer system] manuals were writtenas though by creatures from anotherplanet”.

Page 37: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 36)

The reasons for repetitivefailure

The following factors very commonly appear:-

• Software engineers expect their systems towork rather than accepting their inevitablefailure and planning accordingly

• Software systems are characterised byexceptionally poor diagnosis and frequentlyincomprehensible user manuals

• Software systems are getting much larger andmore tightly coupled in general

• We don’t learn from our mistakes

Page 38: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 37)

Repetitive failure in the outsideworld

! Just to show that reluctance to learn isn’t solelythe preserve of software engineers ...

– The DC-10 cargo door saga" In the six months prior to the dreadful crash of the Turkish

Airlines DC-10 in Paris in March 1974, there had been no lessthan 1000 cargo door incidents amongst the then 100 strongfleet of DC-10s. They were disregarded.

" In this incident, the cargo door fell off, and the resulting de-pressurisation caused the cabin floor to collapse severing vitalcontrol cables..

Page 39: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 38)

What we would like to do

DiagnosticDistance

DiagnosticQuality

Difficult

Easy

Moderate

Moderate

Poor Good

Close

Distant

Education

Design

Page 40: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 39)

The influence of poordiagnosis on testing

! In essence poor diagnosis has the followingimplications for testing

– Testing priority switches in favour of riskmanagement. Tests find defects which cannotbe corrected.

– Testing is not only assessing the reliability of theproduct but its sensitivity. Well-designedproducts respond easily to correctivemaintenance, (and don’t require much of it !).(Think of car engine evolution)

– One of the responsibilities of testing is to assessdiagnosability

Page 41: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 40)

Testing is an economictrade-off

••

•• •

•ROCOF

USAGE TIME

Fitted Curve

Measured reliability

Desired reliability

The point at which testing stops is the point at which it is economicallyviable to stop given the trade-off between early availability and therisk of failure. It is manifestly NOT an engineering decision.

Page 42: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 41)

Testing is an economictrade-off

Late requirements change Ambition / capability clash

Replacement of testers

Page 43: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 42)

Overview of talk

! Some observations on testing

! The nature of defect

! Repetitive failure and diagnosis

! Conclusions

Page 44: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 43)

Conclusions

! Testing is getting harder

! Testing is increasingly concerned with riskassessment

– Diagnosis is much harder

– Systems are much bigger and more tightly-coupled

! Testing is itself a diagnostic for thedevelopment process

! Testing should play a much bigger role indesign

Page 45: Presentation Bio Return to Main Menu F9Presentation Bio Return to Main Menu Paper Bio F9 Friday, Nov 12, 1999 Les Hatton Testing, Complexity, ... Airbus A320 AF319, ... • Fault in

Les Hatton

Les Hatton is an independent consultant in software reliability. He isalso Professor of Software Reliability at the Computing Laboratory,University of Kent, U.K. He holds a B.A. (1970) from King's College,Cambridge, an M.Sc. (1971) and Ph.D. (1973) from the University ofManchester, all in mathematics, an A.L.C.M. (1980) in guitar from theLondon College of Music, and an LL.M. in IT law from the University ofStrathclyde (1999). He received a number of international prizes forgeophysics in the 1970's and '80s culminating in the 1987 EuropeanConrad Schlumberger prize for his work in computational geophysics.

Shortly afterwards, he became interested in software reliability, andchanged careers to study the design of high-integrity and safety-criticalsystems on which he has been a keynote speaker at a number of softwareconferences. He is the author of numerous technical papers and booksand is finally nearing completion of another book entitled "SoftwareFailure: avoiding the avoidable and living with the rest. In October1998, he was voted amongst the “world’s leading scholars of systems andsoftware engineering” for the period 1993-1997 by the US Journal ofSystems and Software.