safety case

28
1 SEA’99 Conference Verification & Validation of Safety Critical Software Verification & Validation of Safety Critical Software Dr Peter Lindsay Assistant Director Software Verification Research Centre School of Information Technology The University of Queensland THE UNIVERSITY OF QUEENSLAND SOFTWARE VERIFICATION RESEARCH CENTRE

Upload: vunguyet

Post on 04-Jan-2017

233 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Safety Case

1SEA’99 ConferenceVerification & Validation of Safety Critical Software

Verification & Validation of

Safety Critical Software

Dr Peter Lindsay

Assistant Director

Software Verification Research Centre

School of Information Technology

The University of Queensland

THE UNIVERSITY OF QUEENSLANDSOFTWARE VERIFICATION RESEARCH CENTRE

Page 2: Safety Case

2SEA’99 ConferenceVerification & Validation of Safety Critical Software

Abstract of talk (1)

The increasing trend towards systems

integration, and increased automation of critical

functions which were once performed by

humans, means that more and more reliance is

placed on software.

Procurers of safety-critical systems are

becoming more aware of the need for

appropriate levels of safety assurance, and are

increasingly requiring system developers to

produce a Safety Case to document the reasons

why a system is safe to be operated.

Page 3: Safety Case

3SEA’99 ConferenceVerification & Validation of Safety Critical Software

Abstract of talk (2)

This talk looks at recent and emerging standards

for safety-critical software, and will introduce

listeners to the key principles of safety

assurance, including:

– hazard and risk analysis

– safety integrity levels

– the structure and content of safety cases

– management of the safety process

Page 4: Safety Case

4SEA’99 ConferenceVerification & Validation of Safety Critical Software

Computer Aided Disasters

Therac 25 (1985-87, N. America) radiation therapy

machine delivers severe radiation overdoses (x6)

London Ambulance Service (1992) 20+ die

unnecessarilly when dispatch system fails

USS Vincennes (1988) shoots down Iran Air airliner

after faulty identification

Airbus A320 (1988-) various crashes

Ariane 5 (1996) software exception causes self-destruct

etc

See http://www.comlab.ox.ac.uk/archive/safety.html

http://www.csl.sri.com/risks.html

Page 5: Safety Case

5SEA’99 ConferenceVerification & Validation of Safety Critical Software

What’s Different About Software?

Broadly speaking, traditional safety engineering is

concerned with physical failures:

– e.g. wear-out, corrosion, faulty manufacture

– mitigations include: well-tried designs, safety margins,

redundant components, inspection, maintenance

– this has little relevance for software

On the other hand, software is typically:

– novel, complex, highly input-sensitive,

not designed by domain experts

Software demands a new approach to safety engineering

Page 6: Safety Case

6SEA’99 ConferenceVerification & Validation of Safety Critical Software

Talk outline

Define main terms & concepts in safety engineering as

they relate to software:

– hazards, risk, safety integrity levels, etc

Explain the basic principles of safety management

& the safety lifecycle for software systems

Outline 3 important safety analysis techniques

– Failure Modes Effects Analysis (FMEA)

– Fault Tree Analysis (FTA)

– Hazard and Operability Studies (HAZOP)

Summary

Page 7: Safety Case

7SEA’99 ConferenceVerification & Validation of Safety Critical Software

Reference Material

IEC 61508 “Functional Safety: Safety-related Systems”

(International Electrotechnical Commission, 1998)

Def(Aust) 5679 Australian Defence Standard for

Procurement of Computer-based Safety-critical Systems

UK MOD 00-55, 00-56, 00-58 Standards for software

development and hazard analysis of safety-critical

systems

Nancy Leveson Safeware: System Safety and

Computers

Page 8: Safety Case

8SEA’99 ConferenceVerification & Validation of Safety Critical Software

Safety

A system is unsafe if it can cause unacceptable harm.

Harm: loss of life, injury, damage to the environment, etc

Safety is a whole system issue

– only physical objects can cause harm

– need to consider all system components:

software, hardware, operators, procedures,

infrastructure,…

Safety is a whole lifecycle issue

– from concept through to decommissioning

Safety and reliability are two different things

Page 9: Safety Case

9SEA’99 ConferenceVerification & Validation of Safety Critical Software

Hazards

Hazard: a situation with the potential for harm

Hazards are a state of the system

– scope of system needs careful definition

– other factors (outside system control) may affect

whether hazard leads to an accident

Failure mode: the way in which something fails

Environment

System

FailureHazard

Accident

Page 10: Safety Case

10SEA’99 ConferenceVerification & Validation of Safety Critical Software

Risk

Absolute safety is generally unachievable

– instead, aim for acceptable risk

Risk: a combination of the severity of consequences

& likelihood of occurrence

Severity: the possible extent of harm

Likelihood: the probability/frequency of occurrence

– eg. probability of 10-6 that X fails on request;

mean-time-to-failure is 2 years;

probability of failure of 10-2 in lifetime of equipment

What constitutes acceptable risk is domain specific

Page 11: Safety Case

11SEA’99 ConferenceVerification & Validation of Safety Critical Software

Risk Assessment

1. Model the system:

– identify the major components and interfaces

2. Identify hazards & how they arise

– identify potential failure modes

– trace consequences and control measures

– build a cause-and-effect model of the system

3. Analyse and assess risk

– assess component failure rates

– assess likelihood & severity of hazards

If some risks are not tolerable, it’s back to the drawing board!

Page 12: Safety Case

12SEA’99 ConferenceVerification & Validation of Safety Critical Software

Likelihood of Software Failure?

Theory of failure-rate prediction is almost non-existent

for all but the simplest software

– same goes for complex hardware, operator

procedures, system design, ...

Design faults now overtaking physical failures in impact

on complex systems

Current best practice relies on the rigour of the

development process - the Safety Integrity Level (SIL)

Standards differ on exactly what SILs mean, and on

what processes are required

– but broadly speaking, SIL relates to degree to which

system safety depends on the component

Page 13: Safety Case

13SEA’99 ConferenceVerification & Validation of Safety Critical Software

TABLE 2 SAFETY INTEGRITY LEVELS : TARGET FAILUREMEASURES

SAFETYINTEGRITYLEVEL

DEMAND MODE OFOPERATION(Probability of failure toperform its design functionon demand)

CONTINUOUS/HIGHDEMAND MODE OFOPERATION(Probability of a dangerousfailure per year)

4 >=10-5

to < 10-4

>=10-9

to < 10-8

3 >=10-4

to < 10-3

>=10-8

to < 10-7

2 >=10-3

to < 10-2

>=10-7

to < 10-6

1 >=10-2

to < 10-1

>=10-6

to < 10-5

IEC 61508: Safety Integrity Levels

In IEC 61508, SILs correspond to acceptable failure rates:

Page 14: Safety Case

14SEA’99 ConferenceVerification & Validation of Safety Critical Software

Safety Management

Overall goal: to deliver a safe system, however

“Like justice, safety needs not only to be done,

but to be seen to be done.”

A Safety Case documents the claim that the system is

safe to be operated

Main ingredients of a Safety Case:

– identification of hazards, failure modes, failure

mechanisms, safety features, safety targets & SILs

– reasoned arguments for risk assessment

– supporting evidence, including: hazard analysis,

V&V results

Page 15: Safety Case

15SEA’99 ConferenceVerification & Validation of Safety Critical Software

10 11

NOTE 1 Activities relating to verification, management of functional safety and functional safety assessment are not shown for reasons of clarity but are relevent to all overall, E/E/PES and software safety lifecycle phases.

NOTE 2 The phases represented by boxes 10 and 11 are outside the scope of this standard.

NOTE 3 Parts 2 and 3 deal with box 9 (realisation) but they also deal, where relevant, with the programmable electronic (hardware and software) aspects of boxes 13, 14 and 15.

Concept1

Overall scope

definition2

Hazard and risk analysis3

Overall safety

requirements4

Safety requirements

allocation 5

Back to appropriate

overall safety lifecycle

phase

Overall safety validation13

Overall operation,

maintenance and repair

Overall modification and retrofit14 15

Decommissioning

or disposal16

Safety-related

systems:

E/E/PES

Realisation(see E/E/PES

safety

lifecycle)

9Safety-related

systems:

other

technology

Realisation

Overall installation

and commissioning12

8

Overall planning

OveralI

operation and

maintenance

planning

OveralI

installation and

commissioning

planning

Overall

safety

validation

planning

6 7 8

External risk reduction facilities

Realisation

Safety Management Lifecycle (1)

From IEC 61508:

Page 16: Safety Case

16SEA’99 ConferenceVerification & Validation of Safety Critical Software

10 11

NOTE 1 Activities relating to verification, management of functional safety and functional safety assessment are not shown for reasons of clarity but are relevent to all overall, E/E/PES and software safety lifecycle phases.

NOTE 2 The phases represented by boxes 10 and 11 are outside the scope of this standard.

NOTE 3 Parts 2 and 3 deal with box 9 (realisation) but they also deal, where relevant, with the programmable electronic (hardware and software) aspects of boxes 13, 14 and 15.

Concept1

Overall scope

definition2

Hazard and risk analysis3

Overall safety

requirements4

Safety requirements

allocation 5

Back to appropriate

overall safety lifecycle

phase

Overall safety validation13

Overall operation,

maintenance and repair

Overall modification and retrofit14 15

Decommissioning

or disposal16

Safety-related

systems:

E/E/PES

Realisation(see E/E/PES

safety

lifecycle)

9Safety-related

systems:

other

technology

Realisation

Overall installation

and commissioning12

8

Overall planning

OveralI

operation and

maintenance

planning

OveralI

installation and

commissioning

planning

Overall

safety

validation

planning

6 7 8

External risk reduction facilities

Realisation

Safety Management Lifecycle (2)

Page 17: Safety Case

17SEA’99 ConferenceVerification & Validation of Safety Critical Software

Software Engineering for Safety

All the regular good software-engineering practices

– thorough requirements analysis, reviews & testing

– configuration management

Involve all system stakeholders in safety management

Design for safety

– KISS (Keep It Simple, Stupid)

– no single point of failure

– isolate critical functions

– belts and braces

– diversity throughout design, implementation, review

Pay special attention to internal & external interfaces

Page 18: Safety Case

18SEA’99 ConferenceVerification & Validation of Safety Critical Software

Safety-Directed V&V

Safety Validation: are we building a safe system?

– all hazards & safety requirements identified

– safety targets are appropriate:

i.e., if met, will achieve acceptable risk

Safety Verification: are we achieving targets?

– safety requirements & targets are being flowed down

through design

– appropriate evidence is being gathered that safety

targets are being met (and no new hazards

introduced)

Safety Integrity Level determines the degree of rigour

to be applied

Page 19: Safety Case

19SEA’99 ConferenceVerification & Validation of Safety Critical Software

Important Safety V&V techniques

The broad goals of Safety V&V are to

– identify (& prioritize) all hazards and

– trace their resolution

Different techiques are applicable at different stages of

design, according to what design details are available

Will outline 3 techniques that apply well to software:

– Failure Modes & Effects Analysis (FMEA)

– Fault Tree Analysis

– Hazard & Operability Studies (HAZOP)

Page 20: Safety Case

20SEA’99 ConferenceVerification & Validation of Safety Critical Software

FMEA Example: Speed Sensor

gearbox

controller

sensor

signal processing unit

dashboard

gearbox

toothed wheel

Page 21: Safety Case

21SEA’99 ConferenceVerification & Validation of Safety Critical Software

FMEA Report: Speed Sensor

Comp-onent

FailureMode

LocalEffect

System Effect Hazard

Speedsensor

Breaks Speedcalculated

as zero

1. Speedometershows zero

2. Odometer notupdated

3. Wrong gearselected

1. Driver mislead,...

2. Maintenancedelayed, ...

3. Engine seizesat high speed,

...

Page 22: Safety Case

22SEA’99 ConferenceVerification & Validation of Safety Critical Software

FMEA - Summary

Failure Modes and Effects Analysis

Method: from known or predicted failure modes of

components, determine possible effects on system

Good for hazard identification early in development,

by considering possible failures of system functions:

– loss of function (omission failure)

– function performed incorrectly

– function performed when not required

(commision failure)

Not so good for mulitple failures

Page 23: Safety Case

23SEA’99 ConferenceVerification & Validation of Safety Critical Software

Example Fault Tree: tank-level sensors

Tank overflow

Inlet open

Inlet

valve failed

Outlet

closed

Wrong control

to inlet valve

Controller

failed Sensor

X

fails

Sensor

Y

fails

Outlet

Valve A

Inlet

Valve B

Controller

X

Y

AND

OR

OR

AND

Page 24: Safety Case

24SEA’99 ConferenceVerification & Validation of Safety Critical Software

Fault Tree Analysis - Summary

Method: trace faults stepwise back through system

design to possible causes

– a tree with a top event at the root

– logic gates at branches, linking each event with its

“immediate” causes

– initiating faults at leaves (eventually)

Good for tracing system hazards through to component

failures, and thus for allocating safety requirements

Good for checking completeness of safety requirements

but can be difficult, time-consuming, hard to maintain

Page 25: Safety Case

25SEA’99 ConferenceVerification & Validation of Safety Critical Software

HAZard and OPerability Studies

Developed by ICI in mid’60s for hazard identification for

chemical process plants

Method: given model of the system in terms of “flows”

between components

– consider possible deviations in flows, using guide

words to steer analysis:

no, more, less, as well as, part of, other than, reverse

– consider both causes and effects of deviations

Adapts well as a systematic design-review technique for

computer systems (CHAZOP)

– guidewords extended with: early, late, before, after

Page 26: Safety Case

26SEA’99 ConferenceVerification & Validation of Safety Critical Software

CHAZOP Example - Elevator

Data flow diagram showing internal structure of software

3

Sequenc

e

controlle

r

1

Lift panel

interface

2

Floor panel

interface

1

Lift panel

interface

2

Floor panel

interface

Request

Display

Request

Display

Feedback

Control

Feedback

Control

Lift request

Display

Floor request

Display

Movement

commands

Status

Door commands

Status

Pending

request

Page 27: Safety Case

27SEA’99 ConferenceVerification & Validation of Safety Critical Software

CHAZOP Example - Elevator Output

Intercon-nection

Attribute Guideword

Cause Consequences/implication

Indication /protection

Question /recommen

dationLift request(Hold-Door-Open)

Data flow No Failed buttonFailed wiringFailure of liftpanel interface

Sequencer doesnot receive doorhold request.Risk of injury ifsmall/soft item(e.g. scart) caughtin door

Loss of dataflow cannotbe detected.Sensors ondoors willprotect inmostcircumstance

Q: Couldbutton/wiringfailuremodes beavoided ifbutton waspush-to-break?

Lift request(Hold-Door-Open)

Data flow More Failure of liftpanel interface

Sequencereceives spuriousdoor hold request.Doors stay open –lift stuck

Nonepossible

-

Lift request(Hold-Door-Open)

Data flow OtherThan

Equivalent toNO in thiscontext

- - -

Page 28: Safety Case

28SEA’99 ConferenceVerification & Validation of Safety Critical Software

Talk Summary

Software Safety Engineering is a new discipline

Standards now require Safety Case prior to operation

Safety is a system-wide, whole lifecycle issue

Safety should be designed into a system,

rather than added on later

– start developing safety arguments from earliest

stages of design

– KISS, cost-effectiveness

Main goals of Safety V&V are to identify all hazards and

track their resolution