2016-04-28 - vu amsterdam - testing safety critical systems

Testing Safety Critical Systems

Theory and Experiences

[email protected] http://www.slideshare.net/Jaap_van_Ekris/

My Job

Your life’s goal will be to stay out of the newspapers

Gerard Duin (KEMA)

My Projects

Agenda

• The Goal

• The requirements

• The challenge

• Go with the process flow

– Development Process

– System design

– Testing Techniques

• Trends

• Reality

4

Specifications…

• Specifications are extremely detailed

• Sometimes up to 20 binders

• After years, you still find contradictions

Goals of testing safety critical systems

• Verify contractually agreed functionality

• Verify correct functional safety-behaviour

• Verify safety-behaviour during degraded and failure conditions

THE REQUIREMENTS

What is so different about safety critical systems?

Some people live on the edge…

How would you feel if you were getting ready to launch and knew you were

sitting on top of two million parts

-- all built by the lowest bidder on a government contract.

John Glenn

Actually, we all do…

We might have become overprotective…

The public is mostly unaware of risk…

Until it is too late…

• February 1st 1953

• Spring tide and heavy winds broke dykes

• Killed 1836 humans and 30.000 animals

The battle against flood risk…

• Cost €2.500.000.000

• The largest moving structure on the planet

• Defends – 500 km2 land

– 80.000 people

• Partially controlled by software

Nothing is flawless, by design… No matter how well the design has been:

• Some scenarios will be missed

• Some scenarios are too expensive to prevent: – Accept risk

– Communicate to stakeholders

When is software good enough?

• Dutch Law on storm surge barriers

• Equalizes risk of dying due to unnatural causes across the Netherlands

Risks have to be balanced…

Availability of the service Safety of the service

VS.

Oosterschelde Storm Surge Barrier

• Chance of

– Failure to close: 10-7 per usage

– Unexpected closure: 10-4 per year

To put things in perspective…

• Having a drunk pilot: 10-2 per flight

• Hurt yourself when using a chainsaw: 10-3 per use

• Dating a supermodel: 10-5 in a lifetime

• Drowning in a bathtub: 10-7 in a lifetime

• Being hit by falling airplane parts: 10-8 in a lifetime

• Being killed by lighting: 10-9 per lifetime

• Winning the lottery: 10-10 per lifetime

• Your house being hit by a meteor: 10-15 per lifetime

• Winning the lottery twice: 10-20 per lifetime

Small chances do happen…

Risk balance does change over time...

9/11... • Identified a

fundamental (new) risk to ATC systems

• Changed the ATC system dramatically

• Doubled our safety critical scenario’s

Are software risks acceptable?

Software plays a significant role...

The industry statistics are against us… • Capers-Jones: at least 2 high severity

errors per 10KLoc

• Industry concensus is that software will never be more reliable than

– 10-5 per usage

– 10-9 per operating hour

THE CHALLENGE

Why is testing safety critical systems so hard?

The value of testing

Program testing can be used to show the presence of bugs, but never to show

their absence!

Edsger W. Dijkstra

Is just testing enough?

• 64 bits input isn’t that uncommon

• 264 is the global rice production in 1000 years, measured in individual grains

• Fully testing all binary inputs on a simple 64-bits stimilus response system once takes 2 centuries

THE SOFTWARE DEVELOPMENT PROCESS

Quality and reliability start at conception, not at testing…

IEC 61508: Safety Integrity Level and acceptable risk

IEC61508: Risk distribution

IEC 61508: A process for safety critical functions

SYSTEM DESIGN

What do safety critical systems look like and what are their most important drivers?

Design Principles

• Risk analysis drives design (decissions)

• Safety first (production later)

• Fail-to-safe

• There shall be no single source of (catastrophic) failure

Simplicity is prerequisite for reliability

Edsger W. Dijkstra

A simple design of a storm surge barrier

Relais

(€10,00/piece)

Waterdetector

(€17,50)

Design documentation

(Sponsored by Heineken)

Risk analysis

Relais failure Chance: small

Cause: aging

Effect: catastophic

Waterdetector fails Change: Huge

Oorzaken: Rust, driftwood,

seaguls (eating, shitting)

Effect: Catastophic

Measurement errors Chance: Collossal

Causes: Waves, wind

Effect: False Positive

Broken cable Chance: Medium

Cause: digging, seaguls

Effect: Catastophic

System Architecture

Risk analysis

Typical risks identified

• Components making the wrong decissions

• Power failure

• Hardware failure of PLC’s/Servers

• Network failure

• Ship hitting water sensors

• Human maintenance error

39

Risk ≠ system crash • Understandability of

the GUI

• Wrongful functional behaviour

• Data accuracy

• Lack of response speed

• Tolerance towards unlogical inputs

• Resistance to hackers

Usability of a MMI is key to safety

Systems do misbehave...

Systems can be late…

Systems aren’t your only problem

StuurX: Component architecture design

Stuurx::Functionality, initial global design

Init

Start_D “Start” signal to Diesels

Wacht

Waterlevel < 3 meter

Waterlevel> 3 meter

W_O_D

“Diesels ready”

Sluit_? “Close Barrier”

Waterlevel

Stuurx::Functionality, final global design

Stuurx::Functionality, Wait_For_Diesels, detailed design

VERIFICATION

What is getting tested, and how?

Design completion...

An example of safety critical components

IEC 61508 SIL4: Required verification activities

Design Validation and Verification

• Peer reviews by – System architect – 2nd designer – Programmers – Testmanager system testing

• Fault Tree Analysis / Failure Mode and Effect Analysis

• Performance modeling • Static Verification/ Dynamic Simulation by

(Twente University)

Programming (in C/C++)

• Coding standard:

– Based on “Safer C”, by Les Hutton

– May only use safe subset of the compiler

– Verified by Lint and 5 other tools

• Code is peer reviewed by 2nd developer

• Certified and calibrated compiler

Unit tests

• Focus on conformance to specifications • Required coverage: 100% with respect to:

– Code paths

– Input equivalence classes

• Boundary Value analysis • Probabilistic testing • Execution:

– Fully automated scripts, running 24x7

– Creates 100Mb/hour of logs and measurement data

• Upon bug detection – 3 strikes is out After 3 implementation errors it is build by another developer

– 2 strikes is out Need for a 2nd rebuild implies a redesign by another designer

Representative testing is difficult

Integration testing

• Focus on – Functional behaviour of chain of components – Failure scenarios based on risk analysis

• Required coverage – 100% coverage on input classes

• Probabilistic testing • Execution:

– Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data

• Upon detection – Each bug Rootcause-analysis

Redundancy is a nasty beast • You do get functional

behaviour of your entire system

• It is nearly impossible to see if all components are working correctly

• Is EVERYTHING working ok, or is it the safetynet?

58

System testing

• Focus on – Functional behaviour – Failure scenarios based on risk analysis

• Required coverage – 100% complete environment (simultation) – 100% coverage on input classes

• Execution: – Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data

• Upon detection – Each bug Rootcause-analysis

Endurance testing

• Look for the “one in a million times” problem

• Challenge:

– Software is deterministic

– execution is not (timing, transmission-errors, system load)

• Have an automated script run it over and over again

Results of Endurance Tests

1,E-05

1,E-04

1,E-03

1,E-02

1,E-01

1,E+00

4.35 4.36 4.37

Ch

ance

of

Failu

re (

Loga

rith

mic

Sca

le)

Platform Version

Reliability Growth of Function M, Project S

Acceptance testing

• Acceptance testing

1. Functional acceptance

2. Failure behaviour, all top 50 (FMECA) risks tested

3. A year of operational verification

• Execution:

– Tests performed on a working stormsurge barrier

– Creates 250Mb/hour of logs and measurement data

• Upon detection

– Each bug Root cause-analysis

A risk limit to testing • Some things are too

dangerous to test

• Some tests introduce more risks than they try to mitigate

• There should always be a safe way out of a test procedure

Testing safety critical functions is dangerous...

GUI Acceptance testing • Looking for

– quality in use for interactive systems

– Understandability of the GUI

• Structural investigation of the performance of the man-machine interactions

• Looking for “abuse” by the users

• Looking at real-life handling of emergency operations

Avalanche testing • To test the capabilies of

alarming and control

• Usually starts with one simple trigger

• Generally followed by millions of alarms

• Generally brings your network and systems to the breaking point

http://wallpaper-s.org/42__Avalanche!,_Denali_National_Park_and_Preserve,_Alaska.htm

Crash and recovery procedure testing • Validation of system

behaviour after massive crash and restart

• Usually identifies many issues about emergency procedures

• Sometimes identifies issues around power supply

• Usually identifies some (combination of) systems incapable of unattended recovery...

Software will never be flawless

Production has its challenges…

• Are equipment and processes optimally arranged?

• Are the humans up to their task?

• Does everything perform as expected?

TRENDS

What is the newest and hottest?

Model Driven Design

A real-life example

A root-cause analysis of this flaw

REALITY

What are the real-life challenges of a testmanager of safety critical systems?

Difference between theory and reality

Working together…

Requires true commitment to results…

• Romans put the architect under the arches when removing the scaffolding

• Boeing and Airbus put all lead-engineers on the first test-flight

• Dijkstra put his “rekenmeisjes” on the opposite dock when launching ships

It is about keeping your back straight…

• Thomas Andrews, Jr.

• Naval architect in charge of RMS Titanic

• He recognized regulations were insufficient for ship the size of Titanic

• Decisions “forced upon him” by the client: – Limit the range of double hulls

– Limit the number of lifeboats

• He was on the maiden voyage to spot improvements

• He knowingly went down with the ship, saving as many as he could

It requires a specific breed of people

The faiths of developers and testers are linked to safety

critical systems into eternity

It sometimes requires drastic measures

Conclusion • Stop reading newspapers

• Safety Critical Testing is a lot of work, making sure nothing happens

• Technically it isn’t that much different, we’re just more rigerous and use a specific breed of people....

Questions?

• Questions/remarks: [email protected]

• View again: http://www.slideshare.net/Jaap_van_Ekris/

mailto:[email protected]



2016-04-28 - vu amsterdam - testing safety critical systems

Technology