2016-04-28 - vu amsterdam - testing safety critical systems
TRANSCRIPT
Testing Safety Critical Systems
Theory and Experiences
[email protected] http://www.slideshare.net/Jaap_van_Ekris/
Agenda
• The Goal
• The requirements
• The challenge
• Go with the process flow
– Development Process
– System design
– Testing Techniques
• Trends
• Reality
4
Specifications…
• Specifications are extremely detailed
• Sometimes up to 20 binders
• After years, you still find contradictions
Goals of testing safety critical systems
• Verify contractually agreed functionality
• Verify correct functional safety-behaviour
• Verify safety-behaviour during degraded and failure conditions
Some people live on the edge…
How would you feel if you were getting ready to launch and knew you were
sitting on top of two million parts
-- all built by the lowest bidder on a government contract.
John Glenn
Until it is too late…
• February 1st 1953
• Spring tide and heavy winds broke dykes
• Killed 1836 humans and 30.000 animals
The battle against flood risk…
• Cost €2.500.000.000
• The largest moving structure on the planet
• Defends – 500 km2 land
– 80.000 people
• Partially controlled by software
Nothing is flawless, by design… No matter how well the design has been:
• Some scenarios will be missed
• Some scenarios are too expensive to prevent: – Accept risk
– Communicate to stakeholders
When is software good enough?
• Dutch Law on storm surge barriers
• Equalizes risk of dying due to unnatural causes across the Netherlands
Oosterschelde Storm Surge Barrier
• Chance of
– Failure to close: 10-7 per usage
– Unexpected closure: 10-4 per year
To put things in perspective…
• Having a drunk pilot: 10-2 per flight
• Hurt yourself when using a chainsaw: 10-3 per use
• Dating a supermodel: 10-5 in a lifetime
• Drowning in a bathtub: 10-7 in a lifetime
• Being hit by falling airplane parts: 10-8 in a lifetime
• Being killed by lighting: 10-9 per lifetime
• Winning the lottery: 10-10 per lifetime
• Your house being hit by a meteor: 10-15 per lifetime
• Winning the lottery twice: 10-20 per lifetime
9/11... • Identified a
fundamental (new) risk to ATC systems
• Changed the ATC system dramatically
• Doubled our safety critical scenario’s
The industry statistics are against us… • Capers-Jones: at least 2 high severity
errors per 10KLoc
• Industry concensus is that software will never be more reliable than
– 10-5 per usage
– 10-9 per operating hour
The value of testing
Program testing can be used to show the presence of bugs, but never to show
their absence!
Edsger W. Dijkstra
Is just testing enough?
• 64 bits input isn’t that uncommon
• 264 is the global rice production in 1000 years, measured in individual grains
• Fully testing all binary inputs on a simple 64-bits stimilus response system once takes 2 centuries
Design Principles
• Risk analysis drives design (decissions)
• Safety first (production later)
• Fail-to-safe
• There shall be no single source of (catastrophic) failure
A simple design of a storm surge barrier
Relais
(€10,00/piece)
Waterdetector
(€17,50)
Design documentation
(Sponsored by Heineken)
Risk analysis
Relais failure Chance: small
Cause: aging
Effect: catastophic
Waterdetector fails Change: Huge
Oorzaken: Rust, driftwood,
seaguls (eating, shitting)
Effect: Catastophic
Measurement errors Chance: Collossal
Causes: Waves, wind
Effect: False Positive
Broken cable Chance: Medium
Cause: digging, seaguls
Effect: Catastophic
Typical risks identified
• Components making the wrong decissions
• Power failure
• Hardware failure of PLC’s/Servers
• Network failure
• Ship hitting water sensors
• Human maintenance error
39
Risk ≠ system crash • Understandability of
the GUI
• Wrongful functional behaviour
• Data accuracy
• Lack of response speed
• Tolerance towards unlogical inputs
• Resistance to hackers
Stuurx::Functionality, initial global design
Init
Start_D “Start” signal to Diesels
Wacht
Waterlevel < 3 meter
Waterlevel> 3 meter
W_O_D
“Diesels ready”
Sluit_? “Close Barrier”
Waterlevel
Design Validation and Verification
• Peer reviews by – System architect – 2nd designer – Programmers – Testmanager system testing
• Fault Tree Analysis / Failure Mode and Effect Analysis
• Performance modeling • Static Verification/ Dynamic Simulation by
(Twente University)
Programming (in C/C++)
• Coding standard:
– Based on “Safer C”, by Les Hutton
– May only use safe subset of the compiler
– Verified by Lint and 5 other tools
• Code is peer reviewed by 2nd developer
• Certified and calibrated compiler
Unit tests
• Focus on conformance to specifications • Required coverage: 100% with respect to:
– Code paths
– Input equivalence classes
• Boundary Value analysis • Probabilistic testing • Execution:
– Fully automated scripts, running 24x7
– Creates 100Mb/hour of logs and measurement data
• Upon bug detection – 3 strikes is out After 3 implementation errors it is build by another developer
– 2 strikes is out Need for a 2nd rebuild implies a redesign by another designer
Integration testing
• Focus on – Functional behaviour of chain of components – Failure scenarios based on risk analysis
• Required coverage – 100% coverage on input classes
• Probabilistic testing • Execution:
– Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data
• Upon detection – Each bug Rootcause-analysis
Redundancy is a nasty beast • You do get functional
behaviour of your entire system
• It is nearly impossible to see if all components are working correctly
• Is EVERYTHING working ok, or is it the safetynet?
58
System testing
• Focus on – Functional behaviour – Failure scenarios based on risk analysis
• Required coverage – 100% complete environment (simultation) – 100% coverage on input classes
• Execution: – Fully automated scripts, running 24x7, speed times 10 – Creates 250Mb/hour of logs and measurement data
• Upon detection – Each bug Rootcause-analysis
Endurance testing
• Look for the “one in a million times” problem
• Challenge:
– Software is deterministic
– execution is not (timing, transmission-errors, system load)
• Have an automated script run it over and over again
Results of Endurance Tests
1,E-05
1,E-04
1,E-03
1,E-02
1,E-01
1,E+00
4.35 4.36 4.37
Ch
ance
of
Failu
re (
Loga
rith
mic
Sca
le)
Platform Version
Reliability Growth of Function M, Project S
Acceptance testing
• Acceptance testing
1. Functional acceptance
2. Failure behaviour, all top 50 (FMECA) risks tested
3. A year of operational verification
• Execution:
– Tests performed on a working stormsurge barrier
– Creates 250Mb/hour of logs and measurement data
• Upon detection
– Each bug Root cause-analysis
A risk limit to testing • Some things are too
dangerous to test
• Some tests introduce more risks than they try to mitigate
• There should always be a safe way out of a test procedure
GUI Acceptance testing • Looking for
– quality in use for interactive systems
– Understandability of the GUI
• Structural investigation of the performance of the man-machine interactions
• Looking for “abuse” by the users
• Looking at real-life handling of emergency operations
Avalanche testing • To test the capabilies of
alarming and control
• Usually starts with one simple trigger
• Generally followed by millions of alarms
• Generally brings your network and systems to the breaking point
Crash and recovery procedure testing • Validation of system
behaviour after massive crash and restart
• Usually identifies many issues about emergency procedures
• Sometimes identifies issues around power supply
• Usually identifies some (combination of) systems incapable of unattended recovery...
Production has its challenges…
• Are equipment and processes optimally arranged?
• Are the humans up to their task?
• Does everything perform as expected?
Requires true commitment to results…
• Romans put the architect under the arches when removing the scaffolding
• Boeing and Airbus put all lead-engineers on the first test-flight
• Dijkstra put his “rekenmeisjes” on the opposite dock when launching ships
It is about keeping your back straight…
• Thomas Andrews, Jr.
• Naval architect in charge of RMS Titanic
• He recognized regulations were insufficient for ship the size of Titanic
• Decisions “forced upon him” by the client: – Limit the range of double hulls
– Limit the number of lifeboats
• He was on the maiden voyage to spot improvements
• He knowingly went down with the ship, saving as many as he could
It requires a specific breed of people
The faiths of developers and testers are linked to safety
critical systems into eternity
Conclusion • Stop reading newspapers
• Safety Critical Testing is a lot of work, making sure nothing happens
• Technically it isn’t that much different, we’re just more rigerous and use a specific breed of people....
Questions?
• Questions/remarks: [email protected]
• View again: http://www.slideshare.net/Jaap_van_Ekris/