software safety engineering (s2e) program status dan fitch march 7, 2001

Post on 15-Jan-2016

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Software Safety Engineering(S2E) Program Status

Dan Fitch

March 7, 2001

Software Safety Program - Overview

General Safety Concepts - WHY

Software Safety and CLCS - HOWKnown HazardsDesigning for SafetySafety & Reliability Thread

Current Status

Software Safety – What is it?

Limit

LimitAnticipate

Limit

Limit

Detect

ControlLimit

Limit

Mitigate

RateSlope

AbsoluteValue

Prevent Limit DamageReturn to Safe State

Software Safety – What is it?

DefinitionsFunctionally-critical

Mission completionSafety-Critical

Humans = Life & LimbHardware = $106

Some set theoryInput versus output

Some Theory…

Set ofInputs ()

Set ofOutputs

Unknowns ()

KnownKnown

SafeUnsafe

AssumedSafe

Sources: Normal Operation Hardware Failures Human Intervention Models/Simulators

Software Safety – Why do it?

Direction:DoD Mil-Std-882D, DoD-Std-2167

NASA NSTS-07700, NSS-8719.13, NASA-GB-1740.13, NSS-

22206, NSS-22254, Direction from Dan Goldin

CLCS 84K-00055, KDP-P-2901

Software Safety – Why do it?

Objective: Identify & Mitigate Risk

Known Fault Scenarios – by requirements, analyses & test

Possible Unknowns – by design approach & further test

“Knowns”

Hardware fault-driven scenarios

Legacy of hardware failure data available from the 1970’s

Hardware-driven hazards May be analyzed – the SSAMay be tested – specific fault injection

Identifies Risk & Yields Design Changes – Issues/ESRs

The Safety Case – Summary of Risk Findings

“Unknowns”

“Stuff” Happens

Software doesn’t fail – It just doesn’t do what we thought it would

Hardware and some functions (e.g., seeds & races) cause most random errors

Specification & Coding errors = Prime Cause90% of errors are in the specificationsC++ and Java are inherently powerful, but

dangerous

Farengi Software Safety Rule #76

If it "touches*" hardware that can impact the safety of people or equipment, an SSA is absolutely necessary.

*(i.e., controls, monitors, or mitigates therisk of using)

SSA - What and When

Assessment of risk factors due to softwareHardware Hazards SFMEA and SFTAKDP-P-2901

Schedule: 30 days before the first interaction with Flight HardwareIn time for 5A/B TestingPresented at TRR/ORR

System Safety Analysis

Detail Design

Code Development

Conceptual Design

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

System Safety Analysis

TRR/ORR

Detail Design

Code Development

Val/VerTest

5A/B(WithHdwr)

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

ReadinessReviews

System Safety Analysis

PHA

TRR/ORR

Detail Design

Code Development

Val/VerTest

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds ReadinessReviews

System Safety Analysis

PHAFTA/

FMEA

TRR/ORR

Detail Design

Code Development

Val/VerTest

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds ReadinessReviews

System Safety Analysis

PHAFTA/

FMEARisk

Assessment

TRR/ORR

Detail Design

Code Development

Val/VerTest

CH

AW

S*

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds

*CHAWS = CLCS Hazard Analysis Worksheet

ReadinessReviews

Issu

es

System Safety Analysis

PHAFTA/

FMEARisk

Assessment

SSA Report

TRR/ORR

Detail Design

Code Development

Val/VerTest

CH

AW

S*

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrix

Risk

CM-Driven Changes

Haz

ards

*CHAWS = CLCS Hazard Analysis Worksheet

ReadinessReviews

Issu

es

Software Fault Tree Analysis

Works backward from the fault to its root causesUses design details of the entire systemLeads to better understanding of causes and their

preventionUnknown fault events not considered

Fault Tree Analysis

Top Event Fill Valve not closed

Other Root

Cause

Human did not notice

pressure

S/W did not react to over pressure

Basic Fault EventsIntermediate Events

S/W did not anticipate rapid

pressure rise

Causal RelationshipAND

Analysis & CLCS Architecture

HardwareSafing

System S/W

Sys Srvcs

Apps Srvcs

Applications

RemainingRisk

Hazardous Event

Control &Mitigation

Detection &Anticipation

The Software FMEA

Predicted hardware failures followed to their conclusion through the softwareWhat can go wrong?What happens when it does?

Must know system failures up frontWon’t prevent the unexpected

CLCS

Spiral Development Cultural Changes

Failure of software Test

SSA – Traditional Approach

Failure Modes& Effects Analysis

Fault Tree

Analysis

Traditional Development

•All or most code available•A lot known about the system•Too late…

SSA - An Iterative Process

Safety Criticality Assessment

EngineeringDesign Changes

Failure Modes& Effects Analysis

Fault Tree Analysis

Spiral Development

S&MA will perform a Software Safety Analysis (SSA) for each Delivery and every location; i.e., as we step up to each new drop.

After the initial SSA, an update of the analysis and a new SSA report will be done for each modification to the safety critical software.

SSA - Where

SSA - Planning

On a Pert chart, the SSA preparation activity will begin during the preparation of the design specifications and have a finish-to-finish relationship with the validation/verification (4A/B) testing.

Design Begin … Val/Ver Test

PHAFTA

FMEARisk Assessment

SSAReport

Farengi Software Safety Rule #304

The SSA isn’t enough.

CLCS

Spiral Development Cultural Changes

Failure of software Test

Paradigms

Software Failures:

“Software does not fail - it just does not perform as intended”

Dr Nancy Leveson, MIT

Paradigms

Design and test for functionality:

Also specify what the system

should not do.

Then test it.

Some Theory… 2nd Look

Set ofInputs ()

Set ofOutputs

Unknowns ()

KnownKnown

SafeUnsafe

AssumedSafe

Sources: Normal Operation Hardware Failures Human Intervention Models/Simulators

Fault Injection(added known)

Design for Safety

“Program and Project Responsibilities”Dan Goldin message:

Safety is more than FMEA and FTASafety must be designed in at the earliest

Existing SpecificationsMust include safety

Methods & techniques for mitigation of hazardsRequirements – Traceable and Testable

Initiatives

Dan Goldin: “Design for Safety”Smart Practices applied early to designs

Early engineering changes are cheaperProvide draft guidance for design of safety-critical

softwareProcess changes

Design Guidelines – NASA-GB-7410.13Peer reviews – enhanced checklistTest development – Fault Injection for Robustness

Works to prevent unforeseen fault scenarios

Objectives

Known fault scenarios – AnalysisRedesignTest – functionality and robustness

UnknownsDesign them out of the systemTest – fault injection

S/W Safety – Where we are.

Safety-Critical software identified & in engineering review

Software Safety Integration Team formedSoftware FTA/FMEA in work

Will be recurring due to spiral development

Design for Safety concepts being integratedSafety & Reliability Thread introducedPost-SSA Analysis Tools being procured

S/W Safety – What’s Next?

Today“Design for Safety” and “Known Fault

Analyses”Tomorrow

Recursive and bi-directional analysesReliability predictions, Markov, Numerical

Integration, Weibull analysis techniquesProbabilistic fault injection techniques

Summary

Life on the Leading Edge

Probably the “Largest real-time safety-critical control system on the planet”

Safety is our #1 core value

We are on front and center stage – The NASA team is watching

top related