safety management challenges for aviation cyber physical ... safety management v2.pdf · other...
TRANSCRIPT
Safety Management Challenges for Aviation Cyber Physical Systems
Prof. R. John Hansman & Roland Weibel
MIT International Center for Air Transportation
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Increasing Stringency Of Safety Standards
Safety targets and assessment process reviewed for past changes• CAA ILS Requirements• EU Reduced Vertical Separation Minimums (RVSM)• North Atlantic Track (NAT) Separation – (2 cases)• Precision Runway Monitor (PRM)
UK CAA ILS Requirement1x10-7 accidents/approach
NAT Track Spacing1x10-7 midairs/hr
NAT Track Spacing2x10-8 midairs/hr
Precision Runway Monitor4x10-8 accidents/approach
EU RVSM2.5x10-9 accidents/hr
Approaches to Setting the Target Level of Safety
Parity: TLS set equal to the current accident rateExample: Precision Runway Monitor (PRM)
Time of change
Target level of safety
Acc
iden
ts/
expo
sure
Acc
iden
ts/
expo
sure
Target level of safety
Current accident rate
Projected time
Operation s
Target level of safety (acc/hr)
Accidents / yr
Acc
iden
ts/
expo
sure
Target level of safety
Extrapolation/Risk Ratio: TLS set by fixed improvement in risk, or continuance of extrapolated risk reduction
Examples: North Atlantic Longitudinal Spacing, TCAS
Homeostasis: TLS calculated to maintain constant annual accident frequency
Examples: Mineta Commission, SESAR targets
Absolute: TLS set regardless of accident frequencies
Examples: ATO Safety Management System
Continued Target Reductions in Accident Rates
2008 AVS Business Plan
Safety Risk Management
FAA Safety Management System (SMS)
Documented Guidelines for Performing Safety Risk Management
Primarily Directed to ATO Personnel
Stated Applicability to all systems related to ATC, navigation, and acquisition
Purpose of Risk Management: A structured process to examine potential causes of accidents and prioritize requirements to mitigate risk to an acceptable level
NAS Change Areas for Analysis of Safety Impacts
Source: FAA Safety Management System Manual, Version 1.1, May 21, 2004, p. 10.
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
System Complexity “Simplified” NAS Architecture
ATC-Based Applications•Surveillance•Separation procedures•Trajectory-based operations
Distributed Air-Ground Systems (eg ADS-B)
Air Vehicle Component
Cockpit-Based Applications•Self-separation•Equivalent VFR operations•Traffic & runway awareness•Airspace, weather, terrain awareness•Precision Navigation
Ground Component
Coverage Volume
Operating Procedures Operating
Procedures
Air Traffic Control
Aircraft CockpitOther Aircraft
Cockpit
ATC
Global Navigation Satellite System
Avionics Integration Avionics
Integration
ATC Integration
ATC Integration
ADS-B OutPosition & intent broadcast from aircraft to ground or other aircraft
Air to Air
Air to Ground
ADS-B InInformation transmitted from ground to the aircraft
Radar Tracks
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Need for New Approaches to Data Analysis
Forensic vs Prognostic Approaches
As safety improves, signals of accident causes weaken• “Paradox of Almost Totally Safe Transportation Systems” – Rene
Amalberti
Current data approaches are generally based on simple excedance parameters• FOQA – envelope exceedance• Operations certificate – procedural non-compliance
Current data mining methods are not prognostic• Require hypothesis or identified problem• Forensic: after-accident investigation
Accidents and Precursors
Accidents
Incidents
Unsafe Acts
Latent Conditions
Decreasing Signal
Strength Toward Root
Causes
Measures
Midair collision ratesAccident RatesRunway Collisions
Loss of SeparationGround IncursionsNear-accidents
Operational ErrorsOperational DeviationsPilot Deviations
???
Modified from H.W. Heinrich, Industrial Accident Prevention, 1931
Confidence Intervals on Rate of Rare Event
fX x | λ,t( )= λ( )x e−λ
x!• Poisson Distribution: probability f of observing x events over
time t if true rate is λ• Alternate formulation (after applying Bayes rule): given x
observed events over time t, what is distribution g of true rate λ? g λ | x,t( )= λt( )x e−λt
x!
x = no. of eventst = observation timeλ
= true event ratet = 108 hours
Need for New Approaches to Data Analysis
Forensic vs Prognostic Approaches
As safety improves, signals of accident causes weaken• “Paradox of Almost Totally Safe Transportation Systems” – Rene
Amalberti
Current data approaches are generally based on simple excedance parameters• FOQA – envelope exceedance• Operations certificate – procedural non-compliance
Current data mining methods are not prognostic• Require hypothesis or identified problem• Forensic: after-accident investigation
Increasing Set of Potential Data Sources (Multiple Formats)
Flight Data Recorder (CVR) • 300 to 1000 states• 1/5 to 30 hz
Other Electronic Recordings• GPS, FMS, Instrumentation
Cockpit Voice Recorder (CVR)
Air Ground Communications• Voice, Data
Trajectory Data• Radar, Multilateration, ADS-B
Self Reports• Pilots, Controllers, Mechanics• ASAS, NASA ASRS
Accident, Incident Reports
Dispatch and Weather Data
Maintenance Data• Performance Tracking Data• Logbook writeups
Aircrew Data• Medical• Perfornece • Rest
Developmental Test Data
Video
Oversight• Air Carrier Oversight (ATOS)
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Severity/Likelihood Measure of Risk
*Source: FAA Safety Management System Manual, Version 1.1, May 21, 2004, p. 45.
From SMS:
High Risk-Unacceptable
Medium Risk-Minimum Acceptable
Low Risk-Target
Simplified Set of States Required to Achieve Operational Capability
Certified Installation
Certified Installation
Installed Equipment
Certified Avionics Certified Avionics
Operationa l Approval Operationa l Approval
Avionics Spec.
Designed Avionics
Operations/ Applications
Spec
Approved Infrastruct. Approved Infrastruct.
Trained Controller
Designed Procedures (Airborne)
Approved Equipment Approved Equipment
Designed Equipment
Designed Infrastruct.
In-Service Decision
In-Service Decision
Infrastruct.Spec.
Deployed Infrastruct.
Verified Equipment
Verified Equipment
System Operational Capability
System Operational Capability
Airborne Elements
Shared Procedures
Ground/ATC Elements
Published Procedures
Acquired Equipment
Airborne Operational Capability
Airborne Operational Capability
Procedural Operational Capability
Procedural Operational Capability
Ground Infrastructure
Capability
Ground Infrastructure
Capability
General Air/Ground Integrated System
Designed Procedures
(Control)
Operating Spec.
Operating Spec.
Trained Pilot
Operationa l Approval Operationa l Approval
Certification Process
Acquisition Process
System Operational
Approval
Level of Requirements for Future Functionality
Level of Requirements
Functionality
current requirements
future requirements
overspecification for requirement B
A
B
underspecification from requirement B
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
CNS/ATM Software Assurance Based on Risk
A L 6 /E A L 5 /D A L 3 /C A L 2 /B A L 1 /A
A L 6
A L 6
A L 6
A L 2
A L 3
A L 4
A L 3
A L 4
A L 5
A L 4
A L 4
A L 5
A L 5
A L 5
A L 6E xtrem ely Im probable
E xtrem ely R em ote
R em ote
P robable (N ote : 2 )
N o S a fe ty E ffec t
M inor M ajor H azardous C atas troph ic
L IK E L IH O O D O F O C C U R R E N C E
C N S /A T M SW A L A ssignm ent M atrix
SEVE
RIT
Y
N ote :
1 . M in im a lly recom m ended S W assurance leve ls based on system r isk , any dev ia tion m ust be p re -approved by the app ropria te app roval/ce rtifica tion au tho rity .
2 . D O -278 equa tes to D O -178B fo r S W w hose functiona lity has a d irect im pact on a ircra ft opera tions (e .g ., ILS , W A A S ).
•S o ftwa re assurance is o ften used to con tro l risk by m itiga ting anom a lous so ftware behav io r.
•S o ftwa re assurance p rov ides the con fidence and a rtifac ts to ensure the system sa fe ty requ irem ents im p lem en ted in so ftwa re function as designed .
Frequent A L 1 /AA L 2 /BA L 3 /CA L 5 /DA L 6 /E
From FAA System Safety Management Program Manual, December 2004, p.42
DO-178B Software Design Assurance Levels (DALs)
Level Failure condition Objectives With
independence
A Catastrophic 66 25
B Hazardous 65 14
C Major 57 2
D Minor 28 2
E No effect 0 0
DO-178B Assurance Levels and Conditions
De facto standard for certification of safety-critical software systems
Currently in update: DO-178C
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Need to Consider Entire System
Software
Human
Hardware Environment
Mode Awareness
Mode Awareness is becoming a serious issues in Complex Automation Systems
• automation executes an unexpected action (commission), or fails to execute an action (omission) that is anticipated or expected by one or more of the pilots
Multiple accidents and incidents• Strasbourg A320 crash: incorrect vertical mode
selection• Orly A310 violent pitchup: flap overspeed• B757 speed violations: early leveloff conditions
Pilot needs to• Identify current state of automation• Understand implications of current state• Predict future states of automation
Operator Directed Process
AutomationModel
TrainingMaterial
SoftwareSpecification
Software
ConfigurationManagement
Training material is derived from Automation Model. Training Representation is created.
Automation Model is derived from Functional Analysis, operator and expert user input.
Software specification is derived from Training Material.
System is certified againstAutomation Model.
Specification changes must be consistent with Automation Model.
Certification
Configuration Management verifies and maintains consistency with Automation Model.
FunctionalAnalysis
Iterative Human-Centered Prototype Evaluation Stage
Challenges
Target Level of Safety Expectations
System Complexity
Prognostic vs Forensic Data Analysis
Safety Assurance and Operational Approval• New Systems and Procedures• Standards
Software Development and Certification
High Confidence Human-Systems Integration
Spectrum of Judgment in Approval Process Steps
Decision Aspects
Type of Analysis
judgment
structure/rules
mix
Expert Appraisal
Modeling & Quantification
system description
mitigations to reach sufficient safety
standards for evaluation
abstraction to potential hazards &
effects
Hazard Analysis
categorization of potential hazards
system description
mitigations to reach sufficient safety
hazard likelihood & effects
safety standard
mitigation design
hazard likelihood & effects
approval decision approval decision approval decision
abstraction
system description
Standard Compliance
compliance evidence
standards for evaluation
system description
compliance decision
Confidence Intervals on Rate of Rare Event
fX x | λ,t( )= λ( )x e−λ
x!• Poisson Distribution: probability f of observing x events over
time t if true rate is λ• Alternate formulation (after applying Bayes rule): given x
observed events over time t, what is distribution g of true rate λ? g λ | x,t( )= λt( )x e−λt
x!
x = no. of eventst = observation timeλ
= true event ratet = 108 hours
Addressing Multiple Hazards in System
Level of Risk Standard
(less safe)
Dominant Hazard
Dominant Hazard
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s) Unacceptable
risk
Dominant Hazard Case
Multiple Equal Hazards
All hazards of equal severity, therefore likelihood combines to overall level of risk
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
Other Hazard(s)
HazardHazard
Unacceptable risk