![Page 1: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/1.jpg)
The Role of Complexity in The Role of Complexity in System Safety and System Safety and How to Manage ItHow to Manage It
Nancy Leveson
![Page 2: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/2.jpg)
– You’ve carefully thought out all the angles
– You’ve done it a thousand times
– It comes naturally to you
– You know what you’re doing, it’s what you’ve been trained to do your whole life.
– Nothing could possibly go wrong, right?
![Page 3: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/3.jpg)
![Page 4: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/4.jpg)
What is the Problem?
• Traditional safety engineering approaches developed for relatively simple electro-mechanical systems
• New technology (especially software) is allowing almost unlimited complexity in the systems we are building
• Complexity is creating new causes of accidents
• Should build simplest systems possible, but usually unwilling to make the compromises necessary1. Complexity related to the problem itself
2. Complexity introduced in the design of solution of problem
• Need new, more powerful safety engineering approaches to dealing with complexity and new causes of accidents
![Page 5: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/5.jpg)
What is Complexity?
• Complexity is subjective
– Not in system, but in minds of observers or users
– What is complex to one person or at one point in time may not be to another
• Relative• Changes with time
• Many aspects of complexity: Will focus on aspects most relevant to safety
![Page 6: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/6.jpg)
Relation of Complexity to Safety
• In complex systems, behavior cannot be thoroughly
– Planned
– Understood
– Anticipated
– Guarded against
• Critical factor is intellectual manageability
• Leads to “unknowns” in system behavior
• Need tools to
– Stretch our intellectual limits
– Deal with new causes of accidents
![Page 7: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/7.jpg)
Types of Complexity Relevant to Safety
• Interactive Complexity: arises in interactions among system components
• Non-linear complexity: cause and effect not related in an obvious way
• Dynamic complexity: related to changes over time
• Decompositional complexity: related to how decompose or modularize our systems
• Others ??
![Page 8: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/8.jpg)
Interactive Complexity
• Level of interactions has reached point where can no longer be thoroughly anticipated or tested
• Coupling causes interdependence
– Increases number of interfaces and potential interactions
– Software allows us to build highly coupled and interactively complex systems
• How affects safety engineering?
– Component failure vs. component interaction accidents
– Reliability vs. safety
![Page 9: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/9.jpg)
Accident with No Component Failures
![Page 10: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/10.jpg)
Software-Related Accidents
• Are usually caused by flawed requirements
– Incomplete or wrong assumptions about operation of controlled system or required operation of computer
– Unhandled controlled-system states and environmental conditions
• Merely trying to get the software “correct” or to make it reliable will not make it safer under these conditions.
![Page 11: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/11.jpg)
Types of Accidents
• Component Failure Accidents
– Single or multiple component failures
– Usually assume random failure
• Component Interaction Accidents
– Arise in interactions among components
– Related to interactive complexity and tight coupling
– Exacerbated by introduction of computers and software
![Page 12: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/12.jpg)
Safety = Reliability
• Safety and reliability are NOT the same
– Sometimes increasing one can even decrease the other.
– Making all the components highly reliable will not prevent component interaction accidents.
• For relatively simple, electro-mechanical systems with primarily component failure accidents, reliability engineering can increase safety.
• But this is untrue for complex, software-intensive socio-technical systems
• Our current safety engineering techniques assume accidents are caused by component failures
![Page 13: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/13.jpg)
(From Rasmussen)
![Page 14: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/14.jpg)
Accident Causality ModelsAccident Causality Models
• Underlie all our efforts to engineer for safety
• Explain why accidents occur
• Determine the way we prevent and investigate accidents
• May not be aware you are using one, but you are
• Imposes patterns on accidents
“All models are wrong, some models are useful”
George Box
![Page 15: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/15.jpg)
Chain-of-Events Model
• Explains accidents in terms of multiple events, sequenced as a forward chain over time.
– Simple, direct relationship between events in chain
• Events almost always involve component failure, human error, or energy-related event
• Forms the basis for most safety-engineering and reliability engineering analysis:
e,g, FTA, PRA, FMECA, Event Trees, etc.
and design:
e.g., redundancy, overdesign, safety margins, ….
![Page 16: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/16.jpg)
Reason’s Swiss Cheese Model
![Page 17: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/17.jpg)
![Page 18: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/18.jpg)
Swiss Cheese Model LimitationsSwiss Cheese Model Limitations
• Focus on “barriers” (from the process industry approach to safety) and omit other ways to design for safety
• Ignores common cause failures of barriers (systemic accident factors)
• Does not include migration to states of high risk: “Mickey Mouse Model”
• Assumes randomness in “lining up holes”
• Assumes some (linear) causality or precedence in the cheese slices
• Human error better modeled as a feedback loop than a “failure” in a chain of events
![Page 19: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/19.jpg)
Non-Linear Complexity
• Definition: Cause and effect not related in an obvious way
• Systemic factors in accidents, e.g., safety culture
– Our accident models assume linearity (chain of events, Swiss cheese)
– Systemic factors affect events in non-linear ways
• John Stuart Mill (1806-1873): “Cause” is a set of necessary and sufficient conditions
– What about factors (conditions) that are not necessary or sufficient?
e.g., Smoking “causes” lung cancer
– Contrapositive: A → B then ~ B→ ~ A
![Page 20: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/20.jpg)
Implications of Non-Linear Complexity for Operator Error
• Role of operators in our systems is changing
– Supervising rather than directly controlling
– Not simply following procedures
– Non-linear complexity makes it harder for operators to make real-time decisions
• Operator errors are not random failures
– All behavior affected by context (system) in which occurs
– Human error a symptom, not a cause
– Human error better modeled as feedback loops
![Page 21: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/21.jpg)
Dynamic Complexity
• Related to changes over time
• Systems are not static, but we assume they are
• Systems migrate toward states of high risk under competitive and financial pressures [Rasmussen]
• Want flexibility but need to design ways to
– Prevent or control dangerous changes
– Detect when they occur during operations
![Page 22: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/22.jpg)
Decompositional Complexity
• Definition: Structural decomposition not consistent with functional decomposition
• Harder for humans to understand and find functional design errors
• For safety, makes it difficult to determine whether system will be safe
– Safety is related to functional behavior of system and its components
– Not a function of the system structure
• No effective way to verify safety of object-oriented system designs
![Page 23: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/23.jpg)
Human Error, Safety, and Complexity
• Role of operators in our systems is changing
– Supervising rather than directly controlling
– Complexity is stretching limits of comprehensibility
– Designing systems in which operator error inevitable and then blame accidents on operators rather than designers
• Designers are unable to anticipate and prevent accidents
• Greatest need in safety engineering is to
– Limit complexity in our systems
– Practice restraint in requirements definition
– Do not add extra complexity in design
– Provide tools to stretch our intellectual limits
![Page 24: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/24.jpg)
It’s still hungry … and I’ve been stuffing worms into it all day.
![Page 25: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/25.jpg)
So What Do We Need to Do?“Engineering a Safer World”
• Expand our accident causation models
• Create new hazard analysis techniques
• Use new system design techniques
– Safety-driven design
– Integrate safety analysis into system engineering
• Improve accident analysis and learning from events
• Improve control of safety during operations
• Improve management decision-making and safety culture
![Page 26: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/26.jpg)
STAMP(System-Theoretic Accident Model and
Processes)
• A new, more powerful accident causation model
• Based on systems theory, not reliability theory
• Treats accidents as a control problem (vs. a failure problem)
“prevent failures” ↓
“enforce safety constraints on system behavior”
![Page 27: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/27.jpg)
STAMP (2) • Safety is an emergent property that arises when system
components interact with each other within a larger environment
– A set of constraints related to behavior of system components (physical, human, social) enforces that property
– Accidents occur when interactions violate those constraints (a lack of appropriate constraints on the interactions)
• Accidents are not simply an event or chain of events but involve a complex, dynamic process
• Most major accidents arise from a slow migration of the entire system toward a state of high-risk
– Need to control and detect this migration
![Page 28: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/28.jpg)
STAMP (3)
• Treats safety as a dynamic control problem rather than a component failure problem. – O-ring did not control propellant gas release by sealing gap in field
joint of Challenger Space Shuttle
– Software did not adequately control descent speed of Mars Polar Lander
– Temperature in batch reactor not adequately controlled in system design
– Public health system did not adequately control contamination of the milk supply with melamine
– Financial system did not adequately control the use of financial instruments
![Page 29: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/29.jpg)
ExampleSafetyControlStructure
![Page 30: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/30.jpg)
SafetyControl inPhysicalProcess
![Page 31: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/31.jpg)
Safety Constraints
• Each component in the control structure has
– Assigned responsibilities, authority, accountability
– Controls that can be used to enforce safety constraints
• Each component’s behavior is influenced by
– Context (environment) in which operating
– Knowledge about current state of process
![Page 32: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/32.jpg)
Accidents occur when model of process is inconsistent with real state of process and controller provides inadequate control actions
Controlled Process
Model ofProcess
ControlActions
Feedback
Controller
Control processes operate between levels of control
Feedback channels are critical -- Design -- Operation
![Page 33: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/33.jpg)
Relationship Between Safety and Process Models (2)
• Accidents occur when models do not match process and
– Required control commands are not given
– Incorrect (unsafe) ones are given
– Correct commands given at wrong time (too early, too late)
– Control stops too soon
Explains software errors, human errors, component interaction accidents …
![Page 34: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/34.jpg)
Accident CausalityUsing STAMP
![Page 35: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/35.jpg)
Uses for STAMP
• More comprehensive accident/incident investigation and root cause analysis
• Basis for new, more powerful hazard analysis techniques (STPA)
• Supports safety-driven design (physical, operational, organizational))– Can integrate safety into the system engineering process
– Assists in design of human-system interaction and interfaces
![Page 36: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/36.jpg)
Uses for STAMP (2)
• Organizational and cultural risk analysis– Identifying physical and project risks
– Defining safety metrics and performance audits
– Designing and evaluating potential policy and structural improvements
– Identifying leading indicators of increasing risk (“canary in the coal mine”)
• Improve operations and management control of safety
![Page 37: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/37.jpg)
STPA (System-Theoretic Process Analysis)
• Identifies safety constraints (system and component safety requirements)
• Identifies scenarios leading to violation of safety constraints
– Includes scenarios (cut sets) found by Fault Tree Analysis
– Finds additional scenarios not found by FTA and other failure-oriented analyses
• Can be used on technical design and organizational design
• Evaluated and compared to traditional HA methods
– Found many more potential safety problems
![Page 38: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/38.jpg)
5 Missing or wrong communication with another controller
![Page 39: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/39.jpg)
Technical• Safety analysis of new missile defense system (MDA)
• Safety-driven design of new JPL outer planets explorer
• Safety analysis of the JAXA HTV (unmanned cargo spacecraft to ISS)
• Incorporating risk into early trade studies (NASA Constellation)
• Orion (Space Shuttle replacement)
• NextGen (planned changes to air traffic control)
• Accident/incident analysis (aircraft, petrochemical plants, air traffic control, railroad, UAVs …)
• Proton Therapy Machine (medical device)
• Adaptive cruise control (automobiles)
Does it work? Is it practical?
![Page 40: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/40.jpg)
• Analysis of the management structure of the space shuttle program (post-Columbia)
• Risk management in the development of NASA’s new manned space program (Constellation)
• NASA Mission control ─ re-planning and changing mission control procedures safely
• Food safety
• Safety in pharmaceutical drug development
• Risk analysis of outpatient GI surgery at Beth Israel Deaconess Hospital
• UAVs in civilian airspace
• Analysis and prevention of corporate fraud
Social and Managerial
Does it work? Is it practical?
![Page 41: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/41.jpg)
Integrating Safety into System Engineering
• Hazard analysis must be integrated into design and decision-making environment. Needs to be available when decisions are made.
• Lots of implications for specifications:
– Relevant information must be easy to find
– Design rationale must be specified
– Must be able to trace from high-level requirements to system design to component requirements to component design and vice versa.
– Must include specification of what NOT to do
– Must be easy to review and find errors
![Page 42: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/42.jpg)
Intent Specifications
• Based on systems theory principles
• Designed to support
– System Engineering (including maintainance and evolution)
– Human problem solving
– Management of complexity (adds intent abstraction to standard refinement and decomposition)
– Model-Based development
– Specification principles from preceding slide
Leveson, Intent Specifications: An Approach to Building Human Centered Specification, IEEE Trans. on Software Engineering, Jan. 2000
![Page 43: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/43.jpg)
![Page 44: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/44.jpg)
Level 3 Modeling Language: Spectrm-RL
• Combined requirements specification and modeling language. Supports model-based development.
• A state machine with a domain-specific notation on top of it
– Reviewers can learn to read it in 10 minutes
– Executable
– Formally analyzable
– Automated tools for creation and analysis (e.g., incompleteness, inconsistency, simulation)
– Black-box requirements only (no component design)
![Page 45: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/45.jpg)
SpecTRM-RL
• Black-box requirements only (no component design)
• Separates design from requirements
– Specify only black box, transfer function across component
– Reduces complexity by omitting information not needed at requirements evaluation time
• Separation of concerns is an important way for humans to deal with complexity
– Almost all software-related accidents caused by incomplete or inadequate requirements (not software design errors)
![Page 46: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/46.jpg)
![Page 47: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/47.jpg)
![Page 48: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/48.jpg)
Conclusions
• Traditional safety engineering techniques do not adequately handle complexity
– Interactive, non-linear, dynamic, and design (especially decompositional)
• Need to take a system engineering view of safety rather than the current component reliability view when building complex systems
– Include entire socio-technical system including safety culture and organizational structure
– Support top-down and safety-driven design
– Support specification and human review of requirements
![Page 49: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/49.jpg)
Conclusions
• Need a more realistic handling of human errors and human decision-making
• Need to include behavioral dynamics and changes over time
– Consider processes behind events and not just events
– Understand why controls drift into ineffectiveness over time and manage this drift
![Page 50: The Role of Complexity in System Safety and How to Manage It](https://reader035.vdocuments.us/reader035/viewer/2022062722/56813a21550346895da1fd87/html5/thumbnails/50.jpg)
Nancy Leveson
“Engineering a Safer World”
(Systems Thinking Applied to Safety)
MIT Press, December 2011
Available for free download from:
http://sunnyday.mit.edu/safer-world