how can usage monitoring improve resilience?

37
Complex Systems Design & Management November 12-14 2014 How Can Usage Monitoring Improve Resilience? Jean-René Ruault Frédéric Vanderhaegen Christophe Kolski [email protected]

Upload: jean-rene-ruault

Post on 07-Jul-2015

85 views

Category:

Engineering


4 download

DESCRIPTION

Resilience and systems engineering are key issues for critical systems. The op-erational usage and states of such systems are quite different from reference ones, generating drift and generate risks. This article suggests functional and physical architectures that fit resilience. Four functions relate to resilience (avoidance, re-sistance, recovery, adaptation). We develop the avoidance one and define a usage monitoring system that implements it. The case study concerns a railway accident that occurred at Aldershot, Canada. We explain the origin of the gap leading to the accident. The usage monitoring system would allow human operators to under-stand the situation and avoid the accident.

TRANSCRIPT

Page 1: How can usage monitoring improve resilience?

Complex Systems Design & Management

November 12-14 2014

How Can Usage Monitoring Improve Resilience?

Jean-René RuaultFrédéric Vanderhaegen

Christophe Kolski

[email protected]

Page 2: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 2

Summary

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 3: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 3

Context, train crashes

� Lac-Mégantic (Canada), 6 July 2013

� 50 dead

� Brétigny sur Orge (France), 12 July 2013

� 7 dead

� 9 gravely injured

� Santiago de Compostela (Spain), 24 July 2013

� 80 dead

� 130 injured

� Granges-près-Marnand(Switzerland), 29 July

� 1 dead

� 25 injured

Page 4: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 4

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 5: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 5

Running outside the specified domainDynamic representation of barriers bypassing

A

B

C

D

1

Time

3

2

Accident E

Legend:• Specified path: • Actual path: • Specified local variability:• Actual local variability:• Situation point:• Safety margin: • Barriers : • Barrier bypassing• Deviation

X

1

Page 6: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 6

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : dissonance Management for resilient systems design

� Railway case study

� Conclusion and perspectives

Page 7: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 7

Systems engineering main issues� Definition

� System: set of complex hardware, software, personnel and operational processes, organized so as to satisfy the needs and fulfil the expected services, in a given environment (prEN 9277: 2012)

� Systems engineering: interdisciplinary approach governing the total technical and managerial effort required to transform a set of stakeholder needs, expectations, and constraints into a solution and to support that solution throughout its life (ISO/IEC/IEEE 15288, forthcoming)

Life cycle

Operational, functional, physical

architecture

Holism

StakeholdersInterdisciplinary

Systems engineering

Evidence

*ities

Page 8: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 8

Systems thinking (1/2)Systems thinking (1/2)

Why?What?

How?

How much?In which context?

How efficient?

What for? How long?

ROI?

Page 9: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 9

MissionsA teller

ComponentsAcquisition costOwnership cost

Environment

Performances (duration of a transaction)

FunctionsLifecycle

ROI ?

Systems thinking (2/2)Systems thinking (2/2)

Page 10: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 10

Example: the railroad Example: the railroad networknetwork

� Trains (high speed, tilting…).� Geography, geology, civil engineering,

urban impacts, environmental impacts…

� Connection with other transportation means: end-to-end transportation.

� Exploitation: exploitation of the network, infrastructure (electrical network, communication network…)

� Commercialization: billing policy, sales channels).

� Network, equipment and infrastructure maintenance.

Page 11: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 11

Longer cyclesLonger cycles� USS Enterprise:

� Keel laid in 1958� Ship launched in Sep. 1960� Commissioned on Nov. 1961� Inactivated Dec. 2012� Decommissioned currently underway (2016?).

� Super Frelon:� Designed in the 50’s� First flight Dec. 1962� In-service Oct. 1965� Retired Apr. 2010� But still in service in China.

� B52� Contract bid June 1946� Maiden flight Apr. 1952� Active service since 1955� Upgrades between 2013 and 2015� Expected to serve into the 2040’s.

� French Metro car MS61� Ordered in 1963� Delivered Feb. 1967� First commercial trip Jun. 1967� Upgrade in the 80’s and 2000-2010� still in service.

Page 12: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 12

Systems modelling

� Systems Modelling Language (SysML) � Adaptation from UML to systems

� Functional modelling

� Key issue for model-based systems engineering (MBSE)

Page 13: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 13

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 14: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 14

Resilience definitions

� “safety and risks in complex organizations are emergent, not resultant properties: safety and risk cannot be predicted or modeled on the basis of constituent components and their interaction” (1)

� “we can only measure the potential of resilience, but not resilience itself” (2)

� resilience as a “management at the border of the domain of application…” (3)

14

1. S. Dekker: Resilience engineering: chronicling the emergence of confused consensus ; in E. Hollnagel, D. Woods & N. Levenson (eds), Resilience Engineering. Concepts and precepts, Ashgate, Hampshire, Great Britain, 2006

2. E. Hollnagel & D. Woods: Epilogue – Resilience engineering precepts; in E. Hollnagel, D. Woods et N. Levenson(eds), Resilience Engineering. Concepts and precepts, Ashgate, Hampshire, Great Britain, 2006

3. D. Luzeaux: Engineering Large-scale Complex Systems in D. Luzeaux, J.-R. Ruault & J.-L. Wippler, Complex Systems and Systems of Systems Engineering, ISTE Ltd and John Wiley & Sons Inc, 2011

Page 15: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 15

About resilience (1/2)

� Work conditions� Crew’s training and experiment

� Availability of the resources� Human-system interfaces (HSI) quality

� Barriers removal (costs; benefits)

� Migration phenomenon (omerta)

� Team collaboration quality� Methods and procedures accessibility and availability

� Mechanisms dealing with resilience

� Dynamic process of “visual piloting”

� Compensation/decompensation mechanism

� Threats and appropriate responses

� Among factors of context

� Multilevel interactions

� Tolerance� Margin

� Flexibility� Reserve (buffering capacity)

� Resilience characteristics

Page 16: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 16

About resilience (2/2)

� Positive / negative effects of the observance / non-observance specified procedures (Hollnagel’s dark matter1)

Negative (accident)Positive (no accident)

Unsuited adaptation mechanism

Failure of a compensation mechanism

Signal indicating the probability of an accident

Sociotechnical system adaptation to the actual environment

Compensation mechanism

Enhance vigilance because decompensation risks

Non-observance

Actual environment different from the reference situation. Procedure observance generates a failure.

Sociotechnical system functioning in its specified domain

Observance

Procedure

Consequence

1. E. Hollnagel: Resilience – The challenge of the Unstable; in E. Hollnagel, D. Woods et N. Levenson (eds), Resilience Engineering. Concepts and precepts, Ashgate, Hampshire, Great Britain, 2006

Page 17: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 17

Issues & aims

Resilience: capability of sociotechnical systems

� To cope with unpredictable, unforeseeable events

� To adjust faced with disturbing events,

� To adapt and learn adequate rules of adaptation,

� Disturbances out of the system’s adaptation mechanisms

Impacts of resilience upon sociotechnical systems:

� Systems engineering processes

� Systems engineering models

� Systems architecture � issue of this article

� Systems utilization processes

17

Page 18: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 18

Four main resilience functions (1)

1. Avoidance (capacity for anticipation)

2. Resistance (capacity for absorption)

3. Adaptation (capacity for reconfiguration)

4. Recovery (capacity for restoration)

This paper deals with:

1. Avoidance

1. D. Luzeaux: Engineering Large-scale Complex Systems in D. Luzeaux, J.-R. Ruault & J.-L. Wippler, Complex Systems and Systems of Systems Engineering, ISTE Ltd and John Wiley & Sons Inc, 2011

Page 19: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 19

Avoidance function decomposition

Acquiring information at the operators’ level � anticipate and avoid accidents

� Obtain a representation of the environment

� Obtain a representation of the system’s dynamics

� Identify the environment states that were not envisioned

� Evaluate the instantaneous or trend drifts

� Evaluate the proximity of the state of the system compared to the zones of danger

Page 20: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 20

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 21: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 21

Functional decomposition of the avoidance function (block definition diagram)

bdd [Package] Resilience_Functions [Resilience_Func tions]

«Resilience_Functions»Resilience

«Resilience_Functions»Av oidance

«Resilience_Functions»Adaption

«Resilience_Functions»Resistance

«Resilience_Functions»Recov ery

«Resilience_Functions»Obtain_Representation_Env ironment

«Resilience_Functions»Obtain_Representation_System_Dynamic«Resilience_Functions»

Identify_Env ironment_States «Resilience_Functions»Ev aluate_Drifts

«Resilience_Functions»Ev aluate_Proximity_Hazard

«Resilience_Functions»Alert_Opeartors

Name:Package:Version:Author:

Resil ience_FunctionsResilience_Functions1.0Ruault

Page 22: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 22

Allocation of resilience functions on the usage monitoring system components

Page 23: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 23

Physical architecture of the usage monitoring system

bdd [Package] Usage_Monitoring_System [Usage_Monito ring_System]

«Usage_Monitoring_Component»Reference_States_Repositoty

«Reference_State»+ Numbrer_Authorization :Reference_State+ Specified_Variabil ity :Reference_State+ Barrier_Characteristics :Reference_State

+ Set(Reference_State)

«Usage_Monitoring_Component»States_Comparison_Engine

+ Assess_Drift(Current_States_Repository, Reference_States_Repositoty) :Drift+ Assess_Warning_Level(Reference_States_Repositoty, Current_States_Repository) :Warning_Level+ Assess_Safety_Margins(Current_States_Repository, Reference_States_Repositoty) :Safety_Margins

«Usage_Monitoring_Component»Current_States_Repository

«Current_State»+ Numbrer_Authorization :Current_State*+ Current_Variabil ity :Current_State+ Barrier_Status :Current_State

«Usage_Monitoring_Component»Usage_Sensor_Proxy

«Usage»+ Gather_Usage() :Usage «Usage_Monitoring_Component»

Usage_Sensor

«Current_State»+ Barrier_Status :Current_State+ Number_Authorization :Current_State+ Current_Variabili ty :Current_State

«Usage»+ Gather_Usage() :Usage

«Usage_Monitoring_System»Usage_Monitoring_System

+ Gather_Usage()+ Express_Warning() :Warning

«Usage_Monitoring_Component»User_Interface_Proxy

+ Show_Warning_Level(Warning_Level)+ Show_Safety_Margins(Safety_Margins)+ Show_Drift(Drift)

1..*

1..*

1..*

1

1

1

Name:Package:Version:Author:

Usage_Monitoring_SystemUsage_Monitoring_System1.0Ruault

Page 24: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 24

Interfaces and flows between operating system and usage monitoring system

ibd [Package] System_Structure [System_Structure]

«Usage»Gather_Usage

«Warning»Express_Warning

:Usage_Monitoring_System

::Usage_Monitoring_System+ Gather_Usage()+ Express_Warning() :Warning

«Usage»Gather_Usage

«Warning»Express_Warning

«Usage»Usage_Information

«Current_State»+ Barrier_Status :Current_State+ Number_Authorization :Current_State+ Current_Variabil ity :Current_State

«Warning»Warning

«Warning»+ Warning_Level :Warning+ Safety_Margins :Warning+ Drift :Warning

«Usage»Exhibit_usage

«Warning»Express_Warning

:Operating_System

«Operating»

::Operating_System+ Exhibit_Usage() :Usage

«Usage»Exhibit_usage

«Warning»Express_Warning

«Warning»

«Usage»

Name:Package:Version:Author:

System_StructureSystem_Structure1.0Ruault

Page 25: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 25

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition : Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 26: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 26

Aldershot accident case study (1/2)

� Context of the accident:

� Main-track derailment of a train at Aldershot , Ontario, Canada

� 3 deaths, 45 wounded

� 4300 litres of diesel fuel released

� Summary

� On 26 February 2012, the train VIA 92 was proceeding eastward from Niagara Falls to Toronto, Ontario, on track 2 of the Canadian National Oakville Subdivision near Burlington, Ontario

� The track switches were lined to route the train from track 2 to track 3, through crossover No. 5, which had an authorized speed of 15 mph

� The train VIA 92 entered crossover No. 5 while travelling at about 67 mph. Subsequently, the locomotive and all 5 coaches derailed.

� The locomotive rolled onto its side and struck the foundation of a building adjacent to the track

Page 27: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 27

Aldershot accident case study (2/2)

� Track schematic and site diagram

1

2

3

4

5

6 78

Page 28: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 28

Simulation of the accident

12

3 4

Page 29: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 29

Main causes of this accident and synthesis� The crew would normally go straight ahead at track speed

but the train was diverted to the switch due to the unexpected presence of work crew on the tracks

� The control opted to route the train from track 2 to track 3 crossover N°5, which was authorized for a speed of 15 mph

� The control did not communicate this route change to the VIA 92 crew

� The crew expected to go straight ahead 99% of the time and expectation drive perception and decision processes

� Synthesis� There was only one way to inform

the crew, the signals along the track

� There was no signal after the Aldershot station in order to remind the crew the speed

� It’s typically a misperception of the crew due bias and heuristic’s effect

� Safety and resilience are not guaranteed

Page 30: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 30

Relevance of an usage monitoring system

� An usage monitoring system would have been useful for detecting the major violations of safety

� Detecting the speed excess by comparing it to the accepted threshold and presenting the evidence of a speed exceeding to the operators.

� Managing the rail crossing between rail 2 to rail 3 by alerting the operators or presenting a visual device with the maneuver to be accomplished.

A

B

C

D

1

Time

3

2

� �

Accident E

X

1

Legend: Specified path: Actual path: Specified local variability: Actual local variability: Situation point: Safety margin: Barriers: Barrier bypassing: Gap: Hazard �

� Reporting the railway signals to the operators in real-time in order to keep them informed and secures the situation awareness of the operating crew.

� Alerting on the specificity of this day, very different from the current 99% “go straight ahead”

Page 31: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 31

A whole system containing an operating system part and an usage monitoring system one

bdd [Package] System_Structure [System_Structure]

«System»Whole_System

«Usage_Monitoring_Syste...Usage_Monitoring_System

+ Gather_Usage()+ Express_Warning() :Warning

«Operating_System,block»Operating_System

«Component»+ Security_Device

+ Exhibit_Usage() :Usage

:Rail_Traffic_Management

«Operating»

::Rail_Traffic_Management+ Manage_Traffic()+ Exhibit_Usage() :Usage

«Usage_Monitoring»

::Rail_Traffic_Management+ Gather_Usage()+ Express_Warning() :Warning

:Trains

«Operating»

::Trains+ Transport()+ Exhibit_Usage() :Usage

«Usage_Monitoring»

::Trains+ Gather_Usage()+ Express_Warning() :Warning

:Railroad_Station

«Operating»

::Railroad_Station+ Exhibit_Usage() :Usage

«Usage_Monitoring»

::Railroad_Station+ Gather_Usage()+ Express_Warning() :Warning

Name:Package:Version:Author:

System_StructureSystem_Structure1.0Ruault

Page 32: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 32

Communication of specified states from the rail traffic management and the train

ibd [Package] System_Structure [System_Structure]

Specified_State

:Trains

«Current_State»

::Trains- Current_Speed

«Operating»

::Trains+ Transport()+ Exhibit_Usage() :Usage

«Usage_Monitoring»

::Trains+ Gather_Usage()+ Express_Warning() :Warning

Specified_StateSpecified_State

:Rail_Traffic_Management

«Reference_State»

::Rail_Traffic_Management+ Specified_Speed+ Crossover_Maximum_Speed = 15 mph

«Operating»

::Rail_Traffic_Management+ Manage_Traffic()+ Exhibit_Usage() :Usage

«Usage_Monitoring»

::Rail_Traffic_Management+ Gather_Usage()+ Express_Warning() :Warning

Specified_State

«information»Specified_State

«Specified_State»+ Specified_Speed+ Crossover_Maximum_Speed

«information»Specified_State «flow»

Name:Package:Version:Author:

System_StructureSystem_Structure1.0Ruault

Page 33: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 33

Road map

� Running outside the specified domain

� About systems engineering

� About resilience

� Proposition: Design pattern fit to resilient systems

� Railway case study

� Conclusion and perspectives

Page 34: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 34

Conclusion and perspective

� Conclusion� Proposal : to interconnect the operating system,

realizing the operational missions, and the usage monitoring system.

� To monitor system’s states and usage� To estimate the gap between the current state of the

system and the safe one, the proximity of hazard, and to inform the operators

� The goal is that the operators share a clear, reliable, relevant and updated representation of the operational context as well as the usage of the system, so that they can take the appropriate measures

� Perspective � Widening this architecture to observe the operational

context and express it to the operators

Page 35: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 35

References� Systems engineering:

� Friedenthal S, Moore A, & Steiner R (2011) A Practical Guide to SysML, Morgan Kaufmann; 2nd edition.

� Resilience:� D Luzeaux (2011) Engineering Large-scale Complex Systems in D

Luzeaux, J-R Ruault & J-L Wippler, Complex Systems and Systems of Systems Engineering, ISTE Ltd and John Wiley & Sons Inc, 2011.

� Ruault J-R, Vanderhaegen F, Luzeaux D (2012) Sociotechnical systems resilience. 22nd Annual INCOSE International Symposium, 9-12 July, 2012, Rome.

� Ruault J-R, Vanderhaegen F, Kolski C (2013) Sociotechnical systems resilience: a dissonance engineering point of view. 12th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (august, 11-15), IFAC, Las Vegas, USA.

� Zieba S., Polet P., Vanderhaegen F., Debernard S. (2010). Principles of adjustable autonomy: a framework for resilient human machine cooperation. Cognition, Technology and work, 12 (3), pp. 193-203.

� Ouedraogo K-A., Enjalbert S., Vanderhaegen F. (2013). How to learn from the resilience of Human–Machine Systems?. Engineering Applications of Artificial Intelligence, volume 26, issue 1, pp. 24-34.

Page 36: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 36

References� Dominique Luzeaux & Jean-René Ruault

(Ed.) Systems of Systems, ISTE-Wiley, London, 2011

� Dominique Luzeaux, Jean-René Ruault & Jean-Luc Wippler (Ed.) ComplexSystems and Systems of Systems Engineering, ISTE-Wiley, London, 2011

� Patrick Millot (Ed.), Risk Management in Life critical Systems, ISTE-Wiley, London, 2014

Page 37: How can usage monitoring improve resilience?

Complex Systems Design & Management - November 12-14 2014 3737

THANK YOU

VERY MUCH

FOR YOUR

ATTENTION