reducing risk by managing software related failures in networked control systems girish baliga,...

17
Reducing Risk by Managing Software Related Failures in Networked Control Systems Girish Baliga, Google, Inc Scott Graham, Air Force Inst. of Technology (AFIT) Carl A. Gunter, Dept. of Computer Science, UIUC P. R. Kumar, Dept. of ECE and CSL, UIUC

Upload: myles-ryan

Post on 27-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Reducing Risk by Managing Software Related Failures in

Networked Control Systems

Girish Baliga, Google, Inc

Scott Graham, Air Force Inst. of Technology (AFIT)

Carl A. Gunter, Dept. of Computer Science, UIUC

P. R. Kumar, Dept. of ECE and CSL, UIUC

Information TechnologyConvergence Lab Vision Sensors

Automatic Control

Ad Hoc Network

Planning and Scheduling

Networked Control Systems

Network

Sensor 1

Supervisor

Controller 2

Actuator 1

Plant 1

Sensor 2Actuator 2

Plant 2Controller 1

Filter 1

Software related failures

Programming errors– Simple errors such as incorrect storage size can be catastrophic– E.g. Arianne 5 failure was due to overflow in a 16 bit integer variable!

Passive failures– Software, node, and link failures can cut-off sub-systems– E.g. Car controller failures can cause a car to collide with other cars

Active failures– Faulty software can interfere with other sub-systems– E.g. Car controller or sensor errors can cause car collisions

Byzantine failures– Malicious agents can actively interfere with system operation– E.g. Rogue cars can try to block intersections and collide with other cars

Preventing software related failures

Robust control laws– Control laws can be designed to tolerate software failures– But, errors could exist in control law implementations!

Software verification using formal methods– Formal methods could be used to verify software implementations– But, failures could occur in systems software, libraries, hardware, or links– Also, software verification is very hard for large systems

Presence of software errors must be a basic assumption in system design

Controller

Plant

Component based design

Control system design

Supervisor

Sensor Actuator

Plant

Controller

Component based design

Component based software design isolates programming errors

Virtual Collocation

Etherware (Baliga & Kumar ‘03)Etherware manages all software components in a networked control system

Etherware– Location

independence– Semantic

addressing of components

– System startup and upgrade during execution

– Time translation– Automatic

migration of components for performance

Etherware manages software failures– Quick and efficient component restarts

– Maintain interconnections across failures

Transport Layer

Network Layer

MAC

Physical Layer

Application Layer

Se

rvic

e 2

Se

rvic

e 3

Tim

ing

Discrete Event

Scheduler

Kalman filter

TrajectoryPlanner

Car

controller

Model PredictiveController

Set PointGeneration

ImageProcessing

Control LawOptimization

Sensor Controller

MessageStream

Message streams connect software components- Message streams are setup and managed automatically by

Etherware- Message streams are persisted across component restarts

Etherware mechanisms formanaging software related failures

Kalman Filter

Filter

Filters intercept messages- Filters can be added to components and message streams - Filters can be used to manage component interactions

Local temporal autonomy

VisionSensor 2

VisionSensor 1

VisionServer

Supervisor

Controller 1

Actuator 1

State estimator

Stateestimator

ControlbufferLocal temporal autonomy reduces component

dependencies to tolerate passive failures

Component restarts

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Component restarts

CA Supervisor

CA Filter

Collation

VisionSensor 2

VisionSensor 1

VisionServer

Supervisor

Controller

Actuator

Collation of multiple independent inputs safeguards from active failures

Security Supervisor

CA Supervisor

CA Filter

Security overrides

VisionSensor 2

VisionSensor 1

VisionServer

Supervisor

Controller

Actuator

Override

Security overrides are used to manage Byzantine failures

- Security overrides must preserve low-level safety mechanisms

Safety preserving security overrides

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Safety preserving security overrides

Conclusions

Presence of software failures - basic assumption of systems design

– Component based design isolates failures

– Etherware provides mechanisms to manage software failures

– Design principles to manage risk due to software failures:» Component based design to contain programming errors» Local temporal autonomy to tolerate passive failures» Collation to safeguard from active failures» Safety preserving security overrides to manage Byzantine failures

Contact information

Email: [email protected]

Website: http://decision.csl.uiuc.edu/~testbed/