self healing operating system
TRANSCRIPT
-
8/8/2019 Self Healing Operating System
1/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
Self Healing Operating System
Neethu . T VRoll No: 32
S7 Computer Science and Engineering
Government Engineering College
Sreekrishnapuram Palakkad
November 25, 2010
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1 -
8/8/2019 Self Healing Operating System
2/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
OVERVIEW
1
Introduction2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page2http:///reader/full/page1http:///reader/full/page3http:///reader/full/page1http:///reader/full/page1http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
3/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
OVERVIEW
1
Introduction2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page4http:///reader/full/page3http:///reader/full/page3http:///reader/full/page2http:///reader/full/page4http:///reader/full/page1http:///reader/full/page2http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
4/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
Introduction
All applications are dependent on the OS
When the OS dies, all running applications are lostResilience to errors is an important requirement of modernoperating systemSelf healing enables systems to diagnose themselfs and react
to faults
I d i T i l E D i E E i li E fi E d i d S l
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page3http:///reader/full/page2http:///reader/full/page5http:///reader/full/page4http:///reader/full/page4http:///reader/full/page3http:///reader/full/page5http:///reader/full/page1http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
5/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
I t d ti T i l E D t ti E E i li E fi t E d t ti d S l
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page6http:///reader/full/page5http:///reader/full/page5http:///reader/full/page4http:///reader/full/page6http:///reader/full/page1http:///reader/full/page4http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
6/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
Terminology
Fault-Defect or flaw in hardware or software
Error -Deviation from correct stateFailure - Inability to perform expected task
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page5http:///reader/full/page4http:///reader/full/page7http:///reader/full/page6http:///reader/full/page6http:///reader/full/page5http:///reader/full/page7http:///reader/full/page1http:///reader/full/page5http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
7/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page8http:///reader/full/page7http:///reader/full/page7http:///reader/full/page6http:///reader/full/page8http:///reader/full/page1http:///reader/full/page6http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
8/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
Error Detection in Existing OSs
Custom Error Detection Code in OSs
Linux - Deadlock Detection, Soft Lockup Detection etc
Windows - Deadlock Detection etcHardware Memory Protection - MMU
Watchdog Timers - Linux, Windows etc
Software Memory Protection - SafeDrive, XFI
Periodic Consistency Checks - EROS
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page7http:///reader/full/page6http:///reader/full/page9http:///reader/full/page8http:///reader/full/page8http:///reader/full/page7http:///reader/full/page9http:///reader/full/page1http:///reader/full/page7http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
9/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page10http:///reader/full/page9http:///reader/full/page9http:///reader/full/page8http:///reader/full/page10http:///reader/full/page1http:///reader/full/page8http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
10/31
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
Error recovery in Existing OSs
Linux -Recovery by terminating thread
Restart Failed Component
Windows Vista - Example: Video Card DriverMinix3ChorusLinux+NooksIBM z/OS
Hardware RedundancyReboot Entire System
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page9http:///reader/full/page8http:///reader/full/page11http:///reader/full/page10http:///reader/full/page10http:///reader/full/page9http:///reader/full/page11http:///reader/full/page1http:///reader/full/page9http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
11/31
gy y g g y
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS
9 Future scope
10 Conclusion
11
Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page12http:///reader/full/page11http:///reader/full/page11http:///reader/full/page10http:///reader/full/page12http:///reader/full/page1http:///reader/full/page10http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
12/31
gy y g g y
Error signaling
C++ exception handling is used for unified error signaling
Devoloper defined exceptionsProcessor exceptions
Benifits of mapping processor exceptions to languageexceptions
Local error recovery using c++ catch statementGeneric handlers for all type of exceptionsGeneric handlers that just print out an error message and haltthe system
Normal run-time performance overhead is negligible
Provide developers a flexible and powerful technique
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page11http:///reader/full/page10http:///reader/full/page13http:///reader/full/page12http:///reader/full/page12http:///reader/full/page11http:///reader/full/page13http:///reader/full/page1http:///reader/full/page11http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
13/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page14http:///reader/full/page13http:///reader/full/page13http:///reader/full/page12http:///reader/full/page14http:///reader/full/page1http:///reader/full/page12http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
14/31
Error confinement
Isolate Os components
Used by microkernal:L4,Minix3
Nooks:Device driver isolation in linux
Objects in Choices can be placed in separate memoryprotection domains
Implemented using wrappers which inherit from target Classes
Example Protected Objects: Serial Port Driver,FileSystem
Inodes, Timer Driver
Recovery can be targeted toward the effected component
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page15http:///reader/full/page14http:///reader/full/page14http:///reader/full/page13http:///reader/full/page15http:///reader/full/page1http:///reader/full/page13http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
15/31
Choices protected components
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page13http:///reader/full/page12http:///reader/full/page16http:///reader/full/page15http:///reader/full/page15http:///reader/full/page14http:///reader/full/page16http:///reader/full/page1http:///reader/full/page14http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
16/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page17http:///reader/full/page16http:///reader/full/page16http:///reader/full/page15http:///reader/full/page17http:///reader/full/page1http:///reader/full/page15http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
17/31
Error detection and Recovery
Code Reloading
Component Micro-RebootsAutomatic Service Restarts
Watchdog-based Recovery
Process-level Recovery
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page18http:///reader/full/page17http:///reader/full/page17http:///reader/full/page16http:///reader/full/page18http:///reader/full/page1http:///reader/full/page16http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
18/31
Code reloading
Fault: Corruption of OS code by software bugs or hardwarebit-flips (Single Event Upsets)
Proactive Recovery: Periodically checksum OS code and
reload corrupted pages from stable storageReactive Recovery: If undefined instruction exception israised, reload relevant OS page from stable storage
Simple fault-injection experiments show 89 % recovery
Example: ARM based microprocessor for mobile phoneincludes Run Time Integrity Checker(RTIC)
Also used in EROS
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page19http:///reader/full/page18http:///reader/full/page18http:///reader/full/page17http:///reader/full/page19http:///reader/full/page1http:///reader/full/page17http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
19/31
Component micro-reboots
Error: Unhandled Exceptions in Components
Recovery: Similar to component restarts in existing systemsInvolves destroying and re-creating C++ object
After micro-reboot , internal state may be error free
Request is re-tried after micro-reboot
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page20http:///reader/full/page19http:///reader/full/page19http:///reader/full/page18http:///reader/full/page20http:///reader/full/page1http:///reader/full/page18http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
20/31
Automatic service restarts
Error: Unhandled Exception in a Process
Recovery: Automatically restart process
Used when component level restarts fail or if error occursoutside components (framework code)
Fault injection experiments show 78.9% recovery for processdispatcher (idle thread)
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page21http:///reader/full/page20http:///reader/full/page20http:///reader/full/page19http:///reader/full/page21http:///reader/full/page1http:///reader/full/page19http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
21/31
Watchdog-based recovery
Error: Lockups inside OS
Recovery: Terminate locked up thread or dispatch exception
Thread termination explored on Linux
An OS hardware watchdog works by setting a count downtimer to run
computer malfunctions the tickles stop and the watchdogeventually counts down to zero and does an automatic rebootof the computer.
Exceptions allow possible local recovery without anyinformation loss (in contrast with thread termination)
Lockup fault injection experiments about 70 % recovery
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page22http:///reader/full/page21http:///reader/full/page21http:///reader/full/page20http:///reader/full/page22http:///reader/full/page1http:///reader/full/page20http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
22/31
process recovery
What to do when OS error recovery is not possible?
Last Resort
Ensure minimal working subsystems - disk, recovery code
Save individual process stateRestore processes after full reboot
Item Explored on Linux
Re-use code for process checkpointing/migration support
Can recovery from arbitrary OS corruption that does notaffect user process state
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page16http:///reader/full/page15http:///reader/full/page23http:///reader/full/page22http:///reader/full/page22http:///reader/full/page21http:///reader/full/page23http:///reader/full/page1http:///reader/full/page21http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
23/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page24http:///reader/full/page23http:///reader/full/page23http:///reader/full/page22http:///reader/full/page24http:///reader/full/page1http:///reader/full/page22http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
24/31
solaris 10 OS
Introduce new architecture for buildingand deploying systems and servicescapable of predictive self healing
Solaris fault manager and solaris service
manager are two main components ofpredictive self heling
Fault manager receives hardware andsoftware errors and diagonose
automaticallyService manager provideservices,permitting automatic self healing
Services include start,stop,restart
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page23http:///reader/full/page22http:///reader/full/page25http:///reader/full/page24http:///reader/full/page24http:///reader/full/page23http:///reader/full/page25http:///reader/full/page1http:///reader/full/page23http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
25/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page26http:///reader/full/page25http:///reader/full/page25http:///reader/full/page24http:///reader/full/page26http:///reader/full/page1http:///reader/full/page24http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
26/31
Future scope
Working on OS restructuring to reduce error propagation and
prevent state loss during component micro-reboots
Framework for developer specified policies to governmicro-reboots and service restarts
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page25http:///reader/full/page24http:///reader/full/page27http:///reader/full/page26http:///reader/full/page26http:///reader/full/page25http:///reader/full/page27http:///reader/full/page1http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
27/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page28http:///reader/full/page27http:///reader/full/page27http:///reader/full/page26http:///reader/full/page28http:///reader/full/page1http:///reader/full/page26http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
28/31
Conclusion
Self-Healing Operating Systems may be built by incorporating
a variety of recovery techniques to address different faultmodels
It is also possible to detect and attempt recovery from systemhangs that would otherwise remain undetected.
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page27http:///reader/full/page26http:///reader/full/page29http:///reader/full/page28http:///reader/full/page28http:///reader/full/page27http:///reader/full/page29http:///reader/full/page1http:///reader/full/page27http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
29/31
OVERVIEW
1 Introduction
2 Terminology
3 Error Detection
4 Error recovery
5 Error signaling
6 Error confinement
7 Error detection and recovery
8 Solaris 10 OS9 Future scope
10 Conclusion
11 Reference
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page29http:///reader/full/page27http:///reader/full/page25http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page30http:///reader/full/page29http:///reader/full/page29http:///reader/full/page28http:///reader/full/page30http:///reader/full/page1http:///reader/full/page28http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
30/31
Reference
1 ARM Integrator Family from the website
http://www.arm.com/miscPDFs/8877.pdf[visited onnovember 10]
2 P. M. Chen, W. T. Ng, S. Chandra, C. Aycock, G. Rajamani,and D. Lowell. The Rio File Cache: Surviving Operating
System Crashes. In Architectural Support for ProgrammingLanguages and Operating Systems, pages 74-83, 2004
3 Dijkstra, E.: Self-stabilizing systems in spite of distributedcontrol. Communications of the ACM,1974
4 M. Baker and M. Sullivan. The Recovery Box: Using Fast
Recovery to Provide High Availability in the UNIXEnvironment.In USENIX,pages 31-44, Summer 2005
5 Building a self heal operating systemhttp://choices.cs.uiuc.edu/selfhealing.pdf [visited on
november 6]
Introduction Terminology Error Detection Error recovery Error signaling Error confinement Error detection and recovery Sola
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page30http:///reader/full/page30http:///reader/full/page29http:///reader/full/page31http:///reader/full/page1http:///reader/full/page29http:///reader/full/page23http:///reader/full/page16http:///reader/full/page13http:///reader/full/page11http:///reader/full/page9http:///reader/full/page7http:///reader/full/page5http:///reader/full/page3 -
8/8/2019 Self Healing Operating System
31/31
THANK YOU
http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page31http:///reader/full/page1http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page29http:///reader/full/page28http:///reader/full/page31http:///reader/full/page31http:///reader/full/page31http:///reader/full/page30http:///reader/full/page31http:///reader/full/page1http:///reader/full/page30