time and prediction based software rejuvenation

Time and Prediction

Based Software Rejuvenation

By,Rajeev N.B (1RV08IS036)

Shah Smit (1RV08IS044)

Abhishek G (1RV08IS002)

Table of Contents• Abstract

• Introduction

• Problem statement and objectives

• Design

• Implementation

• Testing

• Results and discussion

• Conclusion

• References

AbstractLoopholes in existing systems :

• Present distributed systems are reactive.

• They can detect failure and take action based on that but are unable to do before hand.

Solution Proposed

• Using Time and Prediction to determine any failures and rejuvenating them before they crash the system.

• Thus, using a proactive method to take action.

Work Carried● We do detection based on some Pre-determined important parameters.

● Buffer, Cache, CPU load, Memory and Number of processes.

● We detect failures using Time based and prediction based techniques.

Abstract

Outcome of the work :

● The nodes which are about to failure are detected and rejuvenated.

● Thus, a distributed system which is less prone to crashing.

● Best part the reactive based measures can still be used just in case.

Introduction

Software Rejuvenation:

• Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future.

Time Based Prediction:

• At certain intervals we check if the key parameters are not above the safety threshold levels.

Purely Prediction Based:

• Using a mathematical model we try to predict if a node is about to fail on basis of the key parameters.

Existing solutions:

• Distributed systems are reactive and they act after the failure has occur.

• Rejuvenation techniques are not used by these systems as well.

Introduction

Advancement Proposed:

• Proactive techniques should be used in tandem with current reactive solutions.

• Using Time and Prediction based Rejuvenation can reduce system downtime considerably by rejuvenating nodes which are about to crash.

Problem Statement with Objectives

Problem Statement :

• Performance degrades In distributed application running for a long time.

• They are susceptible to crash because of data corruption, numerical error accumulation and availability of OS resources.

• Thus, Leading to downtime and non-optimal performance.

Objectives :

• To Simulate using two Software rejuvenation approaches

Time based and Prediction based.

● To effectively detect and rejuvenate failing nodes in a system using TPSRP.

Design• Level 0

Design

● Level 1

Design

● Level 2

Design

Implementation

Petri net ● Petri Net is one of several modeling languages for the

description of distributed systems.

● Like UML, EPCs etc

● A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or between transitions.

● We use Petri net for our simulation.

Methodology● Simulated a proactive based appraoch for software rejuvenation

● Petri Net graphs were used for representing the nodes in the distributed systems.

● Five key parameters ( Cache ,Memory, Buffer, CPU Load, Processes ) were

considered for simulation.

● Dijkstra’s algorithm is used to traverse through all the nodes and maintain a state of

transition among nodes in the graph.

● Based on the values of the key parameters in the nodes, the simulator decides the

failing nodes in the graph .

● Results of the Simulation which depicts the key parameters of the nodes before and

after rejuvenation using Time and Prediction techniques is written to a

CSV(Comma separated value) file.

Lessons from ImplementationDouble Buffering

● While Making a simulator, it important to make sure it has no flicker, tearing or other artifacts.

● But its difficult to draw a display where pixels don't change more than once, for that we use Double buffering

Handling Multiple Events● Our simulator consists of multiple paint events and time

events, the key challenge was to maintain a synchronization between them.

● Thus to get accurate results we slowed down the transmissions so timer events can work properly.

Implementation

Parameters on Nodes with Threshold Values :

● Cache : 450

● Memory : 7500

● Buffer : 950

● CPU Load : 95

● # of Processes : 45

Implementation

● Time based Rejuvenation Policy

The nodes are inspected for various conditions of the parameters after a certain interval of time.

The nodes to be rejuvenated are decided based on the nature of parameters after the timer expiry.

Implementation

● Prediction Based Rejuvenation Policy :

Prediction of the failing nodes in a system is done on two aspects of the parameters :

1. If the parameter states are very close to the Threshold values of the parameters.

2. If the rate of change in any of the parameter values is very high or is increasing at a exponential rate.

Testing

● Testing coverage :

Unit Testing.

Integration Testing.

System testing.

Unit Testing

5 key modules of our project were tested.

● Dijkstra’s algorithm.

● Time based.

● Prediction based system.

● Timer event for monitoring.

● UI Module.

Integration Testing

● It represents various modules integrated and tested.

● We integrated numerous modules and tested them.

● Initialization of graph.

● Simulation and Rejuvenation.

● Drawing Your Own petri-net graphs.

Snapshots

Results

● The system detects the failing nodes and also rejuvenate them to safer values using TPSRP.

ID Buffer Cache CPU Load

Memory Processes

Before (5)

82 183 12 4346 48

After (5) 82 183 12 4346 5

Before (14)

361 468 84 3692 27

After (14)

361 50 10 3692 27

Conclusion

1 : We presented a new TPSRP to improve software

availability and also to detect failing nodes with higher

probability.

2 : Numerical analysis shows time and prediction based

policy not only outweighs the purely time-based strategy or

purely prediction-based strategy, but also can be easily

applied to a practical system.

References

[1] J.Gray, D.P.Siewiorek, “High-Availability Computer systems”, IEEE

Computer, Vol.24, Issue 9, 1991, pp39-48.

[2] K.Vaidyanathan, K.S.Trivedi, “Extended Classification of Software

Faults Based on Aging”, In 12th International Symposium on Software

Reliability Engineering (ISSRE 2001), Hong Kong, November 2001.

Page 99.

[3] Y.Huang, C.Kintala, N. Kolettis, and N.D.Fulton, “Software

Rejuvenation: Analysis, Module and Applications”, Proc. 25th IEEE Int’l

Symp. On Fault Tolerant Computing, IEEE Computer Society Pree, Los

Alamitos, CA, 1995, pp.381-390.

time and prediction based software rejuvenation

Technology