time and prediction based software rejuvenation

25
Time and Prediction Based Software Rejuvenation By, Rajeev N.B (1RV08IS036) Shah Smit (1RV08IS044) Abhishek G (1RV08IS002)

Upload: rajeev-bharshetty

Post on 12-Jul-2015

1.053 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Time and Prediction based Software Rejuvenation

Time and Prediction

Based Software Rejuvenation

By,Rajeev N.B (1RV08IS036)

Shah Smit (1RV08IS044)

Abhishek G (1RV08IS002)

Page 2: Time and Prediction based Software Rejuvenation

Table of Contents• Abstract

• Introduction

• Problem statement and objectives

• Design

• Implementation

• Testing

• Results and discussion

• Conclusion

• References

Page 3: Time and Prediction based Software Rejuvenation

AbstractLoopholes in existing systems :

• Present distributed systems are reactive.

• They can detect failure and take action based on that but are unable to do before hand.

Solution Proposed

• Using Time and Prediction to determine any failures and rejuvenating them before they crash the system.

• Thus, using a proactive method to take action.

Work Carried● We do detection based on some Pre-determined important parameters.

● Buffer, Cache, CPU load, Memory and Number of processes.

● We detect failures using Time based and prediction based techniques.

Page 4: Time and Prediction based Software Rejuvenation

Abstract

Outcome of the work :

● The nodes which are about to failure are detected and rejuvenated.

● Thus, a distributed system which is less prone to crashing.

● Best part the reactive based measures can still be used just in case.

Page 5: Time and Prediction based Software Rejuvenation

Introduction

Software Rejuvenation:

• Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future.

Time Based Prediction:

• At certain intervals we check if the key parameters are not above the safety threshold levels.

Purely Prediction Based:

• Using a mathematical model we try to predict if a node is about to fail on basis of the key parameters.

Existing solutions:

• Distributed systems are reactive and they act after the failure has occur.

• Rejuvenation techniques are not used by these systems as well.

Page 6: Time and Prediction based Software Rejuvenation

Introduction

Advancement Proposed:

• Proactive techniques should be used in tandem with current reactive solutions.

• Using Time and Prediction based Rejuvenation can reduce system downtime considerably by rejuvenating nodes which are about to crash.

Page 7: Time and Prediction based Software Rejuvenation

Problem Statement with Objectives

Problem Statement :

• Performance degrades In distributed application running for a long time.

• They are susceptible to crash because of data corruption, numerical error accumulation and availability of OS resources.

• Thus, Leading to downtime and non-optimal performance.

Objectives :

• To Simulate using two Software rejuvenation approaches

Time based and Prediction based.

● To effectively detect and rejuvenate failing nodes in a system using TPSRP.

Page 8: Time and Prediction based Software Rejuvenation

Design• Level 0

Page 9: Time and Prediction based Software Rejuvenation

Design

● Level 1

Page 10: Time and Prediction based Software Rejuvenation

Design

● Level 2

Page 11: Time and Prediction based Software Rejuvenation

Design

Page 12: Time and Prediction based Software Rejuvenation

Implementation

Petri net ● Petri Net is one of several modeling languages for the

description of distributed systems.

● Like UML, EPCs etc

● A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or between transitions.

● We use Petri net for our simulation.

Page 13: Time and Prediction based Software Rejuvenation

Methodology● Simulated a proactive based appraoch for software rejuvenation

● Petri Net graphs were used for representing the nodes in the distributed systems.

● Five key parameters ( Cache ,Memory, Buffer, CPU Load, Processes ) were

considered for simulation.

● Dijkstra’s algorithm is used to traverse through all the nodes and maintain a state of

transition among nodes in the graph.

● Based on the values of the key parameters in the nodes, the simulator decides the

failing nodes in the graph .

● Results of the Simulation which depicts the key parameters of the nodes before and

after rejuvenation using Time and Prediction techniques is written to a

CSV(Comma separated value) file.

Page 14: Time and Prediction based Software Rejuvenation

Lessons from ImplementationDouble Buffering

● While Making a simulator, it important to make sure it has no flicker, tearing or other artifacts.

● But its difficult to draw a display where pixels don't change more than once, for that we use Double buffering

Handling Multiple Events● Our simulator consists of multiple paint events and time

events, the key challenge was to maintain a synchronization between them.

● Thus to get accurate results we slowed down the transmissions so timer events can work properly.

Page 15: Time and Prediction based Software Rejuvenation

Implementation

Parameters on Nodes with Threshold Values :

● Cache : 450

● Memory : 7500

● Buffer : 950

● CPU Load : 95

● # of Processes : 45

Page 16: Time and Prediction based Software Rejuvenation

Implementation

● Time based Rejuvenation Policy

The nodes are inspected for various conditions of the parameters after a certain interval of time.

The nodes to be rejuvenated are decided based on the nature of parameters after the timer expiry.

Page 17: Time and Prediction based Software Rejuvenation

Implementation

● Prediction Based Rejuvenation Policy :

Prediction of the failing nodes in a system is done on two aspects of the parameters :

1. If the parameter states are very close to the Threshold values of the parameters.

2. If the rate of change in any of the parameter values is very high or is increasing at a exponential rate.

Page 18: Time and Prediction based Software Rejuvenation

Testing

● Testing coverage :

Unit Testing.

Integration Testing.

System testing.

Page 19: Time and Prediction based Software Rejuvenation

Unit Testing

5 key modules of our project were tested.

● Dijkstra’s algorithm.

● Time based.

● Prediction based system.

● Timer event for monitoring.

● UI Module.

Page 20: Time and Prediction based Software Rejuvenation

Integration Testing

● It represents various modules integrated and tested.

● We integrated numerous modules and tested them.

● Initialization of graph.

● Simulation and Rejuvenation.

● Drawing Your Own petri-net graphs.

Page 21: Time and Prediction based Software Rejuvenation

Snapshots

Page 22: Time and Prediction based Software Rejuvenation

Snapshots

Page 23: Time and Prediction based Software Rejuvenation

Results

● The system detects the failing nodes and also rejuvenate them to safer values using TPSRP.

ID Buffer Cache CPU Load

Memory Processes

Before (5)

82 183 12 4346 48

After (5) 82 183 12 4346 5

Before (14)

361 468 84 3692 27

After (14)

361 50 10 3692 27

Page 24: Time and Prediction based Software Rejuvenation

Conclusion

1 : We presented a new TPSRP to improve software

availability and also to detect failing nodes with higher

probability.

2 : Numerical analysis shows time and prediction based

policy not only outweighs the purely time-based strategy or

purely prediction-based strategy, but also can be easily

applied to a practical system.

Page 25: Time and Prediction based Software Rejuvenation

References

[1] J.Gray, D.P.Siewiorek, “High-Availability Computer systems”, IEEE

Computer, Vol.24, Issue 9, 1991, pp39-48.

[2] K.Vaidyanathan, K.S.Trivedi, “Extended Classification of Software

Faults Based on Aging”, In 12th International Symposium on Software

Reliability Engineering (ISSRE 2001), Hong Kong, November 2001.

Page 99.

[3] Y.Huang, C.Kintala, N. Kolettis, and N.D.Fulton, “Software

Rejuvenation: Analysis, Module and Applications”, Proc. 25th IEEE Int’l

Symp. On Fault Tolerant Computing, IEEE Computer Society Pree, Los

Alamitos, CA, 1995, pp.381-390.