1 fault-tolerance techniques for mobile agent systems prepared by: wong tsz yeung date: 11/5/2001

36
1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

Upload: elisabeth-fitts

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

1

Fault-Tolerance Techniques for Mobile Agent Systems

Prepared by: Wong Tsz Yeung

Date: 11/5/2001

Page 2: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

2

IntroductionMobile agent has been proposed in

different application domains: E-commerce Mobile Computing

It is important to have: Fault-detection Recovery

Page 3: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

3

Mobile Agent Execution ModelA mobile agent executes on a sequence

of machines.A place provides a logical execution

environment for the agent.Executing an agent at a place is called

a stage of the agent execution.

iP

iS

Page 4: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

4

Mobile Agent Failure ModelWe can classify failures into 3 classes:

Agent failure Place failure Machine failure

We assume that agent failure and place failure will not happen.

Page 5: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

5

Mobile Agent Failure ModelWhen a machine failure happens, all

agents executing will be terminates.When an agent wants to travel to a

failed host, an exception will be raised.We assume that the agent will be

terminated in this case.

Page 6: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

6

Problems of failures Agent travels in the network. It is difficult to estimate the running time of an

agent. Two problems :

Agent owner believes that the agent has been lost, but, in fact, it is not.

Agent owner waits for the agent to finish its execution, but the agent is actually terminated abnormally.

Page 7: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

7

Concerns in Protocol DesignBlocking-free

Assume that we have a prefect failure detection mechanism.

Suppose we have checkpoint every agent at every host.

If we have detected a host fails, we restart that failed host.

Page 8: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

8

Concerns in Protocol Design This kind of recovery is prone to blocking. While the recovery is taking place, the

execution is blocked until the recovery finishes.

Page 9: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

9

Concerns in Protocol DesignExactly-once

Suppose an agent is trapped inside a very busy network.

If the owner launches another agent, we will have 2 instances in the network.

It will double the effect done by a single agent if the actions are not idempotent (non-intrusive).

Page 10: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

10

Server Failure DetectionA server fault-tolerance mechanism is

two-folded. Agent have to stop traveling to failed

server. There should be global daemons detecting

failures. Once failure is detected, recovery should

take place.

Page 11: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

11

Server Failure DetectionA simple server fault-tolerance

mechanism: When an agent finishes computation, it

checks if the next server is available or not. If yes, it travels to that server. If no, it waits at its resident server until the

next host is available.

Page 12: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

12

Server Failure Detection The way to detect server failure depends

on what agent platform is using. E.g. RMI and RPC.

We run a daemon global to all the servers. This daemon can detect and recover failed servers.

However, the daemon is a single point of failure. We should introduce multiple instances of this monitor daemon to ease the problem.

Page 13: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

13

ExperimentWe have set up an experiment on

server failure detection.The network:

Home

Page 14: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

14

ExperimentTo introduce failures to the server, we

have a daemon running along with every server.

The job of the daemon is to kill the servers randomly.

We have set the probability to be 0.1 per 2 minutes.

Page 15: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

15

ExperimentWe have 2 kinds of agents:

One can detect the availability of the next server.

Another one cannot.The former will wait for recovery.The latter will travel to failed servers

and being terminated.

Page 16: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

16

ExperimentWe have a global daemon. It detects and recovers server failures. It detects the servers failures by

following a cyclic server list.

1M 2M 3M 1nM nM

Page 17: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

17

ExperimentEstimation of the time between a server

fails and it is recovered: Let p be the probability that a server fails. Let be the time needed to perfore the

recovery process. Let n be the number of servers. The worst time T =

np

Page 18: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

18

ExperimentResult

0

10

20

30

40

50

60

70

80

90

100

1 5 10 15 20 25

No. of servers

Perc

enta

ge

w/ server failure detection

w/o server failure detection

Page 19: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

19

ExperimentAgents are still losing because the

resident servers of the agents die while the agents are waiting.

The time that the agent is waiting is linearly proportional to the number of servers.

Therefore, the curve is dropping more or less in a linear manner.

Page 20: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

20

Agent Failure DetectionPull approach

Pull information out of the agent periodically.

The owner queries the agent.Use agent proxy.Defect:

If agent is on the way traveling to a server, it cannot respond.

Page 21: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

21

Agent Failure DetectionPush approach

Agent pushes information to the owner. Agent sends heartbeat messages to the

owner periodically.Better than pull approach

No need to know where the agent is.

Page 22: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

22

Agent Failure DetectionDefects of the above 2 approaches

Centralized. Depends on status of the network. Produce a lot of traffic on the network.

Page 23: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

23

Agent Failure DetectionCooperative Agent Approach

2 agents are sent at one time. One is called actual agent. Another one is called rear guard. Rear guard always lags the actual agent.

Page 24: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

24

Agent Failure Detection

actual

rear

rear

actual

Page 25: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

25

Agent Failure DetectionHow does it work:

When the actual agent arrives at a server, it sends message to the rear guard

I am in XXX When the actual agent leaves a server, it

sends message to the rear guard I am leaving XXX

The rear guard will then travel to XXX.

Page 26: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

26

Agent Failure DetectionHow to detect and recover the agent:

Assumption (1) Checkpoint of actual agent. It is for the use of recovering actual agent.

Assumption (2) Agent will not be lost while traveling This eliminates the possibility that rear guard

cannot receive I am in XXX message.

Page 27: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

27

Agent Failure DetectionCase 1

Rear guard cannot receive the message I am leaving XXX within a timeout period.

This implies the agent crushes. The rear guard can use the checkpointed

actual agent to continue execution.

Page 28: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

28

Agent Failure DetectionCase 2

Actual agent cannot send I am in XXX to rear guard.

This implies the rear guard crushes. Actual agent can transmit a rear guard to

its previous server.

Page 29: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

29

Agent Failure DetectionAdvantage

Decentralized Probability of both rear guard and actual

agent die are very small. Small amount of messages comparing to

periodic messages.

Page 30: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

30

Replicating Servers

a0p0

Stage S0

a1p1

a1p1

a1p1

Stage S1

a2p2

a2p2

a2p2

Stage S2

Page 31: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

31

Replicating Servers

a0p0

Stage S0

a1p1

a1p1

a1p1

Stage S1 a2p2

a2p2

Stage S2

a2’p2’

a2’p2’

Stage S2’

Page 32: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

32

Checkpointing and RollbackNot all data can be checkpointing easily.Two types of agent data

Strongly reversible objects Weakly reversible objects

Page 33: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

33

Checkpointing and RollbackStrongly reversible objects

They can be compensated by means of an image of the objects.

E.g. Information retrieving agent.

Page 34: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

34

Checkpointing and RollbackWeakly reversible objects

They may be different from the original data after compensations.

E.g. Electronic money.

Page 35: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

35

Conclusion and Future WorkWe will continue to focus on agent

failure detection.The above failure detection schemes do

not satisfy the exactly-once and blocking-free requirements. Efforts are still needed.

Page 36: 1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001

36

** END **