redundancy removal in multicast protocols

5
Information Processing Letters 73 (2000) 169–173 Redundancy removal in multicast protocols Steven McKellar a , Robert Davis b,* a Computing Laboratory, University of Cambridge, Cambridge CB2 3QG, UK b Department of Computing and Electrical Engineering, Heriot–Watt University, Edinburgh, EH14 4AS, UK Received 2 July 1999; received in revised form 5 January 2000 Communicated by R. Backhouse Keywords: Distributed systems; Concurrency; Causal ordering 1. Introduction In a distributed system, a group is a collection of processes that act together in some system or user specified way. The key property that all groups have is that when a message is sent to the group itself, all members of the group receive it. While this may be achieved using Remote Procedure Calls (RPC), such an approach has been shown to be inefficient [1]. Many systems therefore adopt a more basic means of communication called multicasting, in which message passing may be interpreted in the context of send and receive primitives. RPC uses the concept of synchronous procedure calls to communicate between sites, so the concept of message ordering has little or no meaning at all. Multicasting removes the need of RPC to send unary messages to each of the group members. However, it adds complexity to the distributed model by introducing the need to order messages received at system nodes. In a typical distributed system, the task of monitor- ing and controlling various aspects of the environment is divided among the system nodes. When a node ob- serves some event, it informs other relevant nodes of this through message passing. Due to unpredictable * Corresponding author. Email: [email protected]. communication delays, different nodes may learn of different events at different points in time, causing some nodes to have an incorrect view of the environ- ment. Hence, for correct and reliable operation of a distributed system, messages must be ordered so that their cause-and-effect relationship is preserved [2]. Causal multicast algorithms preserve the causal or- der by delaying the delivery of messages received at system nodes. The alternative is to deliver mes- sages unconditionally and to roll back the process state to preserve causality. A well-known protocol that implements the causal order with group multicast is CBCAST [3]. CBCAST derives from ISIS, a message passing system developed using NON-FIFO channels, which had important implications for distributed com- puting. While there has been considerable interest in causal ordering over recent years [4–8], there is lit- tle that focuses upon the redundancy issues discussed here other than in [9] where FIFO channels are as- sumed. The CBCAST algorithm places messages in a delay queue if immediate delivery would violate the causal order. The delay queue is sorted by vec- tor time [10], with concurrent messages ordered by time of receipt. After vector time update takes place the delay queue must then be checked to establish if any of the buffered messages may be delivered as a consequence. Based on how such a requirement is im- 0020-0190/00/$ – see front matter 2000 Elsevier Science B.V. All rights reserved. PII:S0020-0190(00)00021-1

Upload: steven-mckellar

Post on 02-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Information Processing Letters 73 (2000) 169–173

Redundancy removal in multicast protocols

Steven McKellara, Robert Davisb,∗a Computing Laboratory, University of Cambridge, Cambridge CB2 3QG, UK

b Department of Computing and Electrical Engineering, Heriot–Watt University, Edinburgh, EH14 4AS, UK

Received 2 July 1999; received in revised form 5 January 2000Communicated by R. Backhouse

Keywords:Distributed systems; Concurrency; Causal ordering

1. Introduction

In a distributed system, a group is a collection ofprocesses that act together in some system or userspecified way. The key property that all groups haveis that when a message is sent to the group itself, allmembers of the group receive it. While this may beachieved using Remote Procedure Calls (RPC), suchan approach has been shown to be inefficient [1].Many systems therefore adopt a more basic means ofcommunication called multicasting, in which messagepassing may be interpreted in the context of sendand receive primitives. RPC uses the concept ofsynchronous procedure calls to communicate betweensites, so the concept of message ordering has littleor no meaning at all. Multicasting removes the needof RPC to send unary messages to each of thegroup members. However, it adds complexity to thedistributed model by introducing the need to ordermessages received at system nodes.

In a typical distributed system, the task of monitor-ing and controlling various aspects of the environmentis divided among the system nodes. When a node ob-serves some event, it informs other relevant nodes ofthis through message passing. Due to unpredictable

∗ Corresponding author. Email: [email protected].

communication delays, different nodes may learn ofdifferent events at different points in time, causingsome nodes to have an incorrect view of the environ-ment. Hence, for correct and reliable operation of adistributed system, messages must be ordered so thattheir cause-and-effect relationship is preserved [2].

Causal multicast algorithms preserve the causal or-der by delaying the delivery of messages receivedat system nodes. The alternative is to deliver mes-sages unconditionally and to roll back the processstate to preserve causality. A well-known protocol thatimplements the causal order with group multicast isCBCAST [3]. CBCAST derives from ISIS, a messagepassing system developed using NON-FIFO channels,which had important implications for distributed com-puting. While there has been considerable interest incausal ordering over recent years [4–8], there is lit-tle that focuses upon the redundancy issues discussedhere other than in [9] where FIFO channels are as-sumed. The CBCAST algorithm places messages ina delay queue if immediate delivery would violatethe causal order. The delay queue is sorted by vec-tor time [10], with concurrent messages ordered bytime of receipt. After vector time update takes placethe delay queue must then be checked to establish ifany of the buffered messages may be delivered as aconsequence. Based on how such a requirement is im-

0020-0190/00/$ – see front matter 2000 Elsevier Science B.V. All rights reserved.PII: S0020-0190(00)00021-1

170 S. McKellar, R. Davis / Information Processing Letters 73 (2000) 169–173

Fig. 1. Buffering of concurrent messages.

plemented, we suggest that redundant buffering notstrictly required to ensure the correctness of the causalorder may result. Indeed, unless the algorithm pro-vides explicit marking of concurrent messages thenthis redundancy will exist by default. A modifiedbuffer insertion scheme is thus presented to removethis potential redundancy.

2. Approaches for delivery from the delay queue

Given the orderly buffer structure created byCBCAST, the easiest delivery method would be to at-tempt to deliver every buffer element in a sequentialmanner. Hence, in the following example:

(B1,B2,B3)

an attempt to deliver B1 would be made, then B2 andfinally B3. Such a scheme would ensure that no mes-sage is delayed beyond such time as is required forthe correctness of the casual order. However, assum-ing that B1 cannot be delivered, and furthermore, it iscausally related to all the elements that follow, con-sideration of the remainder of the buffer is redundant.It can be shown that buffer size is, in general, propor-tional to the number of nodes in the distributed system.The redundancy in this delivery scheme would there-fore increase with the system size.

An optimization could be to halt consideration ofthe buffer on the first unsuccessful element. Hence,a failure to deliver B1 would imply B2 and B3 areno longer considered. The comparison redundancy ofthe initial scheme is now avoided. But consider thescenario presented in Fig. 1, noting that for clarity onlythe relevant part of each broadcast is shown.

B3 is the first message to arrive at S3, but cannotbe delivered before B1. Hence, B3 is placed in theinitially empty buffer. B4 is next to arrive at S3. Ina similar manner, B4 cannot be delivered before B2.

The need therefore exists to buffer B4. B3 and B4 areconcurrent and are ordered based on arrival time. Thebuffer order is therefore:

(B3,B4).

After the immediate delivery of B2 at S3, the causalorder will allow the delayed delivery of B4. The cruxof the problem is that the current buffer order at S3effectively prevents this. B3 is at the head of the bufferand it relies not on B2 being delivered first, but ratherB1, which has not yet arrived at S3. Delivery from thedelay queue fails on the first element and B4 is notconsidered as a consequence. It is only after the arrivalof B1 that both B3 and B4 can be unbuffered. Thedelay is represented visually in Fig. 2. The optimaldelay for the causal order is shown in Fig. 3. Tosummarize the failing of this delivery method:

Concurrent messages in an arbitrary site’s buffer havethe potential to delay each other past the point wherethey could safely be delivered.

‘Safety’ is defined with respect to the causal order.

3. A modification to CBCAST

It has been shown that redundant buffering of mes-sages is potentially an issue with CBCAST’s imple-mentation of the delay queue. The mere presence,however, of concurrent messages in an arbitrary site’sbuffer does not always infer that this redundancy willexist. Three conditions must be established before thiscan be implied from the buffer order:(1) Firstly, there must be two or more buffered mes-

sages that are concurrent in the context of theirvector timestamps.

(2) These concurrent messages must be directly be-side each other in the buffer, with no causally re-lated messages in between them.

S. McKellar, R. Davis / Information Processing Letters 73 (2000) 169–173 171

Fig. 2. Redundant message delay.

Fig. 3. Optimal message delay.

(3) The concurrent message grouping must be at thehead of the buffer.

Given that these conditions are met, redundancy willexist to some degree in the context of buffered mes-sage delivery.

What is required is a modification that removes theconcurrent redundancy but maintains the minimalisticnature of the buffer comparisons suggested by the ini-tial optimization. The premise that underlies this re-quirement is a means whereby any grouping of mes-sages in a buffer can be identified as being concurrent,or otherwise. We suggest that this can be achieved byway of a simple tagging scheme implemented in par-allel with the existing buffer insertion method. There-fore, when a message has to be inserted into the delayqueue its position is chosen in-line with the associatedvector timestamp. This much remains unchanged. Ad-ditionally, two further comparisons are now made:• Check the buffer entry to the immediate left of the

selected insertion position (if it exists). Given that itis concurrent to the message to be newly inserted,the newly inserted message is given a marker tosignify its concurrency in the context of the messagedirectly to the left.• Check the buffer entry to the immediate right of the

selected insertion position (if it exists). Given that

it is concurrent to the message to be newly inserted,the message to the immediate right is given a markerto signify its concurrency in the context of the newlyinserted message.

Insertion into the buffer can take place in one of twoways:(a) Working from left to right.(b) Working from right to left.

The tagging scheme involves a comparison to eitherside of the chosen position. If CBCAST was prescrip-tive about which of the above schemes was used, thenthe need for two comparisons could be reduced to one.If (a) was implemented, there would be a need to makea comparison only to the right. If (b) was implemented,there would be a need to make a comparison only tothe left.

By illustration, consider the scenario in Fig. 4, andassume a right to left comparison of the buffer. Thefirst message to arrive is B3, which cannot be deliveredbefore B2. Hence the buffer state is:

(B3).

B5 is next to arrive at S3. In a similar manner, B5cannot be delivered before either B1 or B4, and is

172 S. McKellar, R. Davis / Information Processing Letters 73 (2000) 169–173

Fig. 4. Implementation of concurrent message identification.

buffered as a consequence. B5 is concurrent in relationto B3 in the buffer, so B5 receives a tag:

(B3,B5′).B4 is buffered on arrival because it is causally pre-ceded by B1. B4 causally precedes B5 so must appearbefore it in the buffer. Based on the chosen insertionposition, B4 receives a tag because it too is concurrentin relation to B3.

(B3,B4′,B5′).Hence both B4 and B5 are concurrentin relation toB3. This buffer arrangement does not say anythingabout the relationship that exists between B4 andB5. B4 may be causally related to B5, or it may beconcurrent. Even if the former case were true, theimplication of vector clocks is that, given that thisis so, no vector clock value may exist that wouldallow B5 to be delivered and not B4. The buffer isalways considered left to right so B4 would alwaysbe delivered before B5 anyway. The result is that noerror may arise in the context of the causal orderbased on the proposed marker scheme. Given that B4is causally related to B5 or otherwise, what can beinferred from the above example is that a failure todeliver B3 from the delay queue should not halt thebuffer comparisons. The current vector clock value atthe site may allow either B4 or B5 to be delivered, oreven both of them.

The course of events that should occur after asuccessful message delivery may be stated informallyas follows:(1) The delay queue should be considered to see if

buffered messages may be delivered based on thenew vector time.

(2) Delivery should continue until either the buffer isempty or a message is encountered which cannotbe delivered based on the current vector clockvalue.

(3) If buffer delivery was halted because a messagecould not be unbuffered, then consider the nextmessage on the immediate right hand side (if itexists).

(4) If it is unmarked, end consideration of the buffer.(5) If it is marked, this signifies it is concurrent to the

message at the head of the buffer. Try to deliver itbased on the current vector clock value.

(6) Even if delivery is not possible from (5), keepconsidering the messages further along the bufferuntil one of two conditions arise:(1) End of buffer is reached.(2) A position in the buffer is arrived at containing

a message that is unmarked.The optimal delay of Fig. 3 is now achievable with

the modified CBCAST protocol. As before, B3 isplaced in the delay queue upon its arrival at S3. In asimilar manner, so is B4, but this time it is assigned amarker before it is buffered. The resultant buffer stateis therefore:

(B3,B4′).

After delivery of B2 from the communication channel,the modified protocol now allows B4 to be unbuffered,even although it is not at the head of the delay queue.Hence:

(B3)

is the resultant delay queue at S3. B3 is correctlyunbuffered after the arrival of B1.

S. McKellar, R. Davis / Information Processing Letters 73 (2000) 169–173 173

It may be noted that, while technically possibleto envisage a scheme that does not use any tagging,this would result in an increasing whole system delayawaiting for complete state update and an inferiorthroughput in consequence. Essentially, unless thealgorithm provides explicit marking of concurrentmessages then this redundancy will exist by defaultand, without some form of marker scheme, the onlyalternative would be to consider all queued messages.

4. Conclusions

Potential redundancy inCBCAST’s implementationof the delay queue has been identified. This introducesredundant message buffering not strictly required toensure the correctness of the causal order. We havepresented a simple technique to remove this redun-dancy that would be of use in improving overall sys-tem performance.

Acknowledgement

The authors wish to acknowledge Professor K.P.Birman at Cornell University for debating messageordering issues from the ISIS toolkit.

References

[1] K.P. Birman, R. Van Renesse, RPC considered inadequate, in:Reliable Distributed Computing with the Isis Toolkit, IEEEComputer Society Press, Silver Spring, MD, 1994, pp. 68–78.

[2] C.J. Fidge, Fundamentals of distributed system observation,IEEE Software 13 (6) (1996) 77–83.

[3] K.P. Birman, A. Schiper, P. Stephenson, Lightweight causaland atomic group multicast, ACM Trans. Comput. Sys-tems 9 (3) (1991) 272–314.

[4] K. Ravindran, S. Samdarshi, A flexible causal broadcast com-munication interface for distributed applications, J. ParallelDistributed Program. 16 (2) (1992) 134–157.

[5] L.E.T. Rodrigues, P. Verissimo, Causal separators for large-scale multicast communication, in: Proc. 15th IEEE Internat.Conf. on Distributed Computer Systems, June 1995, pp. 83–91.

[6] S. Alagar, S. Venkatesan, Causal ordering in distributed mobilesystems, IEEE Trans. Comput. 46 (3) (1997) 353–361.

[7] F. Adelstein, M. Singhal, Real-time causal message ordering inmultimedia systems, Telecom. Systems 7 (1–3) (1997) 59–74.

[8] A.D. Kshemkalyani, Reasoning about causality between dis-tributed nonatomic events, Artificial Intelligence 92 (1–2)(1997) 301–315.

[9] A.D. Kshemkalyani, M. Singhal, Necessary and sufficientconditions on information for causal message ordering andtheir optimal solution, Distributed Comput. 11 (2) (1998) 91–111.

[10] C.J. Fidge, Logical time in distributed computing systems,IEEE Comput. 24 (8) (1991) 28–33.