determining the global progress of parallel simulation with fifo communication property

ELSEVIER

Information Processing Letters

Information Processing Letters 50 (1994) 13-17

Determining the global progress of parallel simulation with FIFO communication property

Yi-Bing Lin

Bellcore, 445 South Street, Morristown, NJ 07962-1910, USA

(Communicated by D. Gries; received 19 March 1993; revised 13 December 1993)

Abstract

In an optimistic parallel simulation, the global progress (i.e., the global virtual time or GVT) must be computed from time to time. We present a simple and efficient algorithm for computing the GVT in a distributed environment with FIFO communication delays. The accuracy and the complexity of the algorithm are discussed.

Key words: Design of algorithms; Analysis of algorithms; Parallel algorithms

1. Introduction

In a parallel discrete event simulation [4], the simulated system is partitioned into a set of sub- systems that interact through the scheduling of events. (In this paper, the terms event and message or data message have the same meaning.) The set of sub-systems is simulated by a set of processes that communicate by sending/ receiving timestamped messages. The scheduling of an event for a sub-system at time t is simulated by sending a message with timestamp t to the corresponding process. The global event list and global clock of a sequential simulation do not exist in the parallel counterpart. Each process has its own input message queue and local clock (the local clock of a process p at time t is the timestamp of the message being executed by p at t). To cor- rectly simulate a sub-system, the corresponding process must execute arriving messages in their timestamp order, as opposed to their real-time arrival order. To satisfy this causality constraint, a

synchronization mechanism is required. This paper considers the Time Warp synchronization mechanism [5].

Time Warp synchronization takes an optimistic approach in which a process executes every message as soon as it arrives. If a message with an earlier timestamp (a struggler) subsequently arrives, the process rolls back its state to the timestamp of the straggler and re-executes from that point. Thus, the state of each process must be saved regularly (regardless of whether or not rollbacks actually occur>. Also, for each process p, an output message queue is maintained to save messages that have been sent from p. When p is rolled back to a timestamp, all output messages with greater timestamps must be unsent. The information concerning “unsent messages” is provided by the output message queue.

The amount of storage used for state-saving (including the process state queue, the input message queue and the output message queue) grows as the simulation progresses. Jefferson observed

0020-0190/94/$07.00 0 1994 Elsevier Science B.V. AI1 rights reserved

SSDZ 0020.0190(94)00005-.I

14 Y-B. Lin /Information Processing Letters 50 (I 994) 13-17

[5] that at any real time there exists a global virtual time (GVT) such that all executed messages with timestamps earlier than GVT will not be rolled back. Thus the storage used for saving information with timestamp earlier than GVT can be reclaimed. In addition to garbage collec- tion, GVT can be useful in other areas of Time Warp simulation such as memory management, termination detection, snapshots and crash recov- ery, and input and output handling. An operational definition of GVT is given below (other operational definitions of GVT were discussed elsewhere [6]).

Definition 1. The global virtual time (GVT) at time t is the minimum of all local clocks at time t and of the timestamps of all unprocessed messages in the system at time t.

The task of finding GVT is not trivial in a distributed environment even if communication is FIFO (i.e., messages are delivered in the order they sent). The processes may not report their local clocks at the same time, and unprocessed messages may be in transit, and their timestamps cannot be accessed.

2. GVT algorithm design criterion

The main criterion for designing a GVT algorithm is simpkcity. A Time Warp system includes several distributed algorithms [4] (e.g., memory management algorithms, termination algorithms, fault tolerance algorithms, state saving algorithms, and cancellation selection algorithms). The implementation of these algorithms are sig- nificantly affected by the GVT algorithm (for example, GVT algorithms with acknowledgement [1,8] increase the implementation complexity). Unlike most distributed snapshot algorithms or distributed termination algorithms, where an important goal is to minimize the number of control messages sent during the distributed computation, the number of control messages sent in a GVT algorithm is not important. In a typical Time Warp simulation, 106-lo9 data messages are sent during execution. On the other hand, the

number of GVT computations is less than 100. Also, GVT computation is a background process in a Time Warp simulation, and the time to complete GVT computation is not critical. Thus, a simple GVT algorithm with O(P*) message complexity may be preferred to a complex algorithm with O(P) message complexity (where P is the number of processors).

Most GVI algorithms [1,8] require an acknowledgement for every data message even if the communication preserves the FIFO property. These algorithms may not be efficient (because extra lo”-lo9 acknowledgements must be sent in a typical Time Warp simulation). Also, the imple- mentations of these algorithms are not trivial (special data structures or hardware support are required).

Mattern [7] proposed a GVT algorithm with- out acknowledgement where communication is not FIFO. Mattern’s algorithm is simpler and more efficient than all existing algorithms in a non-FIFO communication environment.

Most distributed Time Warp simulators were implemented in an environment where communication is FIFO. This paper presents a simple yet efficient GVT algorithm with FIFO communication property. Our algorithm is designed for Time Warp running on a network of (lo-501 worksta- tions with FIFO communication [2].

3. The algorithm

This section presents our GVT algorithm. We assume that the FIFO property only applies to pairs of processors. ’ Different pairs of processors are not restricted by this property. For example, if processor pi sends a message to pj at time 10 and sends another message to pk at time 12, then pk may receive the message earlier than pj does.

‘Weassume that every process is assigned to a dedicated processor, and the terms process and processor are used interchangeably. When several processes are assigned to a

processor, it is trivial to modify our GVT algorithm such that

the participants in the GVT computation are processors, not

processes.

Y-B. Lin /Information Processing Letters 50 (1994) 13-17 15

(d)Step2 (Cont.) w step3

(b) step 2 (c)Step 2(Cont.)

mstep4

Fig. 1. Computing GVT among five processors (where pO is

the initiator).

The GVT computation is initiated by a processor called the initiator. All processors involve in the GVT computation are referred to as participants.

Step 1. The initiator broadcasts a begin-local- minimum-computation message m, to all participants (cf. Fig. l(a)).

Step 2. When participant pr receives m,, it sends a control message m, to participant pj if pi has sent a data message to pi since the last GI/T computation (cf. Fig. l(b), p3 sends m, to p2 and pJ. On receipt of m,, the participant pj replies with a control message m2 back to pi (cf. Fig. l(c)). After pi has received all m2 messages, it sends an acknowledgement message m3 to the initiator (cf. Fig. l(d)).

Step 3. After the initiator has received all acknowledgement messages m3 from the participants, it broadcasts an end-local-minimum- computation message m4 to all participants (cf. Fig. l(e)>.

Step 4. After participant pi receives message m4, it reports the local minimum 7i (the minimum value of the local clock in the time period [ti,O, ti,41 where ti.0 is the time when participant pi received message m,, and t+ is the time when pi received message m,) to the initiator (cf. Fig. l(f)).

Step 5. After the initiator has received the value ri from every participant pi, the global virtual time T is computed as r = min,,ri. Then the value T is broadcasted to all participants.

This algorithm assumes message preemption. That is, if a straggler arrives at a process pi, the computation at pi is interrupted and a rollback occurs. The value of p’s local clock is replaced by the timestamp of the straggler immediately. If message preemption is not supported, then the minimum timestamps of unprocessed messages in the input message queue must also be used in computing the local minimum during [ ti,O, ti,4] for process pi.

4. Correctness

Let GT/T(t) be the true GVT at time t. The correctness of a GVT algorithm is defined as follows.

Definition 2. Let T be the value computed by a GVT algorithm, and the GVT computation com- pletes at time t. The GVT algorithm is correct if and only if T G GlP(t).

A good GVT algorithm should compute a value T close to GlP(t). Let ti,a be the time when pi receives message m,. Let t, = minviti,“, and t, = max,,iti,o. We prove the main theorem for the correctness of the algorithm as follows. 2

Theorem 3. The value T computed in the algorithm satisfies GVTCt,) 6 T =G GVT(t,).

Proof. Let lci(t) be the local clock of pi at time t. Let ti,4 be the time when pi receives m4. Then the value 7i reported at Step 4 in the GVT algorithm is 7i = min,l,O ~ f ~ ti4ki(t). From the

*A distributed GVT computation algorithm should be safe and be. Definition 2 provides safety. Liveness follows from Theorem 3.

16 Y-B. Lin /Information Processing Letters 50 (1994) 13-17

monotonicity of GVT and Definition 1, for arbi- trary t, < t, < t,, we have

GI/T(t,) < min ki(t)

for any i. Since t, G ti,o < ti,4 for all i, we have

GVT( tF) < n$ min ki(t) = minTi = 7 f;,“<t<f,,h 1 Vi

Thus, GLT(t,) < T. Now, we prove r < GLT(t,). Suppose that at time t,, GV’T(t,) is the timestamp of a data message m sent from pi to pj.

From Definition 1, there are two cases. Case I: Message m is being processed by pj at

time t,. Then lcj(tL) = GV’T(t,). Since tj,o < t, <

tj,4, we have rj < GVT(t,) which implies T < GV’N,).

transit at time t,, t, < t,. The scheduling of m is due to the execution of a message m’ at time t,. Thus the timestamp of m’ is smaller than the timestamp of m, or lc,(t,> < GVT(tJ. Thus, Ti < k,(t,> < GVT(t,) and 7 < 7i < GW(t,).

From Cases I and II, we have r < GV’Rt,). 0

Case II: Message m is in transit at time t,. Since the value Suppose m is sent from pi at time t, and arrives pants after t,, the at pj at time t,. There are two cases. to Definition 2.

Case II(a): Message m is sent before ti,o (i.e., t, G t& Let ti,3 be the time when pi sends message m3 to the initiator. Then Step 2 of the algorithm guarantees that t, < ti,3 < tj,4. (Cf. Fig. 2(a). pi cannot send m3 before it receives m2 from pj, and pi cannot send m2 before it receives m,. The FIFO property ensures that m arrives at pj earlier than m, does.) Thus, 7j < GV’T(t,) and 7 < rj 6 GVT(t,).

5. Discussion

Case II(b): Message m is sent after ti,o (i.e.,

t, > li,O; cf. Fig. 2(b)). Since message m is in

Fig. 3 illustrates the timing of a GVT computation. In the figure, message m5 represents the local minimum ri sent from a participant pi to the initiator, and m6 represents the computed GVT T sent from the initiator to the participants. From Theorem 3, GV’T(t,) Q 7 Q GVT(t,). Since pi receives the computed GVI T at ti,6 > t,, the accuracy of the algorithm is determined by the quantity GV7’(ti,,) - GVT(t,) (assuming the worst case that 7 = GV’T(t,)). The smaller the quantity, the better the accuracy. Since ti,6 - t, is positively correlated with GV7Xti,,) - GVT(t,), a small (ti,6 - tF) yields a better T value. Assume that the communication delay for sending a message from pi to pj (for all i and j) is a random variable with mean p and variance c2. An upper bound O* of the expected value 0 = E[ti,6 - tl- E[t, - t] can be derived as follows. From [3],

ts $0 fL *r *i.3 5.4

(a) The first case in Case II.

ti,O G t, 43 $4 tr

(b) The second case in Case II.

Fig. 2. When the message with the smallest timestamp is in

transit at tL.

pi .

pi

initiator f t, C.0 G.3 ‘i.4 t1 $5

Fig. 3. The timing of each step of the GVT algorithm.

T will be used by the partici- algorithm is correct according

E[ t, - t ] = E min (the delay to send m, from 1 Vi

the initiator to pi) 1 (P- 1)u

GP- J2p-1 (1)

Y-B. Lin /Information Processing Letters 50 (1994) 13-17 17

60-

50-

0' 40

30-

20-

o=O.l

lo-l7 10 20 30 40 50 60

P

Fig. 4. O* against P (u = 1).

and

from a processors to pj) 1 [

(P- 1)U

G6 p+ J2p-1 . I

From (1) and (2),

@=E[t,-t] +E[fi,6-tl] -E[tF-t]

7(P- l)C7 <6/~+ Jzp-l =O*.

Fig. 4 plots O* as a function of P and u with a normalized p = 1. This figure indicates that the accuracy of the GVT algorithm is determined by

the variance (+’ of the communication delay dis- tribution and the number of processors P.

6. Acknowledgement

We learned how to write concise sentences from David Gries. The reviewer’s comments sig- nificantly improve the quality of this paper.

7. References

[l] S. Bellenot, Global virtual time algorithms, in: Proc. 1990 SCS Multiconference on Distributed Simulation (1990) 122- 130.

[2] C. Carothers, R.M. Fujimoto, Y.-B. Lin and P. England,

Distributed simulation of PCS networks using Time Warp,

in: Proc. Internat. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (19941, to appear.

[3] H.A. David, Order Statistics (Wiley and Sons, New York,

2nd ed., 1981). [4] R.M. Fujimoto, Parallel discrete event simulation, Comm.

ACM 33 (10) (1990) 31-53.

[S] D. Jefferson, Virtual time, ACM Trans. Programming Languages Systems 7 (3) (1985) 404-425.

[6] Y.-B. Lin, Memory management algorithms for parallel

simulation, Inform. Sci. (1993), to appear.

[7] F. Mattern, Efficient distributed snapshots and global

virtual time algorithms for non-FIFO systems, J. Parallel Distributed Comput. 18 (4) (1993) 423-434.

[8] Reynolds, Panccerella and Srinivasan. Design and perfor- mance analysis of hardware support for parallel simula-

tion, J. Parallel Distributed Comput. 18 (4) (1993) 435-453.

determining the global progress of parallel simulation with fifo communication property

Documents