parallel program design using high-level petri nets

18
CONCURRENCY PRACTICE AND EXPERIENCE, VOL. S(2). 87-104 (APRIL 1993) Parallel program design using high-level Petri nets IAN GORTON School of Computer Science and Engineering University of New South Wales P.O. Box 1, Kensington NSW 2033, Australia SUMMARY Petri nets are proposed as a general-purpose design and modelling tool for parallel programs. The advantages of Petri nets for this purpose are discussed, and a solution to the Dining Philosophers problem is developed using simple Place-Thinsition nets. The limitations of Place-’hasition nets are described, and the Dining Philosophers problem is used to illustrate how Coloured Petri nets can overcome these limitations. A more complex example of a Coloured Petri net is then given, and it is shown how a collection of processes in the Occam programming language can be developed directly from the properties of the net. Another Petri net model of a simple process farm is given, and a solution is developed in Parallel C: this further highlights the suitability of Petri nets as a design tool for parallel programs. 1. PARALLEL PROGRAM DESIGN When building software for multiprocessor computers,pmgrams need to be appropriately structured in order to exploit the performance potential of the underlying hardware. The initial problem that must be tackled during software design is the decomposition of the problem domain into a number of co-operathg concurrent processes. This quires the programmer firstly to understand and then to specify how, when and why the processes interact. To co-operatein an orderly manner, processes need to synchronisetheir behaviour, and error conditions such as deadlock and incorrect termination need to be guarded against. These additional complexities combine to make the task of constructing correct parallel software significantly more difficult than that of its purely sequential counterpart[ 1 I. Traditionalsoftware design methods such as functional decompsition[2] and dataflow diagrams[31 are aimed at the development of sequential software systems. They do not contain any techniques or guidelines to aid a programmer designing for and dealing with the additional complexities of parallel software. These methods are therefore of limited use to the designer of parallel software. There are, however, software design techniques which deal directly with concurrency. These include: * CCS[41 * CSP[S] * CODE161 1040-3 108/93/oUw)81-18$14.O0 01993 by John Wdey & Sons. Ltd. Received 17 Oclobtr 1991 Revued 22 May 1992

Upload: ian-gorton

Post on 11-Jun-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Parallel program design using high-level Petri nets

CONCURRENCY PRACTICE AND EXPERIENCE, VOL. S(2). 87-104 (APRIL 1993)

Parallel program design using high-level Petri nets IAN GORTON School of Computer Science and Engineering University of New South Wales P.O. Box 1, Kensington NSW 2033, Australia

SUMMARY Petri nets are proposed as a general-purpose design and modelling tool for parallel programs. The advantages of Petri nets for this purpose are discussed, and a solution to the Dining Philosophers problem is developed using simple Place-Thinsition nets. The limitations of Place-’hasition nets are described, and the Dining Philosophers problem is used to illustrate how Coloured Petri nets can overcome these limitations. A more complex example of a Coloured Petri net is then given, and it is shown how a collection of processes in the Occam programming language can be developed directly from the properties of the net. Another Petri net model of a simple process farm is given, and a solution is developed in Parallel C: this further highlights the suitability of Petri nets as a design tool for parallel programs.

1. PARALLEL PROGRAM DESIGN

When building software for multiprocessor computers, pmgrams need to be appropriately structured in order to exploit the performance potential of the underlying hardware. The initial problem that must be tackled during software design is the decomposition of the problem domain into a number of co-operathg concurrent processes. This quires the programmer firstly to understand and then to specify how, when and why the processes interact. To co-operate in an orderly manner, processes need to synchronise their behaviour, and error conditions such as deadlock and incorrect termination need to be guarded against. These additional complexities combine to make the task of constructing correct parallel software significantly more difficult than that of its purely sequential counterpart[ 1 I.

Traditional software design methods such as functional decompsition[2] and dataflow diagrams[31 are aimed at the development of sequential software systems. They do not contain any techniques or guidelines to aid a programmer designing for and dealing with the additional complexities of parallel software. These methods are therefore of limited use to the designer of parallel software.

There are, however, software design techniques which deal directly with concurrency. These include:

* CCS[41 * CSP[S] * CODE161

1040-3 108/93/oUw)81-18$14.O0 0 1 9 9 3 by John Wdey & Sons. Ltd.

Received 17 Oclobtr 1991 Revued 22 May 1992

Page 2: Parallel program design using high-level Petri nets

88 IAN GORTON

* SCHEDULJ2[73 * Petri Nets181

CSP and CCS each provide formal specification notations based upon the fundamental concept of a process. Individual process behaviours are specified in terms of their responses to external signals or messages. Complete systems are constructed by specifying precisely how collections of processes interact and synchronise to solve a given problem. Important properties of systems-for example, absence of deadlock and starvation-may then be proved by applying a range of powerful algebraic transformation rules. Consequently these notations can provide a design method and mathematical proof tools for constructing parallel systems. They do, however, require a high level of sophistication from the system designer in terms of familiarity with the underlying semantics and transformation rules of each notation.

In contrast, the difficulty of developing parallel softwm that is transportable across a range of machine architectures has motivated the development of the CODE and SCHEDULE design methodologies. Essentially, CODE and SCHEDULE enable software designers to graphically specify the data and execution dependencies between subroutines written in sequential programming languages (e.g. Fortran, C, Ada). This allows the inclusion of existing library subroutines into parallel progxam frameworks[9]. From an architecture-independent graphical representation, each methodology is supported by tools which produce parallel software for specific target parallel architectures. CODE uses machine-specific translators known as TOADS to generate implementations. SCHEDULE provides a uniform set of library calls which programs call to create and control parallel processes: portability is achieved by reimplementing the SCHEDULE libmy intemals to exploit a particular architecture. Both methodologies provide graphical, easy to use, approaches to building parallel softwm from sequential computational units. Still, they are limited by a lack of analysis techniques which could give designers increased levels of confidence in the correctness of their parallel solution.

Petri nets are also well suited for the design and verification of concurrent systems. In their simplest form, Petri nets provide an intuitive, graphical, modelling tool for concurrent systems. The modelling power of Petri nets inherent in their simple execution semantics and graphical notation can be utilised to rapidly explore the implications of particular design alternatives, and quickly highlight deadlocks and termination problems before any code is written. Moreover, advances in Petri net theory have led to the development of several higher-level Petri net variants, for example[lO,ll]. These increase the power of expression in nct models and simpliiy their analysis.

This paper firstly shows how Petri nets can be adopted by programmers to give new insights into their parallel software designs. It then goes on to illustrate how the behaviour captured in Petri net models can be translated directly into programming languages which explicitly support parallelism. To this end, Section 2 gives an informal and brief introduction to basic Place-Transition nets, and highlights their practical limitations as a design tool. Section 3 introduces a higher-level Petri net notation, namely Coloured Petri nets[l2], which greatly increase the modelling potential of nets, and provide much simpler analysis methods. Scclion 4 takes an example of a Coloured Petri net and shows how it can be used to produce code in the Occam[131 parallel programming language directly from the properties of the net. Section 5 illustrates the use of nets to model a simple process farm structure which is implemented in Parallel C1141.

Page 3: Parallel program design using high-level Petri nets

PARALLEL PROGRAM DESIGN USING PETRI NETS 89

~

2. PLACE-TRANSITION NETS

Place-Transition (IT) nets[81 are the most well known and widely used Petri net variant. A PT net consists of two types of nodes, corresponding to places and transitions. A ckle represents a place, and a bar represents a transition (see Figm 1). Directed arcs connect places and transitions. An arc directed from a place to a transition defines the place as an input to the transition. An arc directed from a transition to a place defines the place as an output place from the transition. Multiple inputs and outputs of transitions are represented by multiple arcs. To be precise, a PT net is a directed bipartite multigraph, since it is possible to have multiple arcs from one node of the graph to another, and arcs may only connect places to transitions, or transitions to places.

Figure 1. Place-Transition net for rhe Dining Philosophers

To enable Petri nets to model the dynamic properties of systems, tokens are assigned to places in a net. Like places and transitions, tokens are a primitive concept in Petri net theory. Tokens may be thought of as residing in particular places in a net. The number of tokens allowed in a place is unbounded, and will change, along with the exact positions of tokens as the net is executcd. The execution of a Petri net is controlled by the number and distribution of tokens within the net. A net executes by firing transitions. When a

Page 4: Parallel program design using high-level Petri nets

IAN GORTON 90

transition fires, tokens are removed from each of its input places and new tokens are created and placed on each of the transition’s output places. For a transition to fire, it must be enabled. This means that, for the input places of the transition, there must be at least as many tokens in each place as there are arcs from the place to the transition. The tokens in the input places which enable a transition are known as its enabling tokens. A transition fires by removing its enabling tokens from its input places and depositing in its output places one token for each arc from the transition to the place. Multiple tokens are thus produced for multiple output arcs.

Firing a transition usually changes the distribution of tokens in the net. This distribution is known as the marking of the net. Therefore, as the net executes, its marking alters. Before the execution of a net begins, tokens must be assigned to certain places in the net, as governed by the requirements of the particular problem under consideration. This initial distribution of tokens is known as the initial marking of a net, and is important in characterising the precise properties of a Petri net model.

As an example of a FT net, consider Figure 1, which illustrates a Petri net model of the Dining Philosophers problem. Five philosophers may only think or eat alternately, and are seated at a large round table. Five forks are placed on the table between the philosophers: in order to eat, a philosopher needs two forks, and may only use the pair of forks immediately to their left and right. Therefore two neighbouring philosophers may not eat at the same time.

In Figure 1, each of the live philosophers is modelled by two places, labelled ‘th’ and ‘e’, representing thinking or eating, respectively. The five forks are represented by the placesj?&f4 (fo has been drawn twice to make the diagram more orderly). Initially, we assume all the philosophers are thinking, and consequently all the forks are available for use. This is represented by the net’s initial marking, with a token in each of the five ‘th’ and ‘f places. In this state, transitions a0 to a4 are all enabled, meaning that any philosopher may start to eat. The execution rules of Petri nets do not tell us which transition to fire in such circumstances, thus making the order of transition firing non-deterministic. Generally, however, if we fire a transition uN, we remove tokens from the places thN, jN and f ( (N+l)mod 5), and place a new token on place eN. This would represent that philosopher N has ceased thinking and is now eating. Note how firing transition uN disables transitions a((N-1)mod 5) and a((N+l)mod 5), satisfying the requirement that neighbouring philosophers cannot eat simultaneously. Eventually, when philosopher N finishes eating and wishes to resume thinking, transition bN will !ire, removing the token from place eN and placing tokens on thN,jN andf((N+l )mod 5). Transitions a((N-1) mod 5) and a((N+I )mod 5) are now enabled, giving philosophers ((N-1) mod 5) and ( ( N + l ) mod 5 ) the opportunity to begin eating.

Therefore it is apparent that the places in the net represent the different states which the system may adopt. For any philosopher N, a token in place thN represents that N is thinking, and a token in place eN represents N eating. The presence of a token in the places p-f4 represent that a fork is available for a philosopher to pick up, and the structure of the net ensures that both the required forks must be free and picked up simultaneously.

Similarly, the transitions in the net represent events that may occur in the system. The firing of one of the transitions a h 4 represents a philosopher taking up the two required forks, and the firing of onc of the transitions b0-M represents a philosopher putting down the forks after finishing eating.

Page 5: Parallel program design using high-level Petri nets

PARALLEL PROGRAM DESIGN USING PETlzI NETS 91

Once a Petri net has been created to model a problem, it can be analysed to give insights into the behaviour of the system. Propertes such as safeness, boundedness, liveness and conversation[8] of nets can be examined. Most of these properties can be analysed by applying techniques such as reachability analysis, fuing sequences and the invariant method. Such techniques though are beyond the scope of this paper, and the interested reader is referred to References 8 and 12.

While PT nets are a powerful tool for modelling systems, they become, like flow charts for sequential programs, rather cumbersome when large systems are confronted. Large nets become difficult to construct, comprehend and analyse. For these reasons, higher level extensions, such as that in Reference 12, have been developed to increase the complexity of the problems which can be easily managed and analysed. High-level nets provide a more convenient notation for modelling, and greatly reduce the burden of analysing net properties.

3. COLOURED PETRI NETS The central idea behind Coloured Petri Nets (CPNs) is to reduce the size, and thus the complexity, of nets by allowing individual tokens, representing different processes, to share the same subnet. This is done by folding similar places and transitions into single entities, and having the tokens in the net represent different values. In the Dining Philosophers example, this means being able to distinguish between the tokens representing both individual philosophers and individual forks. To do this, sets of values must be created for both the philosophers and forks: these are known as sets of colours, such as:

PH = {phO, phl, ph2, ph3, ph4) F = {fo, f l , f2, f3, f4)

Given these two sets, the fivc places rhO-th4 can be folded into a single place Think (see Figure 2), which is labelled by the set of colours PH, indicating that all tokens which reside on Think must be elements of PH. Analogously, the places &4 can be replaced by a single place Eat with PH as its colour set, and places f4f4 can be replaced by a single place Free Forks, which must be occupied by elements of the set F. The next stage is to fold the transitions a k 4 into the transition take forks, and the transitions b0-M into the transition Put downforks. Both the transitions are labelled by the set of colours PH, which indicates that the transitions must lire in a different manner according to which member of the PH is responsible for the firing. The colour set attached to a transition can also be viewcd as the set whose elements may initiate or trigger the firing of that transition. In other words, a philosopher decides when to pick up the forks, not vice versa! To complete the CPN for the Dining Philosophers problem, the requirement that each

philosopher may only access their two adjacent forks must be enforced. This is done by labelling the arcs from the place Free Forks with functions that indicate which members of the set F are to be removed or replaced when the transitions take forks or Put down forks are fired. They are defincd as:

LEFT@hN) =fN RIGHT@hN) = (fN+l) mod 5

Page 6: Parallel program design using high-level Petri nets

92 IAN ORTON

Think PH

PH

Forks PH I Figure 2. Coloured Petri Net for the Dining Philosophers

The functions indicate that a firing of the transition rake forb, with dour phN, removes phN from Think, adds phN to Eat. and removes two tokens from Free F o r k with colours LEFT(phN) and RIGHT(phN), respectively. Arcs which are not labelled with a function by default represent the identity function of the set of colours attached to its transition. The identity function simply results in a member of this set of colours. Finally the initial marking of the CPN must be stated. This is:

Think Free Forks = (f0, f l , f2, f3, f4)

= {phO, phl, ph2. ph3, ph4)

Eat = {I The PT net of Figure 1 has now been transformed into an equivalent CPN in Figure 2. In fact, a CPN always has an equivalent F T net, and the process described above can easily be reversed[l2]. Consequently the descriptive power of the two formalisms is equivalent in the respect that they can be used to model the same class of systems. However, there is little doubt that CPNs produce a much more useful and succinct problem description, and are therefore a more practical approach to system modelling.

4. TRANSFORMING CPNS TO OCCAM

Occam is a high-level concurrent programming language. A typical Occam program comprises a number of concurrent processes, usually structured in a hierarchy.

Page 7: Parallel program design using high-level Petri nets

93 PARALLEL PROGRAM DESIGN USING PEIRI NElS

Processes collaborate and synchronise by exchanging messages via unidirectional. synchronous communication channels. Occam channels connect exactly two processes. One process outputs messages to the channel using the output operator '!' and the other inputs messages using the input operator '?'. The two processes must execute their corresponding input and output operations simultaneously in order to exchange a message: if either process is not ready to communicate, the other process waits until it becomes ready. Global resources are not permitted in Occam; rather, each process is restricted to directly accessing its own local address space. When global tesources need to be shared amongst a number of processes, the programmer must create server processes to control accesses to the shared data via channel communications. For a detailed description of the Occam language and features, readers are referred to Reference 13.

In order to investigate how B N s can be used to develop Occam programs, consider the following, slightly modified, problem originally from Reference 12. A set of worker processes,

all possess their own copies of a logically shared data space. Each worker process may make changes to its own copy of the shared data during its normal processing activity. When a worker does alter its data, it must communicate this change to each of the other worker processes in the systcm, so that all the processes have a consistent set of data. The process sends a message to the other workers, and waits until all these messages have been received and acknowledgements returned. Only then does the process continue with its normal activity. Further, only one worker may be communicating update messages at any instant, forcing processes to take turns at communicating with each other.

Consequently, the worker processes communicate via a fixed set of message buffers,

MB=( (s. r) I s. r E: P and s <> r }

where s is the message sender and r the receiver. A receiving process is required to acknowledge both the receipt of the message and the completion of the update. For the sake of brevity, the actual contents of the update message are ignored in both the CPN specification and the Occam implementation.

inactive - neither communicating an update or receiving an update. (In this state the worker will be carrying out the tasks which constitute its main purpose, and may cause it to update the shared data. It must, however, still be able to receive update messages from other processes.)

waiting - initiated an update and waiting for acknowledgements updating - received updale message and performing update to preserve data

consistency.

Each worker may be in three distinct states. These are:

In the same manner, each message buffer may be:

unused - sent -

no c m n t update messages update initiated and messages sent to all worker processes

Page 8: Parallel program design using high-level Petri nets

94 IAN GORTON

received - acknowledged - acknowledgement of update completed.

the message buffer has been received by the process it was sent to

A CPN specification of this system of processes is given in Figure 3. The place excllrsion ensures that only one worker process may use the message buffers at any one time: its colour set E contains only one element e and the function ABS essentially removes and replaces this token when the appropriate transitions are fired. The functions REC and MINE are defined as:

V (s, r) E MB [ REC( (s, r) ) = r 1 V s E P [ MINE(s) = C (s, r) 3

r<>s

p Initiate

MB

Figure 3. Coloured Petri Net for the shared data process network

Page 9: Parallel program design using high-level Petri nets

PARALLH, PROGRAM DESIGN USING PETRI NETS 95

The initial marking of the net is:

inactive = (p0, pl, ..., pN} = C P unused = C M B exclusion = {e}

It is left as an exercise for the reader to understand how the net comctly models the problem. An in-depth analysis, showing that the net cannot M o c k , is given in Referencx 12.

The &t stage of transforming Figure 3 into Occam is to mgnise that each colour set (in this case, P and MB) can be modelled by Occam processes executing in parallel. State transitions within the processes are conmlled by passing messages, which can further be used for jmcess synchronisation. Consequently, the processes will communicate via Occam channels. Firstly then, the code for the worker processes P can be developed: this is illustrated in Figure 4.

PROC P ( c w OF ANY receive.update, operation.complete, start.update, ack.update.complete, VAL INT me)

PAR WHILE TRUE

INT any: BOOL globa1.data.updated: SEQ Main.Activity( global.data.updated 1 IF global.data.updated SEQ

start.update ! me operation.complete ? any

/ * all acknowledgements received * / TRUE

WHILE TRUE SKIP

INT update. type : SEQ receive.update ? Update.type Perform.Update (update. type) aCk.Update.COmplete ! me:

Figure 4. The behavww of P modelled in Occam

P comprises two parallel components, which represent the two different types of activity which a member of P must perform, namely initiating data updates and receiving update details from other processes. The Occurrence of either event is non-deterministic, and P must be capable of receiving update messages from other workers at any time, whether the process is performing its main activity, or is possibly blocked waiting to initiate an update itself. Failure to achieve this could (and in reality would!) deadlock the system. This behaviour is clearly specified in the CPN in Figure 3. From the initial marking, the transitions initiate update or receive message may fire, the former under the control of a member of P (transition is labelled P), the latter governed by the existence of the necessary messages (transition is labelled MB). The use of the Occam PAR construct in P satisfies this requirement in a simple and natural manner.

Page 10: Parallel program design using high-level Petri nets

IAN GORTON 96

In general, every transition in the CPN is represented by a channel communication in Occam. The transitions effectively specify the points when processes must either exchange data or synchronise, and the functions in the CPN specify exactly when and how processes communicate. For example, process P inputs a message informing it of an update (receive.updare ? updure.rype): this enables it to proceed to a state in which it performs the necessary changes to its data (Perform.Up&re). When completed, it sends an acknowledgement to indicate that the update has been successfully carried out, and returns to a state in which it is receptive to subsequent update messages.

Figure 5 shows the Occam process which implements the message buffers in the CPN. In its initial state (mused), it simply waits to receive messages from the worker processes attached to the srurz.updure channels. When it receives a message, it passes the information on to all the workers it is connected to, except for the one which initiated the operation: this is specified by the function MINE in the CPN and implemented by the procedure send.messuge.exc2uding in Figure 5. Once all the update messages have been sent out, the message buffer waits to receive the acknowledgements that all the updates have been performed. This is carried out by the procedure receive.messuge.excluding, which inputs acknowledgements from all the processes except the one responsible for the update. Again, this behaviour is specified in the function MINE. Finally, a synchronisation

PROC message.buffer ([ICHAN OF ANY start.update, send.messages, ack.update.complete,

operation.complete, VAL INT no.procs) PROC send.message.excluding([lCHAN OF ANY out, VAL INT id)

SEQ i - 0 FOR no.procs out[il ! id

SKIP

IF ( i < > id )

TRUE

PROC receive.message.excluding([lCHAN OF ANY in, VAL INT id)

SEQ i - 0 FOR no.procs ( i < > id 1 in[il ? any

TRUE SKIP

INT any: IF

WHILE TRUE INT any, id: ALT i - 0 FOR nO.prOCS

start .update [il ? id SEQ

send.message.excluding(send.messages, id) receive.message.excluding(ack.update.complete, id) operation.complete [id] ! any:

Figure 5. The message buffer process

Page 11: Parallel program design using high-level Petri nets

PARALLEL. PROGRAM DESIGN USING PETRI NETS 97

message is sent to the initiator, informing it that the updates have been successful. This message enables the message buffer to return to its initial, unused, state, in which it may accept update requests from any process in the system.

Note that the semantics of Occam’s synchronised channel communication combined with the nondeterministic ALT construct ensure that only one worker at a time may broadcast update messages. Thus there is no need to introduce any special processing to handle the place exclusion in the CPN. The non-determinism inherent in the ALT construct also directly matches the rules of Petri net transition firing. This m y reduces the conceptual gap between the specification and the implementation notations, and facilitates straightforward translation between the two.

All that remains is to ‘wire-up’ the message buffer process with a number of worker processes. This is done in Figure 6. The constant no.pmcs specif~es the number of P processes in this example to be 3: this is then passed to the message buffer to inform it of how many workers it must service. Channel vectors are used to connect each worker to the message buffer process, and each P process receives a value parameter giving it a distinct identity. Note how the folding of multiple worker processes in the CPN is simply represented by multiple instantiations of the process P, each with a different set of parameters.

VAL no.procs IS 3 : [no.procslCHAN OF ANY start.update,send.message,

ack.upbate.complete, end.operation:

P (send.message [Ol , end.operation 101 , start .update [Ol ,

P (send.message[ll , end.operation[ll , start.update[ll ,

P (send.message[21, end.Operati0n[2lI Start.Update[2],

message.buffer(start.update, send.message,

PAR

ack.update.complete[Ol , 0) ack.update.complete[11, 1)

ack.update.cornplete[21, 2)

ack.update.complete, end.operation, no.procs)

Figure 6. Channel declarations and top-level PAR conrtruct

An important feature of the above Occam system is the decoupling of the individual worker processes. The workers have no need to know how many other workers exist in the system; they merely broadcast update messages and receive update details from other processes. The message buffer process effectively provides a communication medium by which the workcrs can safely exchange information. This type of process architecture is typical of Occam applications, and leads to highly modular, maintainable and understandable software solutions.

5. TRANSFORMING CPNS TO PARALLEL C To further illustrate the generality of CPNs, this Section takes the example of a simple process farm and presents a solution in a shared-memory dialect of Parallel C. The particular features of Parallel C used in the example are the creation of multiple threads

Page 12: Parallel program design using high-level Petri nets

IAN GORTON 98

(processes) and the use of semaphores. All global data declared in the program are accessible to all threads. Consequently, 8ccess to shared data must be s y n c h N s e d and controlled by using semaphores. For more details on Parallel C, a comprehensive description can be found in Reference 14.

Consider a system which repeatedly receives fixed-size packets of data which need to be processed in a some way. In order to increase performance on a shared-memory multiprocessor, a number of worker processes may each simultaneously process distinct segments of the packet. When all the workers have completed, the processed packet is passed on to be used in another part of the system, and processing can begin on any new packets that are available. This system can be regarded as a typical producersonsumer incorporating some intermediate activity.

From the problem statement, the following colour sets can be defined:

Producer = {P} Consumer = {C} Worker = {wO, wl, ...., wN}, N > 0 Segment = {SO, sl, ...., sM}, M > 0

The CPN which represents this solution is given in Figure 7. The system has a single producer which inputs packets from some extemal environment (represented by the unconnected input arc PacketSource) and stores them in a buffer. Distinct, non- overlapping segments of the buffer are then allocated to a different worker process. When all the workers have completed processing, the consumer removes the complete packet from the buffer. This allows the producer to place the next packet to be processed in the buffer (if one is available). This behaviour is ensured by the place exclusion, which is defined exactly as is the previous example. The functions STORE, GET, PUT and REMOVE am defined as:

STORE(P) =cs GET(sM) =wN PuT(wN) =sM REMOVE(C) = C s

The initial marking of Figure 7 is:

Input = C P = P Inactive = C w Ready = C C = C Exclusion = {e}

Note that in this system the invariant N = M must hold the number of segments a packet is divided into is determined by the number of worker processes.

To derive a Parallel C program to implement Figure 7, it is first necessary to define the processes that are needed. Following the guidelines of the previous example, each colour set can be represented by a C process. Transitions with multiple input arcs represent points at which processes must synchronise using (in this case) semaphores. This arrangement

Page 13: Parallel program design using high-level Petri nets

PARALLEL PROGRAM DESIGN USING PE'IRI NETS 99

I as

Figure 7. A Coloured Petri nef to represent a process farm

would yield four processes (Producer, Consumer, Worker, Segment) and three semaphores (put-datainbuffer, start-processing, remove-data). While this approach could indeed be implemented and would function correctly. it is by no means optimal in a shared-memory language. The process Segment would have no interesting functionality except that of controlling the semaphores to indicate that processing may proceed and processing has

Page 14: Parallel program design using high-level Petri nets

IANGORTDN 100

completed. It would therefore be more sensible and efficient to represent the Segment colour set as a shared memory object, and delegate the control of the semaphores amongst the processes which act upon the Segments. More specifically. this means that the Producer process would control the semaphore for the transition startprocessing, and the Worker process would control the semaphore for the transition remove- data.

From this point. producing Parallel C code is relatively straightforward Figure 8 shows the code for the Worker process. A worker is passed with arguments the upper and lower bounds of the segment of the shared buffer for which it is responsible. It then waits upon the start-processing semaphore, as specified by the GET function in Figure 7 , until a new packet has been placed in the buffer. When the wait operation retums, the worker processes its segment of the buffer and signals completion using the removedata semaphore.

void worker( int lower, int upper 1

/ / Process shared buffer from 'upper' to 'lower'

{ / / local variables

for ( ; ; 1 { sema-wait(&startgrocessing); ProcessSegment(upper, lower); / / specific to system sema-signal(&remove-data);

1

Figure 8. The worker process

The Consumer process (Figure 9) merely has to wait until each of the workers has performed a signal operation on the removedata semaphore. This behaviour is specified in the REMOVE function: the consumer must wait for all the segments to be available before it can copy the buffer. The sema-wuitn function is used to perform N wait operations on the removedata semaphore, where N is the number of worker processes, and hence the number of segments that the buffer is divided into. When semu-waifn returns. the consumer copies the data from the buffer, informs the producer that the buffer is empty by signalling on the putdatainbt@er semaphore, passes the buffer on to the external environment and retums to its inactive state.

The initialisation of the system, together with the role of the Producer, are fulfilled by the main process (see Figure 10). Main firstly initialises the semaphores and starts the Worker and Consumer processes using the threadmeate function. As in the previous example, multiple instantiations of the Worker process (using thteud_create) with different actual parameters directly implements the multiple w tokens in the CPN.

Page 15: Parallel program design using high-level Petri nets

PARALLEL PROGRAM DESIGN USING PETRl NET3 101

void consumer( int no-workers )

I / / local variables

while (TRUE) {

sema-wait-n (&remove-data, no-workers) ;

CopyBufferO; / / precise details not given

sema-signal (&put-data-in-buffer) ;

Figure 9. The consumer process

Main next assumes the role of the Producer. It enters a loop in which it gets a packet from the environment and waits on the putdatuinhuffer semaphore until the shared buffer is empty. (Note that on the first execution of the loop the wait operation will succeed immediately due to the initialisation of the semaphore to a value of 1, indicated in the CPN by the initial marking of the place Exclusion). Once the packet has been copied into the shared buffer, the Producer performs N (N = no-workers) signal operations on the startprocessing semaphore, as specified by the STORE function (the semusgnuln function achieves this succinctly). The Worker processes may now operate concurrently on the shared buffer, and at the same time the Producer can be retrieving the next packet from the external environment. It may not, however, put any new packet into the shared buffer until the Consumer has removed the previous packet: this occurs after the Consumer's signal operation on the putdatainbufler semaphore.

This completes the derivation of the Parallel C processes from the CPN design. The CPN clearly represents the interactions required between processes in the process farm. and this information has been directly captured in a Parallel C process framework. Still, the precise activities involved in receiving a packet, copying it to shared storage, processing the contents of the packet, and disposing of the altered packet, have not been detailed. This is because it has been assumed that these functions are purely sequential units of computation. Hence they do not require co-operation between parallel processes and have no impact on the CPN design. The functions can therefore be written in standard C, or taken from existing libraries and simply incorporated into the framework of processes that have been constructed. If this were not the case, and these functions conrained some internal parallelism, this behaviour would have to be captured in the CPN design. This could be achieved by either expanding the CPN with additional places and transitions. or indicating that certain places and transitions represent more complex behaviour. specified in an underlying subnet. The lauer option allows CPNs to be structured hierarchically. CPN designs may be constructed in a hierarchy, with each level of the hierarchy representing the behaviour of the system at a given level of abstraction.

Page 16: Parallel program design using high-level Petri nets

IAN GORTON 102

#define no-workers 2 / / this system has just 2 workers #define nogarms 2 / / each worker has two parameters

/ / Shared Global data items

SEMA remove-data, put-data-in-buffer, start-processing;

char buffer[SIZE];

void main ( )

{ / / local variables declarations

/ / Initialise the system

sema-init (&remove-data, 0 ) ; sema-init (&put-data-in-buffer, 1 ) ; sema-init (&startgrocessing, 0 ) ;

for ( i=O ; i< no-workers ; i++ ) { Calc-Segment-Size (i) ; / / exact details not given thread-create (worker, WORKSIZE, nogarms, lower,

upper) ; 1

thread-create (consumer, WORKSIZE, 1, no-workers) ;

/ / now assume role of producer

while (TRUE) {

Get-Packet(); / / interact with external sub-system sema-wait(&put-data-in-buffer); Put-Packet-In-Buffer(); / / exact details not given sema-signal-n(&startgrocessing, no-workers);

1 / / end while

Figure 10. The producer process

6. FURTHER WORK

Petri nets would seem to be an appropriate design tool for pamllel programs. However, it is a subject of further research to discover which high-level net formalism is most suitable. Potential candidates, apart from Coloured Petri Nets, include P-Nets[ 1 13. PredicaWEvent nets1151 and P r e d i c a m i t i o n nets[l6]. Other important features such as incoxporating time and inhibitor arcs into net models also need to be investigated.

The short-term aim of the research is to incorporate Petri net concepts into a more general methodology for constructing parallel programs. The methodology would permit software engineers to design solutions in a hierarchical, architecture-independent notation.

Page 17: Parallel program design using high-level Petri nets

PARALLEL PROGRAM DESIGN USING PETRI NETS 103

The solutions could then be analysed using Petri net semantics to give a high degree of confidence in the correctness of the design. Logically correct solutions may then be targeted to a particular implementation technology by applying simple transformation rules. An important aim of the methodology is that it be applicable across a whole spectrum of application domains, ranging from dedicated high-performance parallel software running on multiprocessor supercomputers, to general-purpose applications running under the control of multithreaded, distributed operating systems.

7. CONCLUSIONS

Petri nets are suitable for the modelling and analysis of systems that involve communication, concurrency, synchronisation and co-operation. Petri net models can be represented graphically, and are amenable to simple analysis and simulation techniques. Importantly, Petri nets can be useful during a range of activities in parallel software design. For example, at a practical, informal level, Petri net models give considerable insight into the communications and synchronisation requirements of a problem. This should reduce the number of semantic errors in programs and lead to shorter development times for applications. At a more advanced level, Petri nets can provide a powerful formalism and associated analysis techniques to support the development of correct, reliable concurrent systems.

This paper has investigated the potential for using a high-level Petri net notation, namely Coloured Petri Nets, for designing parallel programs. High-level nets such as CPNs increase the modelling convenience of Petri nets and can ease the burden of analysis. Two example CPNs have been presented and implementations of these have been developed in Occam and Parallel C, respectively. The features of the two languages which facilitate the translation of the net representations directly into code have been highlighted. These features include explicit parallelism, multiple instantiations of single processes and non-deterministic synchronisation constructs. It is hoped that these examples have illustrated the applicability of the technique to both shared and non- shared-memory parallel languages, and that the same mechanisms may be used to produce implementations in other languages (e.g. Ada) which support appropriate features.

REFERENCES

1. P.A. Suhler et al.:'TDFL A task-level dataflow language', J. Parallel and Distributed

2. OJ. Dahl. E.W. Dijkstra and C.A.R. Hoare. Structured Programming, London. Academic

3. E. Yourdon and L.L. Constantine. Structured Design, Englewood Cliffs, N J Prentice-Hall,

4. AJ.R.G. Milner, A Calculus of Communicating Systems, Springer-Verlag, LNCS 92. 1980. 5. C.A.R.Hoare. Communicating Sequential Processes, Prentice-Hall International, 1985. 6. J.C. Browne. M. Azam and S. S O W 'CODE A unified approach to parallel programming'.

7. JJ. Dongarra and D.C. Sorenson.:'A portable environment for developing parallel FORTRAN

8. J.L.Peterson. Petri Net Theory and the Modelling of Systems, Prentice-Hall. inc.. 1981. 9. J J. Dongarra et al: 'Programming methodology and performance issues for advanced computer

Computing. 9, 103-115 (1990).

Press, 1972.

1979.

IEEE Sofiware. 6. (7). 10-18 (1989).

programs'. Parallel Compufing. 5, 175-198 (1987).

architectures', Parallel Computing. 8. 41-58 (1988).

Page 18: Parallel program design using high-level Petri nets

104 IAN GORTON

10. F.J.W. Symons. 'Modelling and analysis of communication protocols using numerical Petri

11. J.Billington. 'Extensions to coloured Peei nets'. 3rd International Workshop on Petri Nets and

12. K. Jensen: 'Colound Petri nets and the invariant method', Theoretical Computer Science, 14,

13. Inmos Ltd.: Occum 2 Reference M&. Prentice-Hall, 1988. 14. A.D. Culloch 'Pardel programming toolkit for 3LC. FORTRAN and Pascal'. Proceedings

15. W. Reisig, 'Petri nets, an introduction', W A C S Monogrqphs on Theoretical Computer

16. H J Genrich and K.Lautenbach, 'The analysis of distributed systems by means of predicate

nets'. PhD thesis. University of Essex. UK, 1978.

Performance Models. Kyoto. Japan, Jkcember 1989.

317-336 (1981).

8th Occam User Oroup Conference. Sheffield. UK, 1988. pp. 23-30.

Science. vol. 4, Springer-Verlag, 1985.

transition nets', L e t w e Notes in Computer Science, 7 0 123-146 (1979).