parallel algorithms for tree traversals

9

Click here to load reader

Upload: nc-kalra

Post on 21-Jun-2016

249 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Parallel algorithms for tree traversals

Parallel Computing 2 (1985) 163-171 163 North-Holland

Parallel algorithms for tree traversals

N.C. KALRA and P.C.P. BHATT Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi 110016, India

Received June 1984 Revised November 1984 Communicated by Professor D.J. Evans

Abstract. Three commonly used traversal methods for binary trees (forsets) are pre-order, in-order and post-order. It is well known that sequential algorithms for these traversals takes order O(N) time where N is the total number of nodes. This paper establishes a one-to-one correspondence between the set of nodes that possess right sibling and the set of leaf nodes for any forest. For the case of pre-order traversal, this result is shown to provide an alternate characterization that leads to a simple and elegant parallel algorithm of time complexity O(Iog N) with or without read-conflicts on an N processor SIMD shared memory model, where N is the total number of nodes in a forest.

Keywords. Binary tree, forest, tree traversal, parallel algorithms, read-conflict.

1. Introduction

The notion of a tree structure is very important in computer science and the theory of graphs. Three more commonly used traversal methods for binary tree are pre-order, in-order and post-order traversais. These methods have also been extended to traverse forests [1,2,3]. It is well known that sequential algorithms for these traversals take O(N) time where N is the total number of nodes. It would be interesting to investigate parallel traversal algorithms. An approach followed in the past [4] has been to find parallel algorithms for traversing binary tree and apply these to the binary tree representation of a forest(tree). Here we present parallel algorithms for traversing a forest(tree) directly, by using a novel alternate characterization. These algorithms can be used for traversing a forest(tree) represented by either left-child, right-child relations of its equivalent binary tree or left-most-child, right-sibling relations or parents-of relation with imposition of an artificial ordering among the children of each node [6]. We assume the availability of N-processors in SIMD shared-memory structure. These al- gorithms take O(log N) time with or without read-conflicts in case the forest is represented by left-child, right-child relations for its equivalent binary tree or by left-most-child, right-sibling relations. Our complexity results are no different than those obtained in [4]. We leave open the problem for obtaining O(log N) time complexity when the forest is represented by parent-of relation with an imposition of artificial ordering among childern of each node [6]. However, this time complexity result can be obtained easily with N * N processors.

We assume that read, write and standard arithmetic on integers can be performed in a unit time. The use of terms 'tree' and 'forest' in this paper stand for 'an ordered tree' and 'a set of ordered trees' repectively. Throughout this paper relations and functions are in lower ease letters while the corresponding information structures are in upper case letters.

In Section 2, we describe the pre-order traversal of a forest and its alternate characterization. In Section 3, a parallel algorithm for this traversal method based on this characterization is

0167-8191/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)

Page 2: Parallel algorithms for tree traversals

164 N.C. Kalra, P.C.P. Bhatt / Parallel algorithms for tree traoersals

presented. In Section 4, we discuss the details of representational aspects in order to derive relations required for this characterization. Similar characterization for the other two methods namely in-order and post-order can be worked out easily and are, therefore, not discussed in this paper.

2. Pre-order traversal characterization

The pre-order traversal of a forest has been defined recursively as follows [4]: (i) Visit the root of the first tree;

(ii) Traverse the subtrees of the first tree in pre-order; (iii) Traverse the remaining trees in pre-order. We shall take the forest in Fig. 1 as a running example througghut this paper. The

application of above algorithm results in the traversal sequence as depicted in Fig. 2. By observing this pre-order traversal we define a function 'follow' as defined below.

Definition 2.1. On the set of nodes of a forest define a function 'follow' for the pre-order traversal such that:

follow(j) = [ i iff node i is visited next to node j , ~j iff node j is the last node to be visited.

For our example forest,

follow(5) -- 6, follow(4) -- 15, follow(13) = 13 etc.

) Fig. 1. An example forest with arbitrary node labelling.

Fig. 2. Pre-order traversal sequence.

Page 3: Parallel algorithms for tree traversals

N.C. Kalra, P.C.P. Bhatt / Parallel algorithms for tree traversals 165

Observation 2.1. Since each node is visited exactly once, the function 'follow' threads the nodes of a forest in the order of pre-order traversal and, hence defines a linear list. Fig. 2 shows this function graphically.

Our aim is to partition the set of nodes of a given forest into mutually disjoint sets such that the function 'follow' can be computed in parallel in a constant time. At this stage we indicate that there exists a unique correspondence between the set of leaf-nodes and, the set of nodes which possess right-sibling which forms the basis for an alternate characterization of the pre-order traversal. We need some definitions to exihibit this correspondence.

We assume the availability of two primitive relations over the set of nodes of a given forest namely the left-most-child and right-sibling relation. For our example these relations are shown in the first two rows of Table 1. The relations left-most-child and right-sibling are obvious, the latter relates children of each node.

Definition 2.2. On the set of nodes of a forest define a function 'left-most-son' as follows:

i . . . if i is a leaf-node, left-most-son(i)= left-most-child(i) otherwise.

Definition 2.3. On the set of nodes of a forest define a function 'right-brother' as follows:

i . . . if right-sibling(i) = 0, right-brother(i) = right-sibling(i) otherwise.

Definition 2.4. On the set of nodes of a forest define a function 'right-most-son' as follows:

i if i is aleaf-node, j otherwise, where node j is reached by repeatedly

right-most-son(i) -- applying the'right-brother' relation to each node reached in turn, starting initially at

left-most-son(i).

Definition 2.5. On the set of nodes of a forest define a function 'right-most-descendent' as follows:

right-most-descendent(i) = j where j is a leaf-node reached by applying the function right-most-son to each node reached in turn, starting initially at node i.

Definition 2.6. A 'terminal-node' is a node that is visited last in the pre-order traversal.

Obviously it will be a unique leaf node. We shall denote it by ' t ' . All nodes other than the terminal node shall be refered to as 'non-terminal' nodes. For our example forest t is node 13.

Definition 2.7. Let I denote the index set over all leaf nodes. Define for each node i ~ I, a set S(i) as follows:

S(i) = ( j : such that right-most-descendent ( j ) = i}.

For our example forest,

I = {2, 4, 6, 8, 11, 12, 13, 14} and

S(2 ) - -{2} , S ( 4 ) - { 4 , 7 } , S ( 6 ) - - { 5 , 6 , 1 0 } , S ( 8 ) = { 8 } ,

S ( l l ) = {1, 11), S(12)= (12}, S(13)= (9, 13, 15} and S(14)= {3, 14}.

Page 4: Parallel algorithms for tree traversals

166 N. C, Kalra, P.C.P. Bhatt /Parallel algorithms for tree traversals

0

0

0

0

~ 1 ~

I 1 ~

la

Page 5: Parallel algorithms for tree traversals

N.C. Kalra, P.C.P. Bhatt / Parallel algorithms for tree traversals 167

Observation 2.2. The cardinality of set S(i ) for each i E I is always greater than or equal to one.

Observation 2.3. Index set I is also the set of right-most-descendent for all nodes of a forest.

Theorem 2.1. Let I ' = I - {t}, where I is the index set over all leaf nodes and t is the terminal node. Then, for each i ~ 1',

1) there always exists a unique node n ~ S( i) such that right-sibling (n)#: 0, and 2) this node, henceforth denoted by n(i), will occur at the highest ancestoral level among all

nodes S( i ).

Proof. By the Observation 2.2, 1' is the set of all right-most-descendent in a forest leaving the terminal node. By Definition 2.5, the right-most-descendent of a node is obtained by chaining the right-most-son of all intermediate nodes reached in turn. Clearly, only the highest ancestor in such a chain formed out of nodes S(i ) can possess right-sibling. For the terminal node the root of the last tree will occur at the highest ancestoral level among all nodes S(t) , and hence will not possess right-sibling.

Now we present the earlier indicated result in the form of a theorem.

Theorem 2.2. Let 'R' denote the set formed as follows:

R = (i , such that right-sibling (i) 4: 0),

and 1 and 1' denote the sets as defined in the Theorem 2.1. Then, there exits a one-to-one correspondence between the sets R and I'.

Proof. Note that the group of sets S(i ) where i ~ I (Definition 2.7), forms a partition on the set of nodes of the complete forest because the chains formed by each S( i ) by linking their member nodes will be different and non-intersecting. This is shown in Fig. 3. This partitioning along with the result of previous theorem implies clearly one-to-one corrospondence between the sets R and I'.

For our example forest,

1 ' = 1 - {t} = {2, 4, 6, 8, 11, 12, 14}, and R = (1, 2, 3, 7, 8, 10, 12}.

Fig. 3. Partition induced by group of sets S(i).

Page 6: Parallel algorithms for tree traversals

168 N.C. Kalra, P. C P. Bhan /Parallel algorithms for tree traversals

Z'= C

R=~r

• ~ z 6 , 8, t t , 12, I4 3

1, 2 , ,5, 7, 8, I0, /2J

Fig. 4. One-to-one corrospondence between / ' and R.

Figure 4 shows one-to-one corrospondence between these sets.

Observation 2.4. In order to cover all nodes of a forest, the nodes can be partitioned into two mutually disjoint sets as follows:

1) Set of all non-leaf nodes, 2) Set of all leaf nodes.

The set of all leaf nodes can be further partitioned into two mutually disjoint sets as follows: 2.1) Set of all leaf nodes except the terminal node, 2.2) Set containing just the terminal node.

This partitioning is exploited in our alternate characterization.

A i ternate charac ter i za t ion

For the pre-order traversal the function 'follow' can be obtained as follows:

case node j o f

non-leaf node: fo l low( j )= left-most-son(j) leaf-node : f o l l o w ( j ) = i f node n exists as defined in Theorem 2.1

then right-brother(n) e l s e j / * It is the terminal node * / e n d case.

Note that this characterization covers all nodes and hence the pre-order traversal of a forest. Figure 5 shows the traversal sequence with two types of arrows corrosponding to the above two cases.

In the next section we give a parallel algorithm assuming that the functions left-most-son and right-sibling are available while their derivation from original representation of a forest will be dealt in Section 4.

/ / / / /.~. / /

i /

i i

Fig. 5. Pre-order traversal sequence.

Page 7: Parallel algorithms for tree traversals

N.C. Kalra, P.C.P. Bhatt /Parallelalgorithmsfor tree traversals 169

3. Parallel algorithm for pre-order traversai

The alternate characterization for the pre-order traversal presented in Section 2 facilitates the computation of the function 'follow' in parallel in an N-element vector as described below:

/ * Initialize array FOLLOW to take care of the terminal node * / PRE.I: for each node i pardo

FOLLOW [i]: = i

/ * Computing the function 'follow' for the remaining two cases * / PRE.2: for each nonleaf node i pardo

FOLLOW [i]: = LEFT-MOST-SON [i]

PRE.3: for each node i with a RIGHT-BROTHER pardo FOLLOW [RIGHT-MOST-DESCENDENT [i]]: = RIGHT-BROTHER [i]

There are no read-conflicts and it takes O(1) time to compute the entire function 'follow'. The 'right-most-descendent' function can be obtained from the 'right-most-son' function which in turn can be obtained from the 'right-brother' function. We show the derivation of the 'right-most-descendent' function in an N-element vector RMD from the 'right-most-son' function, the latter being assumed to be available in another N-element vector RMS. It takes O(log N) steps to compute the former as described below:

R M D . 1 / * Initialization * / for each node i pardo

RMD [i]: = RMS [i]

R M D . 2 : / * Find the 'right-most-descendent' function in log N steps * / for log N iterations do

for each node i pardo RMD [i]: = RMD [RMD [i]]

We omit the details of obtaining the 'right-most-son' function as steps required to obtain it will be no different from the RMD.1 and RMD.2 steps.

Finally, the ranking of the linear list FOLLOW is obtained by finding the number of nodes preceding each node. This is obtained in an N-element vector RANK in O(log N) steps as follows:

R A N K . l : / * Initialization * / for each node i pardo

RANK [i]: = if FOLLOW [i] = i /* for the terminal node * / then 0 else 1

R A N K . 2 : / * Find the number of nodes following each node * / for log N iterations do

for each node i pardo begin

RANK [i]: = RANK [i] + RANK [FOLLOW [i]] FOLLOW [i]: = FOLLOW [FOLLOW [i]]

end

Page 8: Parallel algorithms for tree traversals

170 N.C. Kalra, P.C.P. Bhatt / Parallel algorithms for tree traversals

RANK.3: / * Achieve the pre-order node labelling * / for each node i pardo

RANK [i]: = N - 1 - RANK [i]

The overall time complexity of this algorithm is O(log N). Note that there are read-conflicts in steps RMD.2 and RANK.2. These are of the form of the last node(s) being referred to simultaneously by all predecessors (for the label fo the last node). One way to avoid such read-conflicts is not to make reference to the last node [5]. There exists yet another type of read-conflict in step RANK.2 where the rank of the terminal node may be referred to simultaneously by all other nodes. Since the information sought is a constant (zero) used only in an addition operation, it need not to be referred to by a node pointing to the terminal node. Thus this type of read-conflict can also be avoided. Therefore the time complexity of the algorithm remains unchanged without read-conflicts.

4. Representational issues

Forests (trees) can be represented in various ways. More commonly, these are represented in the form of arrays using [6]:

(i) 'parent-of' relation with imposition of an ordered labelling of the children of each node, (ii) 'left-most-child' and 'right-sibling' relations, and,

(iii) 'left-child' and 'right-child' relation of their equivalent binary tree forms. Note that the relations 'left-child' and 'right-child' for the binary tree equivalent representa-

tion of a forest are equivalent to the relations 'left-most-child' and 'right-sibling' respectively. It is straight forward to get functions 'left-most-son' and 'right-brother' as shown below:

for each node i pardo LEFT-MOST-SON [i]: = i f LEFT-MOST-CHILD [i] = 0 then i

else LEFT-MOST-CHILD [i]

for each node i pardo RIGHT-BROTHER [i]: = if RIGHT-SIBLING [i] = 0 then i

else RIGHT-SIBLING [i]

When the forest is represented by the 'parent-of' relation with imposition of an ordered labelling of the children of each node, it becomes difficult to obtain the 'left-most-son' and 'right-brother' function in O(log N) time using just N processors. By increasing the number of processor to N * N, the same time complexity result can be obtained easily. We omit the details on 'parent-of' relation with the remark that it is not a good representation for forest at present until someone obtains O(log N) time complexity using only N processors.

5. Conclusion

A one-to-one correspondence relation has been established between the set of nodes possessing right sibling and leaf nodes (excluding the terminal node) of any forest. This result has been utilised to provide an alternate characterization of the pre-order traversal of a forest. This characterization results in a simple and elegant parallel algorithm of time complexity O(logN) with or without read-conflicts on N processor SIMD shared memory model where N

Page 9: Parallel algorithms for tree traversals

N.C. Kalra, P.C.P. Bhatt /Parallel algorithms for tree traversals 171

is the total number of nodes in a forest. The same time complexity has been shown in [4]. This result holds good for basically two equivalent forms of forest representation which are the more commonly used i.e. ' left-most-child' and 'right-sibling' relations, and 'left-child' and right-child' relations for their equivalent binary tree forms. The problem of obtaining the same time complexity has been left open in the case when a forest is initially represented by the 'parent-of ' relation with imposition of an ordered labelling of the children of each node [6]. Similar characterization for the other tree traversals can be worked out quite easily.

Acknowledgement

The authors wish to thank the referees for their constructive comments which helped in making this paper more presentable and readable.

References

[1] D.E. Knuth, The Art of Computer Programming, Vol. 1, Fundamental Algorithms (Addison Wesley, Reading, Mass., 1973).

[2] E. Horowitz and S. Sahni, Fundamentals of Data Structure (Computer Science Press, Inc., 1977). [3] T.A. Standish, Data Structure Techniques (Addison Wesley, Reading, Mass., 1980). [4] J. Wyllie, The Complexity of Parallel Computations, Ph.D. Thesis, Cornell University, 1979. [5] D. Nath and S.N. Maheshwari, Parallel algorithms for the connected components and minimal spanning tree

problems, Inform. Process. Lett. 14(1) (1982) 7-11. [6] A.V. Aho, J.E. Hopcraft and J.D. Ullman, Data Structures and Algorithms (Addison-Wesley, Reading, Mass., 1983).