a high performance interconnection network for multiprocessor systems

9
Parallel Computing 19 (1993) 993-1001 993 North-Holland PARCO 786 A high performance interconnection network for multiprocessor systems Hong Shen * School of Computing and Information Technology, Griffith University, Nathan, QLD 4111, Australia Received 14 September 1992 Abstract This paper presents a high performance interconnection network, ~f-network, constructed by the recursit,e expansion (RE) method on the basis of the Petersen graph. The network contains 10r÷l nodes and has degree 6, diameter 4r +2 and cost 24r + 12 (product of the degree and diameter) for 0 < r < 10. The cost of the network is considerably lower than the torus and hypercube and is comparable with the CCC of the same size. The diameter is lower than the torus and CCC and is comparable with the hypercube. In addition to low cost, the ~-network also possesses other properties such as high scalability, regular topology and efficient message routing. Keywords. Interconnection network; size; degree; diameter; cost; performance; recursive expansion I. Introduction A key part in a multiprocessor system is its interconnection network that decides the hardware configuration of the system and has a great impact on the system performance [1]. Design of high performance interconnection networks thus becomes a central task in multi- processor system development. Various measures on the performance of interconnection networks have been suggested in the literature [1]. There are two key parameters in an interconnection network: the network degree that is the maximum node-degree over all nodes in the network, and the network diameter that is the maximum distance over all pairs of nodes in the network. The most important factor in performance measure for a static interconnection network [3] is the product of the network degree and the network diameter [9] which is called the cost of the network [6]. Other main factors include the network scalability, simplicity of message routing, traffic uniformity and network reliability and so on [1]. A network is said with high performance if it has a low cost and meets the requirements w.r.t, the other factors taken into consideration. Design of high-performance interconnection networks has been extensively studied in the literature [3,8]. A variety of elegant networks, such as hypercube, toms, cube-connected-cycles (CCC) [1,3] have been developed and used in various current multiprocessor systems. Recently, a systematic design method called recursiue expansion (RE for short) has been proposed for the need of construction of high performance interconnection networks of various topologies [6]. * Previously with Department of Computer Science, ,~bo Akademi University, Finland. 0167-8191/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

Upload: hong-shen

Post on 21-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A high performance interconnection network for multiprocessor systems

Parallel Computing 19 (1993) 993-1001 993 North-Holland

PARCO 786

A high performance interconnection network for multiprocessor systems

H o n g S h e n *

School of Computing and Information Technology, Griffith University, Nathan, QLD 4111, Australia

Received 14 September 1992

Abstract

This paper presents a high performance interconnection network, ~f-network, constructed by the recursit,e expansion (RE) method on the basis of the Petersen graph. The network contains 10 r÷l nodes and has degree 6, diameter 4r +2 and cost 24r + 12 (product of the degree and diameter) for 0 < r < 10. The cost of the network is considerably lower than the torus and hypercube and is comparable with the CCC of the same size. The diameter is lower than the torus and CCC and is comparable with the hypercube. In addition to low cost, the ~-ne twork also possesses other properties such as high scalability, regular topology and efficient message routing.

Keywords. Interconnection network; size; degree; diameter; cost; performance; recursive expansion

I. Introduction

A key pa r t in a mul t ip rocesso r system is its i n t e rconnec t ion ne twork that dec ides the h a r d w a r e conf igura t ion of the system and has a g rea t impac t on the system p e r f o r m a n c e [1]. Des ign of high p e r f o r m a n c e in t e rconnec t ion ne tworks thus becomes a cen t ra l task in mult i - p rocesso r system deve lopmen t .

Var ious m e a s u r e s on the p e r f o r m a n c e of in t e rconnec t ion ne tworks have been sugges ted in the l i t e r a tu re [1]. T h e r e are two key p a r a m e t e r s in an in t e rconnec t ion ne twork: the ne twork degree tha t is the max imum n o d e - d e g r e e over all nodes in the ne twork , and the ne twork diameter tha t is the max imum d is tance over all pai rs of nodes in the ne twork . The most i m p o r t a n t fac tor in p e r f o r m a n c e m e a s u r e for a static i n t e rconnec t ion ne twork [3] is the p roduc t of the ne twork deg ree and the ne twork d i a m e t e r [9] which is ca l led the cost of the ne twork [6]. O t h e r ma in fac tors inc lude the ne twork scalability, simplicity of message rout ing, t raff ic uniformity and ne twork reliability and so on [1]. A ne twork is said with high performance if it has a low cost and mee t s the r e qu i r e me n t s w.r.t, the o the r factors t aken into cons idera t ion .

Des ign of h i g h - p e r f o r m a n c e in t e rconnec t ion ne tworks has been extensively s tud ied in the l i t e ra tu re [3,8]. A var ie ty of e l egan t ne tworks , such as hypercube, toms, cube-connected-cycles (CCC) [1,3] have been d e v e l o p e d and used in var ious cur ren t mul t ip rocessor systems. Recent ly , a sys temat ic des ign m e t h o d ca l led recursiue expansion ( R E for short) has been p r o p o s e d for the n e e d of cons t ruc t ion of high p e r f o r m a n c e in t e rconnec t ion ne tworks of var ious topo log ies [6].

* Previously with Department of Computer Science, ,~bo Akademi University, Finland.

0167-8191/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

Page 2: A high performance interconnection network for multiprocessor systems

994 H. Shen

This paper will describe how to construct an interconnection network of size (the number of nodes in the network) 10 r+l (0_< r_< 10) with high performance by applying the RE method on the basis of the Petersen graph [2]. The proposed network, ~ff-network, has a cost which is lower than the torus and hypercube and is comparable with the CCC of the same size. It has also other features such as regular topology, high scalability, and simple message routing [7].

2. The recursive expansion method

An interconnection network can be represented by a graph G(V, E) consisting of a set of nodes (V) and a set of edges (E) connecting the nodes. Without loss of generality, we assume the graph to be undirected.

Definition 1. Two networks G(V, E) and G'(V', E') are disjoint if V n V ' = ~1 and E o E ' = ¢. All of m (m > 2) networks are disjoint if any two of them are disjoint. The disjoint union [4] of m copies of a network G(V, E), denoted by m G-copies or mG, is a network consisting of m disjoint networks, each a copy of (i.e. isomorphic to) G.

Definition 2. A unit, denoted by G,, is a (small) network. We say that an edge connects two networks if it connects a node in one network to a node in the other network. Let 2G u c G. Any edge connecting the two copies of (3, is referred to as an inter-unit connection for the 2G,. Similarly, assuming 2 G ' c G and mG~ c G', we call a set of inter-unit connec- tions, one-to-one connecting the unit-copies in one G ' to the corresponding unit-copies in the other G ' , an inter-unit connection set for all unit-copies between the 2 G ' in G.

Figure 1 shows an example of an inter-unit connection (a) and an inter-unit connection set (b).

Definition 3. Two nodes in a network are said to be adjacent if they are connected by an edge. Two adjacent networks are created if two adjacent nodes in another network are replaced by these two networks so that the edge incident to the two nodes is replaced with a set of edges connecting the two networks.

0 0 0

0 0

0 0 0

0 0

(a) An inter-unit connection (b) An inter-unit connection set

Fig. 1. An example of connections on unit-copies.

Page 3: A high performance interconnection network for multiprocessor systems

A high performance interconnection network 995

Notation 1. We write (n, d, k)-G for a network, G, of size n, degree d and diameter k.

Notation 2. We write {(n f, dr, kf)-Gf, (n. , d. , k. ) -G u} RE~ (nr, dr , kr)_Gr for the r phases RE from (n r, d r, kr)-G r and (n. , d. , k . ) -G, to (nr, dr, kr)-Gr, where the ith-phase RE produces G i, 1 < i < r.

We also use the notation '1P [' for the size of set 9 , i.e. the number of elements in .~.

Definition 4. For {(n f, dr, kr)-G f, (n . , d. , k.)-G.} RE~ (nr, dr, kr)-Gr, the picot-set sequencers ~, PPS~s ~ for short, is a sequence of s disjoint node sets in G., PI, P2 . . . . , Ps, which meets the following conditions: (1) For any two nodes of Pi, there is a path in G. on which all nodes are in Pg, 1 < i < s. (2) IP, l= l Pjl, 1 <i, j<_s. (3) 1 _< I P, [ < min{df, n. /s} , 1 < i < s.

New edges used for the inter-unit connection during the ith-phase RE are assigned to nodes in P1 +{g-1)mo0~.,,) as evenly as possible, 1 < i < r.

We write 'PSS~se)' and 'PSS~s~)' for PSS{s ) used for minimizing the degree and for minimizing the diameter of ar, respectively, where the former can be realized by making the size of each pivot set as close as possible to d r so that the number of new edges for inter-unit connection assigned to each node in the pivot set is as small as possible (the assignment of new edges to nodes in each pivot set is in a way as even as possible, by Definition 3), and the latter minimizing the diameter of G r can be realized by setting the size of each pivot set to 1, the minimum size, so that the new edges in each phase of RE will be all assigned to the only node in each pivot set, which makes the inter-unit communication to be directly realized by the inter-node communication among all nodes of the pivot sets used in this phase and therefore eliminates any message passing between any nodes within each pivot set.

RE~ For {(nf, dr, kf)-G r, (n. , d. , k.)-G,,} (fir, dr, kr)-Gr, we form the pivot-set sequence

in G u, PL, P 2 , ' " , Ps, by Definition 4. The inter-unit connections between each pair of adjacent G~ ~-copies in the ith-phase RE are realized by one-to-one connecting nodes in different P~+{i ~)moJ~.,,) of all unit-copies in one G~_ rcopy to the corresponding ones in the other G i_ ~-copy, where 1 < i < r and G 0 = G., as illustrated in Fig. 2.

The sketch of the RE method can be described by the following algorithm:

Algorithm R E ( n r , , ~ )

{* Construct a network of size n r with property requirements .9L*} (1) Choose a (nf, dr, k[)-Gf and (n. , d. , k . ) -G, as the frame and unit respectively according

to .9~, where n.nrf -~ < n r < n.n~f.

m ---Pi

Fig. 2. Inter-unit connections in the ith phase RE.

Page 4: A high performance interconnection network for multiprocessor systems

996 H. Shen

(2) Form the pivot-set sequence, P1, P 2 , ' " , Ps, in (5;,. (3) For i : - - l t o r do

(a) Copy Gf into Gi; (b) One-to-one replace all nodes in G i with disjoint G i_ l-copies, all edges in G i with

inter-unit connection sets (each for a pair of Gi_l-copies, realized by one-to-one connecting nodes in all different P l + ( i - l ) m o d ( n , , ) in one Gi_ 1 to nodes in the corre- sponding Pl +(i-- 1)mod(nu) in the other G i_ 1).

The following theorem shows the relations of network size, degree and diameter between the initial networks, Gf and Gu, and the target network, Gr, constructed by the RE method:

Theorem 1 [6]. For {(n/, dr, ky)-Gf, (nu, d, , k~)-G,) ~ (nr, dr, kr)-Gr, where hn~ < r < (A + 1)n, and A is an integer, the size, the degree and diameter o f G r meet the following formulae:

(1) n r = nunrf. (1)

(2) I f PSS~sk) is taken,

(a) d r < d . + ( A + l ) d f ; (2)

(b) k r = k , ( r + 1) + k f r . (3)

(3) I f PSS(s~) is taken,

(a) d r < d, + Adf + n , ; (4)

r - An u

(b) k r = k u ( r + l ) + k f ( d f r - ( d f - 1 ) A n , ) - ( d f - 1 ) ( r - A n u ) . (5)

3. Construction of the ~-network

The ~f-network is constructed by the RE method on the basis of the Petersen Graph whose cost (degree × diameter) is probably optimal (lowest) for connecting 10 nodes [4]. The Petersen graph, denoted by ~ , has degree 3, diameter 2 and size 10, and is depicted in Fig. 3.

Taking the Petersen graph as both the unit and the frame and applying the RE method to them, we can construct the ~r~-network that preserves the low-cost property of the Petersen graph and expands the size to 10 r+ 1, where r > 0.

Fig. 3. The Peterson graph ~.

Page 5: A high performance interconnection network for multiprocessor systems

A high performance interconnection network 997

y-,y

(

P L -

PS,' \ Z , ~,4 v J

!

s

Pivot-set sequence: Pi={node i-1 }' 0<i<ll.

Fig. 4. The pivot-set sequence PSS(,k) in the unit ~ .

We use the pivot-set sequence PSSt~) in the unit ~ in order to minimize the diameter of the ~f-network. Letting the nodes in the unit be indexed from 0 to 9, we include one node in each pivot set and set the pivot set Pi = {n°dei-t}, 1 < i < 10, as shown in Fig. 4, which accepts the new edges for the inter-unit connections during the ith-phase expansion.

The construction ~f-network (r > 0) is realized by Algorithm RE and can be displayed as follows: • Let ~0 ~ = ~ . • Copy ~ into ~1 ~, replace each node in ~1 ~ with a ~0~-copy (all copies are disjoint) and

replace each edge in ~ with an inter-unit connection between two adjacent ~0~-copies, thus get the first-phase expanded network ~1 ~ consisting of a set of ~0~-copies together with a set of inter-unit connections.

• Again copy ~ into ~ , replace each node in ~2 ~ with a ~ - c o p y (all copies are disjoint) and each edge with an inter-unit connection set between two adjacent ~ -cop ie s , thus get the second-phase expanded network ~2 ~ consisting of a set of ~l~-copies together with a set of inter-unit connections.

• Repeat the above expanding until ~r ~ is obtained: copy ~ into ~ f , replace each node in ~ f with a ~f_l-copy constructed in the ( r - 1)-th expansion (all copies are disjoint) and each edge with an inter-unit connection set between two adjacent ~f_ 1-copies. As showing an example, Fig. 5 depicts the ~ -ne twork .

4. Performance analysis of the ~ -ne twork

The performance of an interconnection network depends mainly on the network cost. Other factors such as topology regularity, scalability, and simplicity for message routing also affect the performance, and therefore need to be taken into consideration in the design of the network. The cost of a network gives the hardware cost and maximum message delay of the network. The topology regularity implies the traffic uniformity and network reliability under the hardware-cost restriction. The scalability shows the topological extendability of the network, and the routing simplicity presents the cost for message routing in the network.

Page 6: A high performance interconnection network for multiprocessor systems

998 H. Shen

Fig. 5, The .~'i '-network.

The cost of the ~ - n e t w o r k can be computed by the formulae in Theorem 1, since the network is constructed by the RE method. For simplicity in evaluation, we set the value r in the 9 ~ - n e t w o r k not greater than 10, which means that the maximum size of the network is 1011 . This size upper-bound should be sufficient in most practical cases in use of the network. Thus, by equations (1), (2) and (3) in Theorem 1, noticing that A = 0 when n u --- n f = 10 and 0 _< r < 10, we have the following theorem:

Theorem 2. For {(10, 3, 2)-~, (10, 3, 2)-~ ae~ (nr, dr , kr)_gr~ ' where 9 is the Petersen graph and 0 < r < 10, the size, degree and diameter of ~ meet the following formulae:

n r = 10 r + l , (6)

d r = 6, ( 7 )

k r = 4r + 2. (8)

Theorem 2 shows that the ~ - n e t w o r k (0 < r < 10) has a cost

C o s t r = d r )< k r = 24r + 12. (9)

For comparison, we now consider the costs, torus hypercube cost r , cost r and Cost ccc, for building the same size (n r = 10 r+ 1, 0 < r < 10) interconnection network in three well-known configura- tions: torus, hypercube and cube-connected-cycles (CCC) as shown in Fig. 6, where CCC has a lowest cost among the known configurations used in static interconnection networks.

(a) (b) (c)

Fig. 6. The configurat ions of torus (a), hypercube (b) and CCC (c).

Page 7: A high performance interconnection network for multiprocessor systems

A high performance interconnection network 9 9 9

COSt

1400

1200

1000

8 0 0

6 0 0

4 0 0

2 0 0

0;

0

i

Torus - ÷ .. . .

Hypercube - ~ - - -

C C C • n" "

i

.e / +

/ .121"

2 4 6 8

r (network size nr=10 , + l )

Fig. 7. Cost analysis of g f , torus, hypercube and CCC.

10

Since a torus of size n , has degree 4 and diameter 2n~-~ - 1, and a hypercube of size n r has degree log n, and diameter log nr, their costs for n r = 10 r+l can be expressed as follows:

r + l

,) 10> C o s t hypercube = [(r + 1) log 10] 2. (11)

For a CCC of size n2 n, it i s k n o w n that it has degree n and diameter (2n - 1) + [ n / 2 ] (maximum message delay under the most efficient routing scheme [5,8]). When the number of nodes, N, is not equal to n2 n, a CCC can be constructed by finding a smallest n' such that n' > n and N < n'2 n, and letting each cycle have size n' and contain n' - n nodes of degree 2 (without the inter-cycle connection link). Thus a CCC of size n'2 n (n' >_ n) has degree 3 and diameter (2n - 1) + [n' /2] , where the first item of the diameter is the number of routing steps for a message from the source cycle to the destination cycle and the second item is the number of routing steps within the destination cycle for the message to reach the destination node. Since 10 r+l = (5)r+~23(r+l)_< [5r+t/4r]23"+1, where [5 '+1/4 r] is the smallest value not smaller than 3r + 1 for 0 < r < 10, the smallest CCC to connect the 10 "+1 nodes must have size n'2" = [5r+1/4"]23r+l, and thus has the following cost, where 6r + 1 + [[5~+1/4r]/2] is the diameter:

COStr ccc= 3(6r + 1 + [[5r+l/4r]/2]). (12)

For r = 0, 1, 2 , . . . , 10, we compute the costs of ~ , torus, hypercube and CCC for connecting 10 r+l nodes, and plot them in Fig. 7 for comparisons.

By Fig. 7 it is clear that the cost of ~ is considerably smaller than the torus and hypercube and is comparable with the CCC.

In order to see the maximum message delays in the above networks, we compute the diameters of these networks as well and plot them in Fig. 8, which shows that the diameter of

~r z is smaller than the torus and CCC and is comparable with the hypercube. Being built up on the basis of the Petersen graph, ~ well preserves the topological

properties of the Petersen graph and has a regular topology. Being constructed by the recursive expansion method, the . ~ - n e t w o r k is highly scalable.

The network can be easily expanded from size 10 i to size 10 i+1 by just expanding the ~ i - I

Page 8: A high performance interconnection network for multiprocessor systems

1000 H. Shen

Diameter

100

90

80

70

60

50

40

30

20

10

0

I T I I I

~ ." _ Torus -+--- ..' .-

.: . X " Hypercube-~--- . .- CCC -x-- : -" . ..×-

- . . - • . . . . x "

? • • ~ "

. . . . .

m ~

I I I I

0 2 4 6 8 • r + l

r (ne tworks lzenr=10 )

Fig. 8. Diameter analysis of .~7, torus, hypercube and CCC.

10

one phase more. It is also possible to construct a network of size 10 i + l0 j, j < i, by 'partially' expanding the ~i~_1 as needed.

With a proper node indexing scheme, message routing in ~r ~ can be simply realized and the routing algorithm has a time complexity of O(log nr) as described in [7].

5. Concluding remarks

Recursive expansion (RE) is a method for systematically constructing high-performance interconnection networks [6]. Based on two chosen small networks, unit G u and frame G i , the RE method constructs Gi+ 1 by replacing each node in a Gfcopy with a G i - c o p y and each edge in the GFcopy with a set of inter-unit connections for i = 0, 1,- •., where G O = Gu, until the desired size network is obtained.

As showing an application of the RE method, we have presented in this paper a high-performance interconnection network, the ~r:~-network, constructed by the RE method on the basis of the Petersen graph. We have shown that the ~f-network of size 10 r+l has degree 6, diameter 4r + 2 and cost 24r + 12 for 0 < r < 10. The cost of the network is considerably lower than the torus and hypercube and is comparable with the CCC of the same size. The diameter of the network is lower than the torus and CCC and is comparable with the hypercube. The proposed network has also other features such as regular topology, high scalability and simple message routing.

In the ~f-network, we chose the pivot-sequence set PSS(sk) for minimizing the network diameter in order to simplify the performance analysis, which does not prevent the other pivot-sequence set PSS(sa) for minimizing the network degree from being chosen when desired. If PSS(s~) is used, the degree and diameter of the network shall follow equations (4) and (5) in Theorem 1.

References

[1] G.S. Almasi and A. Gottlieb, Highly Parallel Computing (Ben jamin /Cummings , Monlo Park, CA, 1988). [2] L.W. Beineke and R.J. Wilson, Selected Topics in Graph Theory (2) (Academic Press, New York, 1983). [3] T. Feng, A survey of interconnection networks, Comput. 14(12) (1981) 12-27.

Page 9: A high performance interconnection network for multiprocessor systems

A high performance interconnection network 1001

[4] W. Leland, R. Finkel, L. Qiao, M. Solomon and L. Uhr, High density graphs for processor interconnection, Inform. Process. Lett. 12(3) (1981) 117-120.

[5] F.P. Preparata and J. Vuillemin, The cube-connected cycles: a versatile network for parallel processing, Comm. ACM 24(5) (1981) 300-309.

[6] H. Shen and R.-J. Back, Construction of large-size interconnection networks with high performance, to appear in Networks.

[7] H. Shen, Efficient message routing in ~r~-network, submitted for publication. [8] L.D. Wittie, Communication structures for large networks of microcomputers, IEEE Trans. Comput. C-30(4)

(1981) 264-273. [9] N.S. Woo and A. Agrawala, A symmetric tree structure interconnection network and its message traffic, IEEE

Trans. Comput. C-34(8) (1985) 765-769.