parallel routing

Parallel Routing

Bruce, Chiu-Wing Sham

Overview

• Background

• Routing in parallel computers

• Routing in hypercube network– Bit-fixing routing algorithm– Randomized routing algorithm

Parallel Computer Architectures

• Parallel computers consist of multiple processing elements interconnected by a specific interconnection topology

• Example:– linear array– hypercube– mesh– fat tree

Interconnection Topologies

linear array

4-level fat tree mesh

3-dimensionalhypercube

Routing in Parallel Computers

• Parallel computers are modeled by directed graphs• All interconnections between processors (nodes)

occur in synchronous steps• Each link can carry at most one unit message

(packet) in one step• During a step, a node can send at most one packet

to each of its neighbors• Each node is uniquely identified by a number

between 1 and N

Permutation Routing Problem

• A network of N nodes, {1, …, N}

• Each node i contains one packet vi that should be routed to the destination node

• Each destination node d(i) for each node i, for 1 i N, should form a permutation of {1, …, N}, i.e., every node is the destination of exactly one packet

Oblivious Routing Algorithm

• Properties:– A route between each node i and each

destination node d(i) is specified– The route between the node i and the node d(i)

depends on i and d(i) only

Oblivious Routing Algorithm

dN /

• Theorem 1:– For any deterministic oblivious permutation

routing algorithm on a network of N nodes each of degree d, there is an instance of permutation routing requiring ( ) steps

• Proof:– Paper: C. Kaklamanis, D. Krizanc, T. Tsantilas, “Tight

Bounds for Oblivious Routing in the Hypercube”, Pro. of ACM symp. on Parallel alg. & architectures, 1990

Hypercube Topology

1-cube 2-cube 3-cube

000

010

110 111

100011

001

101

Addressing in 3-cube

Hypercube Network

• An n-dimensional hypercube network:– Number of nodes: N = 2n – Degree: n

– The node i with address (i1, i2, …, in) {0, 1}n and the node j with address (j1, j2, …, jn) {0, 1}n are connected if the hamming distance between (i1, i2, …, in) and (j1, j2, …, jn) is 1

Bit-Fixing Routing Algorithm

• Algorithm:– Given a destination address d(i) and an

intermediate node (i)– Compare the bits of d(i) with (i) from left to

right– Identify the first bit position at which these two

addresses differ– Route this packet to its neighbor n(i) such that

(i) and n(i) differ only in this bit position


• Example:– Source: (0, 0, 0, 0, 0, 0)– Destination: (1, 0, 1, 0, 1, 1)– (0, 0, 0, 0, 0, 0) (1, 0, 0, 0, 0, 0)

(1, 0, 1, 0, 0, 0) (1, 0, 1, 0, 1, 0)

(1, 0, 1, 0, 1, 1)


nn /2

• Corollary 1:– On an n-dimensional hypercube, there is an

instance (e.g. transpose permutation) of permutation routing requiring ( ) steps for the bit-fixing routing algorithm

– It satisfies Theorem 1 where N = 2n and d = n


• Proof:– Let (i.j) be the address of a node, where i and j

are two binary strings each of length n/2 and . is the string concatenation operation

– Consider the packet stored on node (i.j) is routed to the destination node (j.i) (transpose permutation) and look at the sources where j = 0 only


• Proof:– i.0 0.i– if i is odd, the packet must pass through node

(1.0)– No. of nodes = 2n/2/2– Only one packet can be routed on the same

edge at a time– Lower bound = 2n/2/2

Randomized Routing Algorithm

• For i = 1 to N– Route a packet vi by executing the following two

steps independently of all the other packets• Choose a random intermediate destination ti from {1,

…, N}, and route vi from i to ti using bit-fixing algorithm

• Route vi from ti to its final destination d(i) using bit-fixing algorithm

• Queuing: FIFO (delay occurs)


• Lemma 1:– If the bit-fixing algorithm is used to route a

packet vi from i to ti and vj from j to tj then their routes do not rejoin after they separate


• Proof (lemma 1):– Assume k is the node at which the two paths

separate and l is the node at which they rejoin

– According to bit-fixing scheme, vi and vj from k to l depends only on the bit representations of k and l

– vi and vj must follow the same route

– Contradict to the assumption


• Let the route of packet vi follow the sequence of edges pi = (e1, e2, …, ek)

• Let S be the set of packets (other than vi) whose routes pass through at least one of {e1, e2, …, ek}

• Lemma 2:– The delay incurred by vi is at most |S|


• Proof (lemma 2):– Define lag l for any packet w, l=t – j (a packet

is ready to follow edge ej at time t

ej

w pass through ej

pi for packet vi

tj

w has lag tj-j

– If the lag of vi increase from l to l + 1, some packet should have lag l in front of vi


• Proof (lemma 2):– Let tj be the last time step at which any packet

in S has lag l

– A packet w must follow the edge ej where l= tj – j and it must leave at tj+1.

ej

w must leave

w pass through ej

pi for packet vi

tj tj+1


• Proof (lemma 2):– If the lag of vi reaches l + 1, some packet in S

leaves pi with lag l

– By lemma 1, the routes of different packets will not rejoin after separate

– Each member of S whose route intersects pi is charged at most one delay for vi


• Define a random variable Hij as:

• Let delayi be the total delay incurred by vi, then:

otherwise0

edge oneleast at share and 1 jiij

ppH

N

jiji Hdelay

1

• From linearity of expectation:

N

jij

N

jiji HEHEdelayE

11

][][][


• For an edge e of the hypercube, let the random variable T(e) be the number of routes that pass through e. If pi = (e1, …, ek), then:

k

ii

N

jij eTH

11

)(

])([])([][111

k

ii

k

ii

N

jij eTEeTEHE

• We have:


• All edges in the hypercube are symmetric– E[T(el)] = E[T(em)] for any two edges el and em

– Total number of edges: Nn– The expected length of each route is n/2– Expected length of total route is Nn/2– E[T(e)] = 1/2 for all edges

22])([][

11

nkeTEHE

k

ii

N

jij

• We have:


n

iiX

1

n

iip

1

• Theorem 2 (Chernoff bound):– Let X1, X2, …, Xn be the independent Poisson

trials such that, for 1 i n, Pr[Xi = 1] = pi, where 0 pi 1.

– X = = E[X] =

]

)1([])1(Pr[

1

eX


]

)1([])1(Pr[

1

eX

]

)1([

1

1

e

)1(]

)1([

e

)1(2])1(Pr[ X

,2)1(

i.e. ,12 if 1

ee


)1(

1

2])1(Pr[

N

jijH

• We have:

2][

1

nHE

N

jij

• By using:

• Put = 11:n

N

jij nH 6

1

2]6Pr[


• Theorem 3:– With probability at least 1-2-5n, the packet vi

reaches ti in 7n or fewer steps

• Proof:– Since the total number of packets is 2n, the

probability that any of them have a delay exceeding 6n is less than 2n*2-6n = 2-5n

– The packet requires addition n steps to route from the source to the destination


• Theorem 4:– A packet reaches its destination in 14n or fewer

steps with a probability larger than (1-1/N)

• Proof:– Phase 2 of the Valiant’s scheme is identical to

Phase 1– Fail probability = 2*2-5n < 2-n = 1/N

Conclusion

• Oblivious routing algorithm may give very poor result at some specific cases

• Randomized routing algorithm can give satisfactory result for all cases with high probability

parallel routing

Documents