array processors ch4

8/3/2019 Array Processors CH4

1/45

ARRAY PROCESSORS

SIMD Computer Organization


2/45

SIMD Computer Organization:

CONFIG I

I/O Instruction and data

PE0

PEM0

PEN-1

PEMN-1

PE1

PEM1

CU memory

CP

Interconnection N/W

Control

Databus

CU


3/45

Config II:I/O

CU memory

CU

PE0 PE1 PEN-1

Alignment Network

M0 M1 Mp-1

Data

Bus


4/45

Formally, an SIMD computer is characterized by the following set ofparameters:

C = ( N, F, I, M)

N:Number of PEs in the system.Illiac IV ___ 64

MPP __ 16,384 (Massively parallel processing)

F: Set of data routing functions provided bythe interconnection network. Example: Mesh, star, Omega andbutterfly.

I:The set of instructions for scalar, vector,

data routing and network manipulation operations.

M: Set of masking schemes, where each maskpartitions the PEs into 2 disjoint subnets of enabled PEs & disablePEs.


5/45

Components in a PE:

Ai, Bi and Ci are general purpose registers.

Only the contents of Ri are transformed to other PEs during data

transfer. If N = 2^m ( m = no. of bits required to identify a PE) PEs are

there, then Di will holdmbit address of the destined PE.

Each PEi is either active or inactive during instruction cycle:

Si = 1Active Si = 0Inactive

Ai

Si

Di Ii Ri

CiBi

ALU

Status Register

IndexRegister

DestinationRegister

DataRoutingRegister


6/45

Necessity of Data Routing:

Consider an Array:

A = (A0, A1,..,An-1)

Now, for computing:

S (n) = Ai

For n = 8 with N = 8, addition is performed in log2 N steps i.e., 3 steps.

In SISD, the same thing would take 8 steps/loops by formula:

sum = sum + A[i]

If you want to make it faster, youll burn this loop into hardware steps.

But the SAMEthing would take ONLY 3 steps in SIMD!!

i=0

n-1


7/45

SIMD:

1

2

3

4

5

6

7

8

1

1 + 2=3

2 + 3=5

3 + 4=7

4 + 5=9

5 + 6=11

6 + 7=13

7 + 8=15

1

3

5(1+5=6)

7(3+7=10)

9(5+9=14)

11(7+11=18)

13(9+13=22)

15(11+15=26)

1

3

6

10

1+!4

3+18

6+22

10+26=36

Step 3Step 2Step 1

S(7)

S(6)

S(5)

S(4)

S(3)

S(2)

S(1)

S(0)


8/45

Algorithm:Step # 1: Ai would transfer data in Ri i = 0-6

Ai Ri

Ri Ri+1 i = 0-6

Ai + Ri Ai i = 1-7

Step # 2:Ai Ri i = 0-5

Ri Ri+2 i = 0-5

Ai + Ri Ai i = 2-7

Step # 3:

Ai Ri i = 0-3

Ri Ri+4 i = 0-3

Ai + Ri Ai i = 4-7


9/45

Masking Scheme:During Data Routing:

Step # 1: PE7 is disabled.

Step # 2: PE6 and PE7 are disabled.

Step # 3: PE4 PE7 are disabled.

During Addition:

Step # 1: PE0 is not involved.

Step # 2: PE0, PE1 are not involved.

Step # 3: PE0-PE3 are not involved.


10/45

SIMD Interconnection network

Interconnectionnetworks

Static network

1-D 2-D3-D and

hypercube

Dynamicnetwork

Bus Based Switch Based

Single Stage Multistage crossbar


11/45


12/45

Static Networks (1D) LinearArray

N nodes connected by n-1 links;

Internal nodes have degree 2

End nodes have degree 1.

Diameter = n-1


13/45

2DRing Network

Like a linear array, but the two end nodes are connected

by an nth link.

Can be unidirectional or bi-directional.

Node degree (d)= 2

Diameter (D)

Unidirectional = n-1

Bidirectional = n/2


14/45

Star network

Internal node degree = n-1

External nodes have degree = 1

Network diameter = 2


15/45

Static Interconnection Networks (2D) A n-dimensional mesh[torusor wraparound mesh] is an

extension of the linear array [ring]. Degree: 2-4 Examples: Intel Paragon (2D mesh),


16/45

Mesh interconnection network (ILLIAC IV N/W)

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

a b c d

a b c d

f

g

h

f

g

h

e

e

It is a chordal ring network.


17/45

In the Illiac IV, each processor iwas connected to processors: Ex: N=16

{i+1, i1, i+4, and i4} (mod 16).

Here are the routing functions:

R+1(i) (i + 1) mod N

R1(i) (i 1) mod N

R+r(i) (i + r) mod N

Rr(i) (ir) mod N

where r = N

Where 0 i N-1.

The diameter of an Illiac-IV mesh is N 1.


18/45

Chordal Ring N/W is ILLIAC-IV n/w. Also called as partiall

connected n/w. (Diag.)0

1

2

3

4

5

6

79

10

11

12

13

14

15

8

By adding additional links (e.g. chords in a circle), the

node degree is increased, and the network diameter is

reduced.


19/45

Barrel shifting network:

A barrel shifter is sometimes called a plus-minus-2inetwork.

Routing functions:

B+i (j ) = (j + 2i) modN

Bi (j ) = (j2i) modN

where 0 j N1 and 0 i < log2N.

B+0, B-0, B+1, B-1, B+2, B-2, B+3 and B-3

In general, the diameter of a barrel shifter is (log2

N)/2.


20/45


21/45

How Barrel Shifter Network is an enhancement over MeshInterconnection network?

In Mesh Network:

R+1=(0 1 2 3 .15) R-1=(15 14 13.0)

R+4=(0 4 8 12) (1 5 9 12) (2 6 10 14) (3 7 11 15)

R-4=(15 11 7 3) (14 10 6 2) (12 9 5 1) (12 8 4 0)In Barrel Shifting Network:

B+0=(0 1 2 3 .15) B-0=(15 14 13.0)

B+1=(0 2 4 6 8..14)(1 3 5 .15) B-1=(15 13 11.1) (14 12 ..0)

B+2=(0 4 8 12) (1 5 9 13) (2 6 10 14) (3 7 11 15)

B-2=(15 11 7 3) (14 10 6 2) (13 9 5 1) (12 8 4 0)

B+3= (0 8) (1 9) (2 10) (3 11) (4 12) (5 13) (6 14) (7 15)

B-3= (15 7) (14 6) (13 5) (12 4) (11 3) (10 2) (9 1) (8 0)


22/45

Contd

Comparing the two networks

B+0= R+1 B

-0

= R-1

B+2= R+4 B-2 =R-4

Which means in general

B+n/2 = R+r B-n/2 = R-r where n=log 2N


23/45

3D- Fully Connected

6 3

1 2

5 4

In the limit, we obtain a fully-connected network, with a nodedegree ofn-1 and a diameter of 1.


24/45

3D networks : 3- cube

00 0

10 0

11 0

01 0

00 1

01 1

11 1

10 1

A hypercube is a generalized cube.

In a hypercube, there are 2n nodes, for some n.

Each node is connected to all other nodes whosenumbers differ from it in only one bit position.

The node degree of n cube equals n and so does the

network diameter.


25/45

3-cube connected cyclic (CCC) network:

Is obtained from 3-cube n/w. The idea is to cutoff corner

nodes of 3-cube and replace each by a cycle of 3 nodes.

CCC n/w is constructed from k-cube with n = 2k cycles

nodes.

Hence a 3-cube can be transformed to a 3-CCC with k x 2k

nodes.


26/45

4D hypercube

4D hypercube = two 3D hypercubes with an additional link

connecting corresponding processors


27/45

A x B switch module

A inputs and B outputs

In practice, A = B = power of 2

Each input is connected to one or more outputs

(conflicts must be avoided)

One-to-one (permutation) and one-to-many are allowed

Multistage Interconnection Network

Switch Modules


28/45

Multistage Interconnection Network

Switch Modules

A 2 2 switch can be configured for

Straight-through

Crossover

Upper broadcast (upper input to both outputs)

Lower broadcast (lower input to both outputs)


29/45

Binary Switch

2x2Switch

Legitimate States = 4


30/45

Perfect-shuffle interconnection:

This interconnection network is defined by the routing functionS (an1 a1a0)2 = (an2 a1a0an1)2

7 7

6

5

4

3

2

1

0

6

5

4

3

2

1

0

7 7

6

5

4

3

2

1

0

6

5

4

3

2

1

0

Perfect Shuffle Inverse Shuffle


31/45

a shuffle network is not a complete interconnection network. This can be

seen by looking at what happens as data is reci rculated through the

network.

An exchangepermutation can be added to a shuffle network to make

it into a complete interconnection structure.

0 1 2 3 4 5 6 7


32/45

with a shuffle-exchange network, arbitrary cyclic shifts of

an N-element array can be performed in log Nsteps. Here

is a diagram of a multistage omega network for N= 8.

E(an1 a1a0)2 = an1 a10


33/45

0

1

2

3

4

5

6

7

0

4

1

5

2

6

3

7

0

2

4

6

1

3

5

7

0

1

2

3

4

5

6

7

Exch. 1 Exch. 2 Exch. 3

Shuffle 1 Shuffle 2 Shuffle 3


34/45

Omega network features

There are log pstages each with p/2 switching elements each = p/2 * log

ptotal

Simple routing algorithm

At each stage, look at the corresponding bit (starting with the msb) of the

source and destination address

If the bits are the same, messages passes through, otherwise is crossed-

over


35/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

4

Path Contention

5


36/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

4

Path Contention

5


37/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

4

Path Contention

5


38/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

4

Path Contention

5


39/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

Path Contention


40/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

Path Contention

5


41/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

Path Contention

5


42/45

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

Path Contention

5


43/45

Extra Problems


44/45

. Find the following in a 16x16 omega network

a) Number of stages.b) Number of 2 x 2 switches needed in each stage?.

c) Draw a 16-input Omega network using 2 x 2 switches as building

blocks.

d) Show switch settings for routing a message

from node 1101 to node 0101 and from node 0111 to node 1001

simultaneously.

From node 2 to 7 and 6 to 4 simultaneously.

e) Does blocking exist in above two case?


45/45

Show the necessity of data routing and masking mechanisms during

the addition of 15 numbers. Assume each PE holds one element.

Find the number of steps required to add 16 elements.

Calculate the different routing functions.

Show the routing and addition in each step.

array processors ch4

Documents