parallel computer models

7/28/2019 parallel computer Models

1/27

1

Parallel Computer Models

CEG 4131 Computer Architecture III

Miodrag Bolic


2/27

2

Overview

Flynns taxonomy

Classification based on the memory arrangement Classification based on communication

Classification based on the kind of parallelism Data-parallel

Function-parallel


3/27

3

Flynns Taxonomy The most universally excepted method of classifying computer

systems Published in the Proceedings of the IEEE in 1966

Any computer can be placed in one of 4 broad categories

SISD: Single instruction stream, single data stream

SIMD: Single instruction stream, multiple data streams

MIMD: Multiple instruction streams, multiple data streams

MISD: Multiple instruction streams, single data stream


4/27

4

SISD

Processing

element (PE)

Main memory

(M)

Instructions

Data

Control Unit PE MemoryPE

IS

IS DS


5/27

5

Applications:

Image processing Matrix manipulations Sorting

SIMD


6/27

6

SIMD Architectures

Fine-grained

Image processing application Large number of PEs

Minimum complexity PEs

Programming language is a simple extension of a sequentiallanguage

Coarse-grained Each PE is of higher complexity and it is usually built with

commercial devices

Each PE has local memory


7/27

7

MIMD


8/27

8

MISD

Applications: Classification Robot vision


9/27

9

Flynn taxonomy

Advantages of Flynn

Universally accepted

Compact Notation

Easy to classify a system (?)

Disadvantages of Flynn

Very coarse-grain differentiation among machine

systems

Comparison of different systems is limited Interconnections, I/O, memory not considered in the

scheme


10/27

10

Classification based on memory arrangement

PE1 PEn

Processors

Interconnectionnetwork

Shared memory

Shared memory - multiprocessors

I/O1

I/OnPE1


M1

P1

PEn

Mn

Pn

Message passing - multicomputers


11/27

11

Shared-memory multiprocessors

Uniform Memory Access (UMA)

Non-Uniform Memory Access (NUMA) Cache-only Memory Architecture (COMA)

Memory is common to all the processors.

Processors easily communicate by means of sharedvariables.


12/27

12

The UMA Model

Tightly-coupled systems (high degree of resource

sharing) Suitable for general-purpose and time-sharing

applications by multiple users.

P1

$

Interconnection network

$

Pn

Mem Mem


13/27

13

Symmetric and asymmetric multiprocessors Symmetric:

- all processors have equal access to all peripheraldevices.- all processors are identical.

Asymmetric:

- one processor (master) executes the operating system- other processors may be of different types and may bededicated to special tasks.


14/27

14

The NUMA Model

The access time varies with the location of the memory

word. Shared memory is distributed to local memories. All local memories form a global address space

accessible by all processors

P1

$

Interconnection network

$

Pn

Mem Mem

Distributed Memory (NUMA)

Access time: Cache, Local memory, Remote memory

COMA - Cache-only Memory Architecture


15/27

15

Distributed memory multicomputers

Multiple computers- nodes

Message-passing network Local memories are private with its

own program and data

No memory contention so that the

number of processors is very large The processors are connected by

communication lines, and the preciseway in which the lines are connected

is called the topologyof themulticomputer.

A typical program consists ofsubtasks residing in all thememories.

PE


M

PE

M

PE

M

PE

M

PE

M

PE

M


16/27

16

Classification based on type of interconnections

Static networks

Dynamic networks


17/27

17

Interconnection Network [1]

Mode of Operation (Synchronous vs. Asynchronous)

Control Strategy (Centralized vs. Decentralized)

Switching Techniques (Packet switching vs. Circuit

switching)

Topology (Static Vs. Dynamic)


18/27

18

Classification based on the kind of parallelism[3]

Parallel architecturesPAs

Data-parallel architectures Function-parallel architectures

Instruction-level

PAs

Thread-level

PAs

Process-levelPAs

ILPS MIMDs

Vectorarchitecture

Associative

architecturearchitectureand neural

SIMDs Systolic Pipelinedprocessors

Processors)

processorsVLIWsSuperscalar Ditributed

memory

(multi-computer)

Sharedmemory(multi-MIMD

DPs


19/27

19

References

1. Advanced Computer Architecture and Parallel

Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.

2. Advanced Computer Architecture Parallelism,

Scalability, Programmability, by K. Hwang, McGraw-Hill

1993.3. Advanced Computer Architectures A Design Space

Approach by Desco Sima, Terence Fountain and PeterKascuk, Pearson, 1997.
http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.htmlhttp://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.html


20/27

20

Speedup

S = Speed(new) / Speed(old)

S = Work/time(new) / Work/time(old)

S = time(old) / time(new)

S = time(before improvement) /

time(after improvement)


21/27

21

Speedup

Time (one CPU): T(1)

Time (n CPUs): T(n)

Speedup: S

S = T(1)/T(n)


22/27

22

Amdahls Law

The performance improvement to be gained from using

some faster mode of execution is limited by the fractionof the time the faster mode can be used


23/27

23

20 hours

200 miles

A B

Walk 4 miles /hour

Bike 10 miles / hourCar-1 50 miles / hour

Car-2 120 miles / hour

Car-3 600 miles /hour

must walk

Example


24/27

24

20 hours

200 miles

A B

Walk 4 miles /hour 50 + 20 = 70 hours S = 1

Bike 10 miles / hour

20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9

Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2

Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4

must walk

Example


25/27

25

Amdahls Law (1967)

: The fraction of the program that is naturally serial

(1- ): The fraction of the program that is naturallyparallel


26/27

26

S = T(1)/T(N)

T(N) = T(1) + T(1)(1- )N

S =

1

+ (1- )

N

=

N

N + (1- )


27/27

27

Amdahls Law

parallel computer models

Documents