parallel computer models
TRANSCRIPT
-
7/28/2019 parallel computer Models
1/27
1
Parallel Computer Models
CEG 4131 Computer Architecture III
Miodrag Bolic
-
7/28/2019 parallel computer Models
2/27
2
Overview
Flynns taxonomy
Classification based on the memory arrangement Classification based on communication
Classification based on the kind of parallelism Data-parallel
Function-parallel
-
7/28/2019 parallel computer Models
3/27
3
Flynns Taxonomy The most universally excepted method of classifying computer
systems Published in the Proceedings of the IEEE in 1966
Any computer can be placed in one of 4 broad categories
SISD: Single instruction stream, single data stream
SIMD: Single instruction stream, multiple data streams
MIMD: Multiple instruction streams, multiple data streams
MISD: Multiple instruction streams, single data stream
-
7/28/2019 parallel computer Models
4/27
4
SISD
Processing
element (PE)
Main memory
(M)
Instructions
Data
Control Unit PE MemoryPE
IS
IS DS
-
7/28/2019 parallel computer Models
5/27
5
Applications:
Image processing Matrix manipulations Sorting
SIMD
-
7/28/2019 parallel computer Models
6/27
6
SIMD Architectures
Fine-grained
Image processing application Large number of PEs
Minimum complexity PEs
Programming language is a simple extension of a sequentiallanguage
Coarse-grained Each PE is of higher complexity and it is usually built with
commercial devices
Each PE has local memory
-
7/28/2019 parallel computer Models
7/27
7
MIMD
-
7/28/2019 parallel computer Models
8/27
8
MISD
Applications: Classification Robot vision
-
7/28/2019 parallel computer Models
9/27
9
Flynn taxonomy
Advantages of Flynn
Universally accepted
Compact Notation
Easy to classify a system (?)
Disadvantages of Flynn
Very coarse-grain differentiation among machine
systems
Comparison of different systems is limited Interconnections, I/O, memory not considered in the
scheme
-
7/28/2019 parallel computer Models
10/27
10
Classification based on memory arrangement
PE1 PEn
Processors
Interconnectionnetwork
Shared memory
Shared memory - multiprocessors
I/O1
I/OnPE1
Interconnectionnetwork
M1
P1
PEn
Mn
Pn
Message passing - multicomputers
-
7/28/2019 parallel computer Models
11/27
11
Shared-memory multiprocessors
Uniform Memory Access (UMA)
Non-Uniform Memory Access (NUMA) Cache-only Memory Architecture (COMA)
Memory is common to all the processors.
Processors easily communicate by means of sharedvariables.
-
7/28/2019 parallel computer Models
12/27
12
The UMA Model
Tightly-coupled systems (high degree of resource
sharing) Suitable for general-purpose and time-sharing
applications by multiple users.
P1
$
Interconnection network
$
Pn
Mem Mem
-
7/28/2019 parallel computer Models
13/27
13
Symmetric and asymmetric multiprocessors Symmetric:
- all processors have equal access to all peripheraldevices.- all processors are identical.
Asymmetric:
- one processor (master) executes the operating system- other processors may be of different types and may bededicated to special tasks.
-
7/28/2019 parallel computer Models
14/27
14
The NUMA Model
The access time varies with the location of the memory
word. Shared memory is distributed to local memories. All local memories form a global address space
accessible by all processors
P1
$
Interconnection network
$
Pn
Mem Mem
Distributed Memory (NUMA)
Access time: Cache, Local memory, Remote memory
COMA - Cache-only Memory Architecture
-
7/28/2019 parallel computer Models
15/27
15
Distributed memory multicomputers
Multiple computers- nodes
Message-passing network Local memories are private with its
own program and data
No memory contention so that the
number of processors is very large The processors are connected by
communication lines, and the preciseway in which the lines are connected
is called the topologyof themulticomputer.
A typical program consists ofsubtasks residing in all thememories.
PE
Interconnectionnetwork
M
PE
M
PE
M
PE
M
PE
M
PE
M
-
7/28/2019 parallel computer Models
16/27
16
Classification based on type of interconnections
Static networks
Dynamic networks
-
7/28/2019 parallel computer Models
17/27
17
Interconnection Network [1]
Mode of Operation (Synchronous vs. Asynchronous)
Control Strategy (Centralized vs. Decentralized)
Switching Techniques (Packet switching vs. Circuit
switching)
Topology (Static Vs. Dynamic)
-
7/28/2019 parallel computer Models
18/27
18
Classification based on the kind of parallelism[3]
Parallel architecturesPAs
Data-parallel architectures Function-parallel architectures
Instruction-level
PAs
Thread-level
PAs
Process-levelPAs
ILPS MIMDs
Vectorarchitecture
Associative
architecturearchitectureand neural
SIMDs Systolic Pipelinedprocessors
Processors)
processorsVLIWsSuperscalar Ditributed
memory
(multi-computer)
Sharedmemory(multi-MIMD
DPs
-
7/28/2019 parallel computer Models
19/27
19
References
1. Advanced Computer Architecture and Parallel
Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.
2. Advanced Computer Architecture Parallelism,
Scalability, Programmability, by K. Hwang, McGraw-Hill
1993.3. Advanced Computer Architectures A Design Space
Approach by Desco Sima, Terence Fountain and PeterKascuk, Pearson, 1997.
http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.htmlhttp://ca.wiley.com/WileyCDA/WileyTitle/productCd-0471467405.html -
7/28/2019 parallel computer Models
20/27
20
Speedup
S = Speed(new) / Speed(old)
S = Work/time(new) / Work/time(old)
S = time(old) / time(new)
S = time(before improvement) /
time(after improvement)
-
7/28/2019 parallel computer Models
21/27
21
Speedup
Time (one CPU): T(1)
Time (n CPUs): T(n)
Speedup: S
S = T(1)/T(n)
-
7/28/2019 parallel computer Models
22/27
22
Amdahls Law
The performance improvement to be gained from using
some faster mode of execution is limited by the fractionof the time the faster mode can be used
-
7/28/2019 parallel computer Models
23/27
23
20 hours
200 miles
A B
Walk 4 miles /hour
Bike 10 miles / hourCar-1 50 miles / hour
Car-2 120 miles / hour
Car-3 600 miles /hour
must walk
Example
-
7/28/2019 parallel computer Models
24/27
24
20 hours
200 miles
A B
Walk 4 miles /hour 50 + 20 = 70 hours S = 1
Bike 10 miles / hour
20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9
Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2
Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4
must walk
Example
-
7/28/2019 parallel computer Models
25/27
25
Amdahls Law (1967)
: The fraction of the program that is naturally serial
(1- ): The fraction of the program that is naturallyparallel
-
7/28/2019 parallel computer Models
26/27
26
S = T(1)/T(N)
T(N) = T(1) + T(1)(1- )N
S =
1
+ (1- )
N
=
N
N + (1- )
-
7/28/2019 parallel computer Models
27/27
27
Amdahls Law