Download - 13. multiprocessing
Multiprocessor & Multicomputer OrganisationOrganisation
Parallel and Distributed Computing
Multiprocessing :: Slide 1 of 25David Rye :: MTRX 3700
Multiprocessors and MulticomputersMultiprocessors and Multicomputers
A multiprocessor system has more than oneA multiprocessor system has more than one processor, with common memory shared between processorsp
A multicomputer system has more than one processor, with each processor having localprocessor, with each processor having local memory
In either case processors may be on a commonIn either case, processors may be on a common bus (close coupled), or distributed on a network (loosely coupled)(loosely coupled)
Multiprocessing :: Slide 2 of 25David Rye :: MTRX 3700
Multiprocessing SystemsMultiprocessing Systems
Generally accepted definition of a lti i / lti ti tmultiprocessing/multicomputing system:
Multiple processors, each with its own CPU and memoryI t ti h d Interconnection hardware
Processors fail independently There exists a shared state There exists a shared state Appears to users as single system
Multiprocessing :: Slide 3 of 25David Rye :: MTRX 3700
Flynn’s TaxonomyFlynn’s Taxonomy
Computer system organisation described by two h t i ticharacteristics Number of instruction streams Number of data streams Number of data streams
SISD (PC) SIMD (Supercomputer) SIMD (Supercomputer) MISD (??)
MIMD (network of processors or network of computers) MIMD (network of processors or network of computers) Tightly coupled (backplane) Loosely coupled (network) Loosely coupled (network)
Limited usefulness but serves to categoriseMultiprocessing :: Slide 4 of 25David Rye :: MTRX 3700
Limited usefulness, but serves to categorise…
SISDSISD
Single Instruction stream, Single Data stream All conventional uniprocessor systems are SISD,
from PCs to mainframes
Examples: 8080 M6800 M68000 i8086 etc etc etc Examples: 8080, M6800, M68000, i8086, etc, etc, etc.
Multiprocessing :: Slide 5 of 25David Rye :: MTRX 3700
SISDSISD
Can include Harvard memory organisation pipelined units
Processor ‘P’
fetchorganisation, pipelined units May execute more than one
instruction simultaneously
fetch
decodest uct o s u ta eous y(superscalar processor) execute
Minstr Mdata
to I/O
Multiprocessing :: Slide 6 of 25David Rye :: MTRX 3700
SIMDSIMD
Single Instruction stream, Multiple Data stream Often called “Array Processor” or “Vector
Architecture”
One instruction unit that fetches an instruction then One instruction unit that fetches an instruction, then commands many processing elements to execute the same instruction simultaneously on many differentsame instruction simultaneously on many different data sets
Multiprocessing :: Slide 7 of 25David Rye :: MTRX 3700
SIMDSIMD
Organisation is usually in the form of a network of MemoryMaster
CPUI/Oform of a network of processing elements with local memory
CPUI/O
Various topologies are used, and may be dynamically
fi d 64kconfigured - e.g. 64k processors in the CM-2
Processing Elements with Local Memory
Multiprocessing :: Slide 8 of 25David Rye :: MTRX 3700
P11 P12 P13 P1y P000 P001
P21 P22 P23 P2y
P P P P
P010 P011
P100P31 P32 P33 P3y
P 1 P 2 P 3 P
P100 P101
P110 P111
N t i hb
Px1 Px2 Px3 Pxy P111
3 b t k Nearest neighbour network
May be end-around
3-cube network
May be end-around connected
Multiprocessing :: Slide 9 of 25David Rye :: MTRX 3700
SIMDSIMD
Examples - mainly Supercomputers Goodyear Aerospace MPP (Massively Parallel Processor) ICL DAP (Distributed Array Processor) Thinking Machines Corp CM-1 and CM-2
Uses are computational rather than for control
Comment: In 2011, only 1 of the world’s top 500 supercomputers (see TOP500) had a vectorsupercomputers (see TOP500) had a vector architecture
Multiprocessing :: Slide 10 of 25David Rye :: MTRX 3700
Dead (Super) Computer SocietyDead (Super) Computer Society ACRI Gould NPL ACRI Alliant American Supercomputer Ametek
Gould NPL Guiltech Intel Scientific Computers International Parallel Machines
Applied Dynamics Astronautics BBN CDC
Kendall Square Research Key Computer Laboratories MasPar MeikoCDC
Convex Cray Computer Cray Research
C ll H i
Meiko Multiflow Myrias Numerix
C b Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent
nCube Prisma Thinking Machines SaxpyDana/Ardent/Stellar/Stardent
Denelcor Elxsi ETA Systems
E d S th l d C t Di i i
Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek
S t S t (SSI) Evans and Sutherland Computer Division Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP
Supercomputer Systems (SSI) Suprenum Vitesse Electronics
(from http://www paralogos com/DeadSuper/ )
Multiprocessing :: Slide 11 of 25David Rye :: MTRX 3700
y p (from http://www.paralogos.com/DeadSuper/ )(see also their Architectural Themes page)
MISDMISD
Multiple Instruction stream, Single Data stream No true implementations
Pipelined processors are sometimes regarded as MISD (each data element is processed by sequentialMISD (each data element is processed by sequential segments of the pipeline)
Fetch Decode Execute Write
Examples: Cray-1, CDC Cyber 205, PIC18...
Multiprocessing :: Slide 12 of 25David Rye :: MTRX 3700
MIMDMIMD
Multiple Instruction stream, Multiple Data stream
Essentially a group of independent computersEssentially a group of independent computers
All distributed systems are MIMD All distributed systems are MIMD
Multiprocessing :: Slide 13 of 25David Rye :: MTRX 3700
Parallel and Distributed Computers
Parallel & distributed computers
Tightly L l
Multiprocessors Multicomputers
g ycoupled Loosely
coupled
Multiprocessors(shared memory)
Multicomputers(private memory)
Bus Switched Bus Switched
Sequent, Encore Ultracomputer,RP3
Workstationson a LAN
Hypercube,Transputer
A taxonomy of parallel & distributed computer systems
Multiprocessing :: Slide 14 of 25David Rye :: MTRX 3700
Structural ClassificationStructural Classification
Computer system is essentiallyessentially ‘p’ processing elements =
(CPU + registers + cache) P1 P2 P M1 M2 M
‘p’ Processors ‘m’ Memories
( g ) ‘m’ memory units joined by an inter-
P1 P2 Pp M1 M2 Mm... ...
connection network
M b l l t
Interconnection Network
Memory may be local to a processor, shared or both
Multiprocessing :: Slide 15 of 25David Rye :: MTRX 3700
Shared Memory(Multiprocessor)
Distributed Memory (Multicomputer or distributed(Multiprocessor) (Multicomputer or distributed
computer system)
‘p’ Processors ‘c’ Computers (c = P and M)
P1 P2 Pp...
M M M Local
C1 C2 Cc
InterconnectionProcessors
P1 P2 Pc
M1 M2 Mc memories
...
...
Network NP1 P2 c...
Memory MInterconnection
Network N
Multiprocessing :: Slide 16 of 25David Rye :: MTRX 3700
Shared MemoryShared Memory
If processor A writes 0x55 to its address 2000, then processor B will read 0x55 from its address 2000. This is a multiprocessor
Obviously, some mechanism is needed to resolve y,contention for the shared resource
Multiprocessing :: Slide 17 of 25David Rye :: MTRX 3700
Multiprocessor interconnections may be
Bussed (time shared) only one bus write at any timey y must prevent bus contention at the bus interface ports BREQ signals etc limited to about 64 processors
Switched multiple simultaneous writes requires fast (parallel) bus switches - not cheap!
Multiprocessing :: Slide 18 of 25David Rye :: MTRX 3700
B d S tBussed Systems
Single shared busP1 P2 Pp M1 M2 Mm
‘p’ Processors ‘m’ Memories
... ... g widely used
SystemSystembus B
‘p’ Processors ‘m’ Memories
P1 P2 Pp M1 M2 Mm
p Processors m Memories
... ...
Multiple busses relieve bus contentionB1
provides some redundancy
B2
Multiprocessing :: Slide 19 of 25David Rye :: MTRX 3700
Bb
Switched SystemsSwitched Systems
‘m’ Memories
Crossbar switch max(m,p) writes at any
M1 M2 Mm...
( ,p) ytime
requires fast mp bus P1
switch‘p’ Processors
P2
.
.
.
Pp
C b t kCrossbar network
Multiprocessing :: Slide 20 of 25David Rye :: MTRX 3700
Switched SystemsSwitched Systems
Crosspoint switch cheaper but slower!! P1
2x2 switches
M1p used in “Omega”
networksP2
Proces
M2 Memssors P3 M3
mories
P4 M4
Multiprocessing :: Slide 21 of 25David Rye :: MTRX 3700
Interconnections (topology) may be either
Static – fixed by hardwareStatic fixed by hardware
Dynamic – re-configurable in software perhaps even Dynamic – re-configurable in software, perhaps even during program execution
Multiprocessing :: Slide 22 of 25David Rye :: MTRX 3700
Static TopologiesStatic Topologies
Common arrangements Linear
are array, ring, star, cube, tree, and complete interconnection of processors.
ArrayCube
RingStar
Fully t d Tree
Multiprocessing :: Slide 23 of 25David Rye :: MTRX 3700
connected Tree
Static TopologyStatic Topology
Cube (or hypercube) gives good balance between internode length (communications latency) number of neighbouring nodes (cost of switching circuitry).
Several commercial hypercube implementations existyp p
Multiprocessing :: Slide 24 of 25David Rye :: MTRX 3700
Dynamic TopologyDynamic Topology
Single bus, multiple bus, crossbar-switched and t k ll l f d iomega networks are all examples of dynamic
topologies.
Multiprocessing :: Slide 25 of 25David Rye :: MTRX 3700