Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-1Chap. 13 Multiprocessors
13-1 Characteristics of Multiprocessors Multiprocessors System = MIMD
An interconnection of two or more CPUs with memory and I/O equipment» a single CPU and one or more IOPs is usually not included in a multiprocessor system
Unless the IOP has computational facilities comparable to a CPU
Computation can proceed in parallel in one of two ways 1) Multiple independent jobs can be made to operate in parallel 2) A single job can be partitioned into multiple parallel tasks
Classified by the memory Organization 1) Shared memory or Tightly-coupled system
» Local memory + Shared memory higher degree of interaction between tasks
2) Distribute memory or Loosely-coupled system» Local memory + message passing scheme (packet or message 전송)
most efficient when the interaction between tasks is minimal
13-2 Interconnection Structure Multiprocessor System을 구성하는 Components
1) Time-shared common bus 2) Multi-port memory 3) Crossbar switch 4) Multistage switching network 5) Hypercube system
CPU, IOP, 그리고 Memory unit 들을서로 Interconnection하는 Components
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-2
Time-shared Common Bus Time-shared single common bus system : Fig. 13-1
» Only one processor can communicate with the memory or another processor at any given time
when one processor is communicating with the memory, all other processors are either busy with internal operations or must be idle waiting for the bus
Dual common bus system : Fig. 13-2» System bus + Local bus» Shared memory
the memory connected to the common system bus is shared by all processors» System bus controller
Link each local bus to a common system bus
Memory unit
CPU 1 CPU 3CPU 2 IOP 1 IOP 2
Tightly coupled system
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-3
Multi-port memory : Fig. 13-3 multiple paths between processors and memory
» Advantage : high transfer rate can be achieved» Disadvantage : expensive memory control logic / large number of cables & connectors
Crossbar Switch : Fig. 13-4 Memory Module의 I/O Port가 하나인 경우에 Crossbar Switch를 사용해야 함
Block diagram of crossbar switch : Fig. 13-5
MM 1 MM 4MM 3MM 2
CPU 1
CPU 4
CPU 3
CPU 2
Memory modules
MM 1 MM 4MM 3MM 2
CPU 1
CPU 4
CPU 3
CPU 2
Memory modules
Memorymodule
Multiplexersand
arbitrationlogic
Data
Memory
Read/write
Address
enable
Data,address, andcontrol form CPU 1
Data,address, andcontrol form CPU 4
Data,address, andcontrol form CPU 3
Data,address, andcontrol form CPU 2
MM CPUs
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-4
cluster
cluster
cluster
cluster
cluster
cluster cluster
Crossbar-Hierarchies
clustercluster
cluster cluster clustercluster
cluster
cluster
cluster
Crossbar
NodeNode
Node
Node
Node
4
Cluster
PU
Node
CU
NetworkInterface
I/O
Local Memory
8
8
Crossbar Switch 사용 예제
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-5Crossbar Switch
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-6
Multistage Switching Network Control the communication between a number of sources and destinations
» Tightly coupled system : PU MM» Loosely coupled system : PU PU
Basic components of a multistage switching network : two-input, two-output interchange switch : Fig. 13-6
예제 ) 2 Processor (P1 and P2) are connected through switches to 8 memory modules (000 - 111) : Fig. 13-7
Omega Network : Fig. 13-8» 2 x 2 Interchange switch를 사용하여 N input x N output network topology 구성
A
B1
0
A connected to 0
A
B1
0
B connected to 1
A
B1
0
B connected to 0
A
B1
0
A connected to 1
0
1
0
1
0
1
0
1
000
111
110
101
100
011
010
0010
1
0
1
0
1
P0
P1
000
001
100
101
010
011
110
111
0
6
5
4
3
2
1
7
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-7
Hypercube Interconnection : Fig. 13-9 : one-cube, two-cube, three-cube Loosely coupled system에서 사용
Hypercube Architecture 예제 : Intel iPSC ( n = 7, 128 node → n-cube, 2n node )
13-3 Interprocessor Arbitration : Bus Control Single Bus System : Address bus, Data bus, Control bus Multiple Bus System : Memory bus, I/O bus, System bus
System bus : Bus that connects CPUs, IOPs, and Memory in multiprocessor system(bus controller/arbitrator)
Data transfer method over the system bus Synchronous bus : achieved by driving both units from a common clock source Asynchronous bus : accompanied by handshaking control signals
0
0
01
10
11
00
010
011
110
101
100
111
000
001
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-8
System Bus 예제 : IEEE Standard 796 MultiBus 86 signal lines : Tab. 13-1
» Bus Arbitration 신호선 : BREQ, BUSY, …
Bus Arbitration Algorithm : Static / Dynamic Static : priority fixed
» Serial (daisy-chain) arbitration : Fig. 13-10
» Parallel arbitration : Fig. 13-11 Dynamic : priority flexible
» Time slice (fixed length time) » Polling » LRU » FIFO » Rotating daisy-chain
* Bus Busy Line 사용If this line is inactive,
no other processor is using the bus
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-9
13-4 Interprocessor Communication & Synchronization Interprocessor Communication
shared memory : tightly coupled system» Accessible to all processors : common memory» Act as a message center similar to a mailbox
no shared memory : loosely coupled system» message passing through I/O channel communication
Interprocessor Synchronization Enforce the correct sequence of processes and ensure mutually exclusive access
to shared writable data Mutual Exclusion
» Protect data from being changed simultaneous by two or more processor Mutual Exclusion with Semaphore
» Critical Session Once begun, must complete execution before another processor accesses
» Semaphore Indicate whether or not a processor is executing a critical section
» Hardware Lock Processor generated signal to prevent other processors from using system bus
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-10
X = 120 Main memory
Bus
X = 120 X = 52X = 52 Caches
P1 P2 P3 Processors
X = 52 Main memory
Bus
X = 120 X = 52X = 52 Caches
P1 P2 P3 Processors
(a) With write-through cache policy
(b) With write-back cache policy
Semaphore를 이용한 shared memory 사용 방법
1) TSL SEM 명령 실행 (Test and Set while Locked)» Hardware Lock 신호를 발생시키면서 SEM 비트를 검사
» 2 memory cycle 필요
: Test semaphore (semaphore를 레지스터 R로 읽어 들인다) : Set semaphore (다른 processor의 shared memory 사용을 금지)
2) R = 0 인 경우 : shared memory is availableR = 1 인 경우 : processor can not access shared memory (semaphore
originally set) 13-5 Cache Coherence
Conditions for Incoherence : Fig. 13-12, 13 Multiprocessor system with private caches
» Write through : P2, P3 Incoherence» Write back : P2, P3, Main memory Incoherence
1][][
SEMM
SEMMR
P1 이 X 에 120 을Write 하는 경우
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-11
Solution to the Cache Coherence Problem Software 적인 방법
» 1) Shared writable data are non-cacheable » 2) Writable data exists in one cache : Centralized global table
Hardware 적인 방법
» 1) Monitor possible write operation : Snoopy cache controller 참고 문헌 :
» IEEE Computer, 1988, Feb.“Synchronization, coherence, and event ordering in multiprocessors”
» IEEE Computer, 1990, June.“A survey of cache coherence schemes for multiprocessors”
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-12
Snoopy Cache Controller Watches bus for write operations to the shared memory
Invalidates cache entry if the write address appears
CacheController
Address/Data Bus
SharedMemory
DataAddress
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-13
SMP Single OS Shared Memory Memory Interconnect OpenMP API: http://www.openmp.org/
MPP Multiple OS Distributed Memory Processor Interconnect MPI API : http://www.mpi-forum.org/
Cluster Cluster of IA32 (1 or 2 CPU)
Node Interconnect Constellation
Cluster of SMP node Node Interconnect
CPU
Memory
Node
Node Interconnect
Memory Interconnect
* Clusters in* top500.org *“simple” Cluster : 1 processor in each nodeCluster of small SMP’s : small # processors / nodeConstellations : large # processors / node
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-14Parallel Machine Code
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-15www.top500.org
MPP – Massively Parallel Processors Loosely coupled system, clusters are “rising”
Clusters “simple” Cluster (1 processor in each node) Cluster of small SMP’s (small # processors / node) Constellations (large # processors / node)
Older Architectures SIMD – Single Instruction Multiple Data Vector Processors (Old Cray machines)
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-16www.top500.org
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-17Beowulf Clusters
http://www.beowulf.org http://www.scyld.com http://linuxhpc.org
Computer System Architecture© Korea Univ. of Tech. & Edu.
Dept. of Info. & Comm.Chap. 13 Multiprocessors
13-18Cloud computing
Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand.
The term "cloud" is used as a metaphor for the Internet, based on the cloud drawing used in the past to represent the telephone network, and later to depict the Internet in computer network diagrams. - Wikipedia -