parallel computing department of computer engineering ferdowsi university hossain deldari
TRANSCRIPT
![Page 1: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/1.jpg)
Parallel Computing
Department Of Computer Engineering
Ferdowsi University
Hossain Deldari
![Page 2: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/2.jpg)
•Parallel Processing
•Super Computer
•Parallel Computer
•Amdahl’s Low, Speedup, Efficiency
•Parallel Machine Architecture
•Computational Model
•Concurrency Approach
•Parallel Programming
•Cluster Computing
Lecture organization
![Page 3: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/3.jpg)
•It is the division of work into smaller tasks
•Assigning many smaller tasks to multiple workers to work on simultaneously
•Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously
•Difficulties: coordinating, controlling and monitoring the workers
•The main goals of parallel processing are:• -solve much bigger problems much faster! •to reduce wall-clock time of execution of computer programs •to increase the size of computational problems that can be solved
What is Parallel Processing?
![Page 4: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/4.jpg)
What is a Supercomputer?A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition
ManufacturerComputer/Procs
Rmax
Rpeak
Installation SiteCountry/Year
TMC
CM-5/1024/ 1024 59.70
131.00 Los Alamos National LaboratoryUSA/
June 1993:
TOP500 Lists
Supercomputer & parallel computer
![Page 5: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/5.jpg)
June 2003:
ManufacturerComputer/Procs
Rmax
Rpeak
Installation SiteCountry/Year
NECEarth-Simulator/ 5120
35860.0040960.00 Earth simulator center
Japan
Rmax Maximal LINPACK performance achieved
Rpeak Theoretical peak performance
LINPACK is a Benchmark
![Page 6: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/6.jpg)
![Page 7: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/7.jpg)
Amdahl’s Law
)(
)1()(
pTime
TimepspeedupPC
Amdahl’s low, Speedup, Efficiency
![Page 8: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/8.jpg)
![Page 9: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/9.jpg)
Efficiency is a measure of the fraction of time that a processor spends performing useful work.
Efficiency
![Page 10: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/10.jpg)
Shunt Operation
![Page 11: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/11.jpg)
• SIMD • MIMD • MISD • Clusters
Parallel and Distributed Computers
![Page 12: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/12.jpg)
SIMD (Single Instruction Multiple Data)
![Page 13: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/13.jpg)
MISD(Multi Instruction Single Data)
![Page 14: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/14.jpg)
MIMD (Multiple Instruction Multiple Data)
![Page 15: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/15.jpg)
MIMD(cont.)
![Page 16: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/16.jpg)
•Shared memory model
•Bus-based
•Switch-based
•NUMA
•Distributed memory model
•Distributed shared memory model
•Page-based
•Object-based
•Hardware
Parallel machine architecture
![Page 17: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/17.jpg)
Shared memory model
![Page 18: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/18.jpg)
- Shared memory or Multiprocessor
-OpenMP is a standard (C/C++/FORTRAN)
Advantage:
Easy Programming.
Disadvantage:
Design Complexity
Not Scalable
Shared memory model(cont.)
![Page 19: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/19.jpg)
-Bus is bottleneck
- Not scalable
Bus-based shared memory model
![Page 20: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/20.jpg)
- Maintenance is difficult.
- Expensive
- scalable
Switch-based shared memory model
![Page 21: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/21.jpg)
•NUMA stands for Non-Uniform Memory Access.•Simulated shared memory•Better scalability
NUMA model
![Page 22: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/22.jpg)
• Multi computer
•MPI(Message Passing Interface)
•Easy design
•Low cost
•High scalability
•Difficult programming
Distributed memory model
![Page 23: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/23.jpg)
Linear Array
Ring
Mesh Fully Connected
6 3
1 2
5 4
Examples of Network Topology
![Page 24: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/24.jpg)
1110 1111
1010 1011
0110 0111
0010 0011
1101
1010
1000 1001
0100 0101
0010
0000 0001
S
d = 4
Hypercubes
Examples of Network Topology(cont.)
![Page 25: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/25.jpg)
•Simpler abstraction
•Sharing data
• easier portability
•Easy design with easy programming
•Low performance(for high communication)
Distributed shared memory model
![Page 26: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/26.jpg)
Degree of Coupling
SIMD MIMD
Shared Memory
Distributed Memory
Supported Grain Sizes
Communication Speedslowfast
fine coarse
loosetight
SIMD SMP NUMA Cluster
Parallel and Distributed Architecture (Leopold, 2001)
![Page 27: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/27.jpg)
•RAM
•PRAM
•BSP
•LOGP
•MPI
Computational Model
![Page 28: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/28.jpg)
RAM Model
![Page 29: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/29.jpg)
• Synchronized Read Compute Write Cycle
• EREW• ERCW• CREW• CRCW
Control
PrivateMemory
P1
PrivateMemory
P2
PrivateMemory
Pp
Global
Memory
Parallel Random Access Machine PRAM Model
![Page 30: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/30.jpg)
• Generalization of PRAM Model
• Processor-Memory Pairs
• Communication Network
• Barrier Synchronization
Super-step
Processes
Execute Communications
Barrier Synchronization
Bulk Synchronous Parallel (BSP) Model
![Page 31: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/31.jpg)
• Cost of superstep =
w+max(hs,hr).g+l– w (maximum
number of local operation)
– hs (maximum # of packets sent)
– hr (maximum # of packets received)
g (communication throughput)
p (number of Processors)
l
(synchronization latency)
BSP Space
Complexity
![Page 32: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/32.jpg)
• Closely related to BSP• It models asynchronous execution• News Parameters
L (message latency)
o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations.
g: The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth
P: The number of processor/memory modules.
LogP Model
![Page 33: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/33.jpg)
Logp (cont.)
![Page 34: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/34.jpg)
What Is MPI?
•A message-passing library specification •message-passing model •not a compiler specification •not a specific product
•For parallel computers, clusters, and heterogeneous networks
•Full-featured
•Designed to permit (unleash?) the development of parallel software libraries
•Designed to provide access to advanced parallel hardware for •end users •library writers •tool developers
MPI(Message Passing Interface)
![Page 35: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/35.jpg)
Application
MPI
Comm.
Application
MPI
Comm.Node 1 Node 2
Task 1 Task 2
Virtual communication
Real communication
MPI Layer
![Page 36: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/36.jpg)
Matrix Multiplication Example
![Page 37: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/37.jpg)
PRAM Matrix Multiplication
Cost Of PRAM Algorithm
![Page 38: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/38.jpg)
BSP Matrix Multiplication
Cost of algorithm
![Page 39: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/39.jpg)
Concurrency Approach
•Control Parallel
•Data Parallel
![Page 40: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/40.jpg)
Control Parallel
![Page 41: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/41.jpg)
Data Parallel
![Page 42: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/42.jpg)
The Best granularity for programming
![Page 43: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/43.jpg)
•Explicit Parallel Programming
Occam, MPI, PVM
•Implicit Parallel Programming
Parallel functional programming
ML,…
Concurrent object-oriented programming
COOL,…
Data parallel programming
Fortran 90, HPF,…
Parallel Programming
![Page 44: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/44.jpg)
• A Cluster system is – Parallel multicomputer built from high-end
PCs and conventional high-speed network.– Support parallel programming
Cluster Computing
![Page 45: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/45.jpg)
• Scientific Computing– Simulation , CFD, CAD/CAM , Weather
prediction, process large volume of data• Super server system
– Scalable internet/ web server– Database server– Multimedia, video, audio server
Applications
Cluster Computing(cont.)
![Page 46: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/46.jpg)
Cluster System Building Block
High Speed Network
HW HW HW HW
OS OS OS OS
Single System Image Layer
System Tool Layer
Application Layer
Cluster Computing(cont.)
![Page 47: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/47.jpg)
Why cluster computing?
• Scalability – Build small system first, grow it later.
• Low-cost– Hardware based on COTS model
(Component off-the-shelf)– S/w(SoftWare) based on freeware from
research community • Easier to maintain • Vendor independent
Cluster Computing(cont.)
![Page 48: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari](https://reader036.vdocuments.us/reader036/viewer/2022062520/56649f045503460f94c18100/html5/thumbnails/48.jpg)
The End