parallel processing comparative study 1. context how to finish a work in short time???? solution to...
TRANSCRIPT
![Page 1: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/1.jpg)
1
PARALLEL PROCESSING COMPARATIVE STUDY
![Page 2: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/2.jpg)
2
CONTEXT
How to finish a work in short time????
Solution
To use quicker worker.
Inconvenient:
The speed of worker has a limit
Inadequate for long works
![Page 3: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/3.jpg)
3
CONTEXT
How to finish a calculation in short time????
Solution
To use quicker calculator (processor).[1960-2000]
Inconvenient:
The speed of processor has reach a limit
Inadequate for long calculations
![Page 4: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/4.jpg)
4
CONTEXT
How to finish a work in short time????
Solution
1. To use quicker worker. (Inadequate for long works)
![Page 5: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/5.jpg)
5
CONTEXT
How to finish a work in short time????
Solution
1. To use quicker worker. (Inadequate for long works)
![Page 6: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/6.jpg)
6
CONTEXT
How to finish a work in short time????
Solution
1. To use quicker worker. (Inadequate for long works)2. To use more than one worker concurrently
![Page 7: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/7.jpg)
7
CONTEXT
How to finish a Calculation in short time????
Solution
1. To use quicker processor (Inadequate for long
calculations)
![Page 8: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/8.jpg)
8
CONTEXT
How to finish a Calculation in short time????
Solution
1. To use quicker processor (Inadequate for long
calculations)
![Page 9: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/9.jpg)
9
CONTEXT
How to finish a Calculation in short time????
Solution
1. To use quicker processor (Inadequate for long calculations)
2. To use more than one processor concurrently
![Page 10: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/10.jpg)
10
CONTEXT
How to finish a Calculation in short time????
Solution
1. To use quicker processor (Inadequate for long calculations)
2. To use more than one processor concurrently
Parallelism
![Page 11: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/11.jpg)
11
CONTEXT
Definition
The parallelism is the concurrent use of more than one processing unit (CPUs, Cores of processor, GPUs, or
combinations of them) in order to carry out calculations more quickly
![Page 12: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/12.jpg)
12
PROJECT GOAL
Parallelism needs
1. Parallel Computer (more than one processors)
2. Accommodate Calculation to Parallel Computer
![Page 13: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/13.jpg)
13
THE GOAL
Parallelism needs
1. Parallel Computer (more than one processors)
2. Accommodate Calculation to Parallel Computer
![Page 14: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/14.jpg)
14
THE GOAL
Parallel Computer
Several parallel computers in the hardware market
Differ in their architecture
Several Classifications
Based on the Instruction and Data Streams (Flynn classification)
Based on the Memory Charring Degree ….
![Page 15: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/15.jpg)
15
THE GOAL
Flynn ClassificationA. Single Instruction and Single Data stream
![Page 16: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/16.jpg)
16
THE GOAL
Flynn ClassificationB. Single Instruction and Multiple Data
![Page 17: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/17.jpg)
17
THE GOAL
Flynn ClassificationC. Multiple Instruction and Single Data stream
![Page 18: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/18.jpg)
18
THE GOAL
Flynn ClassificationD. Multiple Instruction and Multiple Data stream
![Page 19: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/19.jpg)
19
THE GOAL
Memory Sharing Degree Classification
A . Shared Memory B. Distributed memory
![Page 20: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/20.jpg)
20
THE GOAL
Memory Sharing Degree Classification
C. Hybrid Distributed-Shared Memory
![Page 21: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/21.jpg)
21
THE GOAL
Parallelism needs
1. Parallel Computer (more than one processors)
2. Accommodate Calculation to Parallel Computer
Dividing the calculation and data between the processors
Defining the execution scenario (how the processor cooperates)
![Page 22: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/22.jpg)
22
THE GOAL
Parallelism needs
1. Parallel Computer (more than one processors)
2. Accommodate Calculation to Parallel Computer
Dividing the calculation and data between the processors
Defining the execution scenario (how the processor cooperates)
![Page 23: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/23.jpg)
23
THE GOAL
Parallelism needs
1. Parallel Computer (more than one processors)
2. Accommodate Calculation to Parallel Computer
Dividing the calculation and data between the processors
Defining the execution scenario (how the processors cooperate)
![Page 24: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/24.jpg)
24
THE GOAL
The accommodation of calculation to parallel computer
Is called parallel processing
Depend closely on the architecture
![Page 25: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/25.jpg)
25
THE GOAL
Goal : A comparative study between
1. Shared Memory Parallel Processing approach
2. Distributed Memory Parallel Processing approach
![Page 26: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/26.jpg)
26
PLAN
1. Distributed Memory Parallel Processing approach
2. Shared Memory Parallel Processing approach
3. Case study problems
4. Comparison results and discussion
5. Conclusion
![Page 27: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/27.jpg)
27
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
![Page 28: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/28.jpg)
28
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
Distributed-Memory Computers (DMC)
= Distributed Memory System (DMS)
=
Massively Parallel Processor (MPP)
![Page 29: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/29.jpg)
29
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
• Distributed-memory computers architecture
![Page 30: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/30.jpg)
30
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
• Architecture of nodes
Nodes can be :
identical processors Pure DMC
different types of processor Hybrid DMC
different type of nodes with different Architectures Heterogeneous DMC
![Page 31: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/31.jpg)
31
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
• Architecture of Interconnection NetworkNo shared memory space between nodes
Network is the only way of node-communications
Network performance influence directly the performance of parallel program on DMC
Network performance depends on :
1. Topology
2. Physical connectors (as wires…)
3. Routing Technique
The DMC evolutions closely depends on the Networking evolutions
![Page 32: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/32.jpg)
32
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
The Used DMC in our Comparative Study
• Heterogeneous DMC
• Modest cluster of workstations
• Three nodes:
• Sony Laptop: i3 processor
• HP Laptop: i3 processor
• HP Laptop core 2 due processor
• Communication Network: 100 MByte-Ethernet
![Page 33: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/33.jpg)
33
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
Parallel Software Development for DMC
Designer main tasks:
1. Global Calculation decomposition and tasks assignment
2. Data decomposition
3. Communications scheme Definition
4. Synchronization Study
![Page 34: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/34.jpg)
34
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
Parallel Software Development for DMC
Important considerations for efficiency:
1. Minimize Communication
2. Avoid barrier synchronization
![Page 35: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/35.jpg)
35
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
Implementation environments
Several implementation environmentsPVM (Parallel Virtual Machine)
MPI (Message Passing Interface)
![Page 36: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/36.jpg)
DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH
MPI Application Anatomy
All the node execute the same code
All the nodes does not do the same work
It’s possible using SPMD application form
SPMD :....
The processes are organized in one controller and workers
Contradiction
![Page 37: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/37.jpg)
37
SHARED MEMORY PARALLEL PROCESSING APPROACH
Several SMPC in the Markets
Multi-core PC: Intel i3 i5 i7 ,AMD
Which SMPC we use ?
- GPU originally for image processing
- GPU NOW : Domestic Super-Computer
Characteristics:
• Chipset and fastest Shared Memory Parallel computer
• Hard Parallel Design
![Page 38: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/38.jpg)
38
SHARED MEMORY PARALLEL PROCESSING APPROACH
The GPU Architecture
The implementation environment
![Page 39: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/39.jpg)
39
SHARED MEMORY PARALLEL PROCESSING APPROACH
GPU Architecture
As the classical processing unit, the Graphics Processing Unit is composed from two main components:
A- Calculation Units B- Storage Unit
![Page 40: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/40.jpg)
40
SHARED MEMORY PARALLEL PROCESSING APPROACH
![Page 41: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/41.jpg)
41
SHARED MEMORY PARALLEL PROCESSING APPROACHSHARED MEMORY PARALLEL PROCESSING APPROACH
![Page 42: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/42.jpg)
42
SHARED MEMORY PARALLEL PROCESSING
The GPU Architecture
The implementation environment 1. CUDA : for GPUS manufactured by NVIDIA
2. OpenCL: independent of the GPU architecture
![Page 43: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/43.jpg)
43
SHARED MEMORY PARALLEL PROCESSING
CUDA Program Anatomy
![Page 44: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/44.jpg)
44
SHARED MEMORY PARALLEL PROCESSING
Q: How to execute code fragments to be parallelized in the GPU?
R: By Calling a kernel
Q: What’s Kernel ?
R: A kernel is a function callable from the host and executed on the device simultaneously by many threads in parallel
![Page 45: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/45.jpg)
45
KERNEL LAUNCH
SHARED MEMORY PARALLEL PROCESSING
![Page 46: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/46.jpg)
46
KERNEL LAUNCH
SHARED MEMORY PARALLEL PROCESSING
![Page 47: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/47.jpg)
47
KERNEL LAUNCH
SHARED MEMORY PARALLEL PROCESSING
![Page 48: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/48.jpg)
48
SHARED MEMORY PARALLEL PROCESSING
Design recommendations
utilize the shared memory to reduce the amount of time to
access the global memory.
reduce the amount of idle threads ( control divergence) to fully
utilize the GPU resource.
![Page 49: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/49.jpg)
49
CASE STUDY PROBLEM
Square Matrix multiplication problem
• ALGORITHM: ()
// Input: Two matrices and
// Output: Matrix
for to do
for to do
for to do
return
• Complexity:
If we use big notation the
![Page 50: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/50.jpg)
50
CASE STUDY PROBLEMPi approximation
• ALGORITHM: PiApprox ()
// Input: number of Bins
// Output: approximation
for to do
return
• Complexity:
If we use big notation the.
![Page 51: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/51.jpg)
51
COMPARSION
• Comparisons Creteria
• Analysis and conclusion
![Page 52: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/52.jpg)
52
COMPARISONCriteria 1: Time-Cost factor
𝑇𝐶𝐹 = ∗ 𝑃𝐸𝑇 𝐻𝐶𝑃𝐸𝑇: Parallel Execution Time (in Milliseconds)𝐻𝐶: The Hardware Cost (in Saudi Arabia Riyals)
The Hardware costs( )𝐻𝐶GPU : 5000 SAR𝐻𝐶Cluster of workstation : 9630 SAR. 𝐻𝐶
![Page 53: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/53.jpg)
53
COMPARISON
0 500 1000 1500 20000
5000000000
10000000000
15000000000
20000000000
25000000000
30000000000
35000000000
40000000000
45000000000
50000000000
Time Cost-Factor from the matrix multiplication prob-lem
GPU
cluster
matrix size
TCF
0 2000 4000 6000 8000 10000 12000 140000
2000
4000
6000
8000
10000
12000
14000
16000Time Cost-Factor from the PI approximation problem
GPU
cluster
bins number
TCF
![Page 54: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/54.jpg)
54
COMPARISON
Conclusion:
GPU is better if we need to perform a lot of number of small
amount of iterations calculation.
However if our need is to perform a calculation with big
amount of iterations, the cluster of workstations is the best
choice.
![Page 55: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/55.jpg)
55
COMPARISONCriteria 2: required Memory
Matrix multiplication problem
Graphics Processing UnitThe Global-Memory-based-method requirement:
ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=6∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡The Shared-Memory-based-method requirement:
ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=8∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡Cluster of workstations
The used cluster contains three nodes
ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=19/3∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡
![Page 56: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/56.jpg)
56
COMPARISONCriteria 2: required Memory
Pi approximation problem
• Graphics Processing Unit The size of these arrays depends on the number of used thread
The required memory = ∗ ∗ 𝟐 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒉𝒓𝒆𝒂𝒅𝒔 𝒔𝒊𝒛𝒆𝒐𝒇 𝒅𝒐𝒖𝒃𝒍𝒆• Cluster of workstations
Small amount of memory used on each node almost 15 ∗ 𝑠𝑖𝑧𝑒𝑜𝑓𝑑𝑜𝑢𝑏𝑙𝑒
![Page 57: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/57.jpg)
57
COMPARISON
Criteria 2: required Memory
Conclusion:
We cannot judge which parallel approach is the better for the required memory criteria. This criteria depends on the intrinsic characteristics of the on-hand problem.
![Page 58: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/58.jpg)
58
COMPARISON
Criteria 3 : The Gap between the Theoretical Complexity and Effective Complexity
• The Gap between the Theoretical Complexity and Effective Complexity-calculated by:
𝐺𝑎𝑝=(( / )−1)×100𝐸𝑃𝑇 𝑇𝑃𝑇𝐸𝑃𝑇: Experimental Parallel Time𝑇𝑃𝑇: Theoretical Parallel Time
𝑇𝑃𝑇 = /𝑆𝑇 𝑁𝑆𝑇: Sequential Time.𝑁: Number of processing unit.
![Page 59: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/59.jpg)
59
CLUSTER OF WORKSTATIONS
0 200 400 600 800 1000 1200 1400 1600 1800 20000
10000
20000
30000
40000
50000
60000
The Gap between the Theoretical complexity and E ective Complexity fffor Matrix multiplication problem - cluster of workstations
Matrix size
Gap
0 2000 4000 6000 8000 10000 120000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
The Gap between the Theoretical complexity and Effective Complexity for Pi approximation problem-
cluster of workstaion
Bin
Gap
COMPARISONCriteria 3 : The Gap between the Theoretical Complexity and Effective Complexity
![Page 60: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/60.jpg)
60
GRAPHICS PROCESSING UNIT
0 200 400 600 800 1000 1200 1400 1600 1800 2000
-60000
-50000
-40000
-30000
-20000
-10000
0
The Gap between the Theoretical complexity and Effective Complexity for Matrix multiplication prob-
lem- GPU.
Matrix size
Gap
0 2000 4000 6000 8000 10000 12000
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
The Gap between the Theoretical complexity and Effective Complexity for Pi approximation problem -
GPU
Bin
Gap
COMPARISONCriteria 3 : The Gap between the Theoretical Complexity and Effective Complexity
![Page 61: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/61.jpg)
61
COMPARISON
Conclusion
In the GPU, the resulting execution time of parallel program can give less time than the theoretical expected time . That is impossible to achieve when using a Cluster of workstation because of the communication overhead.
To minimize the Gap, or take it constant, in the cluster of workstations, the designer has to maintain constant, as possible, number and sizes of communicated messages when increasing the problem size.
Criteria 3 : The Gap between the Theoretical Complexity and Effective Complexity
![Page 62: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/62.jpg)
62
COMPARISON
Criteria 4: Efficiency
: Sequential Time.
: Parallel Time.
: Number processing unit
![Page 63: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/63.jpg)
63
CRITERIA 4: EFFICIENCY
0 200 400 600 800 1000 1200 1400 1600 1800 20000123456789
10111213141516
Matrix multiplication problem
cluster
GPU
matrix size
efficie
ncy
0 2000 4000 6000 8000 10000 120000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Pi approximation
Cluster
GPU
Bins number
efficie
ncy
COMPARISON
![Page 64: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/64.jpg)
64
COMPARISON
Criteria 4: Efficiency
• Conclusion: The efficiency (speedup) is much better in the GPU than in the cluster of workstations.
![Page 65: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/65.jpg)
65
IMPORTANT NOTIFICATION
one process (CPU) one thread (GPU)0
20000
40000
60000
80000
100000
120000
140000
160000
matrix sequential solution
(32*32) (128*128) (512*512) (1000*1000) (1805*1805)
ms
one process CPU one thread GPU0
2
4
6
8
10
12
14
PI sequential solution
100 1000 10000
ms
COMPARISON
![Page 66: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/66.jpg)
IMPORTANT NOTIFICATION
![Page 67: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/67.jpg)
67
COMPARISON
• Criteria 5: Hardness of development
• Cuda
• MPI
![Page 68: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/68.jpg)
68
COMPARISON
• Criteria 6: necessary hardware and software materials
• GPU (Nvidia gt 525m )
• Cluster of workstation( 3 pc, switch, internet modem and wires)
![Page 69: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/69.jpg)
69
![Page 70: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/70.jpg)
70
CONCLUSION
![Page 71: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/71.jpg)
Parallel Processing Comparative Study
Shared Memory Parallel Processing Approach Distributed Memory Parallel Processing Approach
Graphics Processing Unit (GPU) Cluster Of work-station
GPU and Cluster are the main two components of the Fastest Word Computers (As Shahin)
To compare we use : Two different problems (Matrix-Multiplication and Pi Approximation) Six Measure’s Criteria
More Adequate for Data-Level Parallelism Form More Adequate for Task –Level Parallelism Form
Big number of small calculation A Big calculation
Memory requirement ̴ Problem Characteristics Memory requirement ̴ Problem Characteristics
Better than the expected Run Time Impossible Null or Negative GAP
Complicate Design and programming Less complicated
Implementation environment very practical Complicated
![Page 72: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649d0b5503460f949dea6a/html5/thumbnails/72.jpg)
72