increasing cluster performance by combining rcuda with slurm
TRANSCRIPT
![Page 1: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/1.jpg)
Increasing cluster
performance by combining
rCUDA with Slurm
Federico SillaTechnical University of Valencia
Spain
![Page 2: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/2.jpg)
HPC Advisory Council Switzerland Conference 2016 2/56
Outline
rCUDA … what’s that?
![Page 3: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/3.jpg)
HPC Advisory Council Switzerland Conference 2016 3/56
Basics of CUDA
GPU
![Page 4: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/4.jpg)
HPC Advisory Council Switzerland Conference 2016 4/56
rCUDA … remote CUDA
No GPU
![Page 5: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/5.jpg)
HPC Advisory Council Switzerland Conference 2016 5/56
A software technology that enables a more
flexible use of GPUs in computing facilities
No GPU
rCUDA … remote CUDA
![Page 6: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/6.jpg)
HPC Advisory Council Switzerland Conference 2016 6/56
Basics of rCUDA
![Page 7: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/7.jpg)
HPC Advisory Council Switzerland Conference 2016 7/56
Basics of rCUDA
![Page 8: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/8.jpg)
HPC Advisory Council Switzerland Conference 2016 8/56
Basics of rCUDA
![Page 9: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/9.jpg)
HPC Advisory Council Switzerland Conference 2016 9/56
Physical
configuration
CP
U
Ma
inM
em
ory
Network
PC
I-e
CP
U
Ma
inM
em
ory
Network
PC
I-e
CP
U
Ma
inM
em
ory
Network
PC
I-e
CP
U
Ma
inM
em
ory
Network
PC
I-e
CP
U
Ma
inM
em
ory
Network
PC
I-e
PC
I-e
CP
U
Ma
inM
em
ory
Network
Interconnection Network
Logical connections
Logical
configuration
Cluster envision with rCUDA
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
GPUGPUmem
PC
I-e CP
U
Ma
inM
em
ory
Network
Interconnection Network
PC
I-e CP
U
GPUGPUmem
Ma
inM
em
ory
Network
GPUGPUmem
PC
I-e CP
U
GPUGPUmem
Ma
inM
em
ory
Network
GPUGPUmem
CP
U
Ma
inM
em
ory
Network
GPUGPUmem
GPUGPUmem
PC
I-e CP
U
Ma
inM
em
ory
Network
GPUGPUmem
GPUGPUmem
PC
I-e CP
U
Ma
inM
em
ory
Network
GPUGPUmem
GPUGPUmem
PC
I-e
rCUDA allows a new vision of a GPU deployment, moving from
the usual cluster configuration:
to the following one:
![Page 10: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/10.jpg)
HPC Advisory Council Switzerland Conference 2016 10/56
Outline
Two questions:
• Why should we need rCUDA?
• rCUDA … slower CUDA?
![Page 11: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/11.jpg)
HPC Advisory Council Switzerland Conference 2016 11/56
Outline
Two questions:
• Why should we need rCUDA?
• rCUDA … slower CUDA?
![Page 12: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/12.jpg)
HPC Advisory Council Switzerland Conference 2016 12/56
The main concern with rCUDA is the
reduced bandwidth to the remote GPU
Concern with rCUDA
No GPU
![Page 13: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/13.jpg)
HPC Advisory Council Switzerland Conference 2016 13/56
Using InfiniBand networks
![Page 14: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/14.jpg)
HPC Advisory Council Switzerland Conference 2016 14/56
H2D pageable D2H pageable
H2D pinned D2H pinned
rCUDA EDR Orig rCUDA EDR Opt
rCUDA FDR Orig rCUDA FDR Opt
Initial transfers within rCUDA
![Page 15: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/15.jpg)
HPC Advisory Council Switzerland Conference 2016 15/56
CUDASW++
Bioinformatics software for Smith-Waterman protein database searches
144 189 222 375 464 567 657 729 850 1000 1500 2005 2504 3005 3564 4061 4548 4743 5147 5478
0
5
10
15
20
25
30
35
40
45
0
2
4
6
8
10
12
14
16
18
FDR Overhead QDR Overhead GbE Overhead CUDA
rCUDA FDR rCUDA QDR rCUDA GbE
Sequence Length
rCU
DA
Ove
rhe
ad
(%
)
Exe
cu
tio
n T
ime
(s)
Performance depending on network
![Page 16: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/16.jpg)
HPC Advisory Council Switzerland Conference 2016 16/56
H2D pageable D2H pageable
H2D pinned
Almost 100% of
available BWD2H pinned
Almost 100% of
available BW
rCUDA EDR Orig rCUDA EDR Opt
rCUDA FDR Orig rCUDA FDR Opt
Optimized transfers within rCUDA
![Page 17: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/17.jpg)
HPC Advisory Council Switzerland Conference 2016 17/56
rCUDA optimizations on applications
• Several applications executed with CUDA and rCUDA
• K20 GPU and FDR InfiniBand
• K40 GPU and EDR InfiniBand
Lower
is better
![Page 18: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/18.jpg)
HPC Advisory Council Switzerland Conference 2016 18/56
Outline
Two questions:
• Why should we need rCUDA?
• rCUDA … slower CUDA?
![Page 19: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/19.jpg)
HPC Advisory Council Switzerland Conference 2016 19/56
Outline
rCUDA improves
cluster performance
![Page 20: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/20.jpg)
HPC Advisory Council Switzerland Conference 2016 20/56
Dual socket E5-2620v2 Intel Xeon + 32GB RAM + K20 GPU
FDR InfiniBand based cluster
Test bench for studying rCUDA+Slurm
node with the Slurmscheduler
node with the Slurmscheduler node with
the Slurmscheduler
8+1 GPU nodes
16+1 GPU nodes
4+1 GPU nodes
![Page 21: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/21.jpg)
HPC Advisory Council Switzerland Conference 2016 21/56
Applications used for tests:
GPU-Blast (21 seconds; 1 GPU; 1599 MB)
LAMMPS (15 seconds; 4 GPUs; 876 MB)
MCUDA-MEME (165 seconds; 4 GPUs; 151 MB)
GROMACS (2 nodes) (167 seconds)
NAMD (4 nodes) (11 minutes)
BarraCUDA (10 minutes; 1 GPU; 3319 MB)
GPU-LIBSVM (5 minutes; 1GPU; 145 MB)
MUMmerGPU (5 minutes; 1GPU; 2804 MB)
No
n-G
PU
Short execution time
Long execution time
Set 1
Set 2
Three workloads:
Set 1
Set 2
Set 1 + Set 2
Applications for studying rCUDA+Slurm
![Page 22: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/22.jpg)
HPC Advisory Council Switzerland Conference 2016 22/56
Workloads for studying rCUDA+Slurm (I)
![Page 23: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/23.jpg)
HPC Advisory Council Switzerland Conference 2016 23/56
Performance of rCUDA+Slurm (I)
![Page 24: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/24.jpg)
HPC Advisory Council Switzerland Conference 2016 24/56
Workloads for studying rCUDA+Slurm (II)
![Page 25: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/25.jpg)
HPC Advisory Council Switzerland Conference 2016 25/56
Performance of rCUDA+Slurm (II)
![Page 26: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/26.jpg)
HPC Advisory Council Switzerland Conference 2016 26/56
Outline
Why does rCUDA improve
cluster performance?
![Page 27: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/27.jpg)
HPC Advisory Council Switzerland Conference 2016 27/56
Interconnection Network
node nnode 2 node 3node 1
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
IeCPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
• Non-accelerated applications keep GPUs idle in the nodes
where they use all the cores
1st reason for improved performance
A CPU-only application spreading over
these nodes will make their GPUs
unavailable for accelerated applications
Hybrid MPI shared-memory
non-accelerated applications
usually span to all the cores
in a node (across n nodes)
![Page 28: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/28.jpg)
HPC Advisory Council Switzerland Conference 2016 28/56
Interconnection Network
node nnode 2 node 3node 1
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
IeCPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
• Accelerated applications keep CPUs idle in the nodes
where they executeAn accelerated application using just one
CPU core may avoid other jobs to be
dispatched to this node
Hybrid MPI shared-memory
non-accelerated applications
usually span to all the cores
in a node (across n nodes)
2nd reason for improved performance (I)
![Page 29: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/29.jpg)
HPC Advisory Council Switzerland Conference 2016 29/56
Hybrid MPI shared-memory
non-accelerated applications
usually span to all the cores
in a node (across n nodes)
Interconnection Network
node nnode 2 node 3node 1
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
IeCPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
• Accelerated applications keep CPUs idle in the nodes
where they executeAn accelerated MPI application using
just one CPU core per node may
keep part of the cluster busy
2nd reason for improved performance (II)
![Page 30: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/30.jpg)
HPC Advisory Council Switzerland Conference 2016 30/56
• Do applications completely squeeze the GPUs available in the cluster?
• When a GPU is assigned to an application, computational resources
inside the GPU may not be fully used
• Application presenting low level of parallelism
• CPU code being executed (GPU assigned ≠ GPU working)
• GPU-core stall due to lack of data
• etc …
Interconnection Network
node nnode 2 node 3node 1
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
Network
GPU
PC
IeCPU
CPU RAM
RAM
RAM
Network
GPU
PC
Ie
CPU
CPU RAM
RAM
RAM
3rd reason for improved performance
![Page 31: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/31.jpg)
HPC Advisory Council Switzerland Conference 2016 31/56
GPU usage of GPU-Blast
GPU assigned
but not used
GPU assigned
but not used
![Page 32: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/32.jpg)
HPC Advisory Council Switzerland Conference 2016 32/56
GPU usage of CUDA-MEME
GPU utilization is far away from maximum
![Page 33: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/33.jpg)
HPC Advisory Council Switzerland Conference 2016 33/56
GPU usage of LAMMPS
GPU assigned
but not used
![Page 34: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/34.jpg)
HPC Advisory Council Switzerland Conference 2016 34/56
GPU allocation vs GPU utilization
GPUs
assigned
but not
used
![Page 35: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/35.jpg)
HPC Advisory Council Switzerland Conference 2016 35/56
Sharing a GPU among jobs: GPU-Blast
Two concurrent
instances of GPU-Blast
One
instance
required
about 51
seconds
![Page 36: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/36.jpg)
HPC Advisory Council Switzerland Conference 2016 36/56
Two concurrent
instances of GPU-Blast
Sharing a GPU among jobs: GPU-Blast
First
instance
![Page 37: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/37.jpg)
HPC Advisory Council Switzerland Conference 2016 37/56
Two concurrent
instances of GPU-Blast
Sharing a GPU among jobs: GPU-Blast
First
instance
Second
instance
![Page 38: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/38.jpg)
HPC Advisory Council Switzerland Conference 2016 38/56
Sharing a GPU among jobs
• LAMMPS: 876 MB
• mCUDA-MEME: 151 MB
• BarraCUDA: 3319 MB
• MUMmerGPU: 2104 MB
• GPU-LIBSVM: 145 MB
K20 GPU
![Page 39: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/39.jpg)
HPC Advisory Council Switzerland Conference 2016 39/56
Outline
Other reasons for
using rCUDA?
![Page 40: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/40.jpg)
HPC Advisory Council Switzerland Conference 2016 40/56
Cheaper cluster upgrade
No GPU
• Let’s suppose that a cluster without GPUs needs to be upgraded to use GPUs
• GPUs require large power supplies
• Are power supplies already installed in the nodes large enough?
• GPUs require large amounts of space
• Does current form factor of the nodes allow to install GPUs?
The answer to both
questions is usually “NO”
![Page 41: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/41.jpg)
HPC Advisory Council Switzerland Conference 2016 41/56
Cheaper cluster upgrade
No GPU
GPU-enabled
Approach 1: augment the cluster with some CUDA GPU-
enabled nodes only those GPU-enabled nodes can
execute accelerated applications
![Page 42: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/42.jpg)
HPC Advisory Council Switzerland Conference 2016 42/56
Cheaper cluster upgrade
Approach 2: augment the cluster with some rCUDA
servers all nodes can execute accelerated
applications
GPU-enabled
![Page 43: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/43.jpg)
HPC Advisory Council Switzerland Conference 2016 43/56
Dual socket E5-2620v2 Intel Xeon + 32GB RAM + K20 GPU
FDR InfiniBand based cluster
16 nodes without GPU + 1 node with 4 GPUs
Cheaper cluster upgrade
![Page 44: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/44.jpg)
HPC Advisory Council Switzerland Conference 2016 44/56
More workloads for studying rCUDA+Slurm
![Page 45: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/45.jpg)
HPC Advisory Council Switzerland Conference 2016 45/56
Performance
-68% -60%
-63% -56%
+131% +119%
![Page 46: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/46.jpg)
HPC Advisory Council Switzerland Conference 2016 46/56
Outline
Additional reasons for
using rCUDA?
![Page 47: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/47.jpg)
HPC Advisory Council Switzerland Conference 2016 47/56
#1: More GPUs for a single application
64
GPUs!
![Page 48: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/48.jpg)
HPC Advisory Council Switzerland Conference 2016 48/56
MonteCarlo Multi-GPU (from NVIDIA samples)
Lower
is better
Higher
is better
#1: More GPUs for a single application
FDR InfiniBand +
NVIDIA Tesla K20
![Page 49: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/49.jpg)
HPC Advisory Council Switzerland Conference 2016 49/56
#2: Virtual machines can share GPUs
• The GPU is assigned by using PCI passthrough exclusively to a single virtual machine
• Concurrent usage of the GPU is not possible
![Page 50: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/50.jpg)
HPC Advisory Council Switzerland Conference 2016 50/56
High performance
network available
Low performance
network available
#2: Virtual machines can share GPUs
![Page 51: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/51.jpg)
HPC Advisory Council Switzerland Conference 2016 51/56
Box A has 4 GPUs but only one is busy
Box B has 8 GPUs but only two are busy
1. Move jobs from Box B to Box A and
switch off Box B
2. Migration should be transparent to
applications (decided by the global
scheduler)
#3: GPU task migration
Box A
Box B
Migration is performed
at GPU granularity
![Page 52: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/52.jpg)
HPC Advisory Council Switzerland Conference 2016 52/56
1
1
37
13
14
14
Job granularity instead of GPU granularity
#3: GPU task migration
![Page 53: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/53.jpg)
HPC Advisory Council Switzerland Conference 2016 53/56
Outline
… in summary …
![Page 54: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/54.jpg)
HPC Advisory Council Switzerland Conference 2016 54/56
• Cons:
1.Reduced bandwidth to remote GPU (really a concern??)
Pros and cons of rCUDA
• Pros:
1.Many GPUs for a single application
2.Concurrent GPU access to virtual machines
3. Increased cluster throughput
4.Similar performance with smaller investment
5.Easier (cheaper) cluster upgrade
6.Migration of GPU jobs
7.Reduced energy consumption
8. Increased GPU utilization
![Page 55: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/55.jpg)
HPC Advisory Council Switzerland Conference 2016 55/56
Get a free copy of rCUDA at
http://www.rcuda.net
@rcuda_
More than 650 requests world wide
rCUDA is a development by Technical University of Valencia
![Page 56: Increasing Cluster Performance by Combining rCUDA with Slurm](https://reader034.vdocuments.us/reader034/viewer/2022051404/587884bc1a28ab466c8b6ca3/html5/thumbnails/56.jpg)
HPC Advisory Council Switzerland Conference 2016 56/56
Thanks!
Questions?
rCUDA is a development by Technical University of Valencia