![Page 1: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/1.jpg)
Heterogeneous2 Computing
with rCUDA
Federico Silla
Universitat Politècnica de València
HPC Advisory Council Swiss Conference 2018
![Page 2: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/2.jpg)
- What is heterogeneous2 computing?
- What is rCUDA?F. Silla @ HPC Advisory Council Swiss Conference 2018
Heterogeneous2 Computing
with rCUDA
![Page 3: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/3.jpg)
What is rCUDA?
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 4: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/4.jpg)
Basics of GPU computing
Remark:
GPUs can only be used within
the node they are attached to
Basic behavior of CUDA
GPU
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 5: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/5.jpg)
Basics of GPU computing
Remark:
GPUs can only be used within
the node they are attached to
Basic behavior of CUDA
GPU
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 6: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/6.jpg)
A different approach: remote GPU virtualization
No GPU
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 7: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/7.jpg)
A different approach: remote GPU virtualization
A software technology that enables a more flexible
use of GPUs in computing facilities
No GPU
rCUDA is a development by Universitat Politècnica de València
rCUDA … remote CUDA
![Page 8: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/8.jpg)
Access to remote GPU is
transparent to applications:
no source code
modification is needed
Basics or rCUDA
rCUDA is a development by Universitat Politècnica de València
![Page 9: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/9.jpg)
Basics or rCUDA
rCUDA is a development by Universitat Politècnica de València
Access to remote GPU is
transparent to applications:
no source code
modification is needed
![Page 10: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/10.jpg)
Physical
configuration
rCUDA allows a new vision of a GPU
deployment, moving from the usual cluster
configuration …
… to the following one:
Logical
configuration
rCUDA envision
![Page 11: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/11.jpg)
Performance of rCUDA
Loweris better
• K20 GPU and FDR InfiniBand
• K40 GPU and EDR InfiniBand
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 12: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/12.jpg)
Performance of rCUDA
CUDA-MEME
BarraCUDA
Loweris better
Loweris betterP100 GPU and EDR InfiniBand
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 13: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/13.jpg)
Benefits of rCUDA?
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 14: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/14.jpg)
Providing many GPUs to applications with rCUDA
Loweris better
MonteCarlo multi-GPU program running in 14 NVIDIA Tesla K20 GPUs
K20 GPUs and FDR InfiniBand
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 15: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/15.jpg)
Providing many GPUs to applications with rCUDA
64 GPUs !!
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 16: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/16.jpg)
Server consolidation with rCUDA
1
3
7
9
12
13
14
GPU utilization (%)
20 40 60 80 1000
off
off
off
off
off
off
off
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 17: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/17.jpg)
Server consolidation with rCUDA
• The GPU-Blast application is migrated up to 5 times among K40 GPUs
• The aggregated volume of GPU data is 1300 MB (consisting of 9 memory regions)
The “Reference” line is the execution time of the application when using CUDA with a local GPU and without any migration
Loweris better
![Page 18: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/18.jpg)
Heterogeneous2 environments
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 19: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/19.jpg)
rCUDA availabilityrCUDA is available for the x86,
POWER and ARM processors
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 20: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/20.jpg)
Performance of rCUDA
on POWER systems
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 21: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/21.jpg)
From x86 to POWER with rCUDA
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 22: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/22.jpg)
CUDA
rCUDArCUDA client rCUDA server
x86 @ 2.1 GHz
DDR3 1600 MHz
x86 @ 3.5 GHz
DDR4 2400 MHz
network fabric is
EDR InfiniBand
#1
#2
#3
Several testbeds used
![Page 23: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/23.jpg)
rCUDA
CUDA
Performance of data movements to/from GPU
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 24: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/24.jpg)
H2DD2H
Higher
is better
Higher
is better
Performance of data movements to/from GPU
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 25: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/25.jpg)
P2P
Performance of data movements among GPUs
Higher
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 26: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/26.jpg)
Performance of data movements among GPUs
rCUDA scenario 1
rCUDA scenario 2
rCUDA
CUDA
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 27: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/27.jpg)
Performance of data movements among GPUs
Higher is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 28: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/28.jpg)
Several applications have been analyzed in this study:
1. BarraCUDA
2. CUDA-MEME
3. CUDASW++
4. GPU-Blast
5. Gromacs
6. GPU-LIBVSM
7. Magma
8. NAMD
Application performance
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 29: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/29.jpg)
Unfortunately, we could not run all the applications in the Minsky system:
1. BarraCUDA: this application includes intrinsics headers
2. CUDA-MEME: successfully compiled and executed
3. CUDASW++: this application includes intrinsics headers
4. GPU-Blast: we were not able to compile it
5. Gromacs: successfully compiled and executed
6. GPU-LIBVSM: successfully compiled and executed
7. Magma: successfully compiled and executed
8. NAMD: we were not able to compile it
Application performance
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 30: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/30.jpg)
Application performance
Lower
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 31: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/31.jpg)
Throughput instead of
performance
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 32: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/32.jpg)
...
...
One rCUDA box serves multiple clients
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 33: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/33.jpg)
1. BarraCUDA
2. CUDA-MEME
3. CUDASW++
4. GPU-Blast
5. Gromacs
6. Magma
Lower
is better
One rCUDA box serves multiple clients
F. Silla @ HPC Advisory Council Swiss Conference 2018
- 58%
![Page 34: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/34.jpg)
One rCUDA box serves multiple clients
F. Silla @ HPC Advisory Council Swiss Conference 2018
GPU assigned but not used
GPU assigned but not used
![Page 35: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/35.jpg)
Performance of rCUDA
on ARM systems
Outline
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 36: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/36.jpg)
From ARM to x86 with rCUDA
ThunderX
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 37: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/37.jpg)
Work in progress. A couple of applications have been already analyzed:
1. Cloverleaf: a mini-app that solves the compressible Euler equations
on a Cartesian grid
2. Flow: a mini-app that implements a 2D hydrodynamics simulator
Application performance
F. Silla @ HPC Advisory Council Swiss Conference 2018
![Page 38: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/38.jpg)
Application performance: Cloverleaf
Lower
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
Single node
executions
Estimation over
multiple nodes
![Page 39: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/39.jpg)
Application performance: Cloverleaf
Lower
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
Single node
executions
Estimation over
multiple nodes
ThunderX TDP = 80 watts
P100 TDP = 250 watts
Xeon TDP = 140 watts
40*80 versus 80+3*250+2*140
3200 watts versus 1110 watts
![Page 40: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/40.jpg)
Application performance: Flow
Lower
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
Single node
executions
Estimation over
multiple nodes
![Page 41: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/41.jpg)
Application performance: Flow
Lower
is better
F. Silla @ HPC Advisory Council Swiss Conference 2018
Single node
executions
Estimation over
multiple nodes
ThunderX TDP = 80 watts
P100 TDP = 250 watts
Xeon TDP = 140 watts
60*80 versus 80+3*250+2*140
4800 watts versus 1110 watts
![Page 42: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/42.jpg)
Get a free copy of rCUDA at
http://www.rcuda.net
@rcuda_
More than 900 requests world wide
rCUDA is a development by Universitat Politècnica de València, Spain
![Page 43: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e304eb8f54b47894e4fda/html5/thumbnails/43.jpg)
·Tony Díaz · Pablo Higueras · Javier Prades · Jaime Sierra
· Cristian Peñaranda · Federico Silla · Carlos Reaño
rCUDA is a development by Universitat Politècnica de València, Spain