Download - rCUDA: desde máquinas virtuales a clústers mixtos CPU-GPUdesde máquinas virtuales a clústers mixtos CPU-GPU Federico Silla Universitat Politècnica de València HPC ADMINTECH 2018

rCUDA:

desde máquinas virtuales a

clústers mixtos CPU-GPU

Federico Silla

Universitat Politècnica de València

HPC ADMINTECH 2018

rCUDA:

from virtual machines to

hybrid CPU-GPU clusters

HPC ADMINTECH 2018

Federico Silla

Universitat Politècnica de València

What is rCUDA?

Outline

F. Silla @ HPC ADMINTECH 2018

Basics of GPU computing

Remark:

GPUs can only be used within

the node they are attached to

Basic behavior of CUDA

GPU


Remark:

GPUs can only be used within

the node they are attached to

GPU

Basics of GPU computing

Basic behavior of CUDA


No GPU

A different approach: remote GPU virtualization


A software technology that enables a more flexible

use of GPUs in computing facilities

rCUDA is a development by Universitat Politècnica de València

rCUDA … remote CUDA

A different approach: remote GPU virtualization

No GPU

Access to remote GPU is

transparent to applications:

no source code

modification is needed


Basics or rCUDA


Basics or rCUDA

Access to remote GPU is

transparent to applications:

no source code

modification is needed


rCUDA supports RDMA transfers

Physical

configuration

rCUDA allows a new vision of

a GPU deployment, moving from

the usual cluster configuration …

… to the following one:

Logical

configuration


rCUDA envision

Perfomance of rCUDA?

Outline


Loweris better

• K20 GPU and FDR InfiniBand

• K40 GPU and EDR InfiniBand


Performance of rCUDA

CUDA-MEMEBarraCUDA

Loweris better

Loweris betterP100 GPU and EDR InfiniBand



rCUDA scenario 1

rCUDA scenario 2

rCUDACUDA


Performance of data movements among GPUs

Higher is better


Performance of data movements among GPUs

rCUDACUDA


Performance of data movements to/from GPUs

CPU to GPU

GPU to CPU

Higheris better



Higheris better



CPU to GPU

GPU to CPU

New communication module in progress



Benefits of rCUDA?

Outline


Benefits of rCUDA:

1. Many GPUs for an application

2. Server consolidation

3. GPU acceleration for virtual machines

4. Increased cluster throughput

Outline



Providing many GPUs to an application with rCUDA

Loweris better

MonteCarlo multi-GPU program running in 14 NVIDIA Tesla K20 GPUs

K20 GPUs and FDR InfiniBand



64 GPUs !!



K20 GPUs

Work in progress!!

non-optimized (yet) version of rCUDA!!!

GPU

1

GPU

2

GPU

3

GPU

4

GPU

5GPU

6

GPU

7

GPU

8

GPU

9

GPU

10

GPU

11GPU

12

GPU

13

GPU

14

GPU

15

GPU

16



1

3

7

9

12

13

14

GPU utilization (%)

20 40 60 80 1000

off

off

off

off

off

off

off


Server consolidation with rCUDA

• The GPU-Blast application is migrated up to 5 times among K40 GPUs

• The aggregated volume of GPU data is 1300 MB (consisting of 9 memory regions)

The “Reference” line is the execution time of the application when using CUDA with a local GPU and without any migration

Loweris better


Server consolidation with rCUDA

• How to access the GPU in the native domain from inside of virtual machines?


Virtual machines may need access to GPUs

• The GPU is assigned by using PCI passthrough exclusively to a single virtual machine

• Concurrent usage of the GPU is not possible


Virtual machines may need access to GPUs

• If InfiniBand is available, the rCUDA server can be placed in another node

• Several GPUs can be provided to the VMs, either in a single remote node or in several remote nodes

High performance network fabric available


Using rCUDA to access the GPU

• When InfiniBand is not available, the rCUDA server may be placed in the native domain and the rCUDAclient would be placed inside the VMs

High performance network is not

available

• The virtual network provided by the hypervisor would be used to exchange data between the rCUDA clients and the rCUDA server

• This configuration allows the use of more than one GPU at the host



...

...

One rCUDA box serves multiple clients


Increased cluster throughput

1. BarraCUDA

2. CUDA-MEME

3. CUDASW++

4. GPU-Blast

5. Gromacs

6. Magma

Lower

is better

- 58%



GPU assigned but not used

GPU assigned but not used



One more benefit:

Heterogeneous2 environments

Outline


rCUDA is available for the x86,

POWER and ARM processors


rCUDA availability


on ARM systems

Outline


ThunderX


From ARM to x86 with rCUDA

Work in progress. A couple of applications have been already

analyzed:

1. Cloverleaf: a mini-app that solves the compressible Euler

equations on a Cartesian grid

2. Flow: a mini-app that implements a 2D hydrodynamics

simulator


Application performance

Lower

is

better

Single node

executions

Estimation over

multiple nodes


Application performance: Cloverleaf

Rough energy estimation:

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

40*80 versus 1*80+3*250+2*140

3200 watts versus 1110 watts

Application performance: Cloverleaf

Lower

is

better

Single node

executions

Estimation over

multiple nodes


Lower

is

better

Single node

executions


Application performance: Flow

Estimation over

multiple nodes

Rough energy estimation:

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

60*80 versus 1*80+3*250+2*140

4800 watts versus 1110 watts

Application performance: Flow

Lower

is

better

Single node

executions


Estimation over

multiple nodes

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es


Hybrid CPU-GPU clusters



No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es

rCUDA serversrCUDA clients



No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

No GPU

Hig

h d

en

sit

y A

RM

-ba

se

d n

od

es

Get a free copy of rCUDA at

http://www.rcuda.net

@rcuda_

More than 900 requests world wide

rCUDA is a development by Universitat Politècnica de València, Spain

·Tony Díaz · Pablo Higueras · Javier Prades · Jaime Sierra

· Cristian Peñaranda · Federico Silla · Carlos Reaño

rCUDA is a development by Universitat Politècnica de València, Spain

Download - rCUDA: desde máquinas virtuales a clústers mixtos CPU-GPUdesde máquinas virtuales a clústers mixtos CPU-GPU Federico Silla Universitat Politècnica de València HPC ADMINTECH 2018

Top Related