heterogeneous computing with rcuda - hpc …...basics of gpu computing remark: gpus can only be used...

43
Heterogeneous 2 Computing with rCUDA Federico Silla Universitat Politècnica de València HPC Advisory Council Swiss Conference 2018

Upload: others

Post on 16-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Heterogeneous2 Computing

with rCUDA

Federico Silla

Universitat Politècnica de València

HPC Advisory Council Swiss Conference 2018

Page 2: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

- What is heterogeneous2 computing?

- What is rCUDA?F. Silla @ HPC Advisory Council Swiss Conference 2018

Heterogeneous2 Computing

with rCUDA

Page 3: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

What is rCUDA?

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 4: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Basics of GPU computing

Remark:

GPUs can only be used within

the node they are attached to

Basic behavior of CUDA

GPU

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 5: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Basics of GPU computing

Remark:

GPUs can only be used within

the node they are attached to

Basic behavior of CUDA

GPU

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 6: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

A different approach: remote GPU virtualization

No GPU

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 7: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

A different approach: remote GPU virtualization

A software technology that enables a more flexible

use of GPUs in computing facilities

No GPU

rCUDA is a development by Universitat Politècnica de València

rCUDA … remote CUDA

Page 8: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Access to remote GPU is

transparent to applications:

no source code

modification is needed

Basics or rCUDA

rCUDA is a development by Universitat Politècnica de València

Page 9: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Basics or rCUDA

rCUDA is a development by Universitat Politècnica de València

Access to remote GPU is

transparent to applications:

no source code

modification is needed

Page 10: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Physical

configuration

rCUDA allows a new vision of a GPU

deployment, moving from the usual cluster

configuration …

… to the following one:

Logical

configuration

rCUDA envision

Page 11: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of rCUDA

Loweris better

• K20 GPU and FDR InfiniBand

• K40 GPU and EDR InfiniBand

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 12: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of rCUDA

CUDA-MEME

BarraCUDA

Loweris better

Loweris betterP100 GPU and EDR InfiniBand

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 13: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Benefits of rCUDA?

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 14: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Providing many GPUs to applications with rCUDA

Loweris better

MonteCarlo multi-GPU program running in 14 NVIDIA Tesla K20 GPUs

K20 GPUs and FDR InfiniBand

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 15: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Providing many GPUs to applications with rCUDA

64 GPUs !!

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 16: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Server consolidation with rCUDA

1

3

7

9

12

13

14

GPU utilization (%)

20 40 60 80 1000

off

off

off

off

off

off

off

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 17: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Server consolidation with rCUDA

• The GPU-Blast application is migrated up to 5 times among K40 GPUs

• The aggregated volume of GPU data is 1300 MB (consisting of 9 memory regions)

The “Reference” line is the execution time of the application when using CUDA with a local GPU and without any migration

Loweris better

Page 18: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Heterogeneous2 environments

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 19: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

rCUDA availabilityrCUDA is available for the x86,

POWER and ARM processors

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 20: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of rCUDA

on POWER systems

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 21: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

From x86 to POWER with rCUDA

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 22: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

CUDA

rCUDArCUDA client rCUDA server

x86 @ 2.1 GHz

DDR3 1600 MHz

x86 @ 3.5 GHz

DDR4 2400 MHz

network fabric is

EDR InfiniBand

#1

#2

#3

Several testbeds used

Page 23: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

rCUDA

CUDA

Performance of data movements to/from GPU

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 24: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

H2DD2H

Higher

is better

Higher

is better

Performance of data movements to/from GPU

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 25: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

P2P

Performance of data movements among GPUs

Higher

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 26: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of data movements among GPUs

rCUDA scenario 1

rCUDA scenario 2

rCUDA

CUDA

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 27: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of data movements among GPUs

Higher is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 28: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Several applications have been analyzed in this study:

1. BarraCUDA

2. CUDA-MEME

3. CUDASW++

4. GPU-Blast

5. Gromacs

6. GPU-LIBVSM

7. Magma

8. NAMD

Application performance

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 29: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Unfortunately, we could not run all the applications in the Minsky system:

1. BarraCUDA: this application includes intrinsics headers

2. CUDA-MEME: successfully compiled and executed

3. CUDASW++: this application includes intrinsics headers

4. GPU-Blast: we were not able to compile it

5. Gromacs: successfully compiled and executed

6. GPU-LIBVSM: successfully compiled and executed

7. Magma: successfully compiled and executed

8. NAMD: we were not able to compile it

Application performance

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 30: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Application performance

Lower

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 31: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Throughput instead of

performance

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 32: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

...

...

One rCUDA box serves multiple clients

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 33: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

1. BarraCUDA

2. CUDA-MEME

3. CUDASW++

4. GPU-Blast

5. Gromacs

6. Magma

Lower

is better

One rCUDA box serves multiple clients

F. Silla @ HPC Advisory Council Swiss Conference 2018

- 58%

Page 34: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

One rCUDA box serves multiple clients

F. Silla @ HPC Advisory Council Swiss Conference 2018

GPU assigned but not used

GPU assigned but not used

Page 35: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Performance of rCUDA

on ARM systems

Outline

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 36: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

From ARM to x86 with rCUDA

ThunderX

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 37: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Work in progress. A couple of applications have been already analyzed:

1. Cloverleaf: a mini-app that solves the compressible Euler equations

on a Cartesian grid

2. Flow: a mini-app that implements a 2D hydrodynamics simulator

Application performance

F. Silla @ HPC Advisory Council Swiss Conference 2018

Page 38: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Application performance: Cloverleaf

Lower

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Single node

executions

Estimation over

multiple nodes

Page 39: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Application performance: Cloverleaf

Lower

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Single node

executions

Estimation over

multiple nodes

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

40*80 versus 80+3*250+2*140

3200 watts versus 1110 watts

Page 40: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Application performance: Flow

Lower

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Single node

executions

Estimation over

multiple nodes

Page 41: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Application performance: Flow

Lower

is better

F. Silla @ HPC Advisory Council Swiss Conference 2018

Single node

executions

Estimation over

multiple nodes

ThunderX TDP = 80 watts

P100 TDP = 250 watts

Xeon TDP = 140 watts

60*80 versus 80+3*250+2*140

4800 watts versus 1110 watts

Page 42: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

Get a free copy of rCUDA at

http://www.rcuda.net

@rcuda_

More than 900 requests world wide

rCUDA is a development by Universitat Politècnica de València, Spain

Page 43: Heterogeneous Computing with rCUDA - HPC …...Basics of GPU computing Remark: GPUs can only be used within the node they are attached to Basic behavior of CUDA GPU F. Silla @ HPC

·Tony Díaz · Pablo Higueras · Javier Prades · Jaime Sierra

· Cristian Peñaranda · Federico Silla · Carlos Reaño

rCUDA is a development by Universitat Politècnica de València, Spain