attention developers: the top six advantages of cuda-ready...

Post on 20-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Attention Developers:

The Top Six Advantages of CUDA-Ready Clusters and Clouds

Ian Lumb

Bright Evangelist

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

4

5

Cluster Health Management

Provide problem free environment for running jobs

Four elements1. Cluster management automation

2. Regular health checks

3. Pre-job health checks

4. Hardware stability & performance tests

All elements above are configurable and extensible

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2.environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

Cluster Management Shell

Bright Cluster Manager CUDA Environment

User PortalCluster Management GUI

SSL / SOAP / X509 / IPtables

Cluster Management Daemon

Dis

k

Eth

ern

et

Inte

rco

nn

ect

IPM

I /

iLO

PD

U

CP

U

GP

Us

Me

mo

ry

SlurmPBS Pro

Torque/MauiTorque/MOAB

Grid EngineLSF

MonitoringAutomation

Health ChecksManagement

CompilersLibraries

DebuggersProfilers

Provisioning

SLES / RHEL / CentOS / SL

Innovation characterizes the entire history and evolution of GPU programmability through CUDA•

• People

Proactively maintaining business and technical relationships

• Process

`Hands-

– Preliminary to fully productized implementations

• Product

Bright Cluster Manager released once per year

– Updates flow continuously

10

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

13

Available Versions of the CUDA Toolkit

14

Using CUDA 6.0

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

Programming GPUs

CUDA

OpenCL

OpenACC

MPI

Tools• CUDA gdb

• nvidia-smi

• CUDA Utility Library

• Examples

• 3rd Party

Allinea

Rogue Wave

Case Study: TUAT (1)

The Customer• Engages materials-science research

Compares computational models with physical experiments

• High-resolution, 3D phase field modeling at large scales using GPUs

The Challenge• Make available the latest innovations in GPU technology

without distracting focus from research

Case Study: TUAT (2)

The Solution• Laboratory GPU cluster designed and implemented by

HPCTech Corp.

• Bright Cluster Manager deployed by HPCTech

Use Bright to fully manage the entire CUDA environment including regular updates

Use modules environment via Bright to manage multiple CUDA environments

• Prototype simulations using laboratory HPC cluster

Includes debugging and tuning code

• Execute large-scale simulations using TSUBAME

Large-Scale Grain Growth Simulation• Number of computational grids: 1024 x 1024 x 1024• 3 hours with 128 GPUs

2232768

Simulation conditions

# of grains 32768

Size of domain 512 mm3

Time 8182 s(16000 step)

Grain Number 1

Yamanaka Labhttp://www.tuat.ac.jp/~yamanaka/

23

Case Study: TUAT (3)

“We scientists are time-constrained,” said Dr. Yamanaka. “Our priority is our research, not managing our clusters. Bright is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits.”

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

25

Booth # 34

Additional Slides

27

NVIDIA GPU Boost via Bright Cluster Manager

29

Cluster Health Management

Goal: provide problem free environment for running jobs

Four elements1. Cluster management automation

2. Regular health checks• Actions that return PASS, FAIL or UNKNOWN

• Can be associated with a settable severity and a message

• Can launch an action based on any response value

3. Pre-job health checks• Let the workload manager hold the job very briefly

• Check the health of each reserved node

• If unhealthy, take the node offline, inform the system administrator

• Let the workload manager reschedule the job to a different set of nodes

4. Hardware stability & performance tests• Very wide range of tests

• May include disk overwrites and reboot(s)

All elements above are configurable and extensible

32

Bright API

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5.Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

34

35

36

37

HPC and Hadoop

Use GPUs for HPC and Big Data Analytics

Introduce GPUs into Hadoop clusters

Make use of Hadoop services

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• Amazon Web Services and OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

GPUs in the Cloud? The Top Four Reasons

1. You can realize possibilities using the cloud • You can scale up and scale out

2. You still realize the promise of GPU programmability•

3. Your use of the cloud is transparent •

Constraints apply for MPI apps

4. Your go-to apps still work in the cloud

Scenario I

node001

head nodenode002

node003

Cloud Utilization

Scenario II

head node

node001 node002 node003

node004

node005

node006

node007

Cloud Utilization

43

Case Study: Oil and Gas Exploration (1)

The Customer• Acquires and processes significant volumes of seismic data

for multinational clients

• Refactoring existing algorithms to make use of GPUs

Want to take advantage of the latest innovations

– Decrease time to results through increased performance

The Challenge• Introduce GPU-based enhancements without disrupting

Case Study: Oil and Gas Exploration (2)

The Solution• Wholeheartedly adopting GPU technology

Latest GPUs in a variety of hardware configurations– Including ultradense GPU units

Embracing latest innovations in the CUDA toolkit

• Deployed Bright Cluster ManagerUse Bright to fully manage the entire CUDA environment

– From NVIDIA Tesla K40 GPU accelerators to the CUDA toolkit

Use modules environment via Bright to manage multiple CUDA environments for R&D and production processing

change – Includes in-house seismic processing applications (e.g., RTM)

• The Results • Realizing > 10X performance gains in certain cases

• GPU technology transforming data-processing business

top related