attention developers: the top six advantages of cuda-ready...

Attention Developers:

The Top Six Advantages of CUDA-Ready Clusters and Clouds

Ian Lumb

Bright Evangelist

CUDA-Ready Clusters and Clouds

1. You focus on coding not infrastructure• You view infrastructure as your API

2. entire CUDA environment

3. You cross-develop with confidence and ease • You make use of different versions of the CUDA Toolkit

4. You choose CUDA or OpenCL or OpenACC•

5. or Big Data• You access Hadoop services alongside HPC

6. You make use of public and private clouds• You extend into AWS and deploy OpenStack

CUDA-ready clusters and clouds are GPU developer-ready

Cluster Health Management

Provide problem free environment for running jobs

Four elements1. Cluster management automation

2. Regular health checks

3. Pre-job health checks

4. Hardware stability & performance tests

All elements above are configurable and extensible

2.environment

Cluster Management Shell

Bright Cluster Manager CUDA Environment

User PortalCluster Management GUI

SSL / SOAP / X509 / IPtables

Cluster Management Daemon

SlurmPBS Pro

Torque/MauiTorque/MOAB

Grid EngineLSF

MonitoringAutomation

Health ChecksManagement

CompilersLibraries

DebuggersProfilers

Provisioning

SLES / RHEL / CentOS / SL

Innovation characterizes the entire history and evolution of GPU programmability through CUDA•

• People

Proactively maintaining business and technical relationships

• Process

`Hands-

– Preliminary to fully productized implementations

• Product

Bright Cluster Manager released once per year

– Updates flow continuously

Available Versions of the CUDA Toolkit

Using CUDA 6.0

Programming GPUs

OpenCL

OpenACC

Tools• CUDA gdb

• nvidia-smi

• CUDA Utility Library

• Examples

• 3rd Party

Allinea

Rogue Wave

Case Study: TUAT (1)

The Customer• Engages materials-science research

Compares computational models with physical experiments

• High-resolution, 3D phase field modeling at large scales using GPUs

The Challenge• Make available the latest innovations in GPU technology

without distracting focus from research

The Solution• Laboratory GPU cluster designed and implemented by

HPCTech Corp.

• Bright Cluster Manager deployed by HPCTech

Use Bright to fully manage the entire CUDA environment including regular updates

Use modules environment via Bright to manage multiple CUDA environments

• Prototype simulations using laboratory HPC cluster

Includes debugging and tuning code

• Execute large-scale simulations using TSUBAME

Large-Scale Grain Growth Simulation• Number of computational grids: 1024 x 1024 x 1024• 3 hours with 128 GPUs

2232768

Simulation conditions

# of grains 32768

Size of domain 512 mm3

Time 8182 s(16000 step)

Grain Number 1

Yamanaka Labhttp://www.tuat.ac.jp/~yamanaka/

“We scientists are time-constrained,” said Dr. Yamanaka. “Our priority is our research, not managing our clusters. Bright is intuitive to use, and with it I can effectively manage my cluster without wasting time writing scripts, or synchronizing management tool revisions. Provisioning is fast and easy too. I prefer this approach over open source toolkits.”

Booth # 34

Additional Slides

NVIDIA GPU Boost via Bright Cluster Manager

Cluster Health Management

Goal: provide problem free environment for running jobs

Four elements1. Cluster management automation

2. Regular health checks• Actions that return PASS, FAIL or UNKNOWN

• Can be associated with a settable severity and a message

• Can launch an action based on any response value

3. Pre-job health checks• Let the workload manager hold the job very briefly

• Check the health of each reserved node

• If unhealthy, take the node offline, inform the system administrator

• Let the workload manager reschedule the job to a different set of nodes

4. Hardware stability & performance tests• Very wide range of tests

• May include disk overwrites and reboot(s)

All elements above are configurable and extensible

Bright API

5.Data• You access Hadoop services alongside HPC

HPC and Hadoop

Use GPUs for HPC and Big Data Analytics

Introduce GPUs into Hadoop clusters

Make use of Hadoop services

6. You make use of public and private clouds• Amazon Web Services and OpenStack

GPUs in the Cloud? The Top Four Reasons

1. You can realize possibilities using the cloud • You can scale up and scale out

2. You still realize the promise of GPU programmability•

3. Your use of the cloud is transparent •

Constraints apply for MPI apps

4. Your go-to apps still work in the cloud

Scenario I

node001

head nodenode002

node003

Cloud Utilization

Scenario II

head node

node001 node002 node003

node004

node005

node006

node007

Cloud Utilization

Case Study: Oil and Gas Exploration (1)

The Customer• Acquires and processes significant volumes of seismic data

for multinational clients

• Refactoring existing algorithms to make use of GPUs

Want to take advantage of the latest innovations

– Decrease time to results through increased performance

The Challenge• Introduce GPU-based enhancements without disrupting

Case Study: Oil and Gas Exploration (2)

The Solution• Wholeheartedly adopting GPU technology

Latest GPUs in a variety of hardware configurations– Including ultradense GPU units

Embracing latest innovations in the CUDA toolkit

• Deployed Bright Cluster ManagerUse Bright to fully manage the entire CUDA environment

– From NVIDIA Tesla K40 GPU accelerators to the CUDA toolkit

Use modules environment via Bright to manage multiple CUDA environments for R&D and production processing

change – Includes in-house seismic processing applications (e.g., RTM)

• The Results • Realizing > 10X performance gains in certain cases

• GPU technology transforming data-processing business

attention developers: the top six advantages of cuda-ready...

Documents

cuda compiler driver nvcc -...

debugging experience with cuda-gdb and cuda ......2 cuda...

cuda lecture 4 cuda programming basics

cuda-gdb: the nvidia cuda debugger · cuda debugger user...

gpgpu programming on example of cuda - panoramix -...

cuda programming performance considerations (cuda best...

debugging experience with cuda-gbd and cuda-memcheck ·...

debugging your cuda applications with...

arm and nvidia: accelerating supercomputing...tools from...

an introduction to gpgpu programming - cuda...

march 2015 cuda-gdb cuda debugger - rice university ·...

cuda c best practices guide - multiprocesorski...

cuda lecture 7 cuda threads and atomics

backtrack 4 cuda guide -...

code gpu with cuda - cuda introduction

the top six advantages of cuda-ready...

debugging experience with cuda-gdb and cuda …

cuda: new and upcoming features - university of oxford ·...

v5.0 | october 2012 nvidia cuda samples release...

ready to change? micromaster to sinamics · ready to...