parallel computing with gpus

1

MATLAB Parallel Computing with GPUs

November 2011

Jörg Krall

Sr. Business Development Manager

Professional Solutions Group

MATLAB

2

NVIDIA FACTS:

Founded in 1993

Fastest semiconductor company to reach $1 billion in revenue

FY11: $3.5 billion in revenue

6,800 employees in 20 countries

1,900 patents

Headquartered in Santa Clara, Calif.

3

NVIDIA Brands

GeForce Quadro

Tegra Tesla

4

Add GPUs - Accelerate Computing

GPU CPU

Speedup

5

The Performance gap widens further

6

CPU Pizza Delivery

Process:

Delivery truck

delivers one pizza

and then moves to

next house

Original Idea by Jedox www.jedox.com

http://www.jedox.com/

7

NVIDIA GPU Pizza Delivery

Process:

Many deliveries to

many houses

Original Idea by Jedox www.jedox.com

http://www.jedox.com/

8

CUDA Developer Community Growth

0

500

1000

1500

2000

2500

3000

2005 2006 2007 2008 2009

NVIDIA GPGPU: Papers and Articles

CUDA Capable GPUs 350,000,000

CUDA Toolkit Downloads 1,000,000

Active CUDA Developers 150,000

Universities Teaching CUDA 470

% OEMs offer CUDA GPU PCs 100

9

Tianhe-1A 7168 Tesla GPU’s 2.5 PFLOPS

Nebulae 4650 Tesla GPU’s 1.2 PFLOPS

We not only created the world's fastest computer, but also implemented a heterogeneous computing architecture incorporating CPU and GPU, this is a new innovation. ” Premier Wen Jiabao

Public comments acknowledging Tianhe-1A

“

Tsubame 2.0 4224 Tesla GPU’s 1.194 PFLOPS

Tesla GPUs Power Top Supercomputers

10

NCSA Mixes GPUs into Blue Waters

NCSA is excited about the inclusion of NVIDIA's Tesla GPUs in Blue Waters. GPUs provide extraordinary capabilities for numerically-intensive computations and a cost-effective, energy-efficient way to build tomorrow's petascale supercomputers.

“

” Thom Dunning Director, NCSA

11

Titan at Oak Ridge

World’s Top Open Science Computing Research Facility

2x Faster, 3x More Energy Efficient

than Current #1 (K Computer)

18,000 Tesla GPUs

20+ PetaFlops

~90% of flops from GPUs

12

Options for Targeting GPUs with MATLAB

Built-in MATLAB functions

User-defined MATLAB functions

User-defined CUDA kernels

Ea

se

of

Us

e

Gre

ate

r Co

ntro

l

13

Example: Solving 2D Wave Equation

14

Benchmark: Solving 2D Wave Equation CPU vs. GPU

Intel Xeon Processor X5650, NVIDIA Tesla C2050 GPU

Grid Size CPU (s) GPU

(s) Speedup

64 x 64 0.1004 0.3553 0.28

128 x 128 0.1931 0.3368 0.57

256 x 256 0.5888 0.4217 1.4

512 x 512 2.8163 0.8243 3.4

1024 x 1024 13.4797 2.4979 5.4

2048 x 2048 74.9904 9.9567 7.5

* Note: data displayed on log scale

15

MATLAB GPU Computing Examples

4x speedup in adaptive filtering

routine (part of acoustic tracking

algorithm)

4x speedup in wave equation solving

(part of seismic data processing

algorithm)

3x speedup in estimating 7.6

million contract prices using Black-

Scholes model

14x speedup in template matching

routine (part of cancer cell image

analysis)

10x speedup in data clustering via K-

means clustering algorithm

17x speedup in simulating the movement of

3072 celestial objects

16

GPUs for MathWorks Parallel Computing Toolbox™

and Distributed Computing Server™

Workstation Compute Cluster

MATLAB Distributed Computing Server (MDCS) MATLAB Parallel Computing Toolbox (PCT)

• PCT enables high performance through

parallel computing on workstations

• NVIDIA GPU acceleration available now

• MDCS allows a MATLAB PCT application to be

submitted and run on a compute cluster

• NVIDIA GPU acceleration available now

17

Resources MATLAB Digest Article – GPU Programming in MATLAB http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

GPU Computing with MATLAB webinar http://www.mathworks.com/company/events/webinars/wbnr59816.html

MATLAB benchmarking examples on the GPU http://www.mathworks.com/matlabcentral/fileexchange/?term=gpu

http://www.mathworks.com/products/parallel-computing/demos.html?file=/products/demos/shipping/distcomp/paralleldemo_gpu_backslash.html

http://www.mathworks.com/products/distriben/demos.html?file=/products/demos/distribtb/MapDemo/MapDemo.html

GPUs for Parallel Computing Toolbox http://www.mathworks.com/gpu

http://www.nvidia.com/object/tesla-matlab-accelerations.html

http://www.mathworks.com/products/parallel-computing/

http://www.mathworks.com/products/datasheets/pdf/parallel-computing-toolbox.pdf

Product Trial http://www.mathworks.com/programs/trials/trial_request.html?s_cid=SA_prod_distcomp_parallel_computing_ipspot_trial&prodcode=DM,ML&eventid=6736

40837

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html







http://www.mathworks.com/gpu

http://www.mathworks.com/company/events/webinars/wbnr59816.html

http://www.mathworks.com/company/events/webinars/wbnr59816.html

http://www.mathworks.com/matlabcentral/fileexchange/?term=gpu





http://www.mathworks.com/products/parallel-computing/description5.html





http://www.mathworks.com/gpu

http://developer.download.nvidia.com/compute/cuda/docs/GTC_2010_Archives.htm





















http://www.mathworks.com/programs/trials/trial_request.html?s_cid=SA_prod_distcomp_parallel_computing_ipspot_trial&prodcode=DM,ML&eventid=673640837






18

Special MATLAB-user pricing of GPU

enabled HP workstations

GPU enabled workstations for MATLAB users

http://www.tsa.com/promotions/promotional_details.php?id=54



19

Workstations Servers & Blades

Tesla Data Center & Workstation GPU Solutions

Tesla M-series GPUs M2090 | M2070 | M2050

Tesla C-series GPUs C2070 | C2050

M2090 M2070 M2050

Cores 512 448 448

Memory 6 GB 6 GB 3 GB

Memory bandwidth

(ECC off) 177.6 GB/s 150 GB/s 148.8 GB/s

Peak

Perf

Gflops

Single

Precision 1331 1030 1030

Double

Precision 665 515 515

C2070 C2050

448 448

6 GB 3 GB

148.8 GB/s 148.8 GB/s

1030 1030

515 515

NVIDIA Confidential

20

Workstations

2 to 4 Tesla GPUs

Integrated CPU-GPU

Servers & Blades

Tesla Data Center & Workstation GPU Solutions

Tesla M-series GPUs M2090 M2070 M2050

Tesla C-series GPUs C2075 C2070 C2050

M C

21

OEM Servers with Tesla M20xx GPUs

2 Tesla GPUs

SuperServer 2 CPUs + 2 GPUs in 1U

Tetra 2 CPUs + 4 GPUs in 1U

GreenBlade 10 CPUs + 10 GPUs in 5U

4 Tesla GPUs 10 Tesla GPUs

®

B7015 2 CPUs + 8 GPUs in 4U

8 Tesla GPUs

22

®

OEM Servers with Tesla M20xx GPUs

http://en.wikipedia.org/wiki/File:Logo_groupe_bull.jpg

http://www.dell.com/us/en/gen/df.aspx?refid=df&s=gen&cs=555

Key Systems from Global OEMs System Name (or

“codename”)

# of GPUs # of CPU

Sockets

Ratio

(GPU:CPU)

Minimum rack

config

Effective

GPUs/1RU

Ship Date

Dell C410x

16 None – pairs

with external

host

2:1 now

8:1 w/ new

hosts

5U

(3U for C410x +

2U for hosts)

3.2 Now

Dell “orca” 2 2 1:1 2U 1 Nov 2011

Dell “Ghostrider” 4 2 2:1 4U 1 Nov 2011

HP SL 390 “2U

tray”

3 2 3:2 4U 3 Now

HP SL 390 “4U

tray”

8 2 4:1 4U 4 Feb 2011

IBM idataplex

(now)

2 2 1:1 2U 1.3

(Non-standard

rack depth)

Now

IBM idataplex w/

SXM (redesign)

4 2 2:1 2U 2.6

(Non-standard

rack depth)

Q4 2011

IBM Blade 1-4 2 Variable:

2:1 max

9U 1.2 Now

Key Systems from Specialist OEMs System Name (or

“codename”)

# of GPUs # of CPU

Sockets

Ratio

(GPU:CPU)

Minimum rack

config

Effective

GPUs/1RU

Ship Date

Appro Tetra 4 2 2:1 1U 4 Now

Appro Hydra 8 2 4:1 2U 4 Feb 2011

Bull Blade 2 2 2:1 7U 2.6 Now

Cray XE6 GPU

blade

2 2 1:1 Full rack 2.2

(custom cabinet)

Q2 2011

NextIO vCORE

2070

4 N/A Same as S2050 2U

(1U for host)

2 Now

Supermicro 1U 2 2 1:1 1U 2 Now

Supermicro

GPU TwinBlade

2 2 1:1 7U 2.8 Now

Tplatforms 2 2 1:1 2U 4.6 Now

Tyan 2U 3 2 3:2 2U 1.5 Now

Tyan 4U 8 2 4:1 4U 2 Now

25

Thanks!

parallel computing with gpus

Documents