cuda programming - cilvr at nyu · 2013. 2. 27. · • mkl 10.3.6 on intel sandybridge e5-2687w @...

12
CUDA Programming By Matthew Zeiler Big Data, Large Scale Machine Learning John Langford and Yann LeCun New York University

Upload: others

Post on 12-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

CUDA

ProgrammingBy Matthew Zeiler

Big Data, Large Scale Machine Learning

John Langford and Yann LeCun

New York University

Page 2: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

GPU Programming

• OpenCL

• Works on CPU and GPU

• Supported by AMD, Intel, Nvidia, and others

• CUDA

• Works on Nvidia GPUs only

• Wide developer adoption

Page 3: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Why CUDA

• Pros:

• Massively parallel architecture

• Immense speedups over CPUs

• Cons:

• Different programming style

• Very difficult to get optimal performance

Page 4: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Latest Architecture

Page 5: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

SM(X)

Page 6: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

DEMO

• CIFAR-10 Image Classification

• 10 Classes:• airplane

• automobile

• bird

• cat

• deer

• dog

• frog

• horse

• ship

• truck

Page 7: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Intro to CUDA

• See Slides

Page 8: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Fast Math

• Use special function units in hardware

• Replace exp(), cos(), etc. with:

• __expf(), __cosf(), etc.

• OR, compile with “–use_fast_math” flag

Page 9: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

cuBLAS

Cuda Performance Report, Jan25, 2013

Page 10: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

cuRAND

Cuda Performance Report, Jan25, 2013

Page 11: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Max

Pooling

CPU

Page 12: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU

Max

Pooling

GPU