![Page 1: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/1.jpg)
CUDA
ProgrammingBy Matthew Zeiler
Big Data, Large Scale Machine Learning
John Langford and Yann LeCun
New York University
![Page 2: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/2.jpg)
GPU Programming
• OpenCL
• Works on CPU and GPU
• Supported by AMD, Intel, Nvidia, and others
• CUDA
• Works on Nvidia GPUs only
• Wide developer adoption
![Page 3: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/3.jpg)
Why CUDA
• Pros:
• Massively parallel architecture
• Immense speedups over CPUs
• Cons:
• Different programming style
• Very difficult to get optimal performance
![Page 4: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/4.jpg)
Latest Architecture
![Page 5: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/5.jpg)
SM(X)
![Page 6: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/6.jpg)
DEMO
• CIFAR-10 Image Classification
• 10 Classes:• airplane
• automobile
• bird
• cat
• deer
• dog
• frog
• horse
• ship
• truck
![Page 7: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/7.jpg)
Intro to CUDA
• See Slides
![Page 8: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/8.jpg)
Fast Math
• Use special function units in hardware
• Replace exp(), cos(), etc. with:
• __expf(), __cosf(), etc.
• OR, compile with “–use_fast_math” flag
![Page 9: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/9.jpg)
cuBLAS
Cuda Performance Report, Jan25, 2013
![Page 10: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/10.jpg)
cuRAND
Cuda Performance Report, Jan25, 2013
![Page 11: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/11.jpg)
Max
Pooling
CPU
![Page 12: CUDA Programming - CILVR at NYU · 2013. 2. 27. · • MKL 10.3.6 on Intel SandyBridge E5-2687W @ 3.10GHz 6 . cuRAND Performance Double Precision RNGs 16 14 12 8 10 6 MRG32k3a CPU](https://reader035.vdocuments.us/reader035/viewer/2022070214/6114ddee61272b6e2306dd98/html5/thumbnails/12.jpg)
Max
Pooling
GPU