cs 380 - gpu and gpgpu programming lecture 24: additional ... · cs 380 - gpu and gpgpu programming...
TRANSCRIPT
CS 380 - GPU and GPGPU ProgrammingLecture 24: Additional Stuff, Part 1
Markus Hadwiger, KAUST
2
Reading Assignment #14 (until May 11)
Read (required):
• Programming Massively Parallel Processors book,Chapter 10 (Sparse Matrix-Vector Multiplication)
• Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors, Nathan Bell and Michael Garlandhttp://www.nvidia.com/docs/IO/77944/sc09-spmv-throughput.pdf
Read (optional):
• CUSPARSE library description in the CUDA SDK
• CUSP library: http://cusplibrary.github.io/
• Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS, Maxim Naumovhttp://developer.nvidia.com/sites/default/files/akamai/cuda/files/
psts_white_paper_final.pdfthis is also included in the CUDA SDK!
Reading Assignment ++
“Modern GPU” library and links to other libraries at:• https://nvlabs.github.io/moderngpu/intro.html
About occupancy and latency• https://nvlabs.github.io/moderngpu/performance.html
Latency data obtained via micro benchmarking• http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf
Warp-aggregate atomics• http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-optimized-filtering-warp-
aggregated-atomics/
Fast Histograms Using Shared Atomics on Maxwell• http://devblogs.nvidia.com/parallelforall/gpu-pro-tip-fast-histograms-using-shared-
atomics-maxwell/
CUDA 7 and C++11• http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/
PTX virtual assembly and SASS machine assembly• PTX in CUDA SDK: ptx_isa_4.2.pdf SASS in CUDA SDK: CUDA_Binary_Utilities.pdf
Markus Hadwiger, KAUST 3