exploiting parallelism on gpus
DESCRIPTION
Exploiting Parallelism on GPUs. Matt Mukerjee David Naylor. Parallelism on GPUs. $100 NVIDIA video card 192 cores (Build Blacklight for ~$2000 ???) Incredibly low power Ubiquitous Question: Use for general computation? General Purpose GPU (GPGPU). ?. =. GPU Hardware. - PowerPoint PPT PresentationTRANSCRIPT
Exploiting Parallelism on GPUs
Matt MukerjeeDavid Naylor
Parallelism on GPUs• $100 NVIDIA video card 192 cores– (Build Blacklight for ~$2000 ???)
• Incredibly low power• Ubiquitous
• Question: Use for general computation?– General Purpose GPU (GPGPU)
=?
GPU Hardware• Very specific constraints– Designed to be SIMD (e.g. shaders)– Zero-overhead thread scheduling– Little caching (compared to CPUs)
• Constantly stalled on memory access• MASSIVE # of threads / core• Much finer-grained threads
(“kernels”)
CUDA Architecture
Thread Blocks• GPUs are SIMD
• How does multithreading work?• Threads that branch are halted, then
run• Single Instruction Multiple….?
CUDA is an SIMT architecture
• Single Instruction Multiple Thread• Threads in a block execute the same
instructionMulti-threadedInstruction Unit
ObservationFitting the data structures needed by the threads in one multiprocessor requires application-specific tuning.
Example: MapReduce on CUDA
Too big forcache on one SM!
ProblemOnly one code branch within a block executes at a time
Enhancing SIMT
ProblemIf two multiprocessors share a cache line, there are more memory accesses than necessary.
Data Reordering