Download - CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

CUDA Synchronization

atomics

2

memory fences

– Robert Frost, “Mending Wall”

“Good fences make good neighbors.”

3

without fences

same (lack of) guarantees for reads

4

ORLY?

5

__threadfence_block

6

PTX membar.cta

7

__threadfence

8

PTX membar.gl

9

__threadfence_system

10

volatile

11

CUDA spinlock?

12

– “Robert Frost”, The Thread Not Taken

“Two threads diverged in a CUDA warp,And sorry I had become untwinedfrom my PC by a branch so sharp,

I had to ask Nvidia Corp:the order of paths was undefined.”

13

__syncthreads()

14

intra-warp synchronization

15

PTX

❖ virtual ISA for Nvidia GPUs

❖ RISC-like ISA, load-store, 3-operand

❖ destination register is on the left

PTX load/store cachingqualifier meaning load store

.ca cache at all levels default

.wb write back caching default

.cg cache at L2 (global cache) yes yes

.cs streaming (mark as LRU) yes yes

.lu last use (read & invalidate) yes

18

PTX tidbits

19

Homework 2

CUDA Kernel Timeout == good

21

__managed____device__ __managed__ int d_counter = 0;

void main() { d_counter = 10;

myKernel<<<8,16>>>();

cudaStatus = cudaDeviceSynchronize(); checkCudaErrors(cudaStatus);

printf(“%d”, d_counter); }

22

C++ virtual functions

23

Download - CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

Top Related