amd/ati gpu hardware - aalto · ati, amd & gpgpu ati incorporated 1985 1 amd acquired ati 2006...
TRANSCRIPT
AMD/ATI GPU hardware
Antti P Miettinen
February 15, 2010
Antti P Miettinen AMD/ATI GPU hardware
ATI, AMD & GPGPU
◮ ATI incorporated 1985 1
◮ AMD acquired ATI 2006
◮ GPUs for GPGPU◮ R600: first generation with unified shader model◮ R700: first generation with OpenCL support◮ Latest: R800/Evergreen
◮ GPGPU software◮ Close-to-metal (CTM)◮ Brook, Brook+◮ OpenCL
1NVIDIA released first product 1995
Antti P Miettinen AMD/ATI GPU hardware
Overview
Outputs
Commands
Mem
ory
Controlle
r
Data Parallel Processor (DPP) Array
Host
Application
Command Processor
Instructions
Constants
Inputs
System-Memory
Address Space
Outputs
Commands
Instructions
Constants
Inputs
R600 Local
Memory
Interrupts
R600
Commands, Instructions and data
Memory-Mapped
R600 Registers
Antti P Miettinen AMD/ATI GPU hardware
More details
Host ApplicationCompute Driver
SystemMemory
Stream ProcessorLocal Memory
Commands
Instructionsand Constants
Inputsand outputs
Commands
Instructionsand Constants
Inputsand outputs
Command Processor
Ultra-threaded Dispatch Processor
Output Cache
Mem
ory
Read
and W
rite
Cache
L1 Input C
ache
L2 Input C
ache
Mem
ory
Controlle
r
DM
A
Instruction a
nd
Consta
nt C
ache
ProgramCounter
ProgramCounter
ProgramCounter
ProgramCounter
ATIStreamProcessor
Antti P Miettinen AMD/ATI GPU hardware
DPP array
Ultra-Threaded Dispatch Processor
SIMDEngine
SIMDEngine
SIMDEngine
SIMDEngine
General-Purpose Registers
BranchExecutionUnit
StreamCores
T-Stream Core
Instructionand ControlFlow
ThreadProcessor
Antti P Miettinen AMD/ATI GPU hardware
Memory hierarchy
◮ registers◮ GPRs◮ constant registers
◮ caches◮ instruction caches per instruction type (CF, ALU etc)◮ constant cache◮ texture cache per SIMD◮ L2 input cache per memory channel◮ read/write cache
◮ local data share per SIMD
◮ global data share across SIMDs
◮ local memory: memory accessible to the GPU
Antti P Miettinen AMD/ATI GPU hardware
R700 data sharing
Antti P Miettinen AMD/ATI GPU hardware
Terminology
AMD/ATI term NVIDIA term Description
SIMD engine multiprocessor GPU subunit that has aprogram counter
thread processor scalar processor GPU execution subunit
local memory device memory memory accessible to theGPU
wavefront warp set of threads running inlockstep
local data share shared memory memory that can be sharedby a thread block
Antti P Miettinen AMD/ATI GPU hardware
Instruction set
◮ Control flow instructions◮ initiate ALU clauses, vertex/texture fetch etc◮ loops◮ calls, jumps
◮ ALU clauses◮ no control flow (but can use predication)◮ instruction group: 1-5 instructions, 0-2 literals◮ 5-way VLIW: X/Y/Z/W and Trans ALUs
◮ texture/vertex fetch
◮ export (actually read/write)
◮ data share (separate clauses before R800)
Antti P Miettinen AMD/ATI GPU hardware
Thread state
◮ program counter is shared by threads within a SIMD
◮ loop state (constant, index)
◮ stack (loop nesting, predicates)
◮ GPRs◮ thread private◮ clause temporary◮ SIMD global
◮ constant registers
◮ previous vector, previous scalar
◮ predication state
Antti P Miettinen AMD/ATI GPU hardware
Simple vector addition
__kernel void
vectorAddition(__global float * output,
__global float * input0,
__global float * input1,
const uint width)
{
int bx = get_group_id(0);
int tx = get_local_id(0);
int idx = bx * get_local_size(0) + tx;
if (idx >= 0 && idx < width)
output[idx] = input0[idx] + input1[idx];
}
Antti P Miettinen AMD/ATI GPU hardware
RV710 code
Antti P Miettinen AMD/ATI GPU hardware