brook for gpus ian buck, tim foley, daniel horn, jeremy sugerman pat hanrahan february 10th, 2003

19
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Brook for GPUs

Ian Buck, Tim Foley, Daniel Horn, Jeremy SugermanPat Hanrahan

February 10th, 2003

February 11th, 2004 2

Brook: general purpose streaming language

• developed for PCA Program/Merrimac

– compiler: RStream• Reservoir Labs

– DARPA PCA Program• Stanford: SmartMemories• UT Austin: TRIPS• MIT: RAW

– Brook version 0.2 spec: http://merrimac.stanford.edu

– Brook for GPUs: http://brook.sourceforce.net

StreamExecution Unit

StreamRegister File

MemorySystem

NetworkInterface

ScalarExecution

Unit

texttext

DRDRAMNetwork

February 11th, 2004 3

Brook: general purpose streaming language

• stream programming model– enforce data parallel computing

• streams

– encourage arithmetic intensity• kernels

• C with streams

February 11th, 2004 4

Brook for gpus

• demonstrate gpu streaming coprocessor– make programming gpus easier

• hide texture/pbuffer data management• hide graphics based constructs in CG/HLSL• hide rendering passes• virtualize resources

– performance!• … on applications that matter

– highlight gpu areas for improvement• features required general purpose stream

computing

February 11th, 2004 5

system outline

.brBrook source files

brccsource to source

compiler

brtBrook run-time library

February 11th, 2004 6

Brook language

streams• streams

– collection of records requiring similar computation

• particle positions, voxels, FEM cell, …

float3 positions<200>;

float3 velocityfield<100,100,100>;

– encourage data parallelism

February 11th, 2004 7

Brook language

kernels• kernels

– functions applied to streams• similar to for_all construct

kernel void foo (float a<>, float b<>, out float result<>) {

result = a + b;}

float a<100>;float b<100>;float c<100>;

foo(a,b,c);for (i=0; i<100; i++)

c[i] = a[i]+b[i];

– no dependencies between stream elements• encourage high arithmetic intensity

February 11th, 2004 8

Brook language

kernels• Ray Triangle Intersection

kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); }}

February 11th, 2004 9

Brook language

additional features• reductions

– scalar– stream

• stride & repeat• GatherOp & ScatterOp

– a[i] += p – p = a[i]++

February 11th, 2004 10

brcc compiler

infrastructure• based on ctool

– http://ctool.sourceforge.net

• parser– build code tree– extend C grammar to accept Brook

• convert– tree transformations

• codegen– generate cg & hlsl code– call cgc, fxc– generate stub function

February 11th, 2004 11

Applications

Ray-tracerFFTSegmentationLinear Algebra:

– BLAS, LINPACK, LAPACK

February 11th, 2004 12

Brook Performance

February 11th, 2004 13

GPU Gotchas

Time

Registers Used

February 11th, 2004 14

GPU Gotchas

NVIDIA NV3x: Register usage vs. Time

Time

Registers Used

February 11th, 2004 15

GPU Gotchas

NVIDIA:• Register Penalty• Render to Texture Limitation

– Requires explicit copy or heavy pbuffer solution– Superbuffer extension neededhttp://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf

February 11th, 2004 16

GPU Gotchas

ATI Radeon 9800 Pro• Limited dependent

texture lookup• 96 instructions• 24-bit floating point

– s16e7Integers up to 131,072(s23e8: 16,777,216)

Memory Refs

Math Ops

Memory Refs

Math Ops

Memory Refs

Math Ops

Memory Refs

Math Ops

11

22

33

44

February 11th, 2004 17

GPU Catch-Up!

• Integer & Bit Ops & Double Precision• Memory Addressing• CGC/FXC Performance

– Hand code performance critical code

• No native reduction support• No native scatter support

– p[i] = a (indirect write)

• No programmable blend– GatherOp / ScatterOp

• Limited 4x4 output– Brook virtualized kernel outputs

• Readback still slow– NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback– ATI DirectX: 550 MB/sec Download 50 MB/sec Readback

February 11th, 2004 18

GPUs of the future (we hope)

• Complete Instruction Sets– Integers, Bit Ops, Doubles, Mem Access

• Integration– Streaming coprocessor not just a rendering

device

• Streaming architectures

SDRAM

SDRAM

SDRAM

SDRAM

Str

eam

R

egis

ter

Fil

e ALU Cluster

ALU Cluster

ALU Cluster

February 11th, 2004 19

Brook for GPUs

• Release v0.3 available on Sourceforge• Project Page

– http://graphics.stanford.edu/projects/brook

• Source– http://www.sourceforge.net/projects/brook

• Over 4K downloads!• Questions?

Fly-fishing fly images from The English Fly Fishing Shop