programming gpus cs 446: real-time rendering & game technology

33
Programming GPUs CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia

Upload: oki

Post on 06-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Programming GPUs CS 446: Real-Time Rendering & Game Technology. David Luebke University of Virginia. Demo. Today: Jiayuan Meng, ATI demos Mar 14: Sean Arietta, Super Mario Sunshine. Assignments. Assn 3 due tomorrow at midnight Assignment 4 now out on web page - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Programming GPUs

CS 446: Real-Time Rendering& Game Technology

David LuebkeUniversity of Virginia

Page 2: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 2 David Luebke

Demo

• Today: Jiayuan Meng, ATI demos• Mar 14: Sean Arietta, Super Mario Sunshine

Page 3: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 3 David Luebke

Assignments

• Assn 3 due tomorrow at midnight• Assignment 4 now out on web page

– Individual assignment w/ group “integration” deadline– Due Mar 21 (individual), Mar 23 (group)– Next 2 assignments will also follow this pattern

Page 4: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 4 David Luebke

GPU

The Modern Graphics Pipeline

CPU

Application Transform& Light Rasterize Shade Video

Memory(Textures)

Graphics State

Render-to-texture

AssemblePrimitives

Vertices (3D)

Final Pixels (Color, Depth)

Fragments (pre-pixels)

Screenspace triangles (2D)

Xformed, Lit Vertices (2D)

• Programmable vertex processor!

• Programmable pixel processor!

FragmentProcessor

VertexProcessor

Page 5: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 5 David Luebke

GPU vendor differences

• Note: this slide (March 2006) will be dated quickly!• NVIDIA: as described in previous lecture• ATI GPUs (1900XT current high-end part):

– No vertex texture fetch (but good render-to-vertex-array)– Far fewer levels of computed texture coordinates– Better at fine-grained (less coherent) dynamic branching

• ATI Xenos (Xbox) has some cool design features:– Unified shader model: vertex proc == pixel proc– Scatter support: shaders can write arbitrary memory loc

Page 6: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 6 David Luebke

Cg – “C for Graphics”

• Cg is a high-level GPU programming language• Designed by NVIDIA and Microsoft• Competes with the (quite similar)

GL Shading Language, a.k.a GLslang

Page 7: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 7 David Luebke

Programming in assembly is painful

• Easier to read and modify• Cross-platform• Combine pieces• etc.

Assembly…FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;…

…L2weight = timeval – floor(timeval);L1weight = 1.0 – L2weight;ocoord1 = floor(timeval)/64.0 + 1.0/128.0;ocoord2 = ocoord1 + 1.0/64.0;L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0));L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0));…

Cg

Page 8: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 8 David Luebke

Some points in the design space

• CPU languages– C – close to the hardware; general purpose– C++, Java, lisp – require memory management– RenderMan – specialized for shading

• Real-time shading languages– Stanford shading language– Creative Labs shading language

Page 9: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 9 David Luebke

Design strategy

• Start with C (and a bit of C++)– Minimizes number of decisions– Gives you known mistakes instead of unknown ones

• Allow subsetting of the language• Add features desired for GPU’s

– To support GPU programming model– To enable high performance

• Tweak to make it fit together well

Page 10: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

How are GPUs different from CPUs?

1. GPU is a stream processor– Multiple programmable processing units– Connected by data flows

Application VertexProcessor

FragmentProcessor

Assem

bly &R

asterization

Framebuffer

Operations

Framebuffer

Textures

Page 11: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 11 David Luebke

Cg separates vertex & fragment programs

Application VertexProcessor

FragmentProcessor

Assem

bly &R

asterization

Framebuffer

Operations

Framebuffer

Textures

Program Program

Page 12: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Cg programs have two kinds of inputs

• Varying inputs (streaming data)– e.g. normal vector – comes with each vertex– This is the default kind of input

• Uniform inputs (a.k.a. graphics state)– e.g. modelview matrix

• Note: Outputs are always varying

vout MyVertexProgram( float4 normal, uniform float4x4 modelview) {…

Page 13: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Binding VP outputs to FP inputs

a) Let compiler do it– Define a single structure– Use it for vertex-program output– Use it for fragment-program input

struct vout { float4 color; float4 texcoord; …};

Page 14: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Binding VP outputs to FP inputs

b) Do it yourself– Specify register bindings for VP outputs– Specify register bindings for FP inputs– May introduce HW dependence– Necessary for mixing Cg with assembly

struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; …};

Page 15: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Some inputs and outputs are special

• E.g. the position output from vert prog– This output drives the rasterizer– It must be marked

struct vout { float4 color; float4 texcoord; float4 position : HPOS;};

Page 16: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

How are GPUs different from CPUs?

2. Greater variation in basic capabilities– Most processors don’t yet support branching– Vertex processors don’t support texture mapping– Some processors support additional data types

• Compiler can’t hide these differencesCompiler can’t hide these differences• Least-common-denominator is too restrictiveLeast-common-denominator is too restrictive• Cg exposes differences via language Cg exposes differences via language profilesprofiles

(list of capabilities and data types)(list of capabilities and data types)• Over time, profiles will convergeOver time, profiles will converge

Page 17: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

How are GPUs different from CPUs?

3. Optimized for 4-vector arithmetic– Useful for graphics – colors, vectors, texcoords– Easy way to get high performance/cost

• C philosophy says: expose these HW data C philosophy says: expose these HW data typestypes

• Cg has vector data types and operationsCg has vector data types and operationse.g. e.g. float2float2, , float3float3, , float4float4

• Makes it obvious how to get high performanceMakes it obvious how to get high performance• Cg also has matrix data typesCg also has matrix data types

e.g. e.g. float3x3float3x3, , float3x4float3x4, , float4x4float4x4

Page 18: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Some vector operations

//// Clamp components of 3-vector to [minval,maxval] range//float3 clamp(float3 a, float minval, float maxval) { a = (a < minval.xxx) ? Minval.xxx : a; a = (a > maxval.xxx) ? Maxval.xxx : a; return a;}

Swizzle – replicate and/or rearrange components.

? : is per-component for vectors

Comparisons between vectorsare per-component, andproduce vector result

Page 19: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Cg has arrays too

• Declared just as in C• But, arrays are distinct from

built-in vector types: float4 != float[4]

• Language profiles may restrict array usage

vout MyVertexProgram( float3 lightcolor[10], …) { …

Page 20: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 20 David Luebke

How are GPUs different from CPUs?

4. No support for pointers– Arrays are first-class data types in Cg

5. No integer data type– Cg adds “bool” data type for boolean operations– This change isn’t obvious except when declaring vars

Page 21: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 21 David Luebke

Cg basic data types

• All profiles:– float– bool

• All profiles with texture lookups:– sampler1D, sampler2D, sampler3D,

samplerCUBE• NV_fragment_program profile:

– half -- half-precision float– fixed -- fixed point [-2,2)

Page 22: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 22 David Luebke

Other Cg capabilities

• Function overloading• Function parameters are value/result

– Use “out” modifier to declare return value

• “discard” statement – fragment kill

void foo (float a, out float b) { b = a;}

if (a > b) discard;

Page 23: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 23 David Luebke

Cg Built-in functions

• Texture lookups (in fragment profiles)• Math

– Dot product– Matrix multiply– Sin/cos/etc.– Normalize

• Misc– Partial derivative (when supported)

• See spec for more details

Page 24: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 24 David Luebke

Cg Example

• The following fragment program implements a (very) simple toon shader

– Flat 3-tone shading• Highlight• Base color• Shadow

– Black silhouettes

Page 25: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Cg Example – part 1// In:// eye_space position = TEX7// eye space T = (TEX4.x, TEX5.x, TEX6.x) denormalized// eye space B = (TEX4.y, TEX5.y, TEX6.y) denormalized// eye space N = (TEX4.z, TEX5.z, TEX6.z) denormalizedfragout frag program main(vf30 In) {

float m = 30; // power float3 hiCol = float3( 1.0, 0.1, 0.1 ); // lit color float3 lowCol = float3( 0.3, 0.0, 0.0 ); // dark color float3 specCol = float3( 1.0, 1.0, 1.0 ); // specular color // Get eye-space eye vector. float3 e = normalize( -In.TEX7.xyz );

// Get eye-space normal vector. float3 n = normalize(float3(In.TEX4.z, In.TEX5.z, In.TEX6.z));

Page 26: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

float edgeMask = (dot(e, n) > 0.4) ? 1 : 0; float3 lpos = float3(3,3,3); float3 l = normalize(lpos - In.TEX7.xyz); float3 h = normalize(l + e);

float specMask = (pow(dot(h, n), m) > 0.5) ? 1 : 0;

float hiMask = (dot(l, n) > 0.4) ? 1 : 0; float3 ocol1 = edgeMask * (lerp(lowCol, hiCol, hiMask) + (specMask *specCol));

fragout O; O.COL = float4(ocol1.x, ocol1.y, ocol1.z, 1); return O;}

Cg Example – part 2

Page 27: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 27 David Luebke

New vector operators

• Swizzle – replicate/rearrange elementsa = b.xxyy;

• Write mask – selectively over-writea.w = 1.0;

• Vector constructor builds vector a = float4(1.0, 0.0, 0.0, 1.0);

Page 28: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 28 David Luebke

Change to constant-typing mechanism

• In C, it’s easy to accidentally use high precision

half x, y;x = y * 2.0; // Double-precision multiply!

• Not in Cg

x = y * 2.0; // Half-precision multiply

• Unless you want to

x = y * 2.0f; // Float-precision multiply

Page 29: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 29 David Luebke

• Dot product– dot(v1,v2); // returns a scalar

• Matrix multiplications:– matrix-vector: mul(M, v); // returns a vector– vector-matrix: mul(v, M); // returns a vector– matrix-matrix: mul(M, N); // returns a matrix

Dot product, matrix multiply

Page 30: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 30 David Luebke

Demos and Examples

Page 31: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 31 David Luebke

Cg runtime API

• Compile a program• Select active programs for rendering• Pass “uniform” parameters to program• Pass “varying” (per-vertex) parameters• Load vertex-program constants• Other housekeeping

Page 32: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 32 David Luebke

Runtime is split into three libraries

• API-independent layer – cg.lib– Compilation– Query information about object code

• API-dependent layer –cgGL.lib and cgD3D.lib– Bind to compiled program– Specify parameter values– etc.

Page 33: Programming GPUs  CS 446: Real-Time Rendering & Game Technology

Real-Time Rendering 33 David Luebke

More Information…

• The Cg Toolkit (includes Cg user’s manual):– http://developer.nvidia.com/object/cg_toolkit.html

• NVIDIA’s GPU Programming Guide:– http://developer.nvidia.com/object/gpu_programming_guide.html

• Cg Tutorial Book • The CgShader class

– See http://www.cs.virginia.edu/~rs5ea/tools/ • Hello, Cg (haven’t checked out myself):

– http://developer.nvidia.com/object/hello_cg_tutorial.html