programmable graphics hardware

23
David Luebke 1 06/23/22 Programmable Graphics Hardware

Upload: clare-kinney

Post on 31-Dec-2015

26 views

Category:

Documents


1 download

DESCRIPTION

Programmable Graphics Hardware. Admin. Handout: Cg in two pages. Acknowledgement & Aside. The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Programmable Graphics Hardware

David Luebke 1 04/19/23

ProgrammableGraphics Hardware

Page 2: Programmable Graphics Hardware

David Luebke 2 04/19/23

Admin

● Handout: Cg in two pages

Page 3: Programmable Graphics Hardware

David Luebke 3 04/19/23

Acknowledgement & Aside

● The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology

● For this reason, and because the lab is outfitted with GeForce 3 cards, we will focus on NVIDIA tech

Page 4: Programmable Graphics Hardware

David Luebke 4 04/19/23

Outline

● Programmable graphics■ NVIDIA’s next-generation technology:

GeForceFX (code name NV30)

● Programming programmable graphics■ NVIDIA’s Cg language

Page 5: Programmable Graphics Hardware

David Luebke 5 04/19/23

GPU Programming Model

ApplicationVertexProcessor

FragmentProcessor

Assem

bly &R

asterization

Fram

ebufferO

perations

Fram

ebuffer

GPUCPU

Textures

Page 6: Programmable Graphics Hardware

David Luebke 6 04/19/23

● Framebuffer● Textures● Fragment processor● Vertex processor● Interpolants

32-bit IEEE floating-pointthroughout pipeline

Page 7: Programmable Graphics Hardware

David Luebke 7 04/19/23

Hardware supports several other data types

● Fragment processor also supports:■ 16-bit “half” floating point■ 12-bit fixed point■ These may be faster than 32-bit on some HW

● Framebuffer/textures also support:■ Large variety of fixed-point formats■ E.g., classical 8-bit per component■ These formats use less memory bandwidth than FP32

Page 8: Programmable Graphics Hardware

David Luebke 8 04/19/23

Vertex processor capabilities

● 4-vector FP32 operations, as in GeForce3/4● True data-dependent control flow

■ Conditional branch instruction■ Subroutine calls, up to 4 deep■ Jump table (for switch statements)

● Condition codes● New arithmetic instructions (e.g. COS)● User clip-plane support

Page 9: Programmable Graphics Hardware

David Luebke 9 04/19/23

Vertex processor has high resource limits

● 256 instructions per program(effectively much higher w/branching)

● 16 temporary 4-vector registers● 256 “uniform” parameter registers● 2 address registers (4-vector)● 6 clip-distance outputs

Page 10: Programmable Graphics Hardware

David Luebke 10 04/19/23

Fragment processor has clean instruction set

● General and orthogonal instructions● Much better than previous generation● Same syntax as vertex processor:

MUL R0, R1.xyz, R2.yxw;

● Full set of arithmetic instructions:RCP, RSQ, COS, EXP, …

Page 11: Programmable Graphics Hardware

David Luebke 11 04/19/23

Fragment processor hasflexible texture mapping

● Texture reads are just another instruction(TEX, TXP, or TXD)

● Allows computed texture coordinates,nested to arbitrary depth

● Allows multiple uses of a singletexture unit

● Optional LOD control – specify filter extent● Think of it as…

A memory-read instruction,with optional user-controlled filtering

Page 12: Programmable Graphics Hardware

David Luebke 12 04/19/23

Additional fragment processor capabilities

● Read access to window-space position● Read/write access to fragment Z● Built-in derivative instructions

■ Partial derivatives w.r.t. screen-space x or y■ Useful for anti-aliasing

● Conditional fragment-kill instruction● FP32, FP16, and fixed-point data

Page 13: Programmable Graphics Hardware

David Luebke 13 04/19/23

Fragment processor limitations

● No branching■ But, can do a lot with condition codes

● No indexed reads from registers■ Use texture reads instead

● No memory writes

Page 14: Programmable Graphics Hardware

David Luebke 14 04/19/23

Fragment processor has high resource limits

● 1024 instructions● 512 constants or uniform parameters

■ Each constant counts as one instruction

● 16 texture units■ Reuse as many times as desired

● 8 FP32 x 4 perspective-correct inputs● 128-bit framebuffer “color” output

(use as 4 x FP32, 8 x FP16, etc…)

Page 15: Programmable Graphics Hardware

David Luebke 15 04/19/23

NV30 CineFX Technology Summary

ApplicationVertexProcessor

FragmentProcessor

Assem

bly &R

asterization

Fram

ebufferO

perations

Fram

ebuffer

Textures

•FP32 throughout pipeline

•Clean instruction sets

•True branching in vertex processor

•Dependent texture in fragment processor

•High resource limits

Page 16: Programmable Graphics Hardware

David Luebke 16 04/19/23

Programming in assembly is painful

…FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;…

…FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D;…

…L2weight = timeval – floor(timeval);L1weight = 1.0 – L2weight;ocoord1 = floor(timeval)/64.0 + 1.0/128.0;ocoord2 = ocoord1 + 1.0/64.0;L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0));L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0));…

…L2weight = timeval – floor(timeval);L1weight = 1.0 – L2weight;ocoord1 = floor(timeval)/64.0 + 1.0/128.0;ocoord2 = ocoord1 + 1.0/64.0;L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0));L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0));… • Easier to read and modify

• Cross-platform• Combine pieces• etc.

Assembly

Page 17: Programmable Graphics Hardware

David Luebke 17 04/19/23

Quick Demo

Page 18: Programmable Graphics Hardware

David Luebke 18 04/19/23

Cg – C for Graphics

● Cg is a GPU programming language● Designed by NVIDIA and Microsoft● Compilers available in beta versions from both

companies

Page 19: Programmable Graphics Hardware

David Luebke 19 04/19/23

Design goals for Cg

● Enable algorithms to be expressed…■ Clearly, and■ Efficiently

● Provide interface continuity■ Focus on DX9-generation HW and beyond■ But provide support for DX8-class HW too■ Support both OpenGL and Direct3D

● Allow easy, incremental adoption

Page 20: Programmable Graphics Hardware

David Luebke 20 04/19/23

Easy adoption for applications

● Avoid owning the application’s data■ No scene graph■ No buffering of vertex data

● Compiler sits on top of existing APIs■ User can examine assembly-code output■ Can compile either at run time, or at application-development

time● Allow partial adoption

e.g. Use Cg vertex program with assembly fragment program● Support current hardware

Page 21: Programmable Graphics Hardware

David Luebke 21 04/19/23

Some points inthe design space

● CPU languages■ C – close to the hardware; general purpose■ C++, Java, lisp – require memory management■ RenderMan – specialized for shading

● Real-time shading languages■ Stanford shading language■ Creative Labs shading language

Page 22: Programmable Graphics Hardware

David Luebke 22 04/19/23

Design strategy

● Start with C(and a bit of C++)

■ Minimizes number of decisions■ Gives you known mistakes instead of unknown ones

● Allow subsetting of the language● Add features desired for GPU’s

■ To support GPU programming model■ To enable high performance

● Tweak to make it fit together well

Page 23: Programmable Graphics Hardware

David Luebke 23 04/19/23

How are current GPU’s different from CPU?

1. GPU is a stream processor■ Multiple programmable processing units■ Connected by data flows

ApplicationVertexProcessor

FragmentProcessor

Assem

bly &R

asterization

Fram

ebufferO

perations

Fram

ebuffer

Textures