Download - A User-Programmable Vertex Engine
![Page 1: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/1.jpg)
A User-Programmable Vertex Engine
A User-Programmable Vertex Engine
Erik LindholmErik Lindholm
Mark KilgardMark Kilgard
Henry MoretonHenry Moreton
NVIDIA CorporationNVIDIA Corporation
Presented by Han-Wei ShenPresented by Han-Wei Shen
![Page 2: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/2.jpg)
Where does the Vertex Engine fit? Where does the Vertex Engine fit?
frame-bufferanti-aliasingframe-bufferanti-aliasing
textureblendingtexture
blending
setuprasterizer
setuprasterizer
Transform & LightingTransform & Lighting
Traditional Graphics Pipeline
![Page 3: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/3.jpg)
frame-bufferanti-aliasingframe-bufferanti-aliasing
textureblendingtexture
blending
setuprasterizer
setuprasterizer
Transform & LightingTransform & Lighting
GeForce 3 Vertex EngineGeForce 3 Vertex Engine
VertexProgramVertex
Program
![Page 4: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/4.jpg)
API SupportAPI Support
• Designed to fit into OpenGL and Designed to fit into OpenGL and D3D API’sD3D API’s
• Program mode vs. Fixed function Program mode vs. Fixed function modemode
• Load and bind programLoad and bind program
• Simple to add to old D3D and Simple to add to old D3D and OpenGL programsOpenGL programs
![Page 5: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/5.jpg)
Programming Model Programming Model
• Enable vertex program Enable vertex program •glEnable(GL_VERTEX_PROGRAM_NV);
• Create vertex program objectCreate vertex program object
• Bind vertex program object Bind vertex program object
• Execute vertex program object Execute vertex program object
![Page 6: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/6.jpg)
Create Vertex Program Create Vertex Program
• Programs (assembly) are defined Programs (assembly) are defined inline as inline as
character strings character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";
![Page 7: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/7.jpg)
Create Vertex Program (2)Create Vertex Program (2)
• Load and bind vertex programs Load and bind vertex programs similar to texture objects similar to texture objects glLoadProgramNV(GL_VERTEX_PROGRAM_NV, 7,
strelen(programString), programString);
….
glBindProgramNV(GL_VERTEX_PROGRAM_NV, 7);
![Page 8: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/8.jpg)
Invoke Vertex Program Invoke Vertex Program
• The vertex program is initiated The vertex program is initiated when a vertex is given, i.e., whenwhen a vertex is given, i.e., when
glBegin(…)glBegin(…)
glVertex3f(x,y,z)glVertex3f(x,y,z)
… …
glEnd()glEnd()
![Page 9: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/9.jpg)
Let’s look at the sample program
Let’s look at the sample program
static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \END";
O[HPOS] = M(c0,c1,c2,c3) * v - HPOS? O[COL0] = v[3] - COL0?
Calculate the clip space point position and Assign the vertex with v[3] as its diffuse color
![Page 10: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/10.jpg)
Vertex Source
Vertex Program
Vertex Output
Program Constants
Temporary Registers
16x4 registers
128 instructions
96x4 registers
12x4 registers
15x4 registers
Programming ModelProgramming Model
V[0] …V[15] c[0]
…c[96]
R0 …R11
O[HPOS]O[COL0]O[COL1]O[FOGP]O[PSIZ]O[TEX0] …O[TEX7]
All quad floats
![Page 11: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/11.jpg)
Input Vertex AttributesInput Vertex Attributes
• V[0] – V[15]V[0] – V[15]
• Aliased (tracked) with conventional per-Aliased (tracked) with conventional per-vertex attributes (Table 3)vertex attributes (Table 3)
• Use glVertexAttribNV() to explicitly assig Use glVertexAttribNV() to explicitly assig values values
• Can also specify a scalar value to the vertex Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV()attribute array - glVertexAttributesNV()
• Can change values inside or outside Can change values inside or outside glBegin()/glEnd() pairglBegin()/glEnd() pair
![Page 12: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/12.jpg)
Program ConstantsProgram Constants
• Can only change values outside glBegin()/glEnd() Can only change values outside glBegin()/glEnd() pair pair
• No automatic aliasing No automatic aliasing
• Can be used to track OpenGl matrices Can be used to track OpenGl matrices (modelview, projection, texture, etc.)(modelview, projection, texture, etc.)
• Example: Example:
glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV)GL_IDENTIGY_NV)
- track 4 contiguous program constants starting - track 4 contiguous program constants starting with c[0]with c[0]
![Page 13: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/13.jpg)
Program Constants (cont’d)
Program Constants (cont’d)
DP4 o[HPOS].x, c[0], v[OPOS]DP4 o[HPOS].x, c[0], v[OPOS]
DP4 o[HPOS].y, c[1], v[OPOS]DP4 o[HPOS].y, c[1], v[OPOS]
DP4 o[HPOS].z, c[2], v[OPOS]DP4 o[HPOS].z, c[2], v[OPOS]
DP4 o[HPOS].w, c[3], v[OPOS]DP4 o[HPOS].w, c[3], v[OPOS]
What does it do? What does it do?
![Page 14: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/14.jpg)
Program Constants (cont’d)
Program Constants (cont’d)
glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV)
DP3 R0.x, C[4], V[NRML]DP3 R0.x, C[4], V[NRML]
DP3 R0.y, C[5[, V[NRML]DP3 R0.y, C[5[, V[NRML]
DP3 R0.z, C[6], V[NRML] DP3 R0.z, C[6], V[NRML]
What doe it do? What doe it do?
![Page 15: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/15.jpg)
Hardware Block DiagramHardware Block Diagram
Vertex Attribute Buffer (VAB)
Vector FP Core
Vertex In
Vertex Out
![Page 16: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/16.jpg)
Vertex Attribute Buffer (VAB)
Vertex Attribute Buffer (VAB)
…
128 ( 32 x 4 )
128
dirty bitsVAB
….0 1 14 15IB
![Page 17: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/17.jpg)
0 1 n-2 n-1........IB
0 1 n-2 n-1........OB
SIMDVector Unit
SpecialFunction
Unit
ConstantMemory
InstructionMemory
Registers
writemask
sw/neg
writemask
sw/negsw/neg
HW Block DiagramHW Block Diagram
![Page 18: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/18.jpg)
Data PathData Path
FPU Core
NegateSwizzle
NegateSwizzle
NegateSwizzle
X Y Z WX Y Z W X Y Z W
Write Mask
X Y Z W
![Page 19: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/19.jpg)
Instruction Set: The opsInstruction Set: The ops
• 17 instructions total17 instructions total
• MOV, MUL, ADD, MAD, DSTMOV, MUL, ADD, MAD, DST
• DP3, DP4DP3, DP4
• MIN, MAX, SLT, SGEMIN, MAX, SLT, SGE
• RCP, RSQ, LOG, EXP, LITRCP, RSQ, LOG, EXP, LIT
• ARL ARL
![Page 20: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/20.jpg)
Instruction Set: The Core FeaturesInstruction Set: The Core Features
• Immediate access to sourcesImmediate access to sources
• Swizzle/negate on all sourcesSwizzle/negate on all sources
• Write mask on all destinationsWrite mask on all destinations
• DP3,DP4 most common graphics opsDP3,DP4 most common graphics ops
• Cross product is MUL+MAD with Cross product is MUL+MAD with swizzlingswizzling
• LIT instruction implements LIT instruction implements phongphonglightinglighting
![Page 21: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/21.jpg)
Dot Product Instruction Dot Product Instruction
DP3 R0.x, R1, R2DP3 R0.x, R1, R2
R0.x = R1.x * R2.x + R1.y * R1.y + R0.x = R1.x * R2.x + R1.y * R1.y + R1.z * R2.zR1.z * R2.z
DP4 R0.x, R1, R2DP4 R0.x, R1, R2
4-component dot product 4-component dot product
![Page 22: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/22.jpg)
MUL instruction MUL instruction
MUL R1, R0, R2 MUL R1, R0, R2 (component-wise (component-wise mult.)mult.)
R1.x = R0.x * R2.x R1.x = R0.x * R2.x
R1.y = R0.y * R2.y R1.y = R0.y * R2.y
R1.z = R0.z * R2.z R1.z = R0.z * R2.z
R1.w = R0.w * R2.w R1.w = R0.w * R2.w
![Page 23: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/23.jpg)
MAD instruction MAD instruction
MAD R1, R2, R3, R4MAD R1, R2, R3, R4
R1 = R2 * R3 + R4 R1 = R2 * R3 + R4
*: component wise multiplication*: component wise multiplication
Example: Example:
MAD R1, R0.yzxw, R2.zxyw, -R1MAD R1, R0.yzxw, R2.zxyw, -R1
What does it do? What does it do?
![Page 24: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/24.jpg)
Cross Product Coding ExampleCross Product Coding Example
# Cross product R2 = R0 x R1# Cross product R2 = R0 x R1
MUL R2, R0.zxyw, R1.yzxw;MUL R2, R0.zxyw, R1.yzxw;MAD R2, R0.yzxw, R1.zxyw, -R2;MAD R2, R0.yzxw, R1.zxyw, -R2;
![Page 25: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/25.jpg)
Lighting instructionLighting instruction
LIT R1, R0 LIT R1, R0 (phong light model)(phong light model)Input: R0 = (diffuse, specular, ??, shiness)Input: R0 = (diffuse, specular, ??, shiness)
Output R1 = (1, diffuse, specular^shininess, Output R1 = (1, diffuse, specular^shininess, 1)1)
Usually followed by Usually followed by
DP3DP3 o[COL0], C[21], R1 o[COL0], C[21], R1 (assuming using (assuming using c[21]) c[21])
where C[xx] = (ka, kd, ks, ??) where C[xx] = (ka, kd, ks, ??)
![Page 26: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/26.jpg)
Ready to trace some program? Ready to trace some program?
![Page 27: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/27.jpg)
Previous Work: Geometry EnginePrevious Work: Geometry Engine
• High bandwidth + lots of FlopsHigh bandwidth + lots of Flops
• Low clock rateLow clock rate
• No architectural continuityNo architectural continuity
• VERY hard to programVERY hard to program
• Some high-level language support Some high-level language support (maybe)(maybe)
• A compromise solution (vtx,prim,pix,A compromise solution (vtx,prim,pix,…)…)
![Page 28: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/28.jpg)
Alternative: The CPUAlternative: The CPU
• Low bandwidth + reasonable FlopsLow bandwidth + reasonable Flops
• High clock rateHigh clock rate
• Excellent architectural continuityExcellent architectural continuity
• VERY hard to use efficientlyVERY hard to use efficiently
• Excellent high-level language Excellent high-level language supportsupport
• Flexible, but often too slowFlexible, but often too slow
![Page 29: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/29.jpg)
New Design: The Vertex EngineNew Design: The Vertex Engine
• Simple hardware for a commodity Simple hardware for a commodity GPUGPU
• Allows user to manipulate vertex Allows user to manipulate vertex transformtransform
• Simple to use programming modelSimple to use programming model
• Superset of fixed function modeSuperset of fixed function mode
![Page 30: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/30.jpg)
Why Vertex Processing?Why Vertex Processing?
• Very parallelVery parallel
• Use single vertex programming Use single vertex programming modelmodel
• Hardware can batch or interleaveHardware can batch or interleave
• KISSKISS
![Page 31: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/31.jpg)
Why Not Primitive Processing?Why Not Primitive Processing?
• Face culling and clipping break Face culling and clipping break parallelismparallelism
• Complicates memory accessesComplicates memory accesses
• Inefficient (control takes time)Inefficient (control takes time)
• Let hardware designers optimizeLet hardware designers optimize
![Page 32: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/32.jpg)
Programming Model: Vertex I/OProgramming Model: Vertex I/O
• Streaming vertex architectureStreaming vertex architecture
• Source data converted to floatsSource data converted to floats
• Source data loadedSource data loaded
• Run programRun program
• Destination data drainedDestination data drained
• Destination data re-formatted for Destination data re-formatted for hwhw
![Page 33: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/33.jpg)
Hardware ImplementationHardware Implementation
• Vector SIMD Unit + Special Vector SIMD Unit + Special Function UnitFunction Unit
• Multithreaded and pipelined to hide Multithreaded and pipelined to hide latencylatency
• Any one instruction/cycleAny one instruction/cycle
• All instructions equal latencyAll instructions equal latency
• Free swizzling/negate/write mask Free swizzling/negate/write mask supportsupport
![Page 34: A User-Programmable Vertex Engine](https://reader035.vdocuments.us/reader035/viewer/2022062314/5681444d550346895db0ea95/html5/thumbnails/34.jpg)
ConclusionConclusion
• Very simple, efficient Very simple, efficient implementationimplementation
• Allows vertex programming Allows vertex programming continuitycontinuity
• Stanford Imagine ArchitectureStanford Imagine Architecture
• A work in progress, lots more to A work in progress, lots more to come…come…
• We welcome your feedbackWe welcome your feedback