Programming graphics hardware
Gustav TaxénResearch Lead
Avalanche Studios
• Sweden’s largest independent game developer
• Office in Stockholm
• ~140 employees
• Just Cause for PS2, PC, XBox, and 360 in 2006
• Just Cause 2 + two other projects in pipeline, different publishers, ~30 million dollars
Our games
• Large, open environments
• Sandbox-gameplay
• High audiovisual standards
• Multiplatform
• PlayStation 3
• XBox 360
• PC
Our technology
• Proprietary game engine
• R&D-intensive development process
• Task forces
• Close to scientific SOTA
• Collaboration with universities
• Contact us for [email protected]
ShadingEye position
Surface positionSurface normalLight position
Diffuse color layersSpecularityBump map
Environment map...
Pixel color
Fixed-function pipelines
Application
CPU
Per-vertexoperations
Primitiveassembly
Rasteriza-tion
Per-fragmentoperations
Framebuffer
Graphics hardware
Programmable pipelines
ApplicationCPU
Programmablevertex units Primitive
assemblyRasteriza-
tion
Programmablefragment units
Framebuffer
Graphics hardware (GPU)
Render target
Uniform vs. varying data
Vertex units
Vertex unit
Vertex shader
Vertex units
Vertex unit
Vertex position (object coord.)
Texture coords
Vertex position (eye coord.)Normal (eye coord.)
Color (RGB)
Color (RGB)
Normal (object coord.)
Matrices
User data
Vertex units
N2
L
float3 main(float3 N : NORMAL, uniform float3 L){ float d = dot(N, L); return float3(d, d, d);}
N1
L
Fragment/pixel units
Pixelunit
Pixel shader
Fragment/pixel units
Pixel unit
Interpolated normal
Interpolated texture coords
Color (RGBA)
Interpolated colorN
L
User data
Fragment/pixel units
float3 main(float2 uv : TEXCOORD0, sampler2D normalMap, uniform float3 L){ float3 N = tex2D(normalMap, uv); float d = dot(N, L); return float3(d, d, d);}
http://nehe.gamedev.net
Stream processors
Stream processors
Data in
Raster ops
Stream processors
VU
PU
VU VU VU
PU PU PU PU PU PU PU
4 Vertex Units + 8 Pixel Units
Stream processors
VU
PU
VU VU VU
PU PU PU PU PU PU PU
Some of the PU:s starve!
Vertex-intensive application
Stream processors
VU
PU
VU VU VU
PU PU PU PU PU PU PU
Some of the VU:s starve!
Pixel-intensive application
Stream processors
12 stream processors
VU VU PU PU
PU PU PU PU
PU PU PU PU
Units assume different ”roles”.Allows all to be used!
Stream processors
• Limited instruction set
• Optimized for 4D float vectors
• Inefficient branching
• Small, fast RAM
• Small, fast cache
• Excellent for SIMD ops
Stream processor
Data
Stream processor
...
nVidia GeForce 8800
• 128 stream processors @ 1.35 GHz
• 2560 x 1600
• 16x antialiasing using multisampling
• Up to 8 render targets, up to FP32 (128 bit) formats
• SLI (x2)
• Full support for Direct3D 10, OpenGL 2.0 and CUDA
• Blue-Ray-, HD-DVD-codecs
• API for physics (Quantum Effects)
nVidia GeForce 8800
nVidia GeForce 8800
Stream processors
Texture lookup andfiltering
Level 1 cache
nVidia GeForce 8800
Level 2 cache
Frame buffer
Raster ops
General purposeregisters
Direct3D 10
Input Assembler
Vertex Shader
Geometry Shader
Rasterizer
Pixel Shader
Output Merger
Stream output
Video Memory
Vertex buffer
Index buffer
Texture
Texture
Stream buffer
Texture
Texture
Depth/Stencil
Render target
Direct3D 10
Geometry shader
IBM/Sony/Toshiba Cell Broadband Engine
• PowerPC Processor Element (PPE)
• Memory management, task management
• Synergistic Processor Unit (SPU)
• Special instruction set
• 128 bytes memory (prog + data)
• 8 Synergistic Processing Elements (SPE)
• Designed for SIMD-algorithms
• Element Interconnect Bus (EIC)
• Primary memory bus
IBM/Sony/Toshiba Cell Broadband Engine
IBM/Sony/Toshiba Cell Broadband Engine
Cell BE uses a vectory datatype with 16 bytesExtended C/C++ provides access to it
IBM/Sony/Toshiba Cell Broadband Engine
int a[4] = { 1, 3, 5, 7 }; int b[4] = { 2, 4, 6, 8 }; int c[4];
c[0] = a[0] + b[0]; c[1] = a[1] + b[1]; c[2] = a[2] + b[2]; c[3] = c[3] + c[3];
int a[4] __attribute__((aligned(16))) = { 1, 3, 5, 7 }; int b[4] __attribute__((aligned(16))) = { 2, 4, 6, 8 }; int c[4] __attribute__((aligned(16))); __vector signed int *va = (__vector signed int *) a; __vector signed int *vb = (__vector signed int *) b; __vector signed int *vc = (__vector signed int *) c; *vc = vec_add(*va, *vb);
IBM/Sony/Toshiba Cell Broadband Engine
This is the reason whyconditionals are expensive on GPU:s!
Programmable pipelines (recap)
ApplicationCPU
Programmablevertex units Primitive
assemblyRasteriza-
tion
Programmablefragment units
Framebuffer
Graphics hardware (GPU)
Render target
Shader assembler
OpenGL Vertex Program Assembler
MOV R1, R2.yzwx;MUL R1.xyz, R2, R3;ADD R1, R2, -R3;RCP R1, R2.w;
dp3 r7.w, VECTOR_VERTEXTOLIGHT, VECTOR_VERTEXTOLIGHTrsq VECTOR_VERTEXTOLIGHT.w, r7.wdst r7, r7.wwww, VECTOR_VERTEXTOLIGHT.wwww mul r6, r5, ATTENUATION.w
D3D Vertex Shader Assembler
High-level shading languages
• Direct3D High-Level Shading Language (HLSL)
• nVidia Cg
• OpenGL Shading Language (GLSL)
• Pixar RenderMan
• Stanford Real-Time Shading Language
• ...
Shaders mean more work...
• Replaces automatic T&L...
• Must transform from object- to eye coords ourselves!
• Must perform lighting computations ourselves!
• Replaces blending
• Must combine multiple textures and alpha-blending ourselves!
Hardware capabilities
• For HLSL, function set is defined by the DirectX version
• Cg specifies a number of profiles with different function sets
• GLSL programs use #extension tags to specify the capabilities they need
Demos
Shader language shoot-out
• GLSL
• Pros: Simple to use, integrates well with OpenGL
• Cons: Many confusing reserved words in syntax
• Cg
• Pros: D3D/OpenGL cross-compability, used in games industry
• Cons: Slightly awkward syntax
• HLSL
• Pros: Games industry standard, integrates well with D3D, D3D 10
• Cons: Slightly awkward syntax, Windows platform only
Questions?
[email protected]@avalanchestudios.sewww.avalanchestudios.se