technology behind amd’s “leo demo” jay mckee mts engineer, amd
DESCRIPTION
Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD. Why Forward Rendering ?. Complex materials Multiple light types Supports hardware anti-aliasing Efficient memory usage Supports transparency BUT , previously could not support a large number of lights. - PowerPoint PPT PresentationTRANSCRIPT
Technology Behind AMD’s “Leo Demo”
Jay McKeeMTS Engineer, AMD
Why Forward Rendering?
● Complex materials● Multiple light types● Supports hardware anti-aliasing● Efficient memory usage● Supports transparency● BUT, previously could not support a
large number of lights
Forward+ Rendering
● Modified forward renderer. Add computer shader for light culling. Modify main light loop.
● Lighting and shading done in the same place, all information is preserved.
Forward+ Rendering (continued)● No limits on parameters for lights and
materials● Omni● Spot● Cinematic (arbitrary falloffs, barndoor)● BRDF per material instance
● Simple design, concentrate on rendering, not engine maintenance.
Important DX11 features
●Compute Shaders●UAV support.
Compute Shaders
●In Leo demo we use two compute shaders:● One for culling lights.● Another for spawning Virtual Point Lights (VPLs)
for indirect lighting.
● Culling 3,072 lights takes 1.7 ms on high end GPU.
UAVs
● Array(s) of scene light information.● Array of u32 light indices for storing
start/end lights per-tile.● Array of material instance data
Algorithm summary● Depth Pre-Pass● Light Culling
● Screen divided into tiles. Launch compute shader per tile.● Light info such as position, radius, direction, length
passed to light culling compute shader.● Light culling shader projects lights bounds to screen-
space tiles. Uses scene depth from z pre-pass for z testing against light volumes.
● Outputs to UAV describing per tile light list start/end along with a large UAV of u32 array of light indices.
● Output UAVs are passed to main light shaders for looping through lights per-pixel.
Algorithm summary continued● Render scene materials
● Base light accumulation function● Use screen x, y location to determine tileID● From tileID, get light start and end indices● From start index to end index, loop● Entry is index into light array.● Accumulate light hitting pixel● Returns total direct and indirect light hitting
pixel.
Algorithm summary continued
● Material shader● Decides what to do with total incoming light● Passed into material’s BRDF for example● Uses light accumulation building blocks
● Env. lighting, base light accumulation, BRDF, etc. are put together for final pixel color.
Light Culling Shader Details (1/3)
// 1. prepare
float4 frustum[4];
float minZ, maxZ;
{
ConstructFrustum( frustum );
minZ = thread_REDUCE(MIN, depth );
maxZ = thread_REDUCE(MAX, depth );
ldsMinZ = SIMD_REDUCE(MIN, minZ );
ldsMaxZ = SIMD_REDUCE(MAX, maxZ );
minZ = ldsMinZ;
maxZ = ldsMaxZ;
}
Light Culling Shader Details (2/3)__local u32 ldsNLights = 0;
__local u32 ldsLightBuffer[MAX];
// 2. overlap check, accumulate in LDS
for(int i=threadIdx; i<nLights; i+=WG_SIZE)
{
Light light = fetchAndTransform( lightBuffer[ i ] );
if( overlaps( light, frustum ) && overlaps ( light, minZ, maxZ ) )
{
AtomicAppend( ldsLightBuffer, i );
}
}
Light Culling Shader Details (3/3)// 3. export to global
__local u32 ldsOffset;
if( threadIdx == 0 )
{
ldsOffset = AtomAdd( ldsNLights );
globalLightStart[tileIdx] = ldsOffset;
globalLightEnd[tileIdx] = ldsOffset + ldsNLights;
}
for(int i=threadIdx; i< ldsNLights; i+=WG_SIZE)
{
int dstIdx = ldsOffset + i;
globalLightIndexBuffer[dstIdx] = ldsLightBuffer[i];
}
// BaseLighting.inc // THIS INC FILE IS ALL THE COMMON LIGHTING CODE
StructuredBuffer<float4> LightParams : register(u0);StructuredBuffer<uint> LowerBoundLights : register(u1);StructuredBuffer<uint> UpperBoundLights : register(u2);StructuredBuffer<int2> LightIndexBuffer : register(u3);
uint GetTileIndex(float2 screenPos){ float tileRes = (float)m_tileRes; uint numCellsX = (m_width + m_tileRes - 1)/m_tileRes; uint tileIdx = floor(screenPos.x/tileRes)+floor(screenPos.y/tileRes)*numCellsX;
return tileIdx;}
}
Light Accumulation Pseudo-code
Light Accumulation (2):StartHLSL BaseLightLoopBegin // THIS IS A MACRO, INCLUDED IN MATERIAL SHADERS
uint tileIdx = GetTileIndex( pixelScreenPos ); uint startIdx = LowerBoundLights[tileIdx]; uint endIdx = UppweBoundLights[tileIdx];
[loop] for ( uint lightListIdx = startIdx; lightListIdx < endIdx; lightListIdx++ ) {
int lightIdx = LightIndexBuffer[lightListIdx];
// Set common light parametersfloat ndotl = max(0, dot(normal, lightVec));
float3 directLight = 0;float3 indirectLight = 0;
Light Accumulation (3):
if( lightIdx >= numDirectLightsThisFrame ) { CalculateIndirectLight(lightIdx , indirectLight); } else { if( IsConeLight( lightIdx ) ) { // <<== Can add more light types here CalculateDirectSpotlight(lightIdx , directLight); } else { CalculateDirectSpherelight(lightIdx , directLight); } }
float3 incomingLight = (directLight + indirectLight)*ndotl; float shadowTerm = CalcShadow();
EndHLSL
StartHLSL BaseLightLoopEnd }EndHLSL
Material Shader Template:#include "BaseLighting.inc"
float4 PS ( PSInput i ) : SV_TARGET{ float3 totalDiffuse = 0; float3 totalSpec = GetEnvLighting();;
$include BaseLightLoopBegin
// unique material code goes here!! Light accumulation on the pixel for a given light// we have total incoming light and direct/indirect light components as well as material params and shadow term// use these building blocks to integrate lighting terms
totalDiffuse += GetDiffuse(incomingLight); totalSpec += CalcPhong(incomingLight);
$include BaseLightLoopEnd
float3 finalColor = totalDiffuse + totalSpec; return float4( finalColor, 1 );}
Debug Mode Demo
Benchmark
3k dynamic lights
Compute-based Deferred v.s. Forward+
Forward+(L)
Forward+(H)
Deferred(L)
Deferred(H)
0 2 4 6 8 10 12 14 16 18 20
Prepass Light processing
Final shading
Time (ms)
Takahiro Harada, Jay McKee, Jason C.Yang, Forward+: Bringing Deferred Lighting to the Next Level, Eurographics Short Paper (2012)
Depth Pre-Pass Critical
● Pixel overdraw cripples this technique so depth pre-pass is required.
● Depth pre-pass is good opportunity to use MRT to generate other full-screen data needed for post-fx and other render fx (optional).
Other important points
● XBOX 360 has good bandwidth so given limitations on forward rendering, deferred makes a lot of sense.
● However, ALU computation growing at faster rate than bandwidth. more and more feasible to just do the calculations than to read/write so much data.
● Dynamic branching penalties not nearly as bad as before. As an optimization, compute shader can sort by light-type for example to minimize penalties.
● All that "light management" CPU side code to decide which lights hit each object for setting constant registers can be ditched!
Summary
● Modified forward renderer that handles scenes with 1000s of lights.
● Hardware anti-aliasing (MSAA) “automatic”● Bandwidth friendly.● Makes the most of the GPU's ALU power (which is
growing faster than bandwidth)
Thanks!Contact: [email protected]@[email protected]
Leo Demo website:http://developer.amd.com/samples/demos/pages/AMDRadeonHD7900SeriesGraphicsReal-TimeDemos.aspx
Eurographics 2012: 'Forward+: Bringing Deferred Lighting to the Next Level'