vertex shader tricks new ways to use the vertex shader to improve performance bill bilodeau...

33
Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Upload: penelope-claydon

Post on 01-Apr-2015

242 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Vertex Shader Tricks

New Ways to Use the Vertex Shader to Improve Performance

Bill BilodeauDeveloper Technology Engineer, AMD

Page 2: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Topics Covered

●Overview of the DX11 front-end pipeline●Common bottlenecks●Advanced Vertex Shader Features●Vertex Shader Techniques●Samples and Results

Page 3: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Graphics Hardware

DX11 Front-End Pipeline●VS –vertex data●HS – control points●Tessellator●DS – generated vertices●GS – primitives●Write to UAV at all stages

● Starting with DX11.1

Vector GPR’s(256 2048-bit registers)

Vector ALU(1 64-way single precision operation every 4 clocks)

Scalar ALU(1 operation every 4 clocks)

Scalar GPR’s(256 64-bit registers)

Vector/Scalar cross communication bus

Vector GPR’s(256 2048-bit registers)

Vector ALU(1 64-way single precision operation every 4 clocks)

Scalar ALU(1 operation every 4 clocks)

Scalar GPR’s(256 64-bit registers)

Vector/Scalar cross communication bus

Vector GPR’s(256 2048-bit registers)

Vector ALU(1 64-way single precision operation every 4 clocks)

Scalar ALU(1 operation every 4 clocks)

Scalar GPR’s(256 64-bit registers)

Vector/Scalar cross communication bus

.

.

.

Input Assembler

Hull Shader

Domain Shader

Tessellator

Geometry Shader

Stream Out

CB,SRV,or

UAV

Vertex Shader

Page 4: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Bottlenecks - VS

●VS Attributes● Limit outputs to 4 attributes (AMD)

●This applies to all shader stages (except PS)

●VS Texture Fetches● Too many texture fetches can add latency

●Especially dependent texture fetches●Group fetches together for better performance●Hide latency with ALU instructions

Page 5: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Bottlenecks - VS

●Use the caches wisely● Avoid large vertex formats

that waste pre-VS cache space

● DrawIndexed() allows for reuse of processed vertices saved in the post-VS cache

●Vertices with the same index only need to get processed once

Vertex Shader

Pre-VS Cache(Hides Latency)

Input Assembler

Post-VS Cache(Vertex Reuse)

Page 6: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Bottlenecks - GS

●GS ● Can add or remove primitives● Adding new primitives requires storing new

vertices●Going off chip to store data can be a bandwidth issue

● Using the GS means another shader stage●This means more competition for shader resources●Better if you can do everything in the VS

Page 7: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Advanced Vertex Shader Features

●SV_VertexID, SV_InstanceID●UAV output (DX11.1)●NULL vertex buffer

● VS can create its own vertex data

Page 8: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

SV_VertexID

●Can use the vertex id to decide what vertex data to fetch●Fetch from SRV, or procedurally create a vertex

VSOut VertexShader(SV_VertexID id){

float3 vertex = g_VertexBuffer[id];…

}

Page 9: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

UAV buffers

●Write to UAVs from a Vertex Shader● New feature in DX11.1 (UAV at any stage)

●Can be used instead of stream-out for writing vertex data

● Triangle output not limited to strips ●You can use whatever format you want

●Can output anything useful to a UAV

Page 10: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

NULL Vertex Buffer

●DX11/DX10 allows this● Just set the number of vertices in Draw() ● VS will execute without a vertex buffer bound

●Can be used for instancing● Call Draw() with the total number of vertices● Bind mesh and instance data as SRVs

Page 11: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Vertex Shader Techniques

●Full Screen Triangle●Vertex Shader Instancing

● Merged Instancing●Vertex Shader UAVs

Page 12: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Full Screen Triangle

●For post-processing effects● Triangle has better performance

than quad●Fast and easy with VS generated coordinates

● No IB or VB is necessary●Something you should be using for full screen effects

Clip Space Coordinates

(-1, -1, 0)

(-1, 3, 0)

(3, -1, 0)

Page 13: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Full Screen Triangle: C++ code// Null VB, IBpd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL );pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 );pd3dImmediateContext->IASetInputLayout( NULL );

// Set Shaders pd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 );pd3dImmediateContext->PSSetShader( … );pd3dImmediateContext->PSSetShaderResources( … );

pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

// Render 3 vertices for the trianglepd3dImmediateContext->Draw(3, 0);

Page 14: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Full Screen Triangle: HLSL CodeVSOutput VSFullScreenTest(uint id:SV_VERTEXID){

VSOutput output;

// generate clip space positionoutput.pos.x = (float)(id / 2) * 4.0 - 1.0;output.pos.y = (float)(id % 2) * 4.0 - 1.0;output.pos.z = 0.0;output.pos.w = 1.0;

// texture coordinatesoutput.tex.x = (float)(id / 2) * 2.0;output.tex.y = 1.0 - (float)(id % 2) * 2.0;

// coloroutput.color = float4(1, 1, 1, 1);

return output;}

Clip Space Coordinates

(-1, -1, 0)

(-1, 3, 0)

(3, -1, 0)

Page 15: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

VS Instancing: Point Sprites

●Often done on GS, but can be faster on VS● Create an SRV point buffer and bind to VS● Call Draw or DrawIndexed to render the full

triangle list. ● Read the location from the point buffer and

expand to vertex location in quad● Can be used for particles or Bokeh DOF sprites● Don’t use DrawInstanced for a small mesh

Page 16: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Point Sprites: C++ Code

pd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 );

pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);

Page 17: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Point Sprites: HLSL CodeVSInstancedParticleDrawOut VSIndexBuffer(uint id:SV_VERTEXID){ VSInstancedParticleDrawOut output;

uint particleIndex = id / 4;uint vertexInQuad = id % 4; // calculate the position of the vertexfloat3 position;position.x = (vertexInQuad % 2) ? 1.0 : -1.0;position.y = (vertexInQuad & 2) ? -1.0 : 1.0;position.z = 0.0;position.xy *= PARTICLE_RADIUS;

position = mul( position, (float3x3)g_mInvView ) + g_bufPosColor[particleIndex].pos.xyz; output.pos = mul( float4(position,1.0), g_mWorldViewProj ); output.color = g_bufPosColor[particleIndex].color;

// texture coordinateoutput.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0;output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0;

return output;}

Page 18: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Point Sprite Performance

Indexed, 500K Sprites

Non-Indexed, 500K Sprites

GS, 500K Sprites

DrawInstanced, 500K Sprites

Indexed, 1M Sprites

Non-Indexed, 1M Sprites

GS, 1M Sprites DrawInstanced, 1M Sprites

R9 290x (ms)

0.52 0.77 1.38 1.77 1.02 1.53 2.7 3.54

Titan (ms) 0.52 0.87 0.83 5.1 1.5 1.92 1.6 10.3

1

3

5

7

9

11

AMD Radeon R9 290x

Nvidia Titan

Page 19: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Point Sprite Performance

●DrawIndexed() is the fastest method●Draw() is slower but doesn’t need an IB●Don’t use DrawInstanced() for creating sprites on either AMD or NVidia hardware

● Not recommended for a small number of vertices

Page 20: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Merge Instancing

●Combine multiple meshes that can be instanced many times

● Better than normal instancing which renders only one mesh

● Instance nearby meshes for smaller bounding box

●Each mesh is a page in the vertex data● Fixed vertex count for each mesh

●Meshes smaller than page size use degenerate triangles

Page 21: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Merge Instancing

Mesh Vertex Data

Mesh Data 0

Mesh Data 1

Mesh Data 2

.

.

.

Mesh Instance Data

Instance 0

Mesh Index 2

Instance 1

Mesh Index 0

.

.

.Degenerate

Triangle

Vertex 0Vertex 1Vertex 2Vertex 3

.

.

.000

Fixed Length Page

Page 22: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Merged Instancing using VS

●Use the vertex ID to look up the mesh to instance

● All meshes are the same size, so (id / SIZE) can be used as an offset to the mesh

● Faster than using DrawInstanced()

Page 23: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Merge Instancing Performance

DrawInstanced Soft Instancing0

5

10

15

20

25

30

R9 290xGTX 780

●Instancing performance test by Cloud Imperium Games for Star Citizen●Renders 13.5M triangles (~40M verts)●DrawInstanced version calls DrawInstanced() and uses instance data in a vertex buffer●Soft Instancing version uses vertex instancing with Draw() calls and fetches instance data from SRV

AMD Radeon R9 290X

Nvidia GTX 780

ms

Page 24: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Vertex Shader UAVs

●Random access Read/Write in a VS●Can be used to store transformed vertex data for use in multi-pass algorithms●Can be used for passing constant attributes between any shader stage (not just from VS)

Page 25: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Skinning to UAV

●Skin vertex data then output to UAV● Instance the skinned UAV data multiple times

●Can also be used for non-instanced data● Multiple passes can reuse the transformed

vertex data – Shadow map rendering●Performance is about the same as stream-out, but you can do more …

Page 26: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Bounding Box to UAV

●Can calculate and store Bbox in the VS● Use a UAV to store the min/max values (6)● InterlockedMin/InterlockedMax determine min

and max of the bbox●Need to use integer values with atomics

●Use the stored bbox in later passes● GPU physics (collision)● Tile based processing

Page 27: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Bounding Box: HLSL Codevoid UAVBBoxSkinVS(VSSkinnedIn input, uint id:SV_VERTEXID ){

// skin the vertex. . .// output the max and min for the bounding boxint x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integerint y = (int) (vSkinned.Pos.y * FLOAT_SCALE);int z = (int) (vSkinned.Pos.z * FLOAT_SCALE);

InterlockedMin(g_BBoxUAV[0], x);InterlockedMin(g_BBoxUAV[1], y);InterlockedMin(g_BBoxUAV[2], z);InterlockedMax(g_BBoxUAV[3], x);InterlockedMax(g_BBoxUAV[4], y);InterlockedMax(g_BBoxUAV[5], z);. . .

Page 28: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Particle System UAV

●Single pass GPU-only particle system●In the VS:

● Generate sprites for rendering● Do Euler integration and update the particle

system state to a UAV

Page 29: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Particle System: HLSL Codeuint particleIndex = id / 4;uint vertexInQuad = id % 4;

// calculate the new position of the vertexfloat3 oldPosition = g_bufPosColor[particleIndex].pos.xyz;float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz;

// Euler integration to find new position and velocityfloat3 acceleration = normalize(oldVelocity) * ACCELLERATION;float3 newVelocity = acceleration * g_deltaT + oldVelocity;float3 newPosition = newVelocity * g_deltaT + oldPosition;g_particleUAV[particleIndex].pos = float4(newPosition, 1.0);g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0);

// Generate sprite vertices. . .

Page 30: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Conclusion

●Vertex shader “tricks” can be more efficient than more commonly used methods

● Use SV_Vertex ID for smarter instancing●Sprites●Merge Instancing

● UAVs add lots of freedom to vertex shaders●Bounding box calculation●Single pass VS particle system

Page 31: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Demos

●Particle System●UAV Skinning

● Bbox

Page 32: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Acknowledgements

●Merge Instancing● Emil Person, “Graphics Gems for Games”

SIGGRAPH 2011● Brendan Jackson, Cloud Imperium

●Thanks to● Nick Thibieroz, AMD● Raul Aguaviva (particle system UAV), AMD● Alex Kharlamov, AMD

Page 33: Vertex Shader Tricks New Ways to Use the Vertex Shader to Improve Performance Bill Bilodeau Developer Technology Engineer, AMD

Questions

[email protected]