grass, fur and all things hairy - amd at gdc14

54

Click here to load reader

Upload: amd-developer-central

Post on 16-Apr-2017

2.292 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Grass, Fur and all Things Hairy - AMD at GDC14

Grass, Fur and all things hairyNicolas Thibieroz Karl HilleslandGaming Engineering Manager, AMD Senior Research Engineer, AMD

Page 2: Grass, Fur and all Things Hairy - AMD at GDC14

Next-gen Grass, Fur and Hair●The time for next-gen quality is now●Tomb Raider pioneered next-gen hair

● Even on PS4/XB1●Users expect this level of quality for next-gen titles●You need to start thinking about this●This talk is about making high-quality fur, grass and hair run at real-time performance

Page 3: Grass, Fur and all Things Hairy - AMD at GDC14

TressFX applied to Grass, Fur and Hair●Variations of the same technique can be used for all those applications●In all cases the core principles of next-gen quality are still needed:

● Compute simulations● Anti-aliasing● Transparency● Volumetric self-shadowing● A good lighting model

Page 4: Grass, Fur and all Things Hairy - AMD at GDC14

Forward Rendering Pipeline – a refresher

●Consists of three steps:● Hair simulation● Shade and store fragments into buffers● Fetch shaded fragments, sort and render

Page 5: Grass, Fur and all Things Hairy - AMD at GDC14

// Retrieve current pixel count and increase counter uint uPixelCount = LinkedListUAV.IncrementCounter(); uint uOldStartOffset;

// Exchange indices in LinkedListHead texture corresponding to pixel location InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);

// Append new element at the end of the Fragment and Link Buffer Element.uNext = uOldStartOffset; LinkedListUAV[uPixelCount] = Element;

● Head UAV● Each pixel location has a “head pointer” to a linked list in

the PPLL UAV● PPLL UAV

● As new fragments are rendered, they are added to the next open location in the PPLL (using UAV counter)

● A link is created to the fragment pointed to by the head pointer

● Head pointer then points to the new fragment

Per-Pixel Linked Lists

Head UAV

PPLL UAV

Page 6: Grass, Fur and all Things Hairy - AMD at GDC14

CSCSCS

Input Geometry Post-simulation geometry (UAV)

Forward Rendering Pipeline – a refresherHair Simulation

Simulation parameters

Model space

World space

Page 7: Grass, Fur and all Things Hairy - AMD at GDC14

Forward Rendering Pipeline – a refresherShade and Store fragments into Buffers

Coverage

depth

color

coverage

nextLighting

VS PS

Homogeneous clip space

World space

Null RT

Stencil

PPLL UAV

Head UAV

Shadows

Extrusion from line segments to non-indexed

triangles

Page 8: Grass, Fur and all Things Hairy - AMD at GDC14

Full Screen Quad

Forward Rendering Pipeline – a refresherFetch shaded fragments, sort and render

VS PS

Stencil

Head UAV

PPLL UAV

Render targetFragment sorting and

manual blending

Page 9: Grass, Fur and all Things Hairy - AMD at GDC14

Forward Rendering Performance●Main cost in forward rendering mode is in the shading part

● All fragments are lit and shadowed before being stored● PPLL storing is typically not the bottleneck!

●Don’t need maximum quality on all fragments● “tail” fragments need only “good enough” quality

●Solution: Use shader LOD

Page 10: Grass, Fur and all Things Hairy - AMD at GDC14

Forward vs Deferred Rendering PipelineDeferred rendering pipeline

●Hair simulation●Store fragment properties into buffers●Fetch fragment properties, sort, shade and render

● Full shading on K-frontmost fragments

● “Tail” fragments are shaded with a simpler light equation and shadowing algorithm

Forward rendering pipeline

●Hair simulation●Full shading and store fragments into buffers●Fetch shaded fragments, sort and render

Page 11: Grass, Fur and all Things Hairy - AMD at GDC14

CSCSCS

Input Geometry Post-simulation geometry (UAV)

Deferred Rendering PipelineHair Simulation – unchanged!

Simulation parameters

Model space

World space

Page 12: Grass, Fur and all Things Hairy - AMD at GDC14

Deferred Rendering Pipeline – a refresherStore Fragment Properties into Buffers

Coverage

depth

tangent

coverage

next

VS PS

Homogeneous clip space

World space

Null RT

Stencil

PPLL UAV

Head UAV

Index Buffer

Indexed triangle list

Page 13: Grass, Fur and all Things Hairy - AMD at GDC14

Deferred Rendering PipelineFetch fragments, sort, shade and render

VS PS

Stencil

Head UAV

PPLL UAV

Render targetK frontmost fragment: full shading, sorting and manual blending

Lighting Shadows

Full Screen Quad

Tail fragments: cheap chading, no sorting and manual blending

Page 14: Grass, Fur and all Things Hairy - AMD at GDC14

Deferred Rendering Shading LOD Optimization●Deferred approach allows a reduction in shading cost “Shader LOD”

● Only sort and shade K frontmost fragments at high quality● “Simple” shading and out-of-order rendering on tail fragments● Single-tap shadowing on tail fragments

●Very little quality difference compared to full shading● But much better performance!

Technique CostOut of order, no shading 1.31 ms

Out of order, shading 2.80 ms

Forward PPLL, shading 3.38 ms

Deferred PPLL, shading 2.13 ms Fur model with ~130,000 fur strandsRunning on AMD Radeon 7970 @ 1080p

Shading cost is ~ 1.5 ms

PPLL costis ~ 0.58 ms

Fast!

Page 15: Grass, Fur and all Things Hairy - AMD at GDC14

Full quality shading forced on for all fragments

Shading LOD

Page 16: Grass, Fur and all Things Hairy - AMD at GDC14

●A great portion of time was spent in the GPU front-end● 920,000 line segments for fur model

●Expansion from line segments to triangles was done in GS and then VS with Draw()● Each segment would create a quad (two triangles) with 6 vertices

Geometry Optimizations

DrawIndexed() method

Indexed triangle list = { ( 0, 1, 2 ), (2, 1, 3 ), ( 2, 3, 4 ), (4, 3, 5 ), ( … ) };

1

Line segments Expanded quads

0

1

2

3 2

4

0

5

1,4

Draw() method

Line segments Expanded quads

0

1

2

3,562,3

7,10

8,9

0

11

Triangle list = { ( 0, 1, 2 ), ( 3, 4, 5 ), ( 6, 7, 8 ), (9, 10, 11 ), ( … ) };

●Offline creation of index buffer plus DrawIndexed() maximizes post vertex cache use!

Page 17: Grass, Fur and all Things Hairy - AMD at GDC14

●Input line segments have a random order●Just render fewer (but thicker) fragments when far away!●Needs shading adjustments to ensure smooth quality transitions●Increase alpha threshold for fragment inclusion when far away

Distance-based LOD system Optimization

Page 18: Grass, Fur and all Things Hairy - AMD at GDC14

●PPLL Head UAV uses a RWTexture2D instead of a Buffer● Results in more efficient caching for UAV accesses

●Avoid GPR indexing for sorting● Sorting K frontmost fragments required array of Generic Purpose Registers with

random indexing into it● Used an ALU-based indexing approach to improve performance

●TO DO: compute shader simulation optimizations● Currently a set of multiple compute shaders● Looking at combining some of these, optimizing shaders and output formats

Other Optimizations

Page 19: Grass, Fur and all Things Hairy - AMD at GDC14

Per-Pixel Linked Lists UAV Memory Considerations

●How much memory is needed?● Guesstimate for a given usage model● Max (hair pixels x average overdraw) fragments

●What happens when I run out?● Missing fragments

●What can be done about it?

Page 20: Grass, Fur and all Things Hairy - AMD at GDC14

k-Buffer in Memory

Page 21: Grass, Fur and all Things Hairy - AMD at GDC14

PP Linked-List (PPLL) k-Buffer fixed size array

Node Pool

All fragments

How big?

k k k k k k k kk k k k k k k kk k k k k k k kk k k k k k k kk k k k k k k kk k k k k k k k

Simple Memory Bound

Page 22: Grass, Fur and all Things Hairy - AMD at GDC14

The Front kApproximation to avoid massive sorting●Only sort the front k fragments per-pixel●Blend the rest out-of-order

If deferring for shader LOD … also● Full quality shade on front k● Cheap shade on rest

20 frags/pixel (ave) Red = over 100

k is 4, 8, 16

Page 23: Grass, Fur and all Things Hairy - AMD at GDC14

The Front kApproximation to avoid massive sorting●Only sort the front k fragments per-pixel●Blend the rest out-of-order

If deferring for shader LOD … also● Full quality shade on front k● Cheap shade on rest

k-Buffer

Tail

Can’t know front k until all fragments processed

Page 24: Grass, Fur and all Things Hairy - AMD at GDC14

k-Buffer

For Each Fragment in Each Pixel

Index of furthest

New Fragment

BlendTail ColorTail

Fragment

Page 25: Grass, Fur and all Things Hairy - AMD at GDC14

If New Fragment in k

Index of furthest

k-BufferBlend

Tail Color

If in k1. Swap with furthest2. Find new furthest3. Blend with tail

Tail Fragment

New Fragment

Page 26: Grass, Fur and all Things Hairy - AMD at GDC14

If not in k

Index of furthest

k-BufferBlend

Tail Color

If not in k1. Blend with tail

Tail Fragment

New Fragment

Page 27: Grass, Fur and all Things Hairy - AMD at GDC14

From PPLL to k-BufferFor each pixel:

Write frags to memFor each fragment in each pixel

read fragment from memupdate k-buffer (reg)blend tail fragment (reg)

Read k-buffer from memSort and blend k-buffer (reg)

update k-buffer (mem)blend tail fragment

(mem)

Page 28: Grass, Fur and all Things Hairy - AMD at GDC14

k-Buffer

Screen Width

Scre

en H

eigh

t

k

8 bytes each(depth and data)

PPLL nodes were 12 bytes(depth, data, next)

K=4, 8, 16

Page 29: Grass, Fur and all Things Hairy - AMD at GDC14

PPLL: 2nd Pass

New Fragment

Index of furthest

BlendTail ColorTail

Fragment

k-Buffer

Registers

Page 30: Grass, Fur and all Things Hairy - AMD at GDC14

k-Buffer in Memory: 1st Pass

New Fragment

Index of furthest

BlendTail ColorTail

FragmentMutex, index, …

BlendUnit

k-Buffer

Memory

Page 31: Grass, Fur and all Things Hairy - AMD at GDC14

Mutex/Count/Index Buffer

Screen Width

Scre

en H

eigh

t

Mutex BitInitialized Bit

Max Index(4 bits)

Count(remainder)

High bit

32 bits

Page 32: Grass, Fur and all Things Hairy - AMD at GDC14

Spinlock Mutex[allow_uav_condition]for(; i<MAX_LOOP_COUNT && !bStop; ++i){ uint oldID; InterlockedExchange( tRWMutex[vScreenAddress], RESERVED, oldID); if( (oldID&RESERVED) != RESERVED) ) {

[[ … Do work ]]DeviceMemoryBarrier();tRWMutex[vScreenAddress] = (new_max_id<<28)+INITED;bStop = true;

} // end mutex check}// end spinlock loop

Paranoia

Try

ReleaseDo Work

Page 33: Grass, Fur and all Things Hairy - AMD at GDC14

Find New Max Depthuint new_max_depth = u_inDepth;[unroll] for(int t=0; t<KBUFFER_SIZE; t++){

uint element_depth = DEPTH( vScreenAddress, t );

if(element_depth > new_max_depth ){

new_max_depth = element_depth;new_max_id = t;

}}

Generally more memory traffic

than PPLL

Page 34: Grass, Fur and all Things Hairy - AMD at GDC14

Initialization: The first kOptions●Clear k-buffer fullscreen (0,1)●Clear k-buffer stenciled, 3rd pass●Clear on first fragment●Count

Mutex BitInitialized Bit

Max Index(4 bits)

Count(remainder)

High bit

Page 35: Grass, Fur and all Things Hairy - AMD at GDC14

The first kInterlockedAdd( tRWMutex[vScreenAddress], 1, oldCount);

[allow_uav_condition]if(oldCount < KBUFFER_SIZE){ DATA(vScreenAddress,oldCount) = u_inData; DEPTH(vScreenAddress,oldCount) = u_inDepth; return uint2(u_outDepth,u_outData);}

Mutex BitInitialized Bit

Max Index(4 bits)

Count(remainder)

High bit

Page 36: Grass, Fur and all Things Hairy - AMD at GDC14

Models

2k polygons

~20k hairs~130k hairs

Stats2-3.5 M fragments

200-300k pixels

ShadingOne point light & shadow

2 shifted specular lobes

Page 37: Grass, Fur and all Things Hairy - AMD at GDC14

Depth Complexity

Grey 1Blue 8Green 50Red 100+

Page 38: Grass, Fur and all Things Hairy - AMD at GDC14

Contention

Max attempts per pixel, k=4

Dark Blue 1Aqua <=4Bright Aqua <=8

Page 39: Grass, Fur and all Things Hairy - AMD at GDC14

PerformanceTime ratio to out-of-order blending●Forward PPLL: 1.02 to 1.4●Forward k-Buffer: 1.2 to 1.4●Deferred PPLL: 0.7 to 0.9●Deferred k-Buffer: 0.9 to 1.6

Page 40: Grass, Fur and all Things Hairy - AMD at GDC14

K-Buffer in Memory●Simple memory bound●Can be less memory●Usually slower

● Increased memory traffic

Page 41: Grass, Fur and all Things Hairy - AMD at GDC14

Simulation

Page 42: Grass, Fur and all Things Hairy - AMD at GDC14

Hair Simulation●Length Constraint●Local Constraint●Global Constraint●Model Transform●Collision Shapes●External Forces (wind, gravity, etc.)

Page 43: Grass, Fur and all Things Hairy - AMD at GDC14

Fur Simulation●Length Constraint●Local Constraint●Global Constraint●Model Transform●Collision Shapes●External Forces (wind, gravity, etc.)

Page 44: Grass, Fur and all Things Hairy - AMD at GDC14

Grass Simulation●Length Constraint●Local Constraint (1D)●Global Constraint●Model Transform●Collision Shapes●External Forces (wind, gravity, etc.)

Page 45: Grass, Fur and all Things Hairy - AMD at GDC14

Constraint Method (iterative)

●Used for length, local and global constraints●Length is most difficult to converge

● particularly under large movement

C0

C1

Cn-2

p0

p2

Pn-2

Pn-1

Page 46: Grass, Fur and all Things Hairy - AMD at GDC14

Tridiagonal Matrix Formulation● Direct solve for length constraint

● Almost zero stretch● Limited to smaller time steps (stability)

● Still cheap● Leverages matrix structure of strands● Two sweeps of strand

Page 47: Grass, Fur and all Things Hairy - AMD at GDC14

Tridiagonal Matrix Formulation“Tridiagonal Matrix Formulation for Inextensible Hair Strand Simulation”, VRIPHYS, 2013

Page 48: Grass, Fur and all Things Hairy - AMD at GDC14

Demos

Page 49: Grass, Fur and all Things Hairy - AMD at GDC14

Summary●Next-gen look is possible now!●Deferred Rendering for shading LOD is fastest●k-buffer in memory is an option for memory-constrained situations●High-quality grass and fur simulation with compute

Upcoming TressFX 2 SDK sample update with fur scenario at http://developer.amd.com/tools-and-sdks/graphics-development/amd-radeon-sdk/

Page 50: Grass, Fur and all Things Hairy - AMD at GDC14

Questions?

Page 51: Grass, Fur and all Things Hairy - AMD at GDC14

Extras

Page 52: Grass, Fur and all Things Hairy - AMD at GDC14

Isoline Tessellation for hair/fur? 1/2●Isoline tessellation has two tess factors

● First is line density (lines per invocation)● Second is line detail (segments per line)

●In theory provides easy LOD system● Variable line density and detail by increasing both tessellation factors

based on distance

Tess = (1,1) Tess = (2,1) Tess = (2,2) Tess = (2,3) Tess = (3,3)

Page 53: Grass, Fur and all Things Hairy - AMD at GDC14

Isoline Tessellation for hair/fur? 2/2●In practice isoline tessellation is not cost effective for this scenario●Lines are always 1-pixel thick

● Need GS to extrude them into triangles for smooth edges● Major impact on performance!

● Alternative is to enable MSAA● Most engines are deferred so this causes a large performance impact

● No extrusion for smoothing edges and no MSAA = poor quality!

●Bottom line: a pure Vertex Shader solution is faster● LOD benefit is easily done in VS (more on this later)● Curvature is rarely a problem (dependant on vertices/strands at authoring time)

Page 54: Grass, Fur and all Things Hairy - AMD at GDC14

AA, Self-shadowing and Transparency

Basic Rendering Antialiasing Antialiasing

+ Self Shadowing

Antialiasing + Self

Shadowing + Transparency