(or: “dx11 – unpacking the box”) jon jansen - developer technology engineer, nvidia dx11...
TRANSCRIPT
![Page 1: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/1.jpg)
![Page 2: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/2.jpg)
(Or: “DX11 – unpacking the box”)
Jon Jansen - Developer Technology Engineer, NVIDIA
DX11 Performance Gems
![Page 3: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/3.jpg)
DX11 Performance Gems
Topics CoveredCase study: Opacity Mapping– Using tessellation to accelerate
lighting effects– Accelerating up-sampling with
GatherRed()– Playing nice with AA using
SV_SampleIndex– Read-only depth for soft particles
![Page 4: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/4.jpg)
DX11 Performance Gems
Topics Covered (cont)Deferred Contexts
– How DX11 can help you pump the API harder than you thought possible
– Much viscera - very gory!
![Page 5: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/5.jpg)
Not Covered‘Aesthetic’ tessellation
DirectCompute
!!! Great talks on these topics coming up !!!
DX11 Performance Gems
![Page 6: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/6.jpg)
DEFERRED CONTEXTS
DX11 Performance Gems
![Page 7: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/7.jpg)
Deferred Contexts• What are your options when your API
submission thread is a bottleneck?• What if submission could be done on multiple
threads, to take advantage of multi-core?– This is what Deferred Contexts solve in DX11
DX11 Performance Gems
![Page 8: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/8.jpg)
• So why not just submit directly to an API from multiple threads and be done?
Thread 2:
Deferred Contexts
DX11 Performance Gems
D3D:
Thread 1:
![Page 9: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/9.jpg)
• So why not just submit directly to an API from multiple threads and be done?
Thread 2:
Deferred Contexts
DX11 Performance Gems
D3D:
Thread 1:
![Page 10: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/10.jpg)
• So why not just submit directly to an API from multiple threads and be done?
Thread 2:
Deferred Contexts
DX11 Performance Gems
D3D:
Thread 1:
![Page 11: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/11.jpg)
• So why not just submit directly to an API from multiple threads and be done?
Thread 2:
Deferred Contexts
DX11 Performance Gems
D3D:
Thread 1:
![Page 12: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/12.jpg)
• So why not just submit directly to an API from multiple threads and be done?
Thread 2:
Deferred Contexts
DX11 Performance Gems
D3D:
Thread 1:
WHOOPS!!
![Page 13: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/13.jpg)
Deferred Contexts• A Deferred Context is a device-like interface
for building command-lists
DX11 Performance Gems
// Creation is very straightforwardID3D11DeviceContext* pDC = NULL;hr = pD3DDevice->CreateDeferredContext(0,&pDC);
![Page 14: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/14.jpg)
Deferred Contexts• DX11 uses the same ID3D11DeviceContext
interface for ‘immediate’ API calls• Immediate context is the only way to finally
submit work to the GPU• Access it via ID3D11Device::GetImmediateContext()• ID3D11Device has no submission API
DX11 Performance Gems
![Page 15: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/15.jpg)
DX11 Performance Gems
Thread 2:
DC1:
DC2:
Thread 1:
D3DIM
M D
C:
Render submission calls
Render submission calls
![Page 16: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/16.jpg)
DX11 Performance Gems
Thread 2:
DC1:
DC2:
FinishCommandList()
FinishCommandList()
Thread 1:
D3DIM
M D
C:
![Page 17: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/17.jpg)
DX11 Performance Gems
Thread 2:
DC1:
DC2:
FinishCommandList()
FinishCommandList()
Inter-thread sync, collate/order/buffer CL’s
Thread 1:
D3DIM
M D
C:
![Page 18: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/18.jpg)
DX11 Performance Gems
Thread 2:
DC1:
DC2:
FinishCommandList()
FinishCommandList()
Thread 1:
D3DIM
M D
C:
Inter-thread sync, collate/order/buffer CL’s
![Page 19: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/19.jpg)
DX11 Performance Gems
Thread 2:
DC1:
DC2:
FinishCommandList()
FinishCommandList()
Thread 1:
D3DIM
M D
C:
Inter-thread sync, collate/order/buffer CL’s
Start of new CL
![Page 20: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/20.jpg)
...etc
DX11 Performance Gems
Thread 2:
DC1:
DC2:
FinishCommandList()
...etc
...etc
...etc
FinishCommandList()
FinishCommandList() FinishCommandList()
Inter-thread sync, collate/order/buffer CL’s
Thread 1:
D3DIM
M D
C:
![Page 21: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/21.jpg)
...etc
DX11 Performance Gems
Thread 2:
DC1:
DC2:
‘RenderMain’ Thread
D3DFinishCommandList()
...etc
...etc
...etc
FinishCommandList()
FinishCommandList() FinishCommandList()
ExecuteCommandList()
Inter-thread sync, collate/order/buffer CL’s
Thread 1:
IMM
DC
:
![Page 22: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/22.jpg)
...etc
DX11 Performance Gems
Thread 2:
DC1:
DC2:
‘RenderMain’ Thread
D3DFinishCommandList()
ExecuteCommandList()
...etc
...etc
...etc
FinishCommandList()
FinishCommandList() FinishCommandList()
ExecuteCommandList()
Inter-thread sync, collate/order/buffer CL’s
Thread 1:
IMM
DC
:
![Page 23: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/23.jpg)
DX11 Performance Gems
...etcThread 2:
DC1:
DC2:
‘RenderMain’ Thread
D3DFinishCommandList()
ExecuteCommandList()
ExecuteCommandList()
ExecuteCommandList()
...etc
...etc
...etc
FinishCommandList()
FinishCommandList() FinishCommandList()
ExecuteCommandList()
Inter-thread sync, collate/order/buffer CL’s
Thread 1:
IMM
DC
:
![Page 24: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/24.jpg)
Deferred Contexts• Flexible DX11 internals
– DX11 runtime has built-in implementation– BUT: the driver can take charge, and use its own
implementation• e.g. command lists could be built at a lower level,
moving more of the CPU work onto submission threads
DX11 Performance Gems
![Page 25: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/25.jpg)
Deferred Contexts: Perf• Try to balance workload over contexts/threads
– but submission workloads are seldom predictable– granularity helps (if your submission threads are
able to pick up work dynamically)– if possible, do heavier submission workloads first– ~12 CL’s per core, ~1ms per CL is a good target
DX11 Performance Gems
![Page 26: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/26.jpg)
Deferred Contexts: Perf• Target reasonable command list sizes
– think of # of draw calls in a command list much like # triangles in a draw call
– i.e. each list has overhead~equivalent to a few dozen API calls
DX11 Performance Gems
![Page 27: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/27.jpg)
Deferred Contexts: Perf• Leave some free CPU time!
– having all threads busy can cause CPU saturation and prevent “server” thread from rendering*
– ‘busy’ includes busy-waits (i.e. polling)
*this is good general advice: never use more than N-1 CPU coresfor your game engine. Always leave one for the graphics driver
DX11 Performance Gems
![Page 28: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/28.jpg)
Deferred Contexts: Perf• Mind your memory!
– each Map() call associates memory with the CL– releasing the CL is the only way to release the
memory– could get tight in a 2GB virtual address space!
DX11 Performance Gems
![Page 29: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/29.jpg)
Deferred Contexts: wrapping up• DC’s + multi-core = pump the API HARD• Real-world specifics coming up in Dan’s talk
CALL TO ACTION: If you wish you could submit more batches and you’re not already using DC’s, then experiment!
DX11 Performance Gems
![Page 30: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/30.jpg)
CASE STUDY: OPACITY MAPPING
DX11 Performance Gems
![Page 31: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/31.jpg)
Case study: Opacity Mapping
DX11 Performance Gems
![Page 32: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/32.jpg)
Case study: Opacity MappingGOALS:
– Plausible lighting for a game-typical particle system (16K largish translucent billboards)
– Self-shadowing (using opacity mapping)• Also receive shadows from opaque objects
– 3 light sources (all with shadows)
DX11 Performance Gems
![Page 33: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/33.jpg)
Case study: Opacity Mapping
DX11 Performance Gems
+ opacity mapping*
=
*e.g. [Jansen & Bavoil, 2010]
![Page 34: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/34.jpg)
Case study: Opacity Mapping• Brute force (per-pixel lighting/shadowing) is
not performant– 5 to 10 FPS on GTX 560 Ti* or HD 6950*
• Not surprising considering amount of overdraw...
DX11 Performance Gems
*1680x1050, 4xAA
![Page 35: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/35.jpg)
Case study: Opacity Mapping
DX11 Performance Gems
![Page 36: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/36.jpg)
Case study: Opacity Mapping– Vertex lighting? Faster, but shows significant delta
from ‘ground truth’ of per-pixel lighting...
DX11 Performance Gems
PS lighting VS lighting
5 to 10 FPS 60 to 65 FPS
![Page 37: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/37.jpg)
Case study: Opacity Mapping• Use DX11 tessellation to calculate lighting at
an intermediate ‘sweet-spot’ rate in the DS• High-frequency components can remain at
per-pixel or per-sample rates, as required– opacity– visibility
DX11 Performance Gems
![Page 38: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/38.jpg)
Case study: Opacity Mapping
DX11 Performance Gems
Surface placement
Light attenuationOpaque shadowsOpacity shadows
Visibility
VSrate
PSrate
samplerate
Texturing
PS lighting
![Page 39: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/39.jpg)
Case study: Opacity MappingPS lighting DS lighting
DX11 Performance Gems
Surface placement
Light attenuationOpaque shadowsOpacity shadows
Visibility
DSrate
VSrate
PSrate
samplerate
VSrate
PSrate
samplerate
Texturing
![Page 40: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/40.jpg)
Case study: Opacity Mapping
DX11 Performance Gems
DS lightingPS lighting (VS lighting)
5 to 10 FPS 60 to 65 FPS 40 to 45 FPS
![Page 41: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/41.jpg)
Case study: Opacity Mapping• Adaptive tessellation gives best of both worlds
– VS-like calculation frequency– PS-like relationship with screen pixel frequency
• 1:15 works well in this case
• Applicable to any slowly-varying shading result– GI, other volumetric algos
DX11 Performance Gems
![Page 42: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/42.jpg)
Case study: Opacity Mapping• Main bottleneck is fill-rate following tess-opt• So... render particles to low-res offscreen buffer*
– significant benefit, even with tess opt (1.2x to 1.5x for GTX 560 Ti / HD 6950)
– BUT: simple bilinear up-sampling from low-res can lead to artifacts at edges...
DX11 Performance Gems *[Cantlay, 2007]
![Page 43: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/43.jpg)
DX11 Performance Gems
Case study: Opacity MappingGround truth (full res) Bilinear up-sample (half-res)
![Page 44: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/44.jpg)
Case study: Opacity Mapping• Instead, we use nearest-depth up-sampling*
– conceptually similar to cross-bilateral filtering**– compares high-res depth with neighbouring low-
res depths– samples from closest matching neighbour at depth
discontinuities (bilinear otherwise)
DX11 Performance Gems
*[Bavoil, 2010] **[Eisemann & Durand, 2004] [Petschnigg et al, 2004]
![Page 45: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/45.jpg)
Case study: Opacity Mapping
DX11 Performance Gems Near-Z
Far-Z
full-respixel
lo-resneighbours
Z10(@UV10)
ZFull(@UV)
if( abs(Z00-ZFull) < kDepthThreshold && abs(Z10-ZFull) < kDepthThreshold && abs(Z01-ZFull) < kDepthThreshold && abs(Z11-ZFull) < kDepthThreshold ){ return loResColTex.Sample(sBilin,UV);} else{ return loResColTex.Sample(sPoint,NearestUV);}
Z00(@UV00)
![Page 46: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/46.jpg)
Case study: Opacity Mapping
DX11 Performance Gems Near-Z
Far-Z if( abs(Z00-ZFull) < kDepthThreshold && abs(Z10-ZFull) < kDepthThreshold && abs(Z01-ZFull) < kDepthThreshold && abs(Z11-ZFull) < kDepthThreshold ){ return loResColTex.Sample(sBilin,UV);} else{ return loResColTex.Sample(sPoint,NearestUV);}
NearestUV = UV10
Z10(@UV10)
ZFull(@UV)
Z00(@UV00)
![Page 47: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/47.jpg)
DX11 Performance Gems
Case study: Opacity MappingGround truth (full res) Nearest-depth up-sample
![Page 48: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/48.jpg)
Case study: Opacity Mapping• Use SM5 GatherRed() to efficiently fetch 2x2 low-res
depth neighbourhood in one go
DX11 Performance Gems
float4 zg = g_DepthTex.GatherRed(g_Sampler,UV);float z00 = zg.w; // w: floor(uv)float z10 = zg.z; // z: ceil(u),floor(v)float z01 = zg.x; // x: floor(u),ceil(v)float z11 = zg.y; // y: ceil(uv)
![Page 49: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/49.jpg)
Case study: Opacity Mapping• Nearest-depth up-sampling plays nice with AA
when run per-sample– and surprisingly performant! (FPS hit < 5%)
DX11 Performance Gems
float4 UpsamplePS( VS_OUTPUT In,uint uSID :
SV_SampleIndex) : SV_Target
![Page 50: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/50.jpg)
Case study: Opacity Mapping• Soft particles (depth-based alpha fade)
– requires read from scene depth– for < DX11, this used to mean...
• EITHER: sacrificing depth-test (along with any associated acceleration)
• OR: maintaining two depth surfaces (along with any copying required)
DX11 Performance Gems
![Page 51: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/51.jpg)
DX11 solution:
Case study: Opacity Mapping
DX11 Performance Gems
depthtexture
‘traditional’DSV
depth textureSRVCreateShaderResourceView()
CreateDepthStencilView()
DX11 read-only DSV
CreateDepthStencilView()+ D3D11_DSV_READ_ONLY_DEPTH
NEW!!!in DX11
![Page 52: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/52.jpg)
STEP 1: render opaque objects to depth texture
Case study: Opacity Mapping
DX11 Performance Gems
pDC->OMSetRenderTargets(...)
// Render opaque objects
depth textureSRV
DX11 read-only DSV
‘traditional’DSV
![Page 53: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/53.jpg)
STEP 2: render soft particles with depth-test
Case study: Opacity Mapping
DX11 Performance Gems
depth textureSRV
DX11 read-only DSV
pDC->OMSetRenderTargets(...)pDC->PSSetShaderResources(...)// (Valid D3D state!)
// Render soft particles
‘traditional’DSV
![Page 54: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/54.jpg)
Case study: Opacity Mapping‘Hard’ particles Soft particles
DX11 Performance Gems
![Page 55: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/55.jpg)
Case study: wrapping up• 5x to 10x overall speedup• DX11 tessellation gave us most of it• But rendering at reduced-res alleviates fill-rate
and lets tessellation shine thru• GatherRed() and RO DSV also saved cycles
DX11 Performance Gems
![Page 56: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/56.jpg)
Case study: wrapping up
CALL TO ACTION: Go light some particles!DX11 Performance Gems
![Page 57: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/57.jpg)
End of tour!• Questions?
jjansen at nvidia dot com
DX11 Performance Gems
![Page 58: (Or: “DX11 – unpacking the box”) Jon Jansen - Developer Technology Engineer, NVIDIA DX11 Performance Gems](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649cdf5503460f949a88d9/html5/thumbnails/58.jpg)
ReferencesBAVOIL, L. 2010. Modern Real-Time Rendering Techniques. From Future Game On
Conference, September 2010.CANTLAY, I. 2007. High-speed, off-screen particles. In GPU Gems 3. 513-528EISEMANN, E., & DURAND, F. 2004. Flash Photography Enhancement via Intrinsic
Relighting. In ACM Trans. Graph. (SIGGRAPH) 23, 3, 673–678.JANSEN, J., AND BAVOIL, L. 2010. Fourier opacity mapping. In Proceedings of the
Symposium on Interactive 3D Graphics and Games, 165–172.PETSCHNIGG, G., AGRAWALA, M., HOPPE, H., SZELISKI, R., COHEN, M., & TOYAMA, K.
2004. Digital photography with flash and no-flash image pairs. In ACM Trans. Graph. (SIGGRAPH) 23, 3, 664–672.
DX11 Performance Gems