practical parallel rendering with directx 9 and 10 -...
TRANSCRIPT
![Page 1: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/1.jpg)
Practical Parallel Rendering with DirectX 9 and 10
Windows PC Command Buffers
Vincent Scheib
Architect, Gamebryo
Emergent Game Technologies
![Page 2: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/2.jpg)
Foundational technology, over 200 shipped titles,
more than 13 genres, and multiple platforms.
TITLES
Munch's Oddysee
Sid Meier's Pirates!
Barbie Digital
Makeover
Dark Age of Camelot
Futurama
Tetris Worlds
Crash Racing
Elder Scrolls
Sim Patient
Zero Cup Soccer
Civilization 4
PLATFORMS
PC
Xbox 360
PS3
WiiXbox
PS2
GC
GENRES
Action
Adventure
Family
MMO
Platformer
PuzzleRacing
RPG
Vis / Sim
Sports
Strategy
![Page 3: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/3.jpg)
Customers
![Page 4: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/4.jpg)
Introduction
• Take advantage of multiple cores with parallel rendering
• Performance should scale by number of cores
0
1
2
3
4
Single
Core
Dual
Core
- Quad
Core
Pe
rfo
rma
nce
Ra
tio
Observed data
from this project,
details follow
![Page 5: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/5.jpg)
Presentation Outline
• Motivation and problem definition
• Command buffers
– Requirements
– Implementation
– Handling effects and resources
• Application models
• Integrating to existing code
• Prototype results
• Future work
![Page 6: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/6.jpg)
Motivation
• Take advantage of multi core machines
– 40% machines have 2+ physical CPUs (steamJul08)
• Rendering can have high CPU cost
• Direct3D 11 display lists coming, but want
support for Direct3D 9 and 10 now
– Currently 81% DX9 HW, 9% DX10 HW (steamJul08)
– Rough DX9 HW forecast: 2011 ~30% (emergent)
– Asia HW trends lag somewhat
![Page 7: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/7.jpg)
Multithreaded DX Device?
• DirectX 9 and 10 primarily designed for
single-threaded game architectures
• Multithreaded mode incurs overhead
– Cuts FPS roughly in half on DX9
for a CPU render call bound application
• DX is Stateful
– Requires additional synchronization for parallel
rendering
![Page 8: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/8.jpg)
Ideal Scenario
• One thread per hardware thread
• Application manages dispatching work to multiple
threads
• Rendering data completely prepared, ready to be
sent to single-threaded D3D device
– Function calls, conditionals, and final matrix multiplies are
wasted time on a D3D device thread
![Page 9: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/9.jpg)
Reality
• Update()
– Seldomly generates coherent data in API specific format.
• Render()
– Some work done between calls to DirectX API
![Page 10: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/10.jpg)
Going Wide
Update
Render
Main
Thread
Worker
Thread
Worker
Thread
Worker
Thread
![Page 11: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/11.jpg)
Command Buffers
• Record calls to D3D
– Store in a command buffer
– Can be done concurrently on multiple threads, to multiple
command buffers
• Playback D3D commands
– Efficiently on main thread
– Exact data for DX API
– Coherent in memory
• Clean and modular point to
integrate to application
![Page 12: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/12.jpg)
Command Buffer Requirements
• Minimal modifications to rendering code
– Most code uses pointer to D3DDevice
– Parameters from stack, e.g., D3DRECT
– Support most of the device API
• Draw calls, setting state, constants, shaders, textures,
stream source, and so on
– Support effects
• Playback does not modify buffer
• Playback is ideal performance
![Page 13: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/13.jpg)
Command Buffer Allowances
• No support for:
– Create methods
– Get methods
– Miscellaneous other functions that return values
• QueryInterface, ShowCursor
![Page 14: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/14.jpg)
Command Buffer: Nice to Have
• Buffers played back multiple times
• Optimization of buffers
– Remove redundant state calls
• Offload main thread by doing this on recorder threads
– Reordering of sort independent draw calls
![Page 15: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/15.jpg)
Design: Recording
• Wrap every API call
– Unsupported calls, return error
– Supported calls
• Store enumeration for call into buffer
• Store parameters into buffer
• Make copies of non-reference counted objects such as
D3DMATRIX, D3DRECT, shader constants, and so on
![Page 16: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/16.jpg)
Design: Playback
• Playback, read from buffer, and
– select function call pointer from table given token
– each playback function unpacks parameters buffer
![Page 17: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/17.jpg)
Recording Example
virtual HRESULT STDMETHODCALLTYPE DrawPrimitive(
D3DPRIMITIVETYPE PrimitiveType,
UINT StartVertex,
UINT PrimitiveCount)
{
m_pCommandBuffer->Put(CBD3D_COMMANDS::DrawPrimitive);
m_pCommandBuffer->Put(PrimitiveType);
m_pCommandBuffer->Put(StartVertex);
m_pCommandBuffer->Put(PrimitiveCount);
return D3D_OK;
}
![Page 18: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/18.jpg)
Playback Example
void CBPlayer9::DoDrawPrimitive()
{
D3DPRIMITIVETYPE arg1;
m_pCommandBuffer->Get(&arg1);
UINT arg2;
m_pCommandBuffer->Get(&arg2);
UINT arg3;
m_pCommandBuffer->Get(&arg3);
if(FAILED(m_pDevice->DrawPrimitive(arg1, arg2, arg3)))
OutputDebugStringA(__FUNCTION__ " failed in playback\n");
}
![Page 19: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/19.jpg)
Effects: Problem
• Effect takes pointer to device at creation
• Effect then creates resources
• At render, effect should use our recorder
• Our recording device cannot create
resources
![Page 20: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/20.jpg)
Effects: Solutions
1. Create FX with command buffer device
• Fails: needs real device for initialization
2. Wrap and record FX calls and play them back
• Inefficient
3. Give FX EffectStateManager class to redirect calls to
command buffer, give it real device for initialization
• Disables FX use of state blocks
4. Create redirecting device
• Acts as real device at init, command buffer device at render time
![Page 21: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/21.jpg)
Resource Management
• Multiple threads wish to:
– Create resources (e.g., background loading)
– Update resources (e.g., dynamic geometry)
• App must use playback thread only to modify
resources
– App specific logic
• Deferred creation, double buffering
– Support in command buffers (next slide)
![Page 22: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/22.jpg)
Resource Management (2)
• Command buffer library could encapsulate details
– (This is Future Work)
• Gamebryo Volatile Type Buffers
– D3DUSAGE: WRITEONLY | DYNAMIC
D3DLOCK: NOOVERWRITE, DISCARD
– Lock() is stored into command buffer
– Memory allocated from command buffer, returned from Lock()
– At playback, true lock is performed
• Gamebryo Mutable Type Buffers:
– CPU read and infrequent access
– Backing store required, copied on each Lock()
![Page 23: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/23.jpg)
Implementation Considerations
• Ease of changing implementation
– Macros provide implementation
– Preprocessor & Beautifier produce debuggable code
– Many macro permutations required (~40) for different
argument count and return type
• Generated from Excel
– Function overloading to store non ref counted parameters
• Everything but shader constants then stored with same function
signature.
![Page 24: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/24.jpg)
Application Models
• Command buffers can be used in various
ways by applications
– Fork and join
– Fork and join, frame deferred
– Work queue
– …
• Record once, play back several times
![Page 25: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/25.jpg)
Fork & Join
…Update
Render
…
Main
Thread
Worker
Thread
Signal threads to start
Record command buffer
Wait for signal
Signal command buffer completeWait for command buffers
Record command buffer
Playback command buffers Starve!
![Page 26: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/26.jpg)
Fork & Join, Frame Deferred
…Update
Render
…
Main
Thread
Worker
Thread
Signal threads to start
Record command buffer
Wait for signal
Signal command buffer complete
Record command buffer
Playback command buffers
Update… …
Next Frame
![Page 27: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/27.jpg)
Work Queue
Update
Pla
y
Record
Main
Thread
Worker
Thread
Update
Pla
y
Record
Record U
pdate
![Page 28: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/28.jpg)
Adapting to an Existing Codebase
• Refactor code to take pointer to device that can be
changed easily
– Easy if pointer passed on stack
– Thread local storage if used from heap
• Add ownership of recording devices, playback
class, and pool of command buffers
• Determine application model, and add high-level
logic to parcel out rendering work.
• Manage resources over
recording and playback
![Page 29: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/29.jpg)
Integration into DX Samples
• Instancing
– Effects, shader constants
• Textures tutorial
– Simple, added multithreading
• Stress test
– Fork and join multithreading, with optional:
• Frame delay of playback
• Draw call count
• CPU and memory access
• Recorder thread count
![Page 30: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/30.jpg)
Stress Test Information
• Render call contains:
– Matrices computed with D3DX calls * 3
– SetTransform * 3
– SetRenderState
– SetTexture
– SetTextureStageState * 8
– SetStreamSource
– SetFVF
– DrawPrimitive
![Page 31: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/31.jpg)
CPU Busy Loops
• Draw call CPU cost varies in real applications
• Stress test simulates cost with CPU Busy Loops
– Scattered reads from a large buffer in memory
– Perform some logic, integer, and floating point operations
• Gamebryo render on DX9: 100-200 μs
• (on a Pentium 4, 3 GHz, nVidia 7800)
• Stress test can simulate Gamebryo render calls with
0-200 loops.
![Page 32: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/32.jpg)
DX Sample Stress Test Demo
![Page 33: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/33.jpg)
DX Call Cost vs. Recorder Cost
• Render call cost with DirectX device is
13 times as expensive as command buffer
recorder
– DX: 92μs
– Recorder: 7μs
• (on a Pentium 4, 3 GHz, nVidia 7800)
![Page 34: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/34.jpg)
Thread Profiler Quadcore
1 Recorder Thread
• CPU Busy Loops: 110
PlaybackRecord
![Page 35: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/35.jpg)
Thread Profiler Quadcore
4 Recorder Threads
![Page 36: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/36.jpg)
FPS by Threads and Computer
0
10
20
30
40
50
60
70
0 1 2 3 4 5
2 - XP-A - Intel G965
Express
2 - XP-A - NVIDIA GeForce
7800 GTX
2 - XP-B - NVIDIA GeForce
8800 GTS 512
4 - XP-C - NVIDIA GeForce
8800 GT
4 - Vista-A - NVIDIA
GeForce 8800 GTX
CPU Busy Loops 150 DrawPrimitives 1936
Sum of FPS
Threads
Cores
Computer
GPU
![Page 37: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/37.jpg)
Definition: Performance Ratio
• Charts that follow use
Performance Ratio = FPS test / FPS baseline
• Normalized result
• Useful for comparisons while varying
– Number of draw calls
– CPU busy loops
![Page 38: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/38.jpg)
Perf by Threads & Busy Loops
0
0.5
1
1.5
2
2.5
3
3.5
4
0 1 2 3 4 5
0
50
100
150
200
250
Computer XP-C Cores 4 Draw Primitives 1936
Average of FPSPerfRatio
Threads
CPU Busy Loops
![Page 39: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/39.jpg)
Perf by Threads & Busy Loops
0
0.5
1
1.5
2
2.5
3
3.5
4
0 1 2 3 4 5
100 - 0
100 - 50
100 - 100
100 - 150
100 - 200
100 - 250
196 - 0
196 - 50
196 - 100
196 - 150
196 - 200
196 - 250
289 - 0
289 - 50
289 - 100
289 - 150
289 - 200
289 - 250
400 - 0
Computer XP-C Cores 4
Average of FPSPerfRatio
Threads
Draw Primitives
CPU Busy Loops
![Page 40: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/40.jpg)
Perf by Draws & Busy Loops
0
0.5
1
1.5
2
2.5
3
3.5
4
100
196
289
400
484
576
676
784
900
961
1089
1156
1296
1369
1444
1600
1681
1764
1849
1936
0
50
100
150
200
250
Computer XP-C Cores 4 Threads 4
Average of FPSPerfRatio
DrawPrimitives
CPU Busy Loops
![Page 41: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/41.jpg)
Perf by Busy Loops & Draws
0
0.5
1
1.5
2
2.5
3
3.5
4
0 50 100 150 200 250
100
400
784
1156
1600
1936
Computer XP-C Cores 4 Threads 4
Average of FPSPerfRatio
CPU Busy Loops
DrawPrimitives
![Page 42: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/42.jpg)
Dual Core Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 1 2 3
Intel G965 Express - 50
Intel G965 Express - 150
NVIDIA GeForce 7800 GTX - 50
NVIDIA GeForce 7800 GTX - 150
NVIDIA GeForce 8800 GTS 512 - 50
NVIDIA GeForce 8800 GTS 512 - 150
Cores 2 DrawPrimitives 1936
Average of FPSPerfRatio
Threads
GPU
CPU Busy Loops
![Page 43: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/43.jpg)
Future Work
• Resource management facilitated through
command buffer, instead of application logic
• Optimization of command buffers by reordering
order independent draw calls
• DirectX10
![Page 44: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/44.jpg)
Open Source Library
• Emergent has open sourced the command buffer
library
– Command buffer serialization
– Recording device
– Playback class
– Redirecting device
– EffectStateManager
– DX9 only so far
![Page 45: Practical Parallel Rendering with DirectX 9 and 10 - AMDdeveloper.amd.com/.../2012/10/S2008-Scheib-ParallelRenderingSiggr… · Practical Parallel Rendering with DirectX 9 and 10](https://reader033.vdocuments.us/reader033/viewer/2022051604/5ff8166569d40830c541a59a/html5/thumbnails/45.jpg)
Thank You. Questions?
– Co-Developer: Bo Wilson
• For code & presentation
Google: parallel rendering scheib