cross platform development best practices matt lee, kev gee microsoft game technology group
TRANSCRIPT
Cross Platform Development Best PracticesCross Platform Development Best Practices
Matt Lee, Kev GeeMicrosoft Game Technology Group
AgendaAgenda
Code ConsiderationsCPU Considerations
GPU Considerations
IO Considerations
Content ConsiderationsData Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
Compiler ComparisonCompiler Comparison
VS 2005 front end used for both platforms
Preprocessor benefits both platforms
Debugger experience is the same
Full 2005 IDE support coming
Xbox 360 optimizing back end added with XDK installSingle solution / MSBuild file can target both platforms
PC CPUsPC CPUsIntel Pentium D / AMD Athlon64 X2Programming Model
2 Cores running @ around 3.20 GHz
12-KB Execution trace cache
16-KB L1 cache, 1 MB L2 cache
Deep Branch Prediction
Dynamic data flow analysis
Speculative Execution
Little-endian byte ordering
SIMD instructions
Quad Core announced for early 2007
360 Custom CPU360 Custom CPU
Custom IBM Processor3 64-bit PowerPC cores running at 3.2 GHz
Two hardware threads per core 32-KB L1 instruction cache & data cache, per core
Shared 1-MB L2 cache128-byte cache lines on all caches
Big-endian byte orderingVMX 128 SIMDLots of Registers
Performance ToolsPerformance Tools
Profiling approaches are very similar between PC and Xbox 360PIX for Xbox 360 & PIX for Windows
Being developed by the same team now
Use instrumented tools on Xbox 360XbPerfView / Tracedump
Xbox 360 does not have a sampling profiler yetUse PC profiling tools
Intel VTune / AMD Code Analyst / VS Team System ProfilerAttend the Performance Hands on training!
Focus Your EffortsFocus Your Efforts
Use performance tools to guide work
Areas where we have seen platform specific efforts reap rewards
Single Data Pass engine design
High Frequency Game API LayersUse your profiler tools to target the hot spots
Math Library - Bespoke vs XGMath vs D3DXMath
Impact on Code DesignImpact on Code Design
Designing Cross platform APIsUse of virtual Functions
Parameter passing mechanismsPass by reference vs. pass by value
Typedef vector types and intrinsics Math Library Design Case Study
Use of inlining
Use of Virtual FunctionsUse of Virtual FunctionsBe careful when using virtual functions to hide platform differences
Virtual function performance on Xbox 360
Adds branch instruction which is always mispredicted!
Compiler limited in optimizing these
Make a concrete implementation for Xbox 360
Avoid virtual functions in inner loops
Cross Platform Render ExampleCross Platform Render Example
IRenderSystem
Semi-Abstract Base Class
D3D9
OverridesVirtual Base
Xbox 360
Concrete Implementation
D3D10
OverridesVirtual Base
Cross Platform Render Example (ctd.)Cross Platform Render Example (ctd.)class IRenderSystem
{
……
public:
#if !defined(_XBOX)
virtual void Draw()=0;
#else
void Draw();
#endif
};
void IRenderSystem::Draw()
{
// 360 Implementation
……
}
D3D9 & D3D10 implementations subclass for specialization
Beware Big ConstructorsBeware Big ConstructorsCtors can dominate execution time
Ctors often hidden to casual observerCopy ctors add objects to containers
Arrays of C++ objects are constructed
Overloaded operators may construct temporaries
Consider: should ctor init data?Example: matrix class zeroing all data
Prefer array initialization = { … }
InliningInlining
Careful inlining is in general a Good Thing
Plan to spend time ensuring the compiler is inlining the right stuff
Use Perf Tools such as VTune / Trace recorder
Try the “inline any suitable” option
Enable link-time code generation
Consider profile-guided optimization
Use __forceinline only where necessary
Consider Passing Native Types by ValueConsider Passing Native Types by Value
Xbox 360 has large registers
64 bit Native PC does too
Pass and return these types by valueint, __int64, float
Consider these types if targeting SSE / VMX__m128 / __vector4, XMVECTOR, XMMATRIX
Pass structs by pointer or reference
Help the compiler using _restrict
Math Library Header (Xbox 360)Math Library Header (Xbox 360)#if defined( _XBOX )
#include <ppcintrinsics.h>#include <vectorintrinsics.h>
typedef __vector4 XVECTOR;
typedef const XVECTOR XVECTOR_PARAM;typedef XVECTOR& XVECTOR_OUTPARAM;
#define XMATHAPI inline
#define VMX128_INTRINSICS
#endif
Pass by value
Math Library Header (Windows)Math Library Header (Windows)#if defined( _WIN32 )
#include <xmmintrin.h>
typedef __m128 XVECTOR;
typedef const XVECTOR& XVECTOR_PARAM;typedef XVECTOR& XVECTOR_OUTPARAM;
#define XMATHAPI inline
#define SSE_INTRINSICS
#endif
Pass by reference
Math Library FunctionMath Library FunctionXVECTOR XMATHAPI XVectorAdd( XVECTOR_PARAM vA,
XVECTOR_PARAM vB )
{
#if defined( VMX128_INTRINSICS )
return __vaddfp( vA, vB );
#elif defined( SSE_INTRINSICS )
return _mm_add_ps( vA, vB );
#endif
}
ThreadingThreadingWhy Multithread?
Necessary to take full advantage of modern CPUs
Attend the Multi-threading talk later todayCovers synchronization prims and lockless sync methods
See Also: Talks from Intel and AMD (GDC2005 / GDC-E)
OpenMP – C, not C++, useful in limited circumstances
Concur – C++, see
http://microsoft.sitestream.com/PDC05/TLN/TLN309_files/Default.htm#nopreload=1&autostart=1
D3D Architectural DifferencesD3D Architectural Differences
D3D9 draw call cost is higher on Windows than on Xbox 360
360 is optimized for a Single GPU target
D3D10 improves draw call cost by design on Windows
Very important to carefully manage the number of batches submitted
This can have an impact on content creation
This work will help with 360 performance too
AgendaAgenda
Code ConsiderationsCPU Considerations
GPU Considerations
IO Considerations
Content ConsiderationsData Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
PC GPUsPC GPUs
Wide variety of available Direct3D9 H/WCAPs and Shader Models abstract over feature differences
GPUs that are approximately equivalent performance to the Xbox 360 GPU
ATi X1900 / NVidia 7800 GTX
Shader Model 3.0 support
Direct3D10 Standardizes feature setH/W Scales on performance instead
Xbox 360 Custom GPUXbox 360 Custom GPUDirect3D 9.0+ compatibleHigh-Level Shader Language (HLSL) 3.0+ support10 MB Embedded DRAM
Frame Buffer with 256 GB/sec bandwidth
Hardware scaling for display resolution matching48 shader ALUs shared between pixel and vertex shading (unified shaders)
Up to 8 simultaneous contexts (threads) in-flight at onceChanging shaders or render state can be cheap, since a new context can be started up easily
Hardware tesselatorN-patches, triangular patches, and rectangular patchesFor non continuous / adaptive cases trade memory for this feature on PC systems
Explicit Resolve ControlExplicit Resolve Control
Copies surface data from EDRAM to a texture in system memoryRequired for render-to-texture and presentation to the screen
Can perform MSAA sample averaging or resolve individual samplesCan perform format conversions and biasingCannot do rescaling or resampling of any kind
This can Impact your Xbox 360 engine design as it adds an extra step to common operations.
AgendaAgenda
Code ConsiderationsCPU Considerations
GPU Considerations
IO Considerations
Content ConsiderationsGeometry
Textures
Shaders
Audio data
Use Native File I/O RoutinesUse Native File I/O Routines
Only native routines support key features:
Asynchronous I/O
Completion routines
Prefer CreateFile and ReadFileGuaranteed as fast or faster than any other alternatives
Avoid fopen, fread, C++ iostreams
Use Asynchronous File I/OUse Asynchronous File I/OFile read/write operations block by default
Async operations allows the game to do other interesting workCreateFile with FILE_FLAG_OVERLAPPED
Use FILE_FLAG_NO_BUFFERING, tooGuarantees no intermediate buffering
Use OVERLAPPED struct to determine when operation is complete
See CreateFile docs for details
Memory Mapped File I/OMemory Mapped File I/O
Fastest way to load data on WindowsHowever, the 32 bit address space is getting tight
This is a great 64 bit feature add!
Memory Mapped I/O not supported on 360
No HDD backed Virtual Memory management system
XInput is the same API for Xbox 360 and WindowsThe Microsoft universal controller is a reference design which can be leveraged by other hardware manufacturersXP Driver available from Windows Update
Support is built in to Xbox 360 and Windows Vista
Universal Gaming ControllerUniversal Gaming Controller
AgendaAgenda
Code ConsiderationsCPU Considerations
GPU Considerations
IO Considerations
Content ConsiderationsData Build System
Geometry Formats
Texture Formats
Shaders
Audio Considerations
Data Build SystemData Build SystemAdd a data build / processing phase to your production system
Compile, optimize and compress data according to multiple target platform requirements
Easier and faster to handle endian-ness and other format conversions offline
Data packing process can occur here too
Invest time in making the build fast Artists need to rapidly iterate to make quality content
Incremental builds can really help reduce the buildtime
Try the XNA build tools Copies of XNA build CTP are available NOW!
Geometry CompressionGeometry Compression
Offline Compression of Geometry Provides wins across all platformsDisk I/O wins as well as GPU wins
The compression approach is likely to be target specificPC is usually a superset of the consoles in this area
D3D9 CAPs / limitations to consider16 bit Normals - D3DDECLTYPE_FLOAT16_2
Compressing TexturesCompressing TexturesWide variety of Texture Compression Tools
ATI Compressinator
DirectX SDK DDS tools
NVIDIA – Photoshop DDS Export
Compression tools for 360 (xgraphics.lib)Supports endian swap of texture formats
Build your own too!Make them fit your content.
Texture FormatsTexture FormatsDXT* / DXGI_FORMAT_BC*
BC == Block Compressed
Standard DXT* formats across all platforms
DXN / DXGI_FORMAT_BC5 / BC5u
2-component format with 8 bits of precision per component
Great for normal maps
DXT3A / DXT5A
Single component textures made from a DXT3/DXT5 alpha block
4 bits of precision
Xbox 360 / D3D9 Only
Texture ArraysTexture ArraysTexture arrays
generalized version of cube maps
D3D9 emulate using a texture atlasXbox 360
Up to 64 surfaces within a texture, optional MIPmaps for each surface
Surface is indexed with a [0..1] z coordinate in a 3D texture fetch
D3D10 supports this as a standard featureUp to 512 surfaces within a textureBindable as rendertarget, with per-primitive array index selection
Custom Vertex Fetch / Vertex TextureCustom Vertex Fetch / Vertex Texture
D3D9 Vertex Texture implementations use intrinsics
tex2dlod()
360 supports explicit instructions for this
D3D10 supports this as a standard feature
Load() from buffer (VB, IB, etc.) at any stage
Sample() from texture at any stage
EffectsEffects
D3DX FX and FX Lite co-exist easily#define around the texture sampler differences
Preshaders are not supported on FX Lite
We advise that these should be optimized to native code for D3D9 Effects
HLSL DevelopmentHLSL DevelopmentSet up your engine and tools for rapid shader development and iterationCompile shaders offline for performance,
maybe allow run-time recompilation during development
Be careful with shader generation tools
Perf needs to be consideredSchedule / Plan work for this
Cross-Platform HLSL ConsiderationCross-Platform HLSL ConsiderationTexture access instruction considerationsXbox 360 has native tfetch / getWeights features
Constant texel offsets (-8.0 to 7.5 in 0.5 increments)Independent of texture size
Direct3D 10 supports integer texture offsets when fetching
Direct3D 10 supports getdimensions() natively
Equivalent to getWeights
Direct3D 9 can emulate tfetch & getWeights behavior using a shader constant for texture dimensions
HLSL ExampleHLSL Examplefloat2 g_invTexSize = float2( 1/512.0f, 1/512.0f);
float2 getWeights2D( float2 texCoord ) {
return frac( texCoord / g_invTexSize );
}
float4 tex2DOffset( sampler t, float2 texCoord, float2 offset )
{
texCoord += offset * g_invTexSize;
return tex2D( t, texCoord );
}
Shader managementShader managementFind a balance between übershaders and specialized shader libraries
Dynamic/static branching versus static compilation
Small shader libraries can be built and stored inside a single Effect file
One technique per shader configuration
Larger shader librariesHash table populated with configurations
Streaming code can load could shader groups on demand
Profile-guided content generation
Avoid compiling shaders at run time
Compiled shaders compress very well
Audio ConsiderationsAudio ConsiderationsXACT (Microsoft Cross-Platform Audio Creation Tool)
API and authoring tool parity: author once, deploy to both platforms
Primary difference = wave compression ADPCM on Windows vs. Xbox 360 native XMA support
XMA: controllable quality setting (varies, typically ~6-14:1)
ADPCM: Static ~3.5:1 compression
Likely need to trade memory for bit rate.
On Windows, can use hard disk streaming to balance lower compression rates if needed
Call To Action!Call To Action!Design your games, engines and production systems with cross platform development in mind
(PC / Xbox 360 / Other)
Invest in making your data build system fastTake advantage of each platforms strengths
Target a D3D10 content design point and fallback to D3D9+, D3D9, …
Provide feedback on how we can make production easierAttend the XACT, HLSL, SM4.0 and Performance Hands On Labs
Questions?Questions?