common c++ performance mistakes in games pete isensee xbox advanced technology group
TRANSCRIPT
![Page 1: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/1.jpg)
Common C++ Performance Mistakes in Games
Pete IsenseeXbox Advanced Technology Group
![Page 2: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/2.jpg)
About the Data• ATG reviews code to find
bottlenecks and make perf recommendations– 50 titles per year– 96% use C++– 1 in 3 use “advanced” features like
templates or generics
![Page 3: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/3.jpg)
Why This Talk Is Important• The majority of Xbox games are
CPU bound• The CPU bottleneck is often a
language or C++ library issue• These issues are not usually
specific to the platform
![Page 4: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/4.jpg)
Format• Definition of the problem• Examples• Recommendation• For reference
– A frame is 17 or 33 ms (60fps / 30fps)– Bottlenecks given in ms per frame
![Page 5: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/5.jpg)
Issue: STL• Game using std::list
– Adding ~20,000 objects every frame– Rebuilding the list every frame– Time spent: 6.5 ms/frame!– ~156K overhead (2 pointers per
node)– Objects spread all over the heap
![Page 6: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/6.jpg)
std::set and map• Many games use set/map as
sorted lists• Inserts are slow (log(N))• Memory overhead: 3 ptrs + color• Worst case in game: 3.8 ms/frame
![Page 7: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/7.jpg)
std::vector• Hundreds of push_back()s per
frame• VS7.1 expands vector by 50%• Question: How many reallocations
for 100 push_back()s?• Answer: 13!
(1,2,3,4,5,7,10,14,20,29,43,64,95)
![Page 8: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/8.jpg)
Clearly, the STL is Evil
![Page 9: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/9.jpg)
Use the Right Tool for the Job
• The STL is powerful, but it’s not free
• Filling any container is expensive• Be aware of container overhead• Be aware of heap fragmentation
and cache coherency• Prefer vector, vector::reserve()
![Page 10: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/10.jpg)
The STL is Evil, Sometimes• The STL doesn’t solve every problem• The STL solves some problems poorly• Sometimes good old C-arrays are the
perfect container• Mike Abrash puts it well:
– “The best optimizer is between your ears”
![Page 11: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/11.jpg)
Issue: NIH Syndrome• Example: Custom binary tree
– Sorted list of transparent objects– Badly unbalanced– 1 ms/frame to add only 400 items
• Example: Custom dynamic array class– Poorer performance than std::vector– Fewer features
![Page 12: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/12.jpg)
Optimizations that Aren’tvoid appMemcpy( void* d, const void* s, size_t b )
{
// lots of assembly code here ...
}
appMemcpy( pDest, pSrc, 100 ); // bottleneck
• appMemcpy was slower than memcpy for anything under 64K
![Page 13: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/13.jpg)
Invent Only What You Need
• std::set/map more efficient than the custom tree by 10X– Tested and proven– Still high overhead
• An even better solution– Unsorted vector or array– Sort once– 20X improvement
![Page 14: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/14.jpg)
Profile• Run your profiler
– Rinse. Repeat.– Prove the improvement.
• Don’t rewrite the C runtime or STL just because you can. There are more interesting places to spend your time.
![Page 15: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/15.jpg)
Issue: Tool Knowledge• If you’re a programmer, you use
C/C++ every day• C++ is complex• CRT and STL libraries are complex• The complexities matter• Sometimes they really matter
![Page 16: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/16.jpg)
vector::clear• Game reused global vector in frame loop• clear() called every frame to empty the
vector• C++ Standard
– clear() erases all elements (size() goes to 0)– No mention of what happens to vector
capacity
• On VS7.1/Dinkumware, frees the memory• Every frame reallocated memory
![Page 17: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/17.jpg)
Zero-Initializationstruct Array { int x[1000]; };
struct Container {
Array arr;
Container() : arr() { }
};
Container x; // bottleneck
• Costing 3.5 ms/frame• Removing : arr() speeds this by
20X
![Page 18: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/18.jpg)
Know Thine Holy Standard• Use resize(0) to reduce container
size without affecting capacity• T() means zero-initialize PODs. Don’t
use T() unless you mean it.• Get a copy of the C++ Standard.
Really.– www.techstreet.com; search on 14882– Only $18 for the PDF
![Page 19: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/19.jpg)
Issue: C Runtimevoid BuildScore( char* s, int n )
{
if( n > 0 )
sprintf( s, “%d”, n );
else
sprintf( s, “” );
}
• n was often zero• sprintf was a hotspot
![Page 20: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/20.jpg)
qsort• Sorting is important in games• qsort is not an ideal sorting
function– No type safety– Comparison function call overhead– No opportunity for compiler inlining
• There are faster options
![Page 21: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/21.jpg)
Clearly, the CRT is Evil
![Page 22: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/22.jpg)
Understand Your Options• itoa() can replace sprintf( s, “%d”, n
) • *s = ‘\0’ can replace sprintf( s, “” )• std::sort can replace qsort
– Type safe– Comparison can be inlined
• Other sorting options can be even faster: partial_sort, partition
![Page 23: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/23.jpg)
Issue: Function Calls• 50,000-100,000 calls/frame is normal• At 60Hz, Xbox has 12.2M cycles/frame• Function call/return averages 20 cycles• A game calling 61,000 functions/frame
spends 10% CPU (1.7 ms/frame) in function call overhead
![Page 24: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/24.jpg)
Extreme Function-ality• 120,000 functions/frame• 140,000 functions/frame• 130,000 calls to a single
function/frame (ColumnVec<3,float>::operator[])
• And the winner:– 340,000 calls per frame!– 9 ms/frame of call overhead
![Page 25: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/25.jpg)
Beware Elegance• Elegance → levels of indirection →
more functions → perf impact• Use algorithmic solutions first
– One pass through the world– Better object rejection– Do AI/physics/networking less often
than once/frame
![Page 26: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/26.jpg)
Inline Judiciously• Remember: inline is a suggestion• Try “inline any suitable” compiler
option– 15 to 20 fps– 68,000 calls down to 47,000
• Try __forceinline or similar keyword– Adding to 5 funcs shaved 1.5 ms/frame
• Don’t over-inline
![Page 27: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/27.jpg)
Issue: for loops// Example 1: Copy indices to push buffer
for( DWORD i = 0; i < dwIndexCnt; ++i )
*pPushBuffer++ = arrIndices[ i ];
// Example 2: Initialize vector array
for( DWORD i = 0; i < dwMax; ++i )
mVectorArr[i] = XGVECTOR4(0,0,0,0);
// Example 3: Process items in world
for( itr i = c.begin(); i < c.end(); ++i )
Process( *i );
![Page 28: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/28.jpg)
Watch Out For For• Never copy/clear a POD with a for
loop• std::algorithms are optimized; use
them
memcpy( pPushBuffer, arrIndices,
dwIndexCnt * sizeof(DWORD) );
memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );
for_each( c.begin(), c.end(), Process );
![Page 29: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/29.jpg)
Issue: Exception Handling• Most games never throw• Most games never catch• Yet, most games enable EH• EH adds code to do stack unwinding
– A little bit of overhead to a lot of code– 10% size increase is common– 2 ms/frame in worst case
![Page 30: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/30.jpg)
Disable Exception Handling
• Don’t throw or catch exceptions• Turn off the C++ EH compiler
option• For Dinkumware STL
– Define “_HAS_EXCEPTIONS=0”– Write empty _Throw and _Raise_handler; see
stdthrow.cpp and raisehan.cpp in crt folder– Add #pragma warning(disable: 4530)
![Page 31: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/31.jpg)
Issue: Strings• Programmers love strings• Love hurts• ~7000 calls to stricmp in frame
loop– 1.5 ms/frame
• Binary search of a string table– 2 ms/frame
![Page 32: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/32.jpg)
Avoid strings• String comparisons don’t belong in
the frame loop• Put strings in an table and
compare indices• At least optimize the comparison
– Compare pointers only– Prefer strcmp to stricmp
![Page 33: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/33.jpg)
Issue: Memory Allocation• Memory overhead
– Xbox granularity/overhead is 16/16 bytes
– Overhead alone is often 1+ MB
• Too many allocations– Games commonly do thousands of
allocations per frame– Cost: 1-5 ms/frame
![Page 34: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/34.jpg)
Hidden Allocations• push_back(), insert() and friends
typically allocate memory• String constructors allocate• Init-style calls often allocate• Temporary objects, particularly
string constants that convert to string objects
![Page 35: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/35.jpg)
Minimize Per-Frame Allocations
• Use memory-friendly data structures, e.g. arrays, vectors
• Reserve memory in advance• Use custom allocators
– Pool same-size allocations in a single block of memory to avoid overhead
• Use the explicit keyword to avoid hidden temporaries
• Avoid strings
![Page 36: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/36.jpg)
Other Tidbits• Compiler settings: experiment• dynamic_cast: just say no• Constructors: performance killers• Unused static array space: track this• Loop unrolling: huge wins, sometimes• Suspicious comments: watch out
– “Immensely slow matrix multiplication”
![Page 37: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/37.jpg)
Wrap Up• Use the Right Tool for
the Job• The STL is Evil,
Sometimes• Invent Only What You
Need• Profile• Know Thine Holy
Standard• Understand Your Options
• Beware Elegance• Inline Judiciously• Watch Out For For• Disable Exception
Handling• Avoid Strings• Minimize Per-frame
Allocations
![Page 38: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/38.jpg)
Call to Action: Evolve!• Pass the rubber chicken
– Share your C++ performance mistakes with your team
• Mentor junior programmers– So they only make new mistakes
• Don’t stop learning– You can never know enough C++
![Page 39: Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group](https://reader035.vdocuments.us/reader035/viewer/2022062312/5518d07d550346991f8b5c8b/html5/thumbnails/39.jpg)
Questions• Fill out your feedback forms• Email: [email protected]• This presentation:
www.tantalon.com/pete.htm