molecular-matters.com game connection 2012 memory management strategies master class
TRANSCRIPT
molecular-matters.com
Game Connection 2012Game Connection 2012
Memory Management Strategies
Master Class
molecular-matters.com
About myselfAbout myself
Studied computer science at VUT, Austria Working in the games industry since 2004
PC, XBox360, PS2, PS3, Wii, DS Specialization in low-level programming
(threading, debugging, optimization) Teaching Founder & CTO @ Molecular Matters
Middleware for the games industry
molecular-matters.com
Master classMaster class
Participation Exchange of experiences Discussion There is no perfect way of doing things
There are many „rights“ & „wrongs“ Let us talk about past experiences, mistakes,
improvements Share ideas! Ask questions!
molecular-matters.com
AgendaAgenda
C++ new/delete/placement syntax Virtual memory Allocators Allocation strategies Debugging facilities
Fill patterns Bounds checking Memory tracking
molecular-matters.com
Agenda (cont'd)Agenda (cont'd)
Custom memory system Relocatable allocations Run-time defragmentation Debugging memory-related bugs
Stack overflow Memory overwrites
molecular-matters.com
What's wrong with that?What's wrong with that?
void* operator new(size_t size, unsigned int align){
// align memory by some meansreturn _aligned_malloc(size, align);
}
NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10];
Addresses of nonPod and array?
molecular-matters.com
C++ newC++ new
How do we allocate memory? Using the new operator (keyword new)
T* instance = new T; What happens behind the scenes?
Calls operator new to allocate storage for a T Calls the constructor for non-POD types
molecular-matters.com
C++ deleteC++ delete
How do we free memory? Using the delete operator (keyword delete)
delete instance; What happens behind the scenes?
Calls the destructor for non-POD types Calls operator delete to free storage
molecular-matters.com
C++ new, placement syntaxC++ new, placement syntax
Keyword new supports placement syntax Canonical form called placement new
Calls operator new(size_t, void*) Returns the given pointer, does not allocate memory
Constructs an instance in-place T* instance = new (memory) T;
Destructor needs to be called manually instance->~T();
molecular-matters.com
C++ new, placement syntax (cont'd)C++ new, placement syntax (cont'd)
Placement syntax supports N parameters The compiler maps keyword new to the
corresponding overload of operator new T* instance = new (10, 20, 30) T;
calls
void* operator new(size_t, int, int, int); First argument must always be of type size_t
sizeof(T) is inserted by the compiler
molecular-matters.com
C++ new, placement syntax (cont'd)C++ new, placement syntax (cont'd)
Very powerful! Custom overloads for operator new
Each overload must offer a corresponding operator delete
Can store arbitrary arguments for each call to new An operator is just a function
Can be called directly if desired Needs manual constructor call using placement new
Can use templates
molecular-matters.com
C++ delete, placement syntaxC++ delete, placement syntax
Keyword delete does not support placement syntax delete (instance, 10, 20); Treated as a statement using the comma operator
Overloads of operator delete used when an exception is thrown upon a call to new
Overloads can also be called directly Needs manual destructor call
molecular-matters.com
C++ new[]C++ new[]
Creates an array of instances Similar to keyword new, calls operator new[] Calls the constructor for each non-POD instance Supports placement syntax
Custom overloads of operator new[] possible
First sizeof() argument is compiler-specific POD vs. non-POD!
molecular-matters.com
C++ new[] (cont'd)C++ new[] (cont'd)
For non-PODs, constructors are called delete[] needs to call destructors
How many destructors to call? Compiler needs to store the number of instances Most compilers add an extra 4 bytes to the
allocation size sizeof(T)*N + 4 (non-POD)
sizeof(T)*N (POD)
molecular-matters.com
C++ new[] (cont'd)C++ new[] (cont'd)
Important! Address returned by operator new[] != address to
first instance in the array Source of confusion Compiler-specific behaviour, makes it almost impossible
to call overloads of operator delete[] directly Do we need to go back 4 bytes or not?
Makes support for custom alignment harder
molecular-matters.com
C++ delete[]C++ delete[]
Deletes an array of instances Similar to keyword delete, calls operator delete[] Calls the destructors for each non-POD instance
in reverse order Again, POD vs. non-POD
Number of instances to destruct is stored by the compiler for non-POD types
molecular-matters.com
C++ new vs. delete mismatchC++ new vs. delete mismatch
Allocating with new, deleting with delete[] operator delete[] expects the number of instances
May crash
Allocating with new[], deleting with delete More subtle bugs, only one destructor will be
called Visual Studio heap implementation is smart
enough to detect both mismatches
molecular-matters.com
SummarySummary
new != operator new delete != operator delete new[]/delete[] are compiler-specific Never mix new/delete[] and new[]/delete new offers powerful placement syntax
molecular-matters.com
Virtual memoryVirtual memory
Each process = virtual address space Not to be confused with paging to hard disk Virtual memory != physical memory Address translation done by MMU OS allocates/reserves memory in pages
Page sizes: 4KB, 64KB, 1MB, ...
molecular-matters.com
Virtual memory (cont'd)Virtual memory (cont'd)
Virtual addresses are mapped to physical memory addresses Contiguous virtual addresses != contiguous
physical memory A single page is the smallest amount of
memory that can be allocated Access restrictions on a per-page level
Read, write, execute, ...
molecular-matters.com
Virtual memory (cont'd)Virtual memory (cont'd)
Simplest address translation: Virtual address = page directory + offset Page directory = physical memory page +
additional info Page directory entries set by OS
In practice: Multi-level address translation See „What every programmer should know about
memory“ by Ulrich Drepper http://lwn.net/Articles/253361/
molecular-matters.com
Virtual memory (cont'd)Virtual memory (cont'd)
Address translation is expensive Several accesses to memory
„Page walk“
Result of address translation is cached Translation Look-aside Buffer (TLB)
Multiple levels, like D$ or I$ TLB = Global resource per processor
molecular-matters.com
Virtual memory (cont'd)Virtual memory (cont'd)
Allows to allocate contiguous memory even if the physical memory is not contiguous
Available on many architectures (PC, Mac, Linux, almost all consoles)
Used by CPU only GPU, sound hardware, etc. needs contiguous
physical memory E.g. XPhysicalAlloc
molecular-matters.com
Virtual memory (cont'd)Virtual memory (cont'd)
Growing allocators can account for worst-case scenarios more easily when using VM
Different address ranges for different purposes Heap, stack, code, write-combined, ... Helps with debugging!
molecular-matters.com
SummarySummary
Virtual memory nice to have, but not a necessity Can help tremendously with debugging
Virtual memory made availabe to CPU, not GPU or other hardware
Virtual memory address range >> RAM
molecular-matters.com
Why different allocators?Why different allocators?
No silver bullet, many allocation qualities Size Fragmentation Wasted space Performance Thread-safety Cache-locality Fixed size vs. growing
molecular-matters.com
Common allocatorsCommon allocators
Linear Stack, double-ended stack Pool Micro One-frame, two-frame temporary Double-buffered I/O General-purpose
molecular-matters.com
Linear allocatorLinear allocator
+ Supports any size and alignment + Extremely fast, simply bumps a pointer + No fragmentation + No wasted space + Lock-free implementation possible + Allocations live next to each other - Must free all allocations at once
molecular-matters.com
Stack allocatorStack allocator
+ Supports any size and alignment + Extremely fast, simply bumps a pointer + No fragmentation + No wasted space + Lock-free implementation possible + Allocations live next to each other +/- Must free allocations in reverse-order
molecular-matters.com
Double-ended stack allocatorDouble-ended stack allocator
Similar to stack allocator Can allocate from bottom or top Bottom for resident allocations Top for temporary allocations
Mostly used for level loading
molecular-matters.com
Pool allocatorPool allocator
- Supports one allocation size only + Very fast, simple pointer exchange + Fragments, but can always allocate + No wasted space + Lock-free implementation possible - Holes between allocations + Memory can be allocated/freed in any order
molecular-matters.com
Pool allocator (cont'd)Pool allocator (cont'd)
In-place free list No extra memory for book-keeping Re-use memory of freed allocations
Point to next free entry
molecular-matters.com
Micro allocatorMicro allocator
Similar to pool allocator, but different pools for different sizes + Very fast, lookup & simple pointer exchange + Fragments, but can always allocate - Some wasted space depending on size + Can use pool-local critical sections / lock-free - Holes between allocations + Memory can be allocated/freed in any order
molecular-matters.com
One-frame temporary allocatorOne-frame temporary allocator
Similar to linear allocator Used for scratchpad allocations during a frame
Another alternative is to use stack memory Fixed-size alloca()
molecular-matters.com
Two-frame temporary allocatorTwo-frame temporary allocator
Similar to one-frame temporary allocator Ping-pong between two one-frame allocators
Results from frame N persist until frame N+1 Useful for operations with 1 frame latency
Raycasts
molecular-matters.com
Double-buffered I/O allocatorDouble-buffered I/O allocator
Two ping-pong buffers Read into buffer A, consume from buffer B Initiate reads while consuming Useful for async. sequential reads from disk
Interface offers Consume() only Async. reads done transparently & interleaved
molecular-matters.com
General-purposeGeneral-purpose
Must cope with small & large allocations Used for 3rd party libraries Properties
- Slow - Fragmentation - Wasted memory, allocation overhead - Must use heavy-weight synchronization
molecular-matters.com
General-purpose (cont'd)General-purpose (cont'd)
Common implementations „High Performance Heap Allocator“ in GPG7 Doug Lea's „dlmalloc“ Emery Berger's „Hoard“
molecular-matters.com
Growing allocatorsGrowing allocators
With virtual memory Reserve worst-case up front Backup with physical memory when growing Less hassle during development Can grow without relocating allocations
Without virtual memory Resize allocator for e.g. each level Needs adjustment during development
molecular-matters.com
AllocatorsAllocators
Separate how from where How = allocator Where = heap or stack Offers more possibilities Allows to use stack with different allocators
molecular-matters.com
SummarySummary
No allocator fits all purposes Each allocator has different pros/cons Ideally, for each allocation think about
Size Frequency Lifetime Threading
molecular-matters.com
Why do we need a strategy?Why do we need a strategy?
Using a general-purpose allocator everywhere leads to Fragmented memory Wasted memory Somewhat unclear memory ownership Excessive clean-up before shipping
We can do better!
molecular-matters.com
Decision criteriaDecision criteria
Lifetime Application lifetime Level lifetime Temporary
molecular-matters.com
Decision criteria (cont'd)Decision criteria (cont'd)
Purpose Temporary while loading a level Temporary during a frame Purely visual (e.g. bullet holes)
LRU scheme Streaming I/O Gameplay critical
molecular-matters.com
Decision criteria (cont'd)Decision criteria (cont'd)
Frequency Once Each level load Each frame N times per frame
Should be avoided in the first place
molecular-matters.com
Where would you put those?Where would you put those?
Application-wide singleton allocations Render queue/command buffer allocations Level assets Particles Bullets, collision points, … 3rd party allocations Strings
molecular-matters.com
StrategyStrategy
Classify allocations based on lifetime first Classify allocations based on
purpose/size/frequency second Again, no silver bullet
Make everybody on the team think Make people aware of consequences &
alternatives Provide alternatives to general-purpose allocators
molecular-matters.com
Why?Why?
People make mistakes In relation to memory allocations, mistakes
can be fatal Provide tools for finding mistakes as early as
possible
molecular-matters.com
Common mistakesCommon mistakes
Forgetting to initialize class members Code then relies on garbage or stale values
Writing out-of-bounds Reading out-of-bounds Memory leaks Dangling pointers
molecular-matters.com
Possible outcomesPossible outcomes
From best to worst Compile error, warning Analysis tool error, warning Assertion Consistent crash or corruption Random, unpredictable crash Crashes after N hours of gameplay Seemingly works, crashes at certification
molecular-matters.com
Class member initializationClass member initialization
Non-initialized members get garbage Whatever the memory content is
Lucky case Leads to a crash later
Unlucky case Works, but not as intended Subtle bugs N frames later
molecular-matters.com
Class member initialization (cont'd)Class member initialization (cont'd)
Extra painful with pool allocators Most recently freed entry is re-used Members point to stale data
molecular-matters.com
Class member initialization (cont'd)Class member initialization (cont'd)
Remedy Crank up compiler warning level Fill memory with distinct pattern when allocating Fill memory with distinct pattern when freeing Use static code analysis tools
molecular-matters.com
Fill patternsFill patterns
If you can, do not fill with zero Hides bugs and uninitialized pointer members
E.g. delete 0 still works, but pointer was never initialized Often not possible when dealing with legacy code
Fill with „uncommon“ values Numerically odd Unused address range E.g. 0xCD, 0xFD, 0xAB
molecular-matters.com
Fill patterns (cont'd)Fill patterns (cont'd)
Different patterns for New allocation Freed allocation Guard bytes for allocation info Any other state
Fill patterns should be supported by each allocator More on that later
molecular-matters.com
Writing out-of-boundsWriting out-of-bounds
Could stomp non-critical data Could stomp allocator book-keeping data Could write to non-committed memory Lucky case
Immediate crash Unlucky case
Weird behaviour N frames later
molecular-matters.com
Writing out-of-bounds (cont'd)Writing out-of-bounds (cont'd)
Remedy Bounds checking Use static code analysis tools Use page access protection
More on that later
molecular-matters.com
Bounds checkingBounds checking
Add guard bytes at the front and back of an allocation
Fill guard bytes with distinct pattern Different check levels
Check front and back upon freeing an allocation Walk list of all allocations upon allocating & freeing Walk list of all allocations each frame
molecular-matters.com
Bounds checking (cont'd)Bounds checking (cont'd)
Useful for finding simple overwrites Intrusive
Changes the memory layout Needs more memory Cannot use debug memory on consoles
molecular-matters.com
Reading out-of-boundsReading out-of-bounds
Reads garbage data Garbage could lead to crash later Read operation itself could also crash
Very rare, reading from protected page Worst case
Goes unnoticed for months Seldomly crashes
molecular-matters.com
Reading out-of-bounds (cont'd)Reading out-of-bounds (cont'd)
Remedy Bounds checking, somewhat insufficient
At least the garbage read is always the same Use static code analysis tools Use page access protection
More on that later
molecular-matters.com
Memory leaksMemory leaks
Lead to out-of-memory conditions after hours of playing the game
Can go unnoticed for weeks „User-code“ leaks API leaks System memory leaks Logical leaks
molecular-matters.com
Memory leaks (cont'd)Memory leaks (cont'd)
„User-code“ leaks Calls to new, malloc, or similar Never free the allocation
Memory can never be reclaimed Easiest to track
molecular-matters.com
Memory leaks (cont'd)Memory leaks (cont'd)
API leaks Platform-specific leaks, e.g. D3D
Release() decreases reference-count Some API calls internally increase reference-count Reference-count != 0 upon releasing ownership
Harder to detect APIs often use platform-specific allocation functions
XPhysicalAlloc
Harder to track
molecular-matters.com
Memory leaks (cont'd)Memory leaks (cont'd)
System memory leaks Platform-specific system leaks, e.g. live kernel
objects Synchronization primitives File handles ...
Hardest to detect Hardest to track
molecular-matters.com
Memory leaks (cont'd)Memory leaks (cont'd)
Logical leaks No visible leaks at program termination Leaks at e.g. each level restart
Ever-growing singleton Singleton releases memory upon termination
molecular-matters.com
Memory leaks (cont'd)Memory leaks (cont'd)
Remedy Memory tracking Soak tests Use static code analysis tools
molecular-matters.com
Memory trackingMemory tracking
Two possibilities Allocate & free memory with new & delete only
Override global new and delete operators Static initialization order fiasco
Memory tracker must be instantiated first Platform-specific tricks needed
Still need to hook other functions malloc & free Weak functions or linker function wrapping
Problems with 3rd party libraries doing the same
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Static initialization order Top to bottom in a translation unit Undefined across translation units Platform-specific tricks
#pragma init_seg(compiler/lib/user) (MSVC) __attribute__ ((init_priority (101))) (GCC) Custom code/linker sections
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Other possibility Disallow new & delete completely
Use custom functions instead Allows more fine-grained tracking in allocators
Demand explicit startup & shutdown Harder to do with legacy codebase
Need to hook platform-specific functions Easier on consoles Code-patching on PC
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Hooking functions Weak functions (if supported) Implemented at compiler-level
XMemAlloc (Xbox360) Function wrapping at linker-level
--wrap (GCC) __wrap_* & __real_*
Code-patching (Windows) Disassemble & patch at run-time
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Storing allocations Intrusive
In-place linked list Changes memory layout Cannot use debug memory on consoles
External Associative container, e.g. hash map Same memory layout as with tracking disabled Can use debug memory on consoles
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Levels of sophistication Simple counter File & line & function, size Full callstack, ID, allocator name, size, …
Tracking granularity Global, e.g. #ifdef/#endif Per allocator
More on that later
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Simple counter Very fast Near-zero memory footprint Can be enabled in all builds during development Switch to a different tracker when a leak has been
found
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
File & line & function Fast Moderate overhead & footprint per allocation Provides enough info most of the time Tricky cases
Arriving at low-level functions & containers from many different code paths
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Full callstack & additional info Slow High overhead & footprint per allocation Can pinpoint tricky leaks
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Finding logical leaks Snapshot functionality
Capture all live allocations Compare snapshots e.g. upon level start Report snapshot differences
molecular-matters.com
Memory tracking (cont'd)Memory tracking (cont'd)
Out-of-memory conditions Stop all threads Report all allocations Memory browser
Display allocations & info on screen Support browsing via input device Needs fail-safe (CPU) rendering
Never allocate just a single byte No strings, no hidden allocations
molecular-matters.com
RequirementsRequirements
No global new and delete Custom functions used instead
Want to keep keyword new syntax Want to have placement-like syntax for delete Want to have per-allocator features like
tracking and bounds checking Provide as much extra info as possible
molecular-matters.com
Requirements (cont'd)Requirements (cont'd)
No global new and delete No static initialization order fiasco Plays nice with 3rd party libraries
Custom functions Can make use of additional parameters Can be specialized for certain types
molecular-matters.com
Requirements (cont'd)Requirements (cont'd)
Want to keep keyword new syntax It's what people know Allows passing arbitrary arguments to the
constructor Want to have placement-like syntax for delete
Additional parameters
molecular-matters.com
Requirements (cont'd)Requirements (cont'd)
Want to have per-allocator features like tracking and bounds checking Better granularity of checks Better performance Less overhead Smaller memory footprint
molecular-matters.com
Proposed implementationProposed implementation
3 tiers Low-level allocators High-level arenas new & delete replacement macros
molecular-matters.com
First tierFirst tier
Low-level allocators Allocate & free raw memory
Similar to malloc & free Do not call constructors Do not call destructors Used by low-level code
E.g. building this frame's command buffer No interface, no virtual functions
molecular-matters.com
Second tierSecond tier
High-level arenas Use an allocator internally Call constructors, support array construction Call destructors, support array deletion Used by replacement macros Simple abstract interface Provide (optional) debugging facilities
molecular-matters.com
Second tier (cont'd)Second tier (cont'd)
Optional facilities Fill patterns Bounds checking Memory tracking Out-of-memory conditions
Two C++ approaches Piggyback (templated) implementations Policy-based design
molecular-matters.com
Second tier (cont'd)Second tier (cont'd)
Piggyback implementations Allocators taking others by template Each piggyback allocator adds one functionality ThreadSafe<MemoryTracker<BoundsChecker<LinearAllocator> > > allocator;
Memory arena then simply takes an allocator
molecular-matters.com
Second tier (cont'd)Second tier (cont'd)
Policy-based design Each facility becomes a policy Each policy becomes a template parameter Different policy implementations template <class AllocationPolicy,class BoundsCheckingPolicy,class MemoryTrackingPolicy>class MemoryArena
molecular-matters.com
Third tierThird tier
Why not templates? Pre-C++11 code needs tons of permutations for
providing constructor arguments By-value By-reference By-pointer const & volatile qualifiers All combinations thereof
molecular-matters.com
Third tier (cont'd)Third tier (cont'd)
Why macros? Allows us to „expand“ upon the keyword new
syntax Constructor arguments are handled automatically Allows us to change the implementation later on Allows us to provide extra info without having to
rely on RTTI
molecular-matters.com
Third tier (cont'd)Third tier (cont'd)
Macro approach Use ordinary placement new
Allocate memory from arena Use returned pointer as argument to placement new
Provide extra info Human-readable type name Human-readable ID = description __FILE__, __LINE__, __FUNCTION__
molecular-matters.com
Third tier (cont'd)Third tier (cont'd)
#define ME_NEW(type, arena, ID)
new ((arena)->Allocate(sizeof(type),__alignof(type), ID, #type, __FILE__, __LINE__, __FUNCTION__)) type
DataStruct* data = ME_NEW(DataStruct, someArena, "A data structure")(arg1, arg2, arg3);
molecular-matters.com
Third tier (cont'd)Third tier (cont'd)
The lone „type“ at the end of the macro enables us to pass constructor arguments First parentheses are macro arguments Second parentheses are constructor arguments
molecular-matters.com
Third tier (cont'd)Third tier (cont'd)
ME_DELETE(ptr, arena) macro Calls destructor Frees memory owned by the arena
Arena known, no need to store & access somewhere
Similar macros for allocating & freeing arrays Optimizations
Call constructors & destructors for non-PODs only Type traits Type-based dispatch
molecular-matters.com
Relocatable allocationsRelocatable allocations
What for? Asset hot-reloading Streaming Run-time defragmentation
molecular-matters.com
Relocatable allocations (cont'd)Relocatable allocations (cont'd)
Using raw pointers Constant pointee size
Destruct & construct in-place Asset hot-reloading, only content changes
E.g. CPU texture header, GPU data
Register for callback/event Get notified upon change Can quickly become a mess
molecular-matters.com
Relocatable allocations (cont'd)Relocatable allocations (cont'd)
Handles Never return raw pointers, only handles Index into a (global) table Table holds pointer to „real“ object
Store extra bits for lifetime/ID in handle & table slot Can identify dangling references to deleted objects
Extra indirection But objects can be packed tightly
molecular-matters.com
Relocatable allocations (cont'd)Relocatable allocations (cont'd)
Function arguments? Handles
Less error-prone, user never sees raw pointer Lots of extra indirections
Raw pointers Allow conversion from handle to raw pointer Pointers only valid for one frame
More error-prone Needs more discipline from the team
molecular-matters.com
Run-time defragmentationRun-time defragmentation
Needs relocatable allocations Run for N ms each frame
Wait until v-sync = global sync. point Run on low memory conditions Run on level transitions Run when streaming assets
molecular-matters.com
Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)
Needs to play nice with streaming Priority-based Heuristics-based
Distance Size of streaming data User-controlled
Asynchronous I/O
molecular-matters.com
Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)
Generic defragmentation Differently sized blocks Move allocations
Create large contiguous memory blocks
molecular-matters.com
Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)
Chunk-based defragmentation Define upper-bound for chunk-size, e.g. 1 MB Each single asset must fit within chunk-size Only allocate in chunks Only move chunks
More predictable performance cost Chunks built by content pipeline
Shifts defragmentation problem to off-line tools
molecular-matters.com
Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)
Chunk-based defragmentation (cont'd) Easier to implement Easier to cope with streaming
Stream in chunks only Leads to more wasted memory
Optimizations Make use of platform-specific features
E.g. move stuff in VRAM using the GPU (PS3)
molecular-matters.com
What's wrong with that?What's wrong with that?
class String{public:
explicit String(const char* str);String(const String& other);String& operator=(const String& other);~String(void);const char* c_str(void) const;operator const char*(void) const;
};
String CopyString(const String& str){
String result(str.c_str());// …return result;
}
molecular-matters.com
What's wrong with that? (cont'd)What's wrong with that? (cont'd)
String someString("Hello World");
const char* copiedString = CopyString(someString);
const String& otherCopiedString = CopyString(someString);
String& otherCopiedString2 = CopyString(someString);
molecular-matters.com
Memory-related problemsMemory-related problems
Stack overflows Seldom in main thread More often in other threads
Memory overwrites Off-by-one Wrong casting Reference & pointer to temporary Dangling pointers
molecular-matters.com
Stack overflowsStack overflows
An overflow can stomp single bytes, not just big blocks Thread stack memory comes from heap Can stomp unrelated class members
Often lead to immediate crash upon writing E.g. in function prolog
molecular-matters.com
Stack overflows (cont'd)Stack overflows (cont'd)
Debugging help Find stack start & end
Highly platform-specific Linker-defined symbols for executable
Fill stack with pattern Check for pattern at start & end
Each frame In offending functions
molecular-matters.com
Memory overwritesMemory overwrites
Off-by-one Write past the end of an array 0-based vs. 1-based
Debugging tools Bounds checking
Finds those cases rather quick Static code analysis tools
molecular-matters.com
Memory overwrites (cont'd)Memory overwrites (cont'd)
Wrong casting Multiple inheritance Implicit conversions to void*
No offsets added reinterpret_cast
No offsets added static_cast
Correct offsets
molecular-matters.com
Memory overwrites (cont'd)Memory overwrites (cont'd)
Reference & pointer to temporary Most compilers will warn
Crank up the warning level Compilers are tricked easily Reference vs. const reference
molecular-matters.com
Memory overwrites (cont'd)Memory overwrites (cont'd)
Dangling pointers Point to already freed allocation Lucky case
Memory page has been decommited Immediate crash upon access
Unlucky case Other allocation sits at the same address
Nasty random crashes Reading garbage data → even further delayed crash
molecular-matters.com
Debugging overwritesDebugging overwrites
Memory window is your friend Never lies!
Crashes in optimized builds printf() slows down the application
Hides data races Local variables mostly gone
Not on stack, but in registers (PowerPC!) Volatile vs. non-volatile registers
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Knowing stack frame layout helps Parameter passing Dedicated vs. volatile vs. non-volatile registers
Knowing assembly helps Knowing class layout helps
V-table pointer, padding, alignment, … Verify your assumptions!
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Familiarize yourself with Function prologue Function epilogue Function calls Virtual function calls Different address ranges
Heap, stack, code, write-combined, …
Study compiler-generated code
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Help from the debugger Cast any address to pointer-type in watch window Cast zero into any instance
Work out member offsets Pseudo variables
@eax, @r1, @err, platform-specific ones Going back in time
„Set next instruction“
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Help from the debugger (cont'd) Crash dumps
Registers Work backwards from point of crash Find offsets using debugger Find this-pointer (+offset) in register, cast in debugger
Memory contents On-the-fly coding
Nop'ing out instructions Changing branches
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Most simple method Add padding bytes surrounding the stomped
variable Deduct whether single bytes or a whole block is
overwritten Changes memory layout
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Hardware breakpoints Supported by debuggers Supported by all major platforms
x86: Debug registers (DR0-DR7) PowerPC: Data Access Breakpoint Register (DABR)
Helps finding reads & writes Setting from code helps finding random stomps CPU only
GPU/SPU/DMA/IO access doesn't trigger
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Page protection Move allocations in question to the start or end of
a page Restrict access to surrounding pages Needs a lot more memory
Protect free pages Walk allocations upon freeing memory Don't re-use recently freed allocations
molecular-matters.com
Debugging overwrites (cont'd)Debugging overwrites (cont'd)
Page protection (cont'd) Needs virtual memory system If not available, most platforms offer similar
features
molecular-matters.com
If all else fails...If all else fails...
Go home, get some sleep Let your brain do the work
Grab a colleague Pair debugging Fresh pair of eyes
Talk to somebody Non-programmers, other team members Your family, rubber ducks, ...