molecular-matters.com game connection 2012 memory management strategies master class

134
molecular-matters.com Game Connection 2012 Game Connection 2012 Memory Management Strategies Master Class

Upload: byron-walters

Post on 03-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

molecular-matters.com

Game Connection 2012Game Connection 2012

Memory Management Strategies

Master Class

molecular-matters.com

About myselfAbout myself

Studied computer science at VUT, Austria Working in the games industry since 2004

PC, XBox360, PS2, PS3, Wii, DS Specialization in low-level programming

(threading, debugging, optimization) Teaching Founder & CTO @ Molecular Matters

Middleware for the games industry

molecular-matters.com

Master classMaster class

Participation Exchange of experiences Discussion There is no perfect way of doing things

There are many „rights“ & „wrongs“ Let us talk about past experiences, mistakes,

improvements Share ideas! Ask questions!

molecular-matters.com

AgendaAgenda

C++ new/delete/placement syntax Virtual memory Allocators Allocation strategies Debugging facilities

Fill patterns Bounds checking Memory tracking

molecular-matters.com

Agenda (cont'd)Agenda (cont'd)

Custom memory system Relocatable allocations Run-time defragmentation Debugging memory-related bugs

Stack overflow Memory overwrites

molecular-matters.com

C++ new/delete/placement syntax

molecular-matters.com

What's wrong with that?What's wrong with that?

void* operator new(size_t size, unsigned int align){

// align memory by some meansreturn _aligned_malloc(size, align);

}

NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10];

Addresses of nonPod and array?

molecular-matters.com

C++ newC++ new

How do we allocate memory? Using the new operator (keyword new)

T* instance = new T; What happens behind the scenes?

Calls operator new to allocate storage for a T Calls the constructor for non-POD types

molecular-matters.com

C++ deleteC++ delete

How do we free memory? Using the delete operator (keyword delete)

delete instance; What happens behind the scenes?

Calls the destructor for non-POD types Calls operator delete to free storage

molecular-matters.com

C++ new, placement syntaxC++ new, placement syntax

Keyword new supports placement syntax Canonical form called placement new

Calls operator new(size_t, void*) Returns the given pointer, does not allocate memory

Constructs an instance in-place T* instance = new (memory) T;

Destructor needs to be called manually instance->~T();

molecular-matters.com

C++ new, placement syntax (cont'd)C++ new, placement syntax (cont'd)

Placement syntax supports N parameters The compiler maps keyword new to the

corresponding overload of operator new T* instance = new (10, 20, 30) T;

calls

void* operator new(size_t, int, int, int); First argument must always be of type size_t

sizeof(T) is inserted by the compiler

molecular-matters.com

C++ new, placement syntax (cont'd)C++ new, placement syntax (cont'd)

Very powerful! Custom overloads for operator new

Each overload must offer a corresponding operator delete

Can store arbitrary arguments for each call to new An operator is just a function

Can be called directly if desired Needs manual constructor call using placement new

Can use templates

molecular-matters.com

C++ delete, placement syntaxC++ delete, placement syntax

Keyword delete does not support placement syntax delete (instance, 10, 20); Treated as a statement using the comma operator

Overloads of operator delete used when an exception is thrown upon a call to new

Overloads can also be called directly Needs manual destructor call

molecular-matters.com

C++ new[]C++ new[]

Creates an array of instances Similar to keyword new, calls operator new[] Calls the constructor for each non-POD instance Supports placement syntax

Custom overloads of operator new[] possible

First sizeof() argument is compiler-specific POD vs. non-POD!

molecular-matters.com

C++ new[] (cont'd)C++ new[] (cont'd)

For non-PODs, constructors are called delete[] needs to call destructors

How many destructors to call? Compiler needs to store the number of instances Most compilers add an extra 4 bytes to the

allocation size sizeof(T)*N + 4 (non-POD)

sizeof(T)*N (POD)

molecular-matters.com

C++ new[] (cont'd)C++ new[] (cont'd)

Important! Address returned by operator new[] != address to

first instance in the array Source of confusion Compiler-specific behaviour, makes it almost impossible

to call overloads of operator delete[] directly Do we need to go back 4 bytes or not?

Makes support for custom alignment harder

molecular-matters.com

C++ delete[]C++ delete[]

Deletes an array of instances Similar to keyword delete, calls operator delete[] Calls the destructors for each non-POD instance

in reverse order Again, POD vs. non-POD

Number of instances to destruct is stored by the compiler for non-POD types

molecular-matters.com

C++ new vs. delete mismatchC++ new vs. delete mismatch

Allocating with new, deleting with delete[] operator delete[] expects the number of instances

May crash

Allocating with new[], deleting with delete More subtle bugs, only one destructor will be

called Visual Studio heap implementation is smart

enough to detect both mismatches

molecular-matters.com

SummarySummary

new != operator new delete != operator delete new[]/delete[] are compiler-specific Never mix new/delete[] and new[]/delete new offers powerful placement syntax

molecular-matters.com

Virtual memory

molecular-matters.com

Virtual memoryVirtual memory

Each process = virtual address space Not to be confused with paging to hard disk Virtual memory != physical memory Address translation done by MMU OS allocates/reserves memory in pages

Page sizes: 4KB, 64KB, 1MB, ...

molecular-matters.com

Virtual memory (cont'd)Virtual memory (cont'd)

Virtual addresses are mapped to physical memory addresses Contiguous virtual addresses != contiguous

physical memory A single page is the smallest amount of

memory that can be allocated Access restrictions on a per-page level

Read, write, execute, ...

molecular-matters.com

Virtual memory (cont'd)Virtual memory (cont'd)

Simplest address translation: Virtual address = page directory + offset Page directory = physical memory page +

additional info Page directory entries set by OS

In practice: Multi-level address translation See „What every programmer should know about

memory“ by Ulrich Drepper http://lwn.net/Articles/253361/

molecular-matters.com

Virtual memory (cont'd)Virtual memory (cont'd)

Address translation is expensive Several accesses to memory

„Page walk“

Result of address translation is cached Translation Look-aside Buffer (TLB)

Multiple levels, like D$ or I$ TLB = Global resource per processor

molecular-matters.com

Virtual memory (cont'd)Virtual memory (cont'd)

Allows to allocate contiguous memory even if the physical memory is not contiguous

Available on many architectures (PC, Mac, Linux, almost all consoles)

Used by CPU only GPU, sound hardware, etc. needs contiguous

physical memory E.g. XPhysicalAlloc

molecular-matters.com

Virtual memory (cont'd)Virtual memory (cont'd)

Growing allocators can account for worst-case scenarios more easily when using VM

Different address ranges for different purposes Heap, stack, code, write-combined, ... Helps with debugging!

molecular-matters.com

SummarySummary

Virtual memory nice to have, but not a necessity Can help tremendously with debugging

Virtual memory made availabe to CPU, not GPU or other hardware

Virtual memory address range >> RAM

molecular-matters.com

Allocators

molecular-matters.com

Why different allocators?Why different allocators?

No silver bullet, many allocation qualities Size Fragmentation Wasted space Performance Thread-safety Cache-locality Fixed size vs. growing

molecular-matters.com

Common allocatorsCommon allocators

Linear Stack, double-ended stack Pool Micro One-frame, two-frame temporary Double-buffered I/O General-purpose

molecular-matters.com

Linear allocatorLinear allocator

+ Supports any size and alignment + Extremely fast, simply bumps a pointer + No fragmentation + No wasted space + Lock-free implementation possible + Allocations live next to each other - Must free all allocations at once

molecular-matters.com

Stack allocatorStack allocator

+ Supports any size and alignment + Extremely fast, simply bumps a pointer + No fragmentation + No wasted space + Lock-free implementation possible + Allocations live next to each other +/- Must free allocations in reverse-order

molecular-matters.com

Double-ended stack allocatorDouble-ended stack allocator

Similar to stack allocator Can allocate from bottom or top Bottom for resident allocations Top for temporary allocations

Mostly used for level loading

molecular-matters.com

Pool allocatorPool allocator

- Supports one allocation size only + Very fast, simple pointer exchange + Fragments, but can always allocate + No wasted space + Lock-free implementation possible - Holes between allocations + Memory can be allocated/freed in any order

molecular-matters.com

Pool allocator (cont'd)Pool allocator (cont'd)

In-place free list No extra memory for book-keeping Re-use memory of freed allocations

Point to next free entry

molecular-matters.com

Micro allocatorMicro allocator

Similar to pool allocator, but different pools for different sizes + Very fast, lookup & simple pointer exchange + Fragments, but can always allocate - Some wasted space depending on size + Can use pool-local critical sections / lock-free - Holes between allocations + Memory can be allocated/freed in any order

molecular-matters.com

One-frame temporary allocatorOne-frame temporary allocator

Similar to linear allocator Used for scratchpad allocations during a frame

Another alternative is to use stack memory Fixed-size alloca()

molecular-matters.com

Two-frame temporary allocatorTwo-frame temporary allocator

Similar to one-frame temporary allocator Ping-pong between two one-frame allocators

Results from frame N persist until frame N+1 Useful for operations with 1 frame latency

Raycasts

molecular-matters.com

Double-buffered I/O allocatorDouble-buffered I/O allocator

Two ping-pong buffers Read into buffer A, consume from buffer B Initiate reads while consuming Useful for async. sequential reads from disk

Interface offers Consume() only Async. reads done transparently & interleaved

molecular-matters.com

General-purposeGeneral-purpose

Must cope with small & large allocations Used for 3rd party libraries Properties

- Slow - Fragmentation - Wasted memory, allocation overhead - Must use heavy-weight synchronization

molecular-matters.com

General-purpose (cont'd)General-purpose (cont'd)

Common implementations „High Performance Heap Allocator“ in GPG7 Doug Lea's „dlmalloc“ Emery Berger's „Hoard“

molecular-matters.com

Growing allocatorsGrowing allocators

With virtual memory Reserve worst-case up front Backup with physical memory when growing Less hassle during development Can grow without relocating allocations

Without virtual memory Resize allocator for e.g. each level Needs adjustment during development

molecular-matters.com

AllocatorsAllocators

Separate how from where How = allocator Where = heap or stack Offers more possibilities Allows to use stack with different allocators

molecular-matters.com

SummarySummary

No allocator fits all purposes Each allocator has different pros/cons Ideally, for each allocation think about

Size Frequency Lifetime Threading

molecular-matters.com

Allocation strategies

molecular-matters.com

Why do we need a strategy?Why do we need a strategy?

Using a general-purpose allocator everywhere leads to Fragmented memory Wasted memory Somewhat unclear memory ownership Excessive clean-up before shipping

We can do better!

molecular-matters.com

Decision criteriaDecision criteria

Lifetime Application lifetime Level lifetime Temporary

molecular-matters.com

Decision criteria (cont'd)Decision criteria (cont'd)

Purpose Temporary while loading a level Temporary during a frame Purely visual (e.g. bullet holes)

LRU scheme Streaming I/O Gameplay critical

molecular-matters.com

Decision criteria (cont'd)Decision criteria (cont'd)

Frequency Once Each level load Each frame N times per frame

Should be avoided in the first place

molecular-matters.com

Where would you put those?Where would you put those?

Application-wide singleton allocations Render queue/command buffer allocations Level assets Particles Bullets, collision points, … 3rd party allocations Strings

molecular-matters.com

StrategyStrategy

Classify allocations based on lifetime first Classify allocations based on

purpose/size/frequency second Again, no silver bullet

Make everybody on the team think Make people aware of consequences &

alternatives Provide alternatives to general-purpose allocators

molecular-matters.com

Debugging facilities

molecular-matters.com

Why?Why?

People make mistakes In relation to memory allocations, mistakes

can be fatal Provide tools for finding mistakes as early as

possible

molecular-matters.com

Common mistakesCommon mistakes

Forgetting to initialize class members Code then relies on garbage or stale values

Writing out-of-bounds Reading out-of-bounds Memory leaks Dangling pointers

molecular-matters.com

Possible outcomesPossible outcomes

From best to worst Compile error, warning Analysis tool error, warning Assertion Consistent crash or corruption Random, unpredictable crash Crashes after N hours of gameplay Seemingly works, crashes at certification

molecular-matters.com

Class member initializationClass member initialization

Non-initialized members get garbage Whatever the memory content is

Lucky case Leads to a crash later

Unlucky case Works, but not as intended Subtle bugs N frames later

molecular-matters.com

Class member initialization (cont'd)Class member initialization (cont'd)

Extra painful with pool allocators Most recently freed entry is re-used Members point to stale data

molecular-matters.com

Class member initialization (cont'd)Class member initialization (cont'd)

Remedy Crank up compiler warning level Fill memory with distinct pattern when allocating Fill memory with distinct pattern when freeing Use static code analysis tools

molecular-matters.com

Fill patternsFill patterns

If you can, do not fill with zero Hides bugs and uninitialized pointer members

E.g. delete 0 still works, but pointer was never initialized Often not possible when dealing with legacy code

Fill with „uncommon“ values Numerically odd Unused address range E.g. 0xCD, 0xFD, 0xAB

molecular-matters.com

Fill patterns (cont'd)Fill patterns (cont'd)

Different patterns for New allocation Freed allocation Guard bytes for allocation info Any other state

Fill patterns should be supported by each allocator More on that later

molecular-matters.com

Writing out-of-boundsWriting out-of-bounds

Could stomp non-critical data Could stomp allocator book-keeping data Could write to non-committed memory Lucky case

Immediate crash Unlucky case

Weird behaviour N frames later

molecular-matters.com

Writing out-of-bounds (cont'd)Writing out-of-bounds (cont'd)

Remedy Bounds checking Use static code analysis tools Use page access protection

More on that later

molecular-matters.com

Bounds checkingBounds checking

Add guard bytes at the front and back of an allocation

Fill guard bytes with distinct pattern Different check levels

Check front and back upon freeing an allocation Walk list of all allocations upon allocating & freeing Walk list of all allocations each frame

molecular-matters.com

Bounds checking (cont'd)Bounds checking (cont'd)

Useful for finding simple overwrites Intrusive

Changes the memory layout Needs more memory Cannot use debug memory on consoles

molecular-matters.com

Reading out-of-boundsReading out-of-bounds

Reads garbage data Garbage could lead to crash later Read operation itself could also crash

Very rare, reading from protected page Worst case

Goes unnoticed for months Seldomly crashes

molecular-matters.com

Reading out-of-bounds (cont'd)Reading out-of-bounds (cont'd)

Remedy Bounds checking, somewhat insufficient

At least the garbage read is always the same Use static code analysis tools Use page access protection

More on that later

molecular-matters.com

Memory leaksMemory leaks

Lead to out-of-memory conditions after hours of playing the game

Can go unnoticed for weeks „User-code“ leaks API leaks System memory leaks Logical leaks

molecular-matters.com

Memory leaks (cont'd)Memory leaks (cont'd)

„User-code“ leaks Calls to new, malloc, or similar Never free the allocation

Memory can never be reclaimed Easiest to track

molecular-matters.com

Memory leaks (cont'd)Memory leaks (cont'd)

API leaks Platform-specific leaks, e.g. D3D

Release() decreases reference-count Some API calls internally increase reference-count Reference-count != 0 upon releasing ownership

Harder to detect APIs often use platform-specific allocation functions

XPhysicalAlloc

Harder to track

molecular-matters.com

Memory leaks (cont'd)Memory leaks (cont'd)

System memory leaks Platform-specific system leaks, e.g. live kernel

objects Synchronization primitives File handles ...

Hardest to detect Hardest to track

molecular-matters.com

Memory leaks (cont'd)Memory leaks (cont'd)

Logical leaks No visible leaks at program termination Leaks at e.g. each level restart

Ever-growing singleton Singleton releases memory upon termination

molecular-matters.com

Memory leaks (cont'd)Memory leaks (cont'd)

Remedy Memory tracking Soak tests Use static code analysis tools

molecular-matters.com

Memory trackingMemory tracking

Two possibilities Allocate & free memory with new & delete only

Override global new and delete operators Static initialization order fiasco

Memory tracker must be instantiated first Platform-specific tricks needed

Still need to hook other functions malloc & free Weak functions or linker function wrapping

Problems with 3rd party libraries doing the same

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Static initialization order Top to bottom in a translation unit Undefined across translation units Platform-specific tricks

#pragma init_seg(compiler/lib/user) (MSVC) __attribute__ ((init_priority (101))) (GCC) Custom code/linker sections

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Other possibility Disallow new & delete completely

Use custom functions instead Allows more fine-grained tracking in allocators

Demand explicit startup & shutdown Harder to do with legacy codebase

Need to hook platform-specific functions Easier on consoles Code-patching on PC

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Hooking functions Weak functions (if supported) Implemented at compiler-level

XMemAlloc (Xbox360) Function wrapping at linker-level

--wrap (GCC) __wrap_* & __real_*

Code-patching (Windows) Disassemble & patch at run-time

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Storing allocations Intrusive

In-place linked list Changes memory layout Cannot use debug memory on consoles

External Associative container, e.g. hash map Same memory layout as with tracking disabled Can use debug memory on consoles

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Levels of sophistication Simple counter File & line & function, size Full callstack, ID, allocator name, size, …

Tracking granularity Global, e.g. #ifdef/#endif Per allocator

More on that later

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Simple counter Very fast Near-zero memory footprint Can be enabled in all builds during development Switch to a different tracker when a leak has been

found

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

File & line & function Fast Moderate overhead & footprint per allocation Provides enough info most of the time Tricky cases

Arriving at low-level functions & containers from many different code paths

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Full callstack & additional info Slow High overhead & footprint per allocation Can pinpoint tricky leaks

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Finding logical leaks Snapshot functionality

Capture all live allocations Compare snapshots e.g. upon level start Report snapshot differences

molecular-matters.com

Memory tracking (cont'd)Memory tracking (cont'd)

Out-of-memory conditions Stop all threads Report all allocations Memory browser

Display allocations & info on screen Support browsing via input device Needs fail-safe (CPU) rendering

Never allocate just a single byte No strings, no hidden allocations

molecular-matters.com

Custom memory system

molecular-matters.com

RequirementsRequirements

No global new and delete Custom functions used instead

Want to keep keyword new syntax Want to have placement-like syntax for delete Want to have per-allocator features like

tracking and bounds checking Provide as much extra info as possible

molecular-matters.com

Requirements (cont'd)Requirements (cont'd)

No global new and delete No static initialization order fiasco Plays nice with 3rd party libraries

Custom functions Can make use of additional parameters Can be specialized for certain types

molecular-matters.com

Requirements (cont'd)Requirements (cont'd)

Want to keep keyword new syntax It's what people know Allows passing arbitrary arguments to the

constructor Want to have placement-like syntax for delete

Additional parameters

molecular-matters.com

Requirements (cont'd)Requirements (cont'd)

Want to have per-allocator features like tracking and bounds checking Better granularity of checks Better performance Less overhead Smaller memory footprint

molecular-matters.com

Proposed implementationProposed implementation

3 tiers Low-level allocators High-level arenas new & delete replacement macros

molecular-matters.com

First tierFirst tier

Low-level allocators Allocate & free raw memory

Similar to malloc & free Do not call constructors Do not call destructors Used by low-level code

E.g. building this frame's command buffer No interface, no virtual functions

molecular-matters.com

Second tierSecond tier

High-level arenas Use an allocator internally Call constructors, support array construction Call destructors, support array deletion Used by replacement macros Simple abstract interface Provide (optional) debugging facilities

molecular-matters.com

Second tier (cont'd)Second tier (cont'd)

Optional facilities Fill patterns Bounds checking Memory tracking Out-of-memory conditions

Two C++ approaches Piggyback (templated) implementations Policy-based design

molecular-matters.com

Second tier (cont'd)Second tier (cont'd)

Piggyback implementations Allocators taking others by template Each piggyback allocator adds one functionality ThreadSafe<MemoryTracker<BoundsChecker<LinearAllocator> > > allocator;

Memory arena then simply takes an allocator

molecular-matters.com

Second tier (cont'd)Second tier (cont'd)

Policy-based design Each facility becomes a policy Each policy becomes a template parameter Different policy implementations template <class AllocationPolicy,class BoundsCheckingPolicy,class MemoryTrackingPolicy>class MemoryArena

molecular-matters.com

Third tierThird tier

Why not templates? Pre-C++11 code needs tons of permutations for

providing constructor arguments By-value By-reference By-pointer const & volatile qualifiers All combinations thereof

molecular-matters.com

Third tier (cont'd)Third tier (cont'd)

Why macros? Allows us to „expand“ upon the keyword new

syntax Constructor arguments are handled automatically Allows us to change the implementation later on Allows us to provide extra info without having to

rely on RTTI

molecular-matters.com

Third tier (cont'd)Third tier (cont'd)

Macro approach Use ordinary placement new

Allocate memory from arena Use returned pointer as argument to placement new

Provide extra info Human-readable type name Human-readable ID = description __FILE__, __LINE__, __FUNCTION__

molecular-matters.com

Third tier (cont'd)Third tier (cont'd)

#define ME_NEW(type, arena, ID)

new ((arena)->Allocate(sizeof(type),__alignof(type), ID, #type, __FILE__, __LINE__, __FUNCTION__)) type

DataStruct* data = ME_NEW(DataStruct, someArena, "A data structure")(arg1, arg2, arg3);

molecular-matters.com

Third tier (cont'd)Third tier (cont'd)

The lone „type“ at the end of the macro enables us to pass constructor arguments First parentheses are macro arguments Second parentheses are constructor arguments

molecular-matters.com

Third tier (cont'd)Third tier (cont'd)

ME_DELETE(ptr, arena) macro Calls destructor Frees memory owned by the arena

Arena known, no need to store & access somewhere

Similar macros for allocating & freeing arrays Optimizations

Call constructors & destructors for non-PODs only Type traits Type-based dispatch

molecular-matters.com

Relocatable allocations

molecular-matters.com

Relocatable allocationsRelocatable allocations

What for? Asset hot-reloading Streaming Run-time defragmentation

molecular-matters.com

Relocatable allocations (cont'd)Relocatable allocations (cont'd)

Using raw pointers Constant pointee size

Destruct & construct in-place Asset hot-reloading, only content changes

E.g. CPU texture header, GPU data

Register for callback/event Get notified upon change Can quickly become a mess

molecular-matters.com

Relocatable allocations (cont'd)Relocatable allocations (cont'd)

Handles Never return raw pointers, only handles Index into a (global) table Table holds pointer to „real“ object

Store extra bits for lifetime/ID in handle & table slot Can identify dangling references to deleted objects

Extra indirection But objects can be packed tightly

molecular-matters.com

Relocatable allocations (cont'd)Relocatable allocations (cont'd)

Function arguments? Handles

Less error-prone, user never sees raw pointer Lots of extra indirections

Raw pointers Allow conversion from handle to raw pointer Pointers only valid for one frame

More error-prone Needs more discipline from the team

molecular-matters.com

Run-time defragmentation

molecular-matters.com

Run-time defragmentationRun-time defragmentation

Needs relocatable allocations Run for N ms each frame

Wait until v-sync = global sync. point Run on low memory conditions Run on level transitions Run when streaming assets

molecular-matters.com

Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)

Needs to play nice with streaming Priority-based Heuristics-based

Distance Size of streaming data User-controlled

Asynchronous I/O

molecular-matters.com

Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)

Generic defragmentation Differently sized blocks Move allocations

Create large contiguous memory blocks

molecular-matters.com

Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)

Chunk-based defragmentation Define upper-bound for chunk-size, e.g. 1 MB Each single asset must fit within chunk-size Only allocate in chunks Only move chunks

More predictable performance cost Chunks built by content pipeline

Shifts defragmentation problem to off-line tools

molecular-matters.com

Run-time defragmentation (cont'd)Run-time defragmentation (cont'd)

Chunk-based defragmentation (cont'd) Easier to implement Easier to cope with streaming

Stream in chunks only Leads to more wasted memory

Optimizations Make use of platform-specific features

E.g. move stuff in VRAM using the GPU (PS3)

molecular-matters.com

Debugging memory-related bugs

molecular-matters.com

What's wrong with that?What's wrong with that?

class String{public:

explicit String(const char* str);String(const String& other);String& operator=(const String& other);~String(void);const char* c_str(void) const;operator const char*(void) const;

};

String CopyString(const String& str){

String result(str.c_str());// …return result;

}

molecular-matters.com

What's wrong with that? (cont'd)What's wrong with that? (cont'd)

String someString("Hello World");

const char* copiedString = CopyString(someString);

const String& otherCopiedString = CopyString(someString);

String& otherCopiedString2 = CopyString(someString);

molecular-matters.com

Memory-related problemsMemory-related problems

Stack overflows Seldom in main thread More often in other threads

Memory overwrites Off-by-one Wrong casting Reference & pointer to temporary Dangling pointers

molecular-matters.com

Stack overflowsStack overflows

An overflow can stomp single bytes, not just big blocks Thread stack memory comes from heap Can stomp unrelated class members

Often lead to immediate crash upon writing E.g. in function prolog

molecular-matters.com

Stack overflows (cont'd)Stack overflows (cont'd)

Debugging help Find stack start & end

Highly platform-specific Linker-defined symbols for executable

Fill stack with pattern Check for pattern at start & end

Each frame In offending functions

molecular-matters.com

Memory overwritesMemory overwrites

Off-by-one Write past the end of an array 0-based vs. 1-based

Debugging tools Bounds checking

Finds those cases rather quick Static code analysis tools

molecular-matters.com

Memory overwrites (cont'd)Memory overwrites (cont'd)

Wrong casting Multiple inheritance Implicit conversions to void*

No offsets added reinterpret_cast

No offsets added static_cast

Correct offsets

molecular-matters.com

Memory overwrites (cont'd)Memory overwrites (cont'd)

Reference & pointer to temporary Most compilers will warn

Crank up the warning level Compilers are tricked easily Reference vs. const reference

molecular-matters.com

Memory overwrites (cont'd)Memory overwrites (cont'd)

Dangling pointers Point to already freed allocation Lucky case

Memory page has been decommited Immediate crash upon access

Unlucky case Other allocation sits at the same address

Nasty random crashes Reading garbage data → even further delayed crash

molecular-matters.com

Debugging overwritesDebugging overwrites

Memory window is your friend Never lies!

Crashes in optimized builds printf() slows down the application

Hides data races Local variables mostly gone

Not on stack, but in registers (PowerPC!) Volatile vs. non-volatile registers

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Knowing stack frame layout helps Parameter passing Dedicated vs. volatile vs. non-volatile registers

Knowing assembly helps Knowing class layout helps

V-table pointer, padding, alignment, … Verify your assumptions!

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Familiarize yourself with Function prologue Function epilogue Function calls Virtual function calls Different address ranges

Heap, stack, code, write-combined, …

Study compiler-generated code

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Help from the debugger Cast any address to pointer-type in watch window Cast zero into any instance

Work out member offsets Pseudo variables

@eax, @r1, @err, platform-specific ones Going back in time

„Set next instruction“

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Help from the debugger (cont'd) Crash dumps

Registers Work backwards from point of crash Find offsets using debugger Find this-pointer (+offset) in register, cast in debugger

Memory contents On-the-fly coding

Nop'ing out instructions Changing branches

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Most simple method Add padding bytes surrounding the stomped

variable Deduct whether single bytes or a whole block is

overwritten Changes memory layout

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Hardware breakpoints Supported by debuggers Supported by all major platforms

x86: Debug registers (DR0-DR7) PowerPC: Data Access Breakpoint Register (DABR)

Helps finding reads & writes Setting from code helps finding random stomps CPU only

GPU/SPU/DMA/IO access doesn't trigger

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Page protection Move allocations in question to the start or end of

a page Restrict access to surrounding pages Needs a lot more memory

Protect free pages Walk allocations upon freeing memory Don't re-use recently freed allocations

molecular-matters.com

Debugging overwrites (cont'd)Debugging overwrites (cont'd)

Page protection (cont'd) Needs virtual memory system If not available, most platforms offer similar

features

molecular-matters.com

If all else fails...If all else fails...

Go home, get some sleep Let your brain do the work

Grab a colleague Pair debugging Fresh pair of eyes

Talk to somebody Non-programmers, other team members Your family, rubber ducks, ...

molecular-matters.com

Yes, rubber ducks!Yes, rubber ducks!

molecular-matters.com

Yes, rubber ducks!Yes, rubber ducks!

molecular-matters.com

That's it!That's it!

Questions?

Contact: stefan.reinalter@molecular-

matters.com