4developers 2015: gamedev-grade debugging - leszek godlewski

90
Gamedev-grade debugging Leszek Godlewski, The Astronauts Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html

Upload: proidea

Post on 15-Jul-2015

68 views

Category:

Software


0 download

TRANSCRIPT

Page 1: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Gamedev-grade debuggingLeszek Godlewski, The Astronauts

Source: http://igetyourfail.blogspot.com/2009/01/reaching-out-tale-of-failed-skinning.html

Page 2: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

● Engine Programmer, The Astronauts (Nov 2014 – present)

– PS4 port of The Vanishing of Ethan Carter

● Programmer, Nordic Games (early 2014 – Nov 2014)

● Freelance Programmer (Sep 2013 – early 2014)

● Generalist Programmer, The Farm 51 (Mar 2010 – Aug 2013)

Who is this guy?

Page 3: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Agenda

● How is gamedev different?● Bug species● Case studies● Conclusions

Page 4: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

StartStart Exit?Exit?

EndEnd

Yes

NoUpdateUpdate DrawDraw

How is gamedev different?

Page 5: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

33 milliseconds

● How much time you have to get shit done™– 30 Hz → 33⅓ ms per frame

– 60 Hz → 16⅔ ms per frame

E d i t o rE d i t o r

L e v e l t o o l sL e v e l t o o l s

A s s e t t o o l sA s s e t t o o l s

E n g i n eE n g i n e

P h y s i c sP h y s i c s

R e n d e r i n gR e n d e r i n g A u d i oA u d i o

N e t w o r kN e t w o r k

P l a t f o r mP l a t f o r m

I n p u tI n p u t

N e t w o r kb a c k - e n d

N e t w o r kb a c k - e n d

G a m eG a m e

U IU I L o g i cL o g i c A IA I

Page 6: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Interdisciplinary working environment

● Designers– Game, Level, Quest, Audio…

● Artists– Environment, Character, 2D, UI, Concept…

● Programmers– Gameplay, Engine, Tools, UI, Audio…

● Writers● Composers● Actors● Producers● PR & Marketing Specialists● … } Tightly

woventeams

Page 7: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Severe, fixed hardware constraints

● Main reason for extensive use of native code

Page 8: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Different trade-offs

Robustness

C

ost

Performance

Fun

/Coo

lnes

s

Enterprise/B2B/webdev Gamedev

Page 9: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Indeterminism & complexity

● Leads to poor testability– Parts make no sense in isolation

– What exactly is correct?

– Performance regressions?

Source: https://github.com/memononen/recastnavigation

Page 10: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Aversion to general software engineering

● Modelling● Object-Oriented Programming● Design patterns● C++ STL● Templates in general● …

Page 11: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Agenda

● How is gamedev different?● Bug species● Case studies● Conclusions

Page 12: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Source: http://benigoat.tumblr.com/post/100306422911/press-b-to-crouch

Bug species

Page 13: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

General programming bugs

● Memory access violations● Memory stomping/buffer overflows● Infinite loops● Uninitialized variables● Reference cycles● Floating point precision errors● Out-Of-Memory/memory fragmentation● Memory leaks● Threading errors

Page 14: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Bad maths

● Incorrect transform order– Matrix multiplication not commutative

– AB ≠ BA

● Incorrect transform space

Source: http://leadwerks.com/wiki/index.php?title=TFormQuat

Page 15: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Temporal bugs

● Incorrect update order

– for (int i = 0; i < entities.size(); ++i)entities[i].update();

● Incorrect interpolation/blending– Bad alpha term– Bad blending mode (additive/modulate)

● Deferred effects– After n frames

– After n times an action happens

– n may be random, indeterministic

Page 16: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Graphical glitches

● Incorrect render state● Shader code bugs● Precision

Source: http://igetyourfail.blogspot.com/2009/01/visit-lake-fail-this-weekend.html

Page 17: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Content bugs

● Incorrect scripts● Buggy assets

Source: http://www.polycount.com/forum/showpost.php?p=1263124&postcount=10466

Page 18: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Worst part?

● Most cases are two or more of the aforementioned, intertwined

Page 19: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Agenda

● How is gamedev different?● Bug species● Case studies● Conclusions

Page 20: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Most material captured by

Case studies

Page 21: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Video settings not updating

Page 22: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect weapon after demon mode foreshadowing

Page 23: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Post-death sprint camera anim

Page 24: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death

Page 25: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death

● In normal gameplay, pawns have simplified movement– Sweep the actor's collision primitive through the

world

– Slide along slopes,stop against walls

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 26: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death

● Upon death, pawns switch to physics-based movement (ragdoll)

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 27: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death (cont.)

● Physics bodies have separate state from the game actor– Actor does not drive physics bodies, unless

requested

– If actor is driven byphysics simulation,their location issynchronized tothe hips bonebody's

Source: http://udn.epicgames.com/Three/PhysicalAnimation.html

Page 28: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death (cont.)

● Idea: breakpoint in FarMove()?– One function because world octree is updated– Function gets called a gazillion times per frame �– Terrible noise

● Breakpoint condition?– Teleport from arbitrary point A to arbitrary point B– Distance?

● Breakpoint sequence?– Break on death instead

– When breakpoint hit, break in FarMove()

Page 29: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Corpses teleported on death (cont.)

● Cause: physics body driving the actor with out-of-date state

● Fix: request physics body state synchronization to animation before switching to ragdoll

Page 30: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Weapons floating away from the player

Page 31: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Weapons floating away from the player

Page 32: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Weapons floating away from the player

● Extremely rare, only encountered on consoles– Reproduction rate somewhere at 1 in 50 attempts

– And never on developer machines �● Player pawn in a special state for the

rollercoaster ride– Many things could go wrong

● For the lack of repro, sprinkled the code with debug logs

Page 33: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Weapons floating away from the player (cont.)

● Cause: incorrect update order

– for (int i = 0;i < entities.size();++i)entities[i].update();

– Player pawn forced to update after rollercoaster car– Possible for weapons to be updated before player

pawns

● Fix: enforce weapon update after player pawns

Page 34: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Characters with “rapiers”

Page 35: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Characters with “rapiers”

● UE3 has ”content cooking” as part of game build pipeline– Redistributable builds are ”cooked” builds

● Artifact appears only in cooked builds

Page 36: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Characters with “rapiers” (cont.)

● Logs contained assertions for ”out-of-bounds vertices”● Mesh vertex compression scheme

– 32-bit float → 16-bit short int (~50% savings)– Find bounding sphere for all vertices– Normalize all vertices to said sphere radius– Map [-1; 1] floats to [-32768; 32767] 16-bit integers

● Assert condition

– for (int i = 0; i < 3; ++i)assert(v[i] >= -1.f && v[i] <= 1.f,”Out-of-bound vertex!”);

Page 37: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Characters with “rapiers” (cont.)

● v[i] was NaN

– Interesting property of NaN: all comparisons fail– Even with itself

● float f = nanf();bool b = (f == f);// b is false

● How did it get there?!● Tracked the NaN all the way down to the raw

engine asset!

Page 38: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Characters with “rapiers” (cont.)

● Cause: ???● Fix: re-export the mesh from 3D software

– Magic!

Page 39: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Meta-case: undeniable assertion

Page 40: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion

● Happened while debugging ”rapiers”● Texture compression library without sources● Flood of non-critical assertions

– For almost every texture

– Could not ignore in bulk �– Terrible noise

● Solution suggestion taken from [SINILO12]

Page 41: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion (cont.)

● Enter disassembly

Page 42: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion (cont.)

● Locate assert message function call instruction

Page 43: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion (cont.)

● Enter memory view and look up the adress– 0xE8 is the CALL opcode

– 4-byte address argument

Page 44: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion (cont.)

● NOP it out!– 0x90 is the NOP opcode

Page 45: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Undeniable assertion (cont.)

Page 46: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement

Page 47: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement

Page 48: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement

● Recreating player movement from one engine in another (Pain Engine → Unreal Engine 3)

● Different physics engines (Havok vs PhysX)● Many nuances

– Air control

– Jump and fall heights

– Slope & stair climbing & sliding down

Page 49: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement (cont.)

● Main nuance: capsule vs cylinder

Page 50: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement (cont.)

● Switching our pawn collision to capsule-based was not an option

● Emulate by sampling the ground under the cylinder instead

● No clever way to debug, just make it ”bug out” and break in debugger

Page 51: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement (cont.)

● Situation when getting stuck● Cause: vanilla UE3 code sent a player locked

between non-walkable surfaces into the ”falling” state

● Fix: keep the player in the “walking” state

Page 52: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect player movement (cont.)

● Situation when moving without player intent● Added visualization of sampling, turned on

collision display● Cause: undersampling● Fix: increase radial sampling resolution1) 2)

Page 53: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Blinking full-screen damage effects

Page 54: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Blinking full-screen damage effects

● Post-process effects are organized in one-way chains

Page 55: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Blinking full-screen damage effects (cont.)

● No debugger available to observe the PP chain● Rolled my own overlay that walked and

dumped the chain contents

MaterialEffect 'Vignette' Param 'Strength' 0.83 [IIIIIIII ]MaterialEffect 'FilmGrain' Param 'Strength' 0.00 [ ]UberPostProcessEffect 'None' SceneHighLights (X=0.80,Y=0.80,Z=0.80) SceneMidTones (X=0.80,Y=0.80,Z=0.80) …MaterialEffect 'Blood' Param 'Strength' 1.00 [IIIIIIIIII]

Page 56: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Blinking full-screen damage effects (cont.)

● Cause: entire PP chain override– Breakpoint in chain setting revealed the level script

as the source

– Overeager level designer ticking one checkbox too many when setting up thunderstorm effects

● Fix: disable chain overriding altogether– No use case for it in our game anyway

Page 57: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states

Page 58: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states

Page 59: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states

Page 60: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states

● Animation in UE3 is done by evaluating a tree– Branches are weight-blended (either replacement or

additive blend)

– Sequences (raw animations) for whole-skeleton poses

– Skeletalcontrols forfine-tuning ofindividualbones

Source: http://udn.epicgames.com/Three/AnimTreeEditorUserGuide.html

Page 61: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states (cont.)

● Prominent case for domain-specific debuggers● No tools for that in UE3, rolled my own visualizer

– Walks the animation tree and dumps active branches

– Allows inspection of states, but not transitions

– Conventionaldebuggingstill required,but greatlynarroweddown

Page 62: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states (cont.)

● Animation bug “checklist”● Inspect the animation state in slow motion

– Is the correct blending mode used?

● Inspect the AI and cutscene state– Capable of full animation overrides

● Inspect the assets (animation sequences)– Is the root bone correctly oriented?– Is the root bone motion correct?– Are inverse kinematics targets present and correctly placed?

– Is the mesh skeleton complete and correct?

Page 63: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Incorrect animation states (cont.)

● Incorrect blend of reload animation– Cause: bad root bone orientation in animation

sequence

● Left hand off the weapon– Cause: left hand inverse kinematics was off

– Fix: revise IK state control code

● Left hand incorrectly oriented– Cause: bad IK target marker orientation on weapon

mesh

Page 64: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are in view

Page 65: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are in view

● Graphics debugging is:– Tracing & recording graphics API calls

– Replaying the trace

– Reviewing the renderer state and resources

● Trace may besomewhat unreadableat first…

Page 66: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are… (cont.)

● Traces may be annotated for clarity– Direct3D: ID3DUserDefinedAnnotation– OpenGL:

GL_KHR_debug(more info:[GODLEWSKI01])

Page 67: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are… (cont.)

● Quick renderer state inspection revealed that viewport dimensions were off– 1024x1024, 1:1 aspect ratio instead of 1280x720, 16:9

– Looks like shadow map resolution?

● Found the latest glViewport() call– Shadow map code indeed

● Why wasn't the viewport updated for main scene rendering?

Page 68: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are… (cont.)

● Renderer state changes are expensive– New state needs to be validated

– Modern graphics APIs are asynchronous

– State reading may requrie synchronization → stalls

● Cache the current renderer state to avoid redundant calls– Cache ↔ state divergence → bugs!

Page 69: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Viewport stretched when portals are… (cont.)

● Cause: cache ↔ state divergence– Difference between Direct3D and OpenGL:

viewport dimensions as part of render target state, or global state

● Fix: tie viewport dimensions to render target in the cache

Page 70: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

Page 71: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

Page 72: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

Page 73: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

Page 74: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

Page 75: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts

● First thing to do is to inspect the state● Nothing suspicious found, turned to shaders● On OpenGL 4.2+, shaders could be debugged in NSight…● OpenGL 2.1, so had to resort to early returns from shader with

debug colours– Shader equivalent of debug logs, a.k.a. ”Your Mum's Debugger”

● ”Shotgun debugging” with is*() functions

– isnan(), isinf()● isnan() returned true!

Page 76: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Black artifacts (cont.)

● Cause: undefined behaviour in NVIDIA's pow() implementation

– Results are undefined if x < 0.Results are undefined if x = 0 and y <= 0. [GLSL120]

– Undefined means the implementation is free to do whatever● NVIDIA returns QNaN the Barbarian (displayed as black, poisoning all involved

calculations)● Other vendors usually return 0

● Fix: for all pow() calls, clamp either:– Arguments to their proper ranges

– Output to [0; ∞)

Page 77: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash

Page 78: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash

● Game in content lock (feature freeze) for a while● Playstation 3 port nearly done● Crash ~3-5 frames after entering a specific room● First report included a perfectly normal callstack but no

obvious reason● QA reassigned to another task, could not pursue more● Concluded it must've been an OOM crash

Page 79: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

● Bug comes back, albeit with wildly different callstack● Asked QA to reproduce mutliple times, including other platforms

– No crashes on X360 & Windows!

● Totally different callstack each time● Confusion!

– OOM? Even in 512 MB developer mode (256 MB in retail units)?– Bad content?– Console OS bug?

– Audio thread?– ???

Page 80: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

● Reviewed a larger sample of callstacks● Most ended in dlmalloc's integrity checks

– Assertions triggered upon allocations and frees

● Memory stomping…? Could it be…?

Page 81: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

● Started researching memory debugging– No tools provided by Sony

● Tried using debug allocators (dmalloc et al.)– Most use the concept of memory fences

– Difficult to hook up to UE3

malloc

Regular allocation Fenced allocation

malloc

Page 82: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

● Found and integrated a community-developed tool, Heap Inspector [VANDERBEEK14]– Memory analyzer

– Focused on consumption and usage patterns monitoring– Records callstacks for allocations and frees

● Several reproduction attempts revealed a correlation– Crash adress

– Construction of a specific class

● Gotcha!

Page 83: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

// class declaration

class Crasher extends ActorComponent;

var int DummyArray[1024];

// in ammo consumption code

Crash = new class'Crasher';

Comp = new class'ActorComponent'(Crash);

Page 84: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Mysterious crash (cont.)

● Cause: buffer overflow vulnerability in UnrealScript VM– No manifestation on X360 & Windows due to larger

allocation alignment (8 vs 16 bytes)

● Fix: make copy-construction fail when template is a subclassed object

● I wish I had Valgrind! [GODLEWSKI02]

Page 85: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Agenda

● How is gamedev different?● Bug species● Case studies● Conclusions

Page 86: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Takeaway

● Time is of the essence!● Always on a tight schedule● Constantly in motion

– Temporal visualization is key– Custom, domain-specific tools

● Complex and indeterministic– Difficult to automate testing

– Wide knowledge required

● Prone to bugs outside the code– Custom, domain-specific tools, again

Page 87: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Takeaway (cont.)

● Rendering is a whole separate beast– Absolutely custom tools in isolation from the rest of the

game– Still far from ideal usability

● Good to know your machine down to the metal● Good memory debugging tools make a world's

difference● You are never safe, not even in managed

languages!

Page 89: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

Thank you!

Page 90: 4Developers 2015: Gamedev-grade debugging - Leszek Godlewski

References

● SINILO12 – Sinilo, M. ”Coding in a debugger” [link]● GODLEWSKI01 – Godlewski, L. ”OpenGL (ES)

debugging” [link]● GLSL120 – Kessenich, J. ”The OpenGL® Shading

Language”, Language Version: 1.20, Document Revision: 8, p. 57 [link]

● VANDERBEEK14 – van der Beek, J. ”Heap Inspector” [link]

● GODLEWSKI02 – Godlewski, L. ”Advanced Linux Game Programming” [link]