unity internals: memory and performance
DESCRIPTION
In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part on performance profiling and optimizations. Finally, there will be an overview of debugging and profiling improvements in the newly announced Unity 5.0.TRANSCRIPT
Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer
Page 6/9/14 2
This Talk Goals and Benefits
Page
Who Am I ? • Now Field Engineer @ Unity
• Previously, Software Engineer • Mainly worked on game engines • Shipped several video games:
• Captain America: Super Soldier • FIFA ‘07 – FIFA ’10 • Fight Night: Round 3
6/9/14 3
Page
Topics • Memory Overview • Garbage Collection • Mesh Internals • Scripting • Job System • How to use the Profiler
6/9/14 4
Page 6/9/14 5
Memory Overview
Page
Memory Domains • Native (internal)
• Asset Data: Textures, AudioClips, Meshes • Game Objects & Components: Transform, etc.. • Engine Internals: Managers, Rendering, Physics, etc..
• Managed - Mono • Script objects (Managed dlls) • Wrappers for Unity objects: Game objects, assets,
components • Native Dlls
• User’s dlls and external dlls (for example: DirectX)
6/9/14 6
Page
Native Memory: Internal Allocators • Default • GameObject • Gfx • Profiler 5.x: We are considering to expose an API for using a native allocator in Dlls
6/9/14 7
Page
Managed Memory • Value types (bool, int, float, struct, ...)
• Exist in stack memory. De-allocated when removed from the stack. No Garbage.
• Reference types (classes) • Exist on the heap and are handled by the mono/.net
GC. Removed when no longer being referenced. • Wrappers for Unity Objects :
• GameObject • Assets : Texture2D, AudioClip, Mesh, … • Components : MeshRenderer, Transform, MonoBehaviour
6/9/14 8
Page
Mono Memory Internals • Allocates system heap blocks for internal allocator • Will allocate new heap blocks when needed • Heap blocks are kept in Mono for later use
• Memory can be given back to the system after a while • …but it depends on the platform è don’t count on it
• Garbage collector cleans up • Fragmentation can cause new heap blocks even
though memory is not exhausted
6/9/14 9
Page 6/9/14 10
Garbage Collection
Page
Unity Object wrapper
• Some Objects used in scripts have large native backing memory in unity • Memory not freed until Finalizers have run
6/9/14 11
WWW Decompression buffer
Compressed file
Decompressed file
Managed Native
Page
Mono Garbage Collection • GC.Collect
• Runs on the main thread when • Mono exhausts the heap space • Or user calls System.GC.Collect()
• Finalizers • Run on a separate thread
• Controlled by mono • Can have several seconds delay
• Unity native memory • Dispose() cleans up internal
memory • Eventually called from finalizer • Manually call Dispose() to cleanup
6/9/14 12
Main thread Finalizer thread
www = null; new(someclass); //no more heap -> GC.Collect();
www.Dispose();
.....
Page
Garbage Collection • Roots are not collected in a GC.Collect
• Thread stacks • CPU Registers • GC Handles (used by Unity to hold onto managed
objects) • Static variables!!
• Collection time scales with managed heap size • The more you allocate, the slower it gets
6/9/14 13
Page
GC: does lata layout matter ? struct Stuff {
int a; float b; bool c; string leString;
} Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time VS int[] As; float[] Bs; bool[] Cs; string[] leStrings; << Only this is scanned. GC takes less time.
6/9/14 14
Page
GC: Best Practices • Reuse objects è Use object pools • Prefer stack-based allocations è Use struct
instead of class • System.GC.Collect can be used to trigger
collection • Calling it 6 times returns the unused memory to
the OS • Manually call Dispose to cleanup immediately
6/9/14 15
Page
Avoid temp allocations • Don’t use FindObjects or LINQ • Use StringBuilder for string concatenation • Reuse large temporary work buffers • ToString() • .tag è use CompareTag() instead
6/9/14 16
Page
Unity API Temporary Allocations Some Examples: • GetComponents<T> • Vector3[] Mesh.vertices • Camera[] Camera.allCameras • foreach
• does not allocate by definition • However, there can be a small allocation, depending on the
implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
6/9/14 17
Page
Memory fragmentation • Memory fragmentation is hard to account for
• Fully unload dynamically allocated content • Switch to a blank scene before proceeding to next level
• This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory
• Ensure you clear out variables so GC.Collect will remove as much as possible
• Avoid allocations where possible • Reuse objects where possible within a scene play • Clear them out for map load to clean the memory
6/9/14 18
Page
Unloading Unused Assets • Resources.UnloadUnusedAssets will trigger asset
garbage collection • It looks for all unreferenced assets and unloads them • It’s an async operation • It’s called internally after loading a level
• Resources.UnloadAsset is preferable • you need to know exactly what you need to Unload • Unity does not have to scan everything
• Unity 5.0: Multi-threaded asset garbage collection
6/9/14 19
Page 6/9/14 20
Mesh Internals Memory vs. Cycles
Page
Mesh Read/Write Option • It allows you to modify the mesh at run-time • If enabled, a system-copy of the Mesh will remain in
memory • It is enabled by default • In some cases, disabling this option will not reduce the
memory usage • Skinned meshes • iOS
Unity 5.0: disable by default – under consideration
6/9/14 21
Page
Non-Uniform scaled Meshes We need to correctly transform vertex normals • Unity 4.x:
• transform the mesh on the CPU • create an extra copy of the data
• Unity 5.0 • Scaled on GPU • Extra memory no longer needed
6/9/14 22
Page
Static Batching What is it ? • It’s an optimization that reduces number of draw
calls and state changes How do I enable it ? • In the player settings + Tag the object as static
6/9/14 23
Page
Static Batching How does it work internally ? • Build-time: Vertices are transformed to world-
space • Run-time: Index buffer is created with indices of
visible objects
Unity 5.0: • Re-implemented static batching without copying of
index buffers
6/9/14 24
Page
Dynamic Batching What is it ? • Similar to Static Batching but it batches non-static
objects at run-time How do I enable it ? • In the player settings • no need to tag. it auto-magically works…
6/9/14 25
Page
Dynamic Batching How does it work internally ? • objects are transformed to world space on the
CPU • Temporary VB & IB are created • Rendered in one draw call
Unity 5.x: we are considering to expose per-platform parameters
6/9/14 26
Page
Mesh Skinning Different Implementations depending on platform: • x86: SSE • iOS/Android/WP8: Neon optimizations • D3D11/XBoxOne/GLES3.0: GPU • XBox360, WiiU: GPU (memexport) • PS3: SPU • WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances
6/9/14 27
Page 6/9/14 28
Scripting
Page
Unity 5.0: Mono • No upgrade • Mainly bug fixes • New tech in WebGL: IL2CPP
• http://blogs.unity3d.com/2014/04/29/on-the-future-of-web-publishing-in-unity/
• Stay tuned: there will be a blog post about it
6/9/14 29
Page
GetComponent<T> It asks the GameObject, for a component of the specified type: • The GO contains a list of Components • Each Component type is compared to T • The first Component of type T (or that derives from
T), will be returned to the caller • Not too much overhead but it still needs to call into
native code
6/9/14 30
Page
Unity 5.0: Property Accessors • Most accessors will be removed in Unity 5.0 • The objective is to reduce dependencies,
therefore improve modularization • Transform will remain • Existing scripts will be converted. Example:
in 5.0:
6/9/14 31
Page
Transform Component • this.transform is the same as GetComponent<Transform>() • transform.position/rotation needs to:
• find Transform component • Traverse hierarchy to calculate absolute position • Apply translation/rotation
• transform internally stores the position relative to the parent • transform.localPosition = new Vector(…) è simple
assignment • transform.position = new Vector(…) è costs the same if
no father, otherwise it will need to traverse the hierarchy up to transform the abs position into local
• finally, other components (collider, rigid body, light, camera, etc..) will be notified via messages
6/9/14 32
Page
Instantiate API: • Object Instantiate(Object, Vector3, Quaternion); • Object Instantiate(Object);
Implementation: • Clone GameObject Hierarchy and Components • Copy Properties • Awake • Apply new Transform (if provided)
6/9/14 33
Page
Instantiate cont..ed • Awake can be expensive • AwakeFromLoad (main thread)
• clear states • internal state caching • pre-compute
Unity 5.0: • Allocations have been reduced • Some inner loops for copying the data have been
optimized
6/9/14 34
Page
JIT Compilation What is it ? • The process in which machine code is generated from CIL
code during the application's run-time
Pros: • It generates optimized code for the current platform
Cons: • Each time a method is called for the first time, the
application will suffer a certain performance penalty because of the compilation
6/9/14 35
Page
JIT compilation spikes What about pre-JITting ? • RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
6/9/14 36
Page 6/9/14 37
Job System
Page
Unity 5.0: Job System (internal) The goals of the job system:
• make it easy to write very efficient job based multithreaded code
• The jobs should be able to run safely in parallel to script code
6/9/14 38
Page
Job System: Why ? Modern architectures are multi-core: • XBox 360: 3 cores • PS4/Xbox One: 8 cores
…which includes mobile devices: • iPhone 4S: 2 cores • Galaxy S3: 4 cores
6/9/14 39
Page
Job System: What is it ? • It’s a Framework that we are going to use in
existing and new sub-systems • We want to have Animation, NavMesh, Occlusion,
Rendering, etc… run as much as possible in parallel
• This will ultimately lead to better performance
6/9/14 40
Page
Unity 5.0: Profiler Timeline View It’s a tool that allows you to analyse internal (native) threads execution of a specific frame
6/9/14 41
Page
Unity 5.0: Frame Debugger
6/9/14 42
Page 6/9/14 43
Conclusions
Page
Budgeting Memory How much memory is available ? • It depends… • For example, on 512mb devices running iOS 6.0:
~250mb. A bit less with iOS 7.0
What’s the baseline ? • Create an empty scene and measure memory • Don’t forget that the profiler requires some
memory • For example: on Android 15.5mb (+ 12mb profiler)
6/9/14 44
Page
Profiling • Don’t make assumptions • Profile on target device • Editor != Player • Platform X != Platform Y • Managed Memory is not returned to Native Land!
For best results…: • Profile early and regularly
6/9/14 45
Page 6/9/14 46
Questions ? [email protected] - Twitter: @m_trive