CS 3214
Computer Systems
Godmar Back
Automatic Memory Management/GC
MEMORY MANAGEMENT
Part 2
CS 3214 Spring 2020
Some of the following slides are taken with permission from
Complete Powerpoint Lecture Notes for
Computer Systems: A Programmer's Perspective (CS:APP)
Randal E. Bryant and David R. O'Hallaron
http://csapp.cs.cmu.edu/public/lectures.html
Dynamic Memory Allocation
• Explicit vs. Implicit Memory Allocator
– Explicit: application allocates and frees space
• E.g., malloc and free in C
– Implicit: application allocates, but does not free space
• E.g. garbage collection in Java, ML or Lisp
• Allocation
– The memory allocator provides an abstraction of memory as a set of blocks or, in type-safe languages, as objects
– Doles out free memory blocks to application
• Will discuss automatic memory allocation today
Application
Dynamic Memory Allocator
Heap Memory
CS 3214 Spring 2020
Implicit Memory Management
• Motivation: manually (or explicitly) reclaiming memory is difficult:
– Too early: risk access-after-free errors
– Too late: memory leaks
• Requires principled design
– Programmer must reason about ownership of objects
• Difficult & error prone, especially in the presence of object sharing
• Complicates design of APIs
CS 3214 Spring 2020
Concept Map
CS 3214 Spring 2020
Implicit/Automatic
Memory Management
Motivation: Lack of
Robustness of explicit
schemes
Reference Counting
based approaches
manualsmart
pointers
Garbage Collection
Mechanisms
Reachability
Graph
Mark/Sweep
Programming Issues
Churn
Bloat
Leaks
Efficiency
considerationsPolicies & Tuning:
Generation sizing,
triggers,
heap expansion
policy, etc.Generational
Incremental
Concurrent
Evacuation/
Scavenging
Barriers
Memory Overhead relative
to live heap size
Allocation rate
Program Throughput
GC Throughput
Manual Reference Counting• Idea: keep track of how many references there are to each object in a
reference counter stored with each object– Copy a reference to an object globalvar = q
• increment count: “addref”
– Remove a reference p = NULL• decrement count: “release”
• Uses set of rules programmers must follow– E.g., must ‘release’ reference obtained from OUT parameter in function call
– Must ‘addref’ when storing into global
– May not have to use addref/release for references copied within one function
• Programmer must use addref/release correctly– Still somewhat error prone, but rules are such that correctness of the code
can be established locally without consulting the API documentation of any functions being called; parameter annotations (IN, INOUT, OUT, return value) imply reference counting rules
• Used in Microsoft COM & Netscape XPCOM
CS 3214 Spring 2020
Automatic Reference Counting
• Idea: force automatic reference count updates when pointers are assigned/copied
• Most common variant: – C++ “smart pointers” – C++ allows programmer to
interpose on assignments and copies via operator overloading/special purpose constructors.
• Disadvantage of all reference counting schemes is their inability to handle cycles– But great advantage is immediate reclamation: no
“drag” between last access & reclamation
CS 3214 Spring 2020
Garbage Collection
• Determine which objects may be accessed
in the future
– Don’t know which one’s will, but can
determine those who can’t be accessed
because there are no pointers to them
– Requires that all pointers are identifiable (e.g.,
no pointer/int conversion)
• Invented 1960 by McCarthy for LISP
CS 3214 Spring 2020
Reachability Graph
• Roots are commonly– Global variables (static in Java)
– Local variables that contain references (any object or array in Java). Local variables are stored in the currently active stack frames of each running thread. They change constantly as the thread calls new methods/returns from calls
– Internal roots pinned down by the JVM
• The following slides visualize this. Note that in the actual implementation, objects are not tagged with a thread id (this is just for visualization purposes)
CS 3214 Spring 2020
Reachability Graph
Thread A Thread B Thread C
Root set
C
C
C
C
C
B
B
B
B
B
C
B
A
A
A
A
A
A
CS 3214 Spring 2020
Reachability Graph
Thread A Thread B Thread C
Root set
C
C
C
C
C
B
B
B
B
B
C
B
A
A
A
A
A
A
CS 3214 Spring 2020
Reachability Graph
Thread A Thread B Thread C
Root set
C
C
C
C
C
B
B
B
B
B
C
A
A
A
A
A
CS 3214 Spring 2020
Reachability Graph
Thread A Thread B
Root set
C
C
C
C
C
B
B
B
B
B
C
A
A
A
A
A
CS 3214 Spring 2020
Reachability Graph
Thread A
Root set
C
C
C
C
B
B
B
B
B
C
A
A
A
A
A
CS 3214 Spring 2020
Reachability Graph
Thread A
Root set
A
A
A
A
CS 3214 Spring 2020
GC Design Choices
• Determining which objects are reachable– “marking” live objects, or
– “evacuating”/”scavenging” objects –copying live objects into new area (if objects are movable)
• Deallocating unreachable objects– “sweeping” – essentially calling “free()”
on all unreachable objects
– more efficient if it’s possible to evacuate all life objects from an area
CS 3214 Spring 2020
cost generally
proportional to
amount of life
objects in area
considered
cost proportional
to amount of dead
objects (garbage)
in theory, constant cost;
in practice, dominated by
need to zero memory
End time – teStart time – ts
Time
Allocated
Memory
Amax
live
garbage
Memory Allocation Time-Profile
CS 3214 Spring 2020
End time – teStart time – ts
Time
Allocated
Memory
Amax
live
garbage
Modeling Memory Allocation
CS 3214 Spring 2020
Execution Time vs. Memory
time
memory
ts te
Max
Heap
CS 3214 Spring 2020
Execution Time vs. Memory
time
memory
ts te
Max Heap
CS 3214 Spring 2020
Execution Time vs. Memory
time
memory
ts te
Max Heap
CS 3214 Spring 2020
time
memory
ts te
Execution Time vs. Memory
Max Heap
CS 3214 Spring 2020
Execution Time vs. Memory
time
memory
Max Heap
ts te
CS 3214 Spring 2020
Heap Size vs. GC Frequency
• All else being equal, smaller maximum heap sizes necessitate more frequent collections
– Old rule of thumb: need between 1.5x and 2.5x times the size of the live heap to limit collection overhead to 5-15% for applications with reasonable allocation rates
– [Hertz 2005] finds that GC outperforms explicit MM when given 5x memory, is 17% slower with 3x, and 70% slower with 2x
– Performance degradation occurs when live heap size approaches maximum heap size
CS 3214 Spring 2020
Kattis.com example
• In ICPC judging, Java programs are subjected to both a total memory limit (via –Xmx) and a total CPU consumption limit (which includes all JVM threads, including those devoted to GC!)
• Given that the JVM is unaware it’s being timed and tries to adhere to its default policies, what’s the fairest way to run the JVM under these conditions?– Hypothesis (1): set start heap size to max heap size to tell
JVM it’s ok to ask for this much memory from the OS. (Otherwise, it’ll try to GC before growing its heap).
– Hypothesis (2): even though the JVM “senses” free cores and enables concurrent GC, force it to use serial GC instead: GC happens in the context of mutator thread.
CS 3214 Spring 2020
Kattis.com data
• Sample of submissions to open.kattis.com
written in Java
• X-axis: Total CPU consumption (mutator +
gc threads)
• Y-axis: Log (CPU_standard/CPU_better)
– CPU_standard: -Xmx{memlimit}
• Tries to use parallel gc
– CPU_better: -Xms{memlimit} –Xmx{memlimit}
–XX:+UseSerialGCCS 3214 Spring 2020
“Real-World” data point: Kattis.com
CS 3214 Spring 2020
Conclusion of kattis.com
• Works better most of the time (and so is
now the default in Kattis)
• Flipside: larger start heap size forces
larger Eden size and less frequent GC in
the nursery, resulting in bad locality.
– For one benchmark slowdown of 400% by
setting –Xms vs not.
CS 3214 Spring 2020
Infant Mortality
CS 3214 Spring 2020
Source: http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
Generational Collection
• Observation: “most objects die young”
• Allocate objects in separate area (“nursery”, “Eden space”, collect area when run out of space– Will typically have to evacuate few survivors
– “minor garbage collection”
• But: must treat all pointers into Eden as roots– Typically, requires cooperation of the mutator
threads to record assignments: if ‘b’ is young, and ‘a’ is old, a.x = b must add a root for ‘b’.
• Aka “write barrier”
CS 3214 Spring 2020
When to collect
• “Stop-the-world”
– All mutators stop while collection is ongoing
• Incremental
– Mutators perform small chunks of marking
during each allocation
• Concurrent/Parallel
– Garbage collection happens in concurrently
running thread – requires some kind of
synchronization between mutator & collectorCS 3214 Spring 2020
Example:
G1GC• See Oracle tutorial and InfoQ.
• Source: [Beckwith 2013]
CS 3214 Spring 2020
Trade-Offs
• For a good discussion of other trade-offs
related to GC, see this post related to
claims in Go GC
CS 3214 Spring 2020
Precise vs. Conservative Collectors
• Precise collectors keep only objects alive that are in fact part of reachability graph
• Conservative collectors may keep objects alive that aren’t– Reason typically that they do not know where pointers are
stored, must conservatively guess
• In-between forms: some systems assume precise knowledge of heap objects, but not stack frame layouts– Can be expensive to keep track of where references are
stored on the stack, particularly in fully preemptive environments
• Conservatism makes GC usable for languages such as C, but prevents moving/compacting of objects
CS 3214 Spring 2020
Application Programmer’s Perspective
• Dealing with Memory Leaks
– Avoiding bloat
– Avoiding churn
• Tuning garbage collection parameters
• Garbage collection in mixed language
environments
– C code must coordinate with the garbage
collection system in place
CS 3214 Spring 2020
Programmer’s Perspective
• Your program is running out of memory. What do you do?
• Possible reasons:
– Leak
– Bloat
• Your program is running slowly and unpredictably
– Churn
– “GC Thrashing”
CS 3214 Spring 2020
Memory Leaks
• Objects that remain reachable, but will not be accessed in the future
– Due to application semantics
• Will ultimately lead to out-of-memory condition
– But will degrade performance before that
• Common problem, particularly in multi-layer frameworks
– Containers are a frequent culprit
– Heap profilers can help
CS 3214 Spring 2020
Bloat and Churn
• Bloat: use of inefficient, pointer-intensive
data structures increases overall memory
consumption [Chis et al 2011]
– E.g. HashMaps for small objects
• Churn: frequent and avoidable allocation
of objects that turn to garbage quickly
• Caches:
– How to implement and size caches
CS 3214 Spring 2020