garbage collection introduction and overview christian schulte excerpted from presentation by...
TRANSCRIPT
Garbage CollectionGarbage CollectionIntroduction and OverviewIntroduction and Overview
Excerpted from presentation by Christian SchulteChristian SchulteProgramming Systems LabProgramming Systems Lab
Universität des Saarlandes, GermanyUniversität des Saarlandes, Germany
[email protected]@ps.uni-sb.de
Garbage Collection…Garbage Collection…
…is concerned with the automatic reclamation of dynamically allocated memory after its last use by a program
Garbage collection…Garbage collection…
Dynamically allocated memoryDynamically allocated memoryLast use by a programLast use by a programExamples for automatic reclamationExamples for automatic reclamation
Kinds of Memory AllocationKinds of Memory Allocation
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Static AllocationStatic Allocation
By compiler (in text area)By compiler (in text area) Available through entire runtimeAvailable through entire runtime Fixed sizeFixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Automatic AllocationAutomatic Allocation
Upon procedure call (on stack)Upon procedure call (on stack) Available during execution of callAvailable during execution of call Fixed sizeFixed size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Dynamic AllocationDynamic Allocation
Dynamically allocated at runtime (on heap)Dynamically allocated at runtime (on heap) Available until explicitly deallocatedAvailable until explicitly deallocated Dynamically varying sizeDynamically varying size
static int i;
void foo(void) {
int j;
int* p = (int*) malloc(…);
}
Dynamically Allocated MemoryDynamically Allocated Memory
Also: heap-allocated memoryAlso: heap-allocated memory Allocation: malloc, new, …Allocation: malloc, new, …
– before first usage Deallocation: free, delete, dispose, …Deallocation: free, delete, dispose, …
– after last usage Needed forNeeded for
– C++, Java: objects– SML: datatypes, procedures– anything that outlives procedure call
Getting it WrongGetting it Wrong
Forget to free (memory leak)Forget to free (memory leak)– program eventually runs out of memory– long running programs: OSs. servers, …
Free to early (dangling pointer)Free to early (dangling pointer)– lucky: illegal access detected by OS– horror: memory reused, in simultaneous use
• programs can behave arbitrarily• crashes might happen much later
Estimates of effortEstimates of effort– Up to 40%! [Rovner, 1985]
Nodes and PointersNodes and Pointers
Node Node nn– Memory block, cell
Pointer Pointer pp– Link to node– Node access: *p
Children Children children(children(nn))– set of pointers to nodes referred by n
n
p
MutatorMutator
Abstraction of programAbstraction of program– introduces new nodes with pointer– redirects pointers, creating garbage
Nodes referred to by several pointersNodes referred to by several pointers Makes manual deallocation hardMakes manual deallocation hard
– local decision impossible– respect other pointers to node
Cycles instance of sharingCycles instance of sharing
Shared NodesShared Nodes
Last Use by a ProgramLast Use by a Program
Question: When is node Question: When is node MM not any longer not any longer used by program?used by program?– Let P be any program not using M– New program sketch:
Execute P; Use M;– Hence:
M used P terminates– We are doomed: halting problem!
So “last use” undecidable!So “last use” undecidable!
Safe ApproximationSafe Approximation
Decidable and also simpleDecidable and also simpleWhat means safe?What means safe?
– only unused nodes freedWhat means approximation?What means approximation?
– some unused nodes might not be freed IdeaIdea
– nodes that can be accessed by mutator
Reachable NodesReachable Nodes
Reachable from Reachable from root setroot set– processor registers– static variables– automatic variables (stack)
Reachable from reachable nodesReachable from reachable nodes
roo
t
Summary: Reachable NodesSummary: Reachable Nodes
A node A node nn is reachable, iff is reachable, iff– n is element of the root set, or– n is element of children(m) and m is
reachable
Reachable node also called “live”Reachable node also called “live”
Mark and SweepMark and Sweep
Compute set of reachable nodesCompute set of reachable nodesFree nodes known to be not Free nodes known to be not
reachablereachable
Reachability: Safe ApproximationReachability: Safe Approximation
SafeSafe– access to not reachable node
impossible– depends on language semantics– but C/C++? later…
ApproximationApproximation– reachable node might never be
accessed– programmer must know about this!– have you been aware of this?
Example Garbage CollectorsExample Garbage Collectors
Mark-SweepMark-Sweep
OthersOthers– Mark-Compact– Reference Counting– Copying– see Chapter 1&2 of [Lins&Jones,96]
The Mark-Sweep CollectorThe Mark-Sweep Collector
Compute reachable nodes: MarkCompute reachable nodes: Mark– tracing garbage collector
Free not reachable nodes: SweepFree not reachable nodes: SweepRun when out of memory: AllocationRun when out of memory: AllocationFirst used with LISP [McCarthy, First used with LISP [McCarthy,
1960]1960]
AllocationAllocation
node* new() {
if (free_pool is empty)
mark_sweep();
…
AllocationAllocation
node* new() {
if (free_pool is empty)
mark_sweep();
return allocate();
}
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
…
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
…
all live nodes marked
Recursive MarkingRecursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
…
}
}
Recursive MarkingRecursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
…
}
}nodes reachable from n marked
Recursive MarkingRecursive Marking
void mark(node* n) {
if (!is_marked(n)) {
set_mark(n);
for (m in children(n))
mark(m);
}
}i-th recursion: nodes on path with length i
marked
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
all nodes on heap live
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
…
all nodes on heap live
and not marked
Eager SweepEager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
…
}
}
Eager SweepEager Sweep
void sweep() {
node* n = heap_bottom;
while (n < heap_top) {
if (is_marked(n)) clear_mark(n);
else free(n);
n += sizeof(*n);
}
}
The Garbage CollectorThe Garbage Collector
void mark_sweep() {
for (r in roots)
mark(r);
sweep();
if (free_pool is empty)
abort(“Memory exhausted”);
}
AssumptionsAssumptions
Nodes can be markedNodes can be markedSize of nodes knownSize of nodes knownHeap contiguousHeap contiguousMemory for recursion availableMemory for recursion availableChild fields known!Child fields known!
Assumptions: RealisticAssumptions: Realistic
Nodes can be markedNodes can be markedSize of nodes knownSize of nodes knownHeap contiguousHeap contiguousMemory for recursion availableMemory for recursion availableChild fields knownChild fields known
Assumptions: ConservativeAssumptions: Conservative
Nodes can be markedNodes can be markedSize of nodes knownSize of nodes knownHeap contiguousHeap contiguousMemory for recursion availableMemory for recursion availableChild fields knownChild fields known
Mark-Sweep PropertiesMark-Sweep Properties
Covers cycles and sharingCovers cycles and sharing Time depends onTime depends on
– live nodes (mark)– live and garbage nodes (sweep)
Computation must be stoppedComputation must be stopped– non-interruptible stop/start collector– long pause
Nodes remain unchanged (as not moved)Nodes remain unchanged (as not moved) Heap remains fragmentedHeap remains fragmented
Software Engineering IssuesSoftware Engineering Issues
Design goal in SE:Design goal in SE:• decompose systems• in orthogonal components
Clashes with letting each component Clashes with letting each component do its memory managementdo its memory management
• liveness is global property• leads to “local leaks”• lacking power of modern gc methods
Typical CostTypical Cost
Early systems (LISP) Early systems (LISP)
up to 40% [Steele,75] up to 40% [Steele,75] [Gabriel,85][Gabriel,85]
• “garbage collection is expensive” myth
Well engineered system of todayWell engineered system of today
10% of entire runtime [Wilson, 10% of entire runtime [Wilson, 94]94]
Areas of UsageAreas of Usage
Programming languages and systemsProgramming languages and systems– Java, C#, Smalltalk, …– SML, Lisp, Scheme, Prolog, …– Perl, Python, PHP, JavaScript– Modula 3, Microsoft .NET
ExtensionsExtensions– C, C++ (Conservative)
Other systemsOther systems– Adobe Photoshop– Unix filesystem– Many others in [Wilson, 1996]
Understanding Garbage Understanding Garbage Collection: BenefitsCollection: Benefits Programming garbage collectionProgramming garbage collection
– programming systems– operating systems
Understand systems with garbage Understand systems with garbage collection (e.g. Java)collection (e.g. Java)– memory requirements of programs– performance aspects of programs– interfacing with garbage collection
(finalization)
ReferencesReferences
Garbage Collection. Richard Jones Garbage Collection. Richard Jones and Rafael Lins, John Wiley & Sons, and Rafael Lins, John Wiley & Sons, 1996.1996.
Uniprocessor garbage collection Uniprocessor garbage collection techniques. Paul R. Wilson, ACM techniques. Paul R. Wilson, ACM Computing Surveys. To appear.Computing Surveys. To appear.
• Extended version of IWMM 92, St. Malo.