garbage collection in an uncooperative environment hans-juergen boehm computer science dept. rice...
TRANSCRIPT
GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT
Hans-Juergen BoehmComputer Science Dept. Rice University, Houston
Mark WieserXerox Corporation, Palo Alto
Presented by Srilakshmi Swati Pendyala
Outline
Introduction Garbage Collection In Different Languages
Problem Domain Need for conservative garbage collection in
uncooperative environments Overview of the proposed Garbage
Collector Use of the proposed GC as a debugging tool Implementation Results Conclusion
Introduction
Garbage Collection Different Languages ? JAVA
http://www.folgmann.com/en/gc.html
Introduction
Garbage Collection Different Languages : .NET, VB, C# ? Perl , Python ? C, C++ ?
Introduction
Garbage Collection Different Languages ? .NET, VB, C# – Mark and Sweep, Generational Perl , Python – Reference Counting C, C++ – No garbage collection, managed
options available. ADA, Modula 3 – Manual & Automated
Garbage Collection
Introduction
JAVA, .NET etc. Automatic Garbage Collection No memory management effort for the
programmer In the run-time, the program should tell the GC
which memory objects are still in use C, C++ etc.
Program should “free” the allocated memory Prone to memory leaks etc. Both cases lead to additional effort from
program/compiler. GC affects the performance of the program Better performance can be achieved (in some cases) when the program doesn’t worry about GC at all.
What is the need to avoid cooperation?
Programmers don’t want to pay for GC unless needed
Disadvantage in tagging the integers Reduction available number of bits
Difficulty in manipulating standard machine representation of data. Need for interfacing routines
To implement specific programming language like Russell
To enable garbage collection in conjunction with C, Pascal etc.
Difficult to design compilers that always preserve garbage collection invariants
Need for a Garbage Collectorthat expects less from the
program/compiler
Uncooperative Environment
Program/compiler does not provide information to recognize pointers
Every register/word potential pointer All the storage that is accessible by the stack,
registers etc., may not be needed by the program Compiled code may fail to destroy the references (for
performance issues/because of bugs) Particular run-time representations may involve
unnecessary references not intended by the programmer Difficult to tell if an object is actually required by the
program Can lead to program failure if necessary objects are
deleted Need for CONSERVATIVE Garbage Collection
Imagine doing a mark and sweep GC, but not knowing for sure if a cell has a pointer in it or some other data.
If it looks like a pointer (that is, is a valid word-aligned address within heap memory bounds), assume that it IS a pointer, and trace that and other pointers in that record too.
Any heap data that is not marked in this way is garbage and can be collected. (There are no pointers to it.)
Conservative Garbage Collection
Discussion
Is conservative Garbage Collection needed in Cooperative systems ?
Disadvantages of Conservative Garbage Collection ? Some amount of inaccessible memory is not
reclaimed. How can we reduce memory lost because
of Conservative Garbage Collection ? Better checks to detect false pointers
How does the Garbage Collector work? Uses Mark-Sweep Stop-the-World Garbage
Collection Algorithms Procedure:
Scan all objects referenced directly by pointer variables (roots) from stack & registers
Verify that pointers are actually pointing to intended objects (validity check) and mark the objects referenced by validated pointers
Mark objects directly reachable from newly marked objects.
Finally identify unmarked objects and free them (sweep) E.g. put them in free lists. Reuse to satisfy allocation requests.
Objects are not moved.
Mark/Sweep illustration
Stack w/ pointer variables
Mark/Sweep illustration (2)
Stack w/ pointer variables
Allocator design
Allocation scheme obtains “chunks” of memory.
Chunks are always multiples of 4k in size.
Separate free lists for each object size. Characteristics:
No per object space overhead (except mark bits)
Partial sweeps are possible.
Heap layout
Freelists
.
.
.
Heap Data
4k size chunks
Data Structure for Chunks
A list of allocated chunks contains pointers to the beginning of each chunk
Contents of a chunk C:
Size of objects in the chunk
A pointer to the entry for C in list of allocated chunks
An area reserved for mark bits corresponding to the objects in the chunk
Data Objects
Is it better than “tagging” integers ?
Finding Roots & Pointers Possible roots: registers, stack, static areas No cooperation from compiler
treat every word as potential pointer ignore interior pointers (standard) prefer marking from false pointers over ignoring valid
pointers
Conservative Pointer Identification: given word p; does p refer to the collected heap? does it point into heap block allocated by collector? does it point to the beginning of an object in that block?
if yes, mark object in block header push object onto mark stack
Sweep: If a chunk is completely empty, return it to the chunk
allocator
Pointer Validity Check
Goal: To minimize the marking of false pointers
The pointer “p” should reference to a proper
heap-address range for it to correspond to an object
If it corresponds an object, the pointer contained in the chunk header should correspond to the actual address of pointer “p” in the list of allocated chunks
The offset of the supposed object from the chunk header should be a multiple of of the object size given by chunk header and it should be within the end of the chunk
Garbage Collector as a Debugging Tool Use GC to identify allocated memory that is no
longer needed by the program, but not yet freed by it.
Use a tracer to track the memory leaks back to the subroutine responsible for them.
Procedure: An allocation-and-free tracer. Subroutine names are recorded on a stack with
every call to “malloc”. Mark the storage as freed when ‘free’ calls are made. When collector runs, storage having no pointers to it
and that was never explicitly deallocated with ‘free’ call is likely for storage leak.
Collector running with the tracer could find most of the storage unmarked by the collector, but never been explicitly “free”d.
Experimental Results
Mark phase of Russell collector took 1.9 seconds per megabyte of accessible memory in the heap. Sweep phase took 0.4 seconds per megabyte.
Garbage Collection was added to TimberWolf and SDI.
The systems were re-linked so that calls to Unix allocation routines instead called the allocator.
SunView presented problems because of dynamic allocated memory remapping and ‘notifier’.
Programming styles involving disguised pointers will not work with the collector method.
Use of the proposed GC as debugging tool has also been demonstrated on SunView system.
Conclusions GC effective for traditional imperative languages
with minimum cooperation from program/compiler
Realistic alternative to explicit memory management for most applications
May not suitable for real-time applications No big constraints to coding style, except hidden
pointer problem GC’ing allocators competitive even with code not
written for GC The same GC can be used as debugging tool for
programs that do manual garbage collection An implementation of this garbage collector can
be downloaded online