oldest-first garbage collection - university of new mexicodarko/public/files/oldest-first... ·...

Oldest-First Garbage CollectionComputer Science Technical Report 98-81

Darko Stefanovic J. Eliot B. Moss Kathryn S. McKinleyDepartment of Computer Science

University of Massachusetts

Revised, April 1998

Note. Some of the information in this document is

superseded by later results. In particular, we now

view the oldest-first collection described in this re-

port as a special case of renewal-age oldest-first

collection. We also believe that excess promotion of

objects can be a significant factor for performance

in practice, but it is not modelled in this study. This

circumstance diminishes the utility of mathematical

analyses and simulations based solely on object life-

times. April 1998.

Abstract. An oldest-first generational garbage col-

lector leaves intact the most recently allocated object

space, and instead collects the remaining, older ob-

jects. Because these older objects have had more

time to die, an oldest-first copying collector will

generally do less copying than a traditional gen-

erational collector (which operates youngest-first),

a non-generational collector, and even Clinger and

Hansen’s non-predictive collector (which does wait a

little while for objects to die). To explore the perfor-

mance of oldest-first collection, we present a math-

ematical analysis, simulation results for a variety of

mature object lifetime distributions, and simulation

results for mature object lifetimes drawn from real

programs and for real mature object traces. These

results demonstrate that oldest-first collection does

perform significantly better than youngest-first or

non-generational collection for mature objects. Al-

though some previous work pointed in this direction,

it provided very little evidence of our conclusion.

We also find that oldest-first collection works well

for lifetime distributions that satisfy the generational

hypothesis, which suggests we should also consider

oldest-first collection in the young object space.

1 Introduction

Garbage collection, automatic dynamic memory manage-ment, frees programmers from explicitly managing dataallocation and reclamation. Its software engineering ben-efits are well known, but its influence on performance isa subject of debate. To date, the best-performing garbagecollectors in wide use are generational copying collectors.According to common wisdom, these collectors rely onthe fact that the youngest objects die quickly, and concen-trate their collection efforts on them. For longer-lived ob-jects, there is little evidence that the younger ones die morequickly than the older ones. Nevertheless, we show that,if collection cost is determined by object lifetimes, then a

non-standard generational “oldest-first” collector can effi-ciently collect these longer-lived, mature objects, and ingeneral, objects that do not die quickly.

Figure 1 illustrates a collector that divides objects intoyoung and old, using a traditional generational collectorfor the young space, and a mature space collector. Thecollector promotes objects that survive the oldest genera-tion of young space into the mature space. In this paper,we consider three alternative organizations for the maturespace collector: (1) non-generational, (2) youngest-first,and (3) oldest-first. The non-generational collector simplycollects the entire space every time it fills up. It serves asa base line for comparison. The youngest-first collector isa traditional two-generation copying collector. It collectsthe young objects every time the heap fills up. The oldest-first collector instead collects the old objects every timethe heap fills up.

To explore these alternatives, we study a variety of ob-ject allocation traces which exhibit different lifetime dis-tributions. We analyze the three collectors mathematicallyand through simulation using widely different object life-time distributions, and show that the oldest-first collectoris superior to the others. We take the lifetime traces formature objects from a number of analytical distributionsand distributions built from actual programs, and we sim-ulate the collectors on actual mature object traces. We arenot aware of any other work that reports accurate objectlifetime information for mature objects.

The remainder of the paper is organized as follows.We first discuss related work that prompted our effort.Section 3 then defines the object lifetime terminology.Section 4 provides a simply motivating example for whyoldest-first should perform well. We then present a set ofanalytical object lifetime distributions we use to study thethree collectors in Section 5. The remaining sections de-scribe collection space and time costs, and provide mathe-matical and simulation analyses of these costs for the threecollectors, including their behavior on actual mature objecttraces.

2 Related work

The primary related work is a recent study by Clingerand Hansen [Clinger and Hansen, 1997]. They consid-ered the behavior of heaps under the exponential distri-bution (radioactive decay model), and proposed a non-predictive collector. Rather than collecting the youngestdata like a traditional generational collector, the non-predictive collector processes theyoungestportion of the

nursery and young space collector mature space collectornew objects mature objects

Figure 1: Structure of collection.

older generation, so it differs from our oldest-first collec-tor, which processes theoldestportion. Because oldest-first copies the oldest portion it will do less copying thannon-generational, youngest-first, or Clinger and Hansen’snon-predictive collector, forany object survival function.Clinger and Hansen presented their results as being con-fined to the exponential distribution.

Because the analysis is simpler, Clinger and Hansenonly explored values ofg < 0:5, whereg is the fractionof the heapnot collected at the next collection. We ex-tend the analysis to the full range 0< g< 1 and find thatvalues ofg close to 1 perform well. Other ways in whichwe go beyond Clinger and Hansen’s work are: we ana-lyze youngest-first collection; we consider other analyticaldistributions; we simulate collector behavior with matureobject distributions determined from actual traces; and wesimulate collector behavior on actual mature object traces.

There have been a number of papers about ob-ject lifetimes and the generational hypotheses;Clinger and Hansen offer a good overview of these[Clinger and Hansen, 1997, Section 9 (pp. 106–107)]. Wecould not find any recent studies of mature object lifetimedistributions that go beyond anecdotal reports, with thesignal exception of Hayes’ data [Hayes, 1993].

One of us has worked on collection techniques di-rected at mature objects, the Mature Object Space(MOS) collector, also called the Train Algorithm[Hudson and Moss, 1992]. The objective of that work,though, is not so much the minimization of total garbagecollection cost as avoidance of disruptive pauses. Selig-mann and Grarup [Seligmann and Grarup, 1995] imple-mented MOS and found it to work quite well. It bothkeeps pauses short and keeps heap usage very close to theamount of live data. What makes MOS relevant here is thatif we ignore the fact that it will tend to reorganize objectsaccording to their reachability, it is an oldest-first collectorfor mature objects. Thus the present work provides addi-tional support for the MOS approach.

3 Background and definitions

There is no consensus on terminology for garbage collec-tion analysis, so we define the terms as we use them. Thetimebetween two events as seen by a collector is expressedas the amount of data allocated in between the events andmanaged by that collector. For a mature space collector,the time reflects the amount of data allocatedinto the ma-ture space, i.e., promoted out of the young space. Thelifetime of an object is the time elapsed between the ob-ject’s allocation and when the object becomes unreach-able. Given a sequence of new or mature objects, we char-acterize some statistical properties of the sequence usingsurvival-related functions.

Following standard practice [Cox and Oakes, 1984,Elandt-Johnson and Johnson, 1980], we describe the dis-tribution of lifetimes using thesurvivor function, theden-sity function, and themortality (or hazard) function. If weview the object lifetime as a random variableT, then thesurvivor function issT(t) =℘fT > tg, and it expressesthe fraction of original allocation that is still live afteraninterval t. It is a monotone non-increasing function. Thedensity function isfT(t) = �s0T(t), and it expresses thedistribution of object lifetimes. The mortality function ismT(t) =

fT(t)sT(t)

= �

ddt logsT(t), and it expresses the age-

specific death rate. (SubscriptsT are dropped in the fol-lowing.)

4 Why oldest-first is better: a simpleanalysis

Consider an arbitrary survivor functions(t), as illustratedin Figure 2. The only certain property ofs(t) is that it is amonotone non-increasing function. Suppose that spaceVis available for allocating new objects, and that followinga period of allocation into this space we collect the live ob-jects, copying them elsewhere. The volume of objects col-lected is

RV0 s(t)dt, or the area under the curves(t). Col-

lections occur at regular intervals of lengthV, so the copy-

ing cost of collection is proportional toRV

0 s(t)dtV . If instead

we look at only one half of the allocated space, namely

3

��

��

0 tV/2 V0

1

s(t)

Figure 2: Fundamentals of collection: survivor function.

that allocated earlier, i.e, with greater current age, thenthe volume collected is

RVV=2s(t)dt, or the area hatched

in the figure. We can implement this strategy by collect-ing twice as frequently, and each time collecting not theareaV=2 just allocated, but the one allocated in the previ-ous cycle. The copying cost of collection is then propor-

tional to2RV

V=2 s(t)dtV . Now, becauses(t) is non-increasing, it

will always be the case thatRVV=2s(t)dt 6

RV=20 s(t)dt, and

2RVV=2s(t)dt6

RV=20 s(t)dt+

RVV=2s(t)dt =

RV0 s(t)dt. This

difference is indicated by the shaded region on the top ofthe left half of the figure. Collecting twice as frequently aregion half as big, butwith postponement, makes the copy-ing cost lower (or in the worst case does not change it),regardlessof object lifetimes.

The foregoing reasoning can be generalized to configu-rations in which a portion smaller than one-half of the vol-ume is considered each time, and postponement is by morethan one collection cycle. These configurations promiseeven lower copying costs, and we use the termoldest-firstcollection for all of them. Hence, we can summarize bysaying that oldest-first collection copies less and is thusbetter than non-generational collection. By similar argu-ments it is also better than traditional two-generational col-lection, which repeatedly examines the most recently allo-cated region, and can thus be termed youngest-first. Wenow examine how much better it is, and refine the analysisto include also the cost of invoking the collector.

5 Object lifetimes

Generational collection has traditionally been justified andexplained using certain hypotheses about the distribution

of object lifetimesin real systems. Theweak generationalhypothesisstates that young objects die fast, i.e., that mor-tality m(t) is high for smallt. The strong generationalhypothesisstates furthermore that even among older ob-jects, the relatively younger ones die faster, i.e., thatm(t)is a monotone decreasing function. In the context of thesehypotheses, the exponential distribution,1 for which themortality is constant, is the boundary case between dis-tributions favorable to generational collection, withm(t)decreasing, and those unfavorable to it, withm(t) increas-ing [Baker, 1993].

We show that these assumptions are not intrinsicallynecessary for the efficient operation of a generational col-lector, but that the form of the distribution does affect thebest organization of the collector.

We were interested in analyzing distributions reflectiveof actual mature object lifetime distributions, but couldfind no previous studies of mature object lifetimes. There-fore we gathered object allocation traces from 25 long-running programs in Smalltalk and in SML/NJ, and ex-tracted traces of allocation into mature space. Some ofthe object lifetime distributions satisfy the generational hy-potheses and some do not. To cover the space of possi-ble shapes of the mortality function, we chose three rep-resentative analytical distributions: the exponential distri-bution (mortality is constant), the square-root-exponential(mortality decreases with age), and the square-exponential(mortality increases with age).2

Exponential survival: The survivor function iss(t) =e�λt , mortality ism(t) = λ, and the probability den-sity function is f (t) = λe�λt . The mortality is con-stant for all ages. For this distribution, we also haveanalytical solutions for collector performance. Weused them to validate the simulation procedure.

Square-root-exponential survival: The survivor func-

tion is s(t) = e�p

βt , m(t) =

q

βt

2 , and f (t) =q

βt e�

p

βt

2 . The mortality is a decreasing function ofage. This distribution satisfies the generational hy-potheses and corresponds well to the lifetime distri-butions of young objects in real systems.

Square-exponential (semi-normal) survival: The sur-vivor function is s(t) = e�(βt)2, m(t) = 2β2t, andf (t) = 2β2te�(βt)2. The mortality is an increasingfunction of age.

1Also known as the radioactive decay model [Baker, 1993,Clinger and Hansen, 1997].

2The questions of modelling the observed lifetime distributions us-ing particular analytical distributions are beyond the scope of this paper.

4

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000 1e+06

Sur

vivo

r fu

nctio

n

Age

ExponentialSqrt-exponential

Square-exponential

(a) Survivor function

0

2e-05

4e-05

6e-05

8e-05

0.0001

0.00012

0.00014

1 10 100 1000 10000 100000 1e+06

Mor

talit

y

Age


Square-exponential

(b) Mortality function

0

2e-05

4e-05

6e-05

8e-05

0.0001

0.00012

0.00014

1000 10000 100000

Pro

babi

lity

dens

ity

Age


Square-exponential

(c) Density function

Figure 3: Analytical distributions.

On opposite sides of the exponential distribution wehave: the square-root-exponential with mortality decreas-ing for all ages, so it should be ideal for youngest-first col-lection; and the square-exponential, with mortality startingat 0 and increasing for all ages. Figure 3 illustrates thesesurvivor functions, and the dramatic differences in theirmortality functions.

In the remainder of the paper, we develop and comparethe total cost of garbage collection for each scheme (non-generational, youngest-first, and oldest-first), for the an-alytical distributions, for real distributions, and for realtraces. We make the comparison of the collection costoverheads of different schemes fair by allowing eachscheme an equal amount of space overhead. We firstdevelop mathematical formulae describing the time andspace cost of collection (Section 6). We continue by de-scribing how each collector works in Section 7 and givinga summary of the mathematical analysis for the exponen-tial distribution in Section 8. We offer simulation resultsfor the analytical distributions and traces synthesized us-ing real distributions in Section 9.1, with results from ac-tual traces in Section 9.2. These results, based on objectlifetimes, show that for a broad range of operating con-ditions, which we expect to be representative both of realworkloads and of real collector implementation costs, theoldest-first organization is superior to the youngest-firstor-ganization. However, the results based on heap pointerstructure show a different and ambiguous picture, as wediscuss in Section 10; we present our final conclusions inSection 11.

6 Time and space overheads of collec-tion

Time. The main cost of copying garbage collection is thecost of copying live objects, whether they are copied else-where (pure copying) or compacted within the region (slid-ing). This cost partly depends on thenumberof objectscopied, but it is roughly proportional to the totalvolumecopied. Themark/cons ratio, defined asµ= Volume copied

Volume allocated,is a good measure of the copying cost and we use it tocompare copying costs. It represents the average num-ber of times an object is copied, and thus measures thetime “wasted” by the collector. If a program allocates anamountA, and the cost ofcopyingone word iscc timesthe cost ofallocatingone word, wherecc is a small num-ber, perhaps in the range 1-3, then the copying cost for theprogram isccµA.

In an earlier study, we measured the cost of one garbage

5

collection in a deployed system (SML/NJ v0.93) to beC = 5183+81:2w, whereC is the cost in cycles andw isthe amount copied in words, and the second term alwaysdominates. The first term, which we callCS, is the costof an invocation of the garbage collector. The number ofinvocations is inversely proportional to the amount of allo-cation between two collections, i.e., the amountF freed byeach collection. The total garbage collection startup costfor a program that allocates an amountA is CS

AF ; the to-

tal cost isccµA+CSAF ; and the cost per word allocated is

c= ccµ+CS1F .

In addition to these costs, which we model, there aresome costs that are beyond the scope of this study. In bothanalysis and simulation, we assume that perfect knowl-edge of object reachability is available. In practice, thatknowledge is derived by the collector itself by means oftracing. There is clearly an up-front cost of the tracingoperation, but in a copying collector that cost can be ac-counted for as part of the copying cost by suitably increas-ing cc. However, tracing in a collector with multiple heapregions depends on the availability ofremembered sets—records of pointers that cross region boundaries, in partic-ular, pointers into the region collected. The maintenancecost of these sets is paid in part by the collector and in partby the mutator. In addition, whenever a region is not col-lected, the collector assumes that all data in it are live, andthat all pointers emanating from that region and into theregion collected are root pointers. Thus, the collector as-sumes that more data are live than is actually the case, andthe copying cost is correspondingly higher than it wouldbe given perfect reachability knowledge.

Space. We useV to denote the volume available for theheap. It is possible in real collectors to vary this size toadapt to the workload, and since using more space tendsto reduce the time overhead, in our study we fixV to com-pare different schemes equitably by giving each the samespace constraints. We note that pure copying collectionhas higher peak space demands than sliding compaction.Here we consider only sliding collection.

When considering a heap in equilibrium, in which therate of allocation equals the rate of object demise, we usev to denote the steady-state amount of live data.3 The ra-tio L =

Vv expresses the space overhead of the collector

configuration: how many times the available heap space is

3The notion of a steady state is convenient for mathematical treat-ment of heap organizations. Moreover, real programs that do reach aquasi-steady state can run long and go through a large number of col-lections, so the effects of the fortuitous choice of collection instants arereduced.

gretaer than the amount of live data. The smallest possi-ble heap, with no space overhead, would haveV = v andL = 1. But for any collector organization, asL is madeto approach 1, the copying costµ tends to infinity. In or-der not to deal with such large values ofµ for small L, itis often convenient when visualizing and comparing copy-ing costs to normalize with respect to a non-generationalcollector, and we define therelative mark/cons ratioρ =

µ=µnon-generational. The value of this metric will turn out tobe justρ = µ(L�1), in other words it measures thetime-space product overhead.

We denote by f the fraction of the heap a collec-tion frees: f = F

V . The total cost of collection is thusc= ccµ+CS

1f Lv. Normalizing, we can use the costc0 =

µ+CS1

cc f Lv = µ+ Xf L . HereX =

CSccv captures the cost of

collection startup relative to the cost of copying.CS andcc

depend on garbage collector implementation, andv is spe-cific to the application program. The number of collectorinvocations differs in the collector organizations we con-sider, thus relative collector performance depends on thecost of collection startup. We therefore present compari-son results withX as parameter for a wide range of valuesof X. But we must then ask—what are reasonable val-ues ofX that one could encounter in practice? Considerour earlier measurements: the cost of copying one wordis about 80 cycles, but the cost of allocating one word isabout 2 cycles, socc � 40; the cost of startupCS is about5000 cycles. Then for a program with a smallish steady-state live amount of 50,000 words,X � 0:0025. Our com-parison results will show that for reasonable values ofXandL the best configuration of the oldest-first collector haslower cost than the best configuration of the youngest-firstcollector.

7 Collector details

We now describe each collector in more detail. These de-scriptions correspond to the way in which we simulatedthe collectors’ behavior.

Non-generational collector: The total space availableis V and after collectioni, εiV data is live andfiV is free.Initially the whole heap is empty:ε0 = 0 and f0 = 1. Eachcollection processes the entire spaceV. It does not mat-ter in which direction objects are compacted or in whichdirection they are allocated.

Youngest-first collector: The total space available isV. It is divided into a young generation of sizegV andan older generation of size(1�g)V. Allocation proceedsin the young generation until it is exhausted, which trig-

6

gers aminor collection. The minor collection processesthe young generation, copying survivors into the old gen-eration. Eventually the old generation fills up, and the nexttime the young generation fills, the entire heap is collected,with survivors put back into the old generation. We assumethat g is chosen such that the old generation exactly fillsup in the steady state. That is, if the old generation “over-fills”, we increaseV to hold the additional survivors, whilekeeping the young generation’ssizeconstant (implyinggis smaller than first assumed).

Oldest-first collector: We proceed circularly throughthe space, which we visualize here as proceeding from leftto right. Just after collectioni� 1, there is fi�1V spaceavailable for allocation (see Figure 4). We allocate fromleft to right until that space fills. We then choose to collectsome part of the area of very old objects just ahead (to theright) of the allocation pointer. In the steady state, thatspace is(1�g)V. In any case, we collect it, slidingεiVsurvivors to the left and creatingfiV free space. We setthe allocation pointer to the left end of that space and wecan resume allocation. Note that we will cycle through theentire space inK =

V(1�g)V collections. Note thatK need

not be an integer, so analysis is intricate.

8 Mathematical analysis for the caseof the exponential distribution (sum-mary)

We summarize here the results of the mathematical analy-sis of the three collectors described in the preceding sec-tion.

Non-generational collector: The mark/cons ra-tio is µnon-generational=

1L�1 regardless of distribution

[Appel, 1987, Jones and Lins, 1996].Youngest-first collector: Under exponential distribu-

tion, the number of minor collections per major collec-

tion cycle is ζ =

L(1�g)�e�Lg

1�e�Lg . The mark/cons ratio is

µyoungest-first=1�gζg . The full derivation is given in Ap-

pendix A.Oldest-first collector: Even under exponential distri-

bution the analysis is quite complicated, and the detailsof the general solution are given in Appendix B. In thesimplest case, whenK from previous section is an inte-ger, the mark/cons ratio isµoldest-first=

1�g� ff , where f

is found as the solution of the nonlinear equation:(1�K f )L

�

1�e�K f L�

= Ke�K f L�

ef L�1

�

.Under exponential distribution, and for any configura-

tion, µoldest-first< µnon-generational< µyoungest-first.

9 Simulation results

We produced synthetic object allocation traces, with objectlifetimes as independent identically distributed randomvariables. We normalized the steady-state live amount inthe heap, i.e., the expected value of lifetime, to 50,000,and used traces of at least 1,000,000 objects allocated, sothat a steady-state is clearly entered. For the three distribu-tions reported here, we have: for exponential,λ= 2�10�5,and generation of the synthetic trace lifetimesT by in-version from a uniformly-distributed random variateU ac-cording to formulaT =�

1λ lnU (see Ref. [Devroye, 1986,

pp. 27ff]); for square-root-exponential,β = 4 � 10�5, andT =

1β (lnU)2; for square-exponential,β = 1:772454�

10�5, andT =

1βp

� lnU.We built simulators for both youngest-first and oldest-

first collection, according to the description given in Sec-tion 7. Under the assumption that the allocation trace en-ters a steady state, the simulator reports the steady-statevalue of the live amount of data, of the amounts collected,mark/cons ratio, frequency of collection, etc. We validatedthe simulators against the mathematical model for the ex-ponential distribution, by calculating the mark/cons ratiofor each collector configurationL, g used in simulation.

The average relative error ise=q

∑( µsimulatedµcalculated

�1)2, where

the summation is over all configurations. For youngest-first collection,e= 0:0048 over 6630 configurations. Foroldest-first collection,e= 0:0076 over 10587 configura-tions. With agreement within 1%, we are satisfied that thesimulator is accurate.

9.1 Simulation results for analytical distribu-tions

We first present the results of the simulations of analyticaldistributions varying the configuration parametersL (heapspace overhead) andg (fraction considered young). Forbetter readability, we use the mark/cons ratioρ, normal-ized to the non-generational collector. We show in Fig-ure 5 the copying cost of youngest-first collection on theleft, and that of oldest-first collection on the right. Eachplot has the configuration parameterg along the horizon-tal axis, and includes several curves for selected values ofthe space overhead parameterL: 1.3, 1.5, 1.7, 2.0, 2.5,3.0, and 4.0. Note first that the scale on the vertical axisis different on the left and on the right: the curves foryoungest-first lie mostly above 1, and those for oldest-firstmostly below 1: youngest-first is mostly worse than non-generational, and oldest-first is mostly better. It is onlyfor the square-root-exponential distribution, and only for

7

εiV

εiV

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

allocation

allocation pointer

free area

new free area

collect

iVf

i-1V

+ = (1-g)ViV

f

f

area chosen for collection

Figure 4: Oldest-first collection.

small values ofL and ofg that the opposite is true. Second,the relative advantage of oldest-first over non-generationalcollection grows with more spaceL. The relative disad-vantage of youngest-first against non-generational collec-tion also grows with more spaceL. Third, for youngest-first collection the cost diminishes with increasingg. As gincreases, the numberζ of minor collections between ma-jor collections decreases, and the collector behaves morelike a non-generational one. Fourth, for oldest-first col-lection the cost has a downward trend with increasingg;it decreases monotonically for the exponential but non-monotonically for the two other distributions. (The case-analysis in Appendix B provides an intuition for this pat-tern.) Fifth, and quite surprising, the variance of costof youngest-first collection for the three distributions isnot great. This variance grows withL, but for a modestvalueL = 1:5 for example, the cost on the square-root-exponential distribution is at least 82% of the cost on theexponential distribution (achieved withg� 0:1), and thecost on the square-exponential distribution is at most 106%(with g� 0:16).

We see that the mark/cons ratio of a collector is heav-ily dependent on the configuration parameterg. Let usassume, however, that a collector is capable of adaptingto the workload by changing its configuration, within afixed amount of heap space. In other words, we are in-terested in the best choice ofg for a givenL. The op-timization is done with respect to the total cost measurec0 = µ+ X

f L . If both the youngest-first and the oldest-first collector achieve their best configuration, then we canfairly compare them, by asking which one has the lowercost for the same space overhead (measured byL) and thesame relative cost of collector invocation (measured byX).

For the three distributions, the answer is presented in Fig-ure 6. In the region indicated by shading, where the cost ofinvocationX is high, and in the region of extremely tightheaps whereL is close to 1, the youngest-first collectoris better (but the absolute performance of either schemeis poor). However, for reasonable values ofL, and rea-sonable values ofX, as discussed previously, the oldest-first collector is better. To see how much better, observethe contour lines drawn at 20% increments of the costratio

c0youngest-first

c0oldest-first. The advantage of the oldest-first collec-

tor increases with diminishingX and with increasingL.However, for the square-root-exponential distribution onthe left, the contours are widely spaced, which indicatesa shallow grade, and only modest improvement in the ad-vantage of the oldest-first collector. For the exponentialdistribution and even more so for the square-exponentialdistribution, this improvement is rapid. The break-evencontour also shifts a bit with the change in distribution.

9.2 Simulation results for real traces

In this section, we consider copying costs forreal programs. We instrumented three systems torecord object lifetimes: a Smalltalk virtual machine[Hosking et al., 1992], a custom version of the SML/NJcompiler [Stefanovic and Moss, 1994], and a Java virtualmachine. We use our language-independent garbagecollector toolkit [Hudson et al., 1991] to record objectallocation and to report the demise of objects at eachcollection. We configured the collector to collect veryfrequently (each 40,000 words of allocation for Smalltalk,and 125,000 words for SML) and to collect the wholeheap each time. This setup enables accurate (in relationto trace length) measurement of object lifetimes. We

8

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Square-root-exponential

L=1.3L=1.5L=1.7

L=2.0

L=2.5L=3.0

L=4.0

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Square-root-exponential

L=1.3L=1.5L=1.7L=2.0L=2.5L=3.0L=4.0

Youngest-first Oldest-first(A) Square-root-exponential distribution

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Exponential

L=1.3

L=1.7

L=2.0L=2.5

L=3.0

L=4.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Exponential

L=1.3L=1.5L=1.7

L=2.0

L=2.5L=3.0

L=4.0

Youngest-first Oldest-first(B) Exponential distribution

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Square-exponential

L=1.3L=1.5

L=1.7

L=2.0

L=2.5

L=3.0

L=4.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

rela

tive

mar

k/co

ns r

atio

rho

g

Square-exponential

L=1.3

L=1.5

L=1.7

L=2.0L=2.5L=3.0L=4.0

Youngest-first Oldest-first(C) Square-exponential distribution

Figure 5: Copying costs of different distributions.

9

1.03 1.2 1.41 1.65 1.94 2.27 2.66 3.12 3.65 4.28 5.01 5.870.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

1.03 1.18 1.35 1.56 1.79 2.05 2.36 2.71 3.11 3.58 4.11 4.720.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

1.03 1.18 1.36 1.56 1.79 2.06 2.36 2.72 3.12 3.58 4.120.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

Square-root-exponential Exponential Square-exponential

Figure 6: Comparison of youngest-first and oldest-first collection.

obtained traces for a number of long-running benchmarkprograms and interactive sessions (17 in Smalltalk, 8 inSML, 19 in Java). The longest trace allocates 310 millionwords and comes from an hour-long Smalltalk session.

Since our emphasis is on collection in mature space,we used a filtering mechanism to extract subtraces of onlymature objects. In practice, mature objects will be thosethat are promoted out of a young-space collector. Al-though the objects promoted out of the young-space col-lector vary in age, depending on the construction of thecollector, it can safely be assumed for the purposes of ourevaluation that they are of the same age. This thresholdage can be chosen somewhat arbitrarily; we made severalchoices for each original object trace, making sure thatthe thresholds were beyond the first knee of the survivorfunction—beyond the region of very young objects withvery high mortality. 4 We then focused on those matureobject traces that approach a quasi-steady state. Here wereport on two SML programs, Tomcatv and Swim, whichare adaptations of two SPEC95 matrix calculation bench-marks, and one Smalltalk heap-intensive benchmark pro-gram, Tree-replace, which repeatedly replaces subtrees ofa large tree by newly allocated subtrees.

We show in Figure 7(a,b,c) the comparison of oldest-first and youngest-first collection for three real programtraces of mature space objects. Plots of their survivor func-tions are in Figure 7(e). The interpretation of the contourgraphs is as in the preceding section: oldest-first collec-

4It is not correct, however, simply to reject objects with lifetimesunder the threshold: the lifetimes of the remaining objects must betransformed to correspond to the allocation amounts as seen by maturespace.

tion is preferred in the lower region of the plot, for smallerX. However, recall thatX =

CSccv

. For real traces the valueof the live amountv in the quasi-steady state is known.Thus, with the suggested realistic values ofCS andcc, wehave actual values forX as indicated underneath the fig-ures. For these values ofX, oldest-first is better, regard-less ofL. Why then provide the plots for a range ofXvalues? CS and cc values are different among collectorimplementations, and soX could be somewhat higher orlower. Note, however, thatCS

ccwould have to be at least

1000 times higher than we have estimated for the operat-ing points to move up into the regions where youngest-firstis preferred.

We never expected simulations of real program tracesto produce smooth results like those for synthetic traces ofanalytical distributions; after all they do not enter a trulysteady state. Nevertheless, we thought the plots in the toprow of Figure 7 to be very jagged. We considered twopossible sources of the irregularity: first, that the distri-bution of lifetimes may be far from smooth (with sharppeaks in mortality at critical ages); and second, that thereis significant correlation between lifetimes of successivelyallocated objects. Both effects can be expected in a traceof any iterative algorithm.

To eliminate the second effect, we generated a synthetictrace having the same distribution of object lifetimes asthe real trace, but with lifetimes drawn independently. Weshow in Figure 7(d) the outcome for such a synthetic tracebased on the Tomcatv benchmark. The appearance is stillslightly jagged, so this effect must be caused by the dis-tribution itself. The contours change very little, but inthe relatively flat region at the top of the plot, the bound-

10

ary shifts somewhat downward. We cannot claim, how-ever, that the correspondence between the real trace andits synthetic version is close for all traces, and this ques-tion awaits further investigation.

10 Simulation results for real traceswith pointer structure

11 Discussion and conclusions

We found that oldest-first collection almost always out-performs youngest-first generational collection and non-generational collection. Now for the caveats! The reportedanalyses are for sliding compaction; pure copying may ex-hibit some differences. However, since oldest-first doesless copying, it presumably will also need less additionalspace for pure copying. Our comparisons are for a costmodel that includes copying and collector startup costs,but the model ignores remembered set maintenance andwrite-barrier issues, and further assumes that the memoryhierarchy affects all collectors equally.

Changing the relationships between the regions col-lected and not collected might have large effects on re-membered set sizes and their processing times. On theother hand, if locality of reference between objects is cor-related with original allocation time, then the effects mightnot be large. We observe that the oldest-first collector ef-fectively inserts new survivors between batches of old sur-vivors, so it maydecreaselocality. This subject demandsexperimental measurement, and so does cache behavior.

Our simulations assumed perfect knowledge of reach-ability, too, and actual generational collectors will tendto copy a bit more because they rely on the rememberedsets, which are an upper bound on live pointers into the re-gion to be collected, but are not necessarily precise. A re-lated concern is that straightforward oldest-first collectionmight fail to collect cycles of garbage. The Mature ObjectSpace collector [Hudson and Moss, 1992] addresses thatconcern, but necessarily reorganizes objects.

We also observe that real systems often have long-liveddata, not likely to be discarded, and special techniques areneeded to prevent copying such data repeatedly. An ex-ample are class files for heavily used language or libraryclasses in Java. On the other hand, classes dynamicallyloaded by a net browser should perhaps be subject to col-lection as the user changes attention to different tasks.

Another obvious area for future exploration is applyingthe oldest-first strategy to collection of young objects. Inany case, we conclude that oldest-first collection is quite

promising, and look forward to evaluating itin vivo.

References

[Appel, 1987] Appel, A. W. (1987). Garbage collectioncan be faster than stack allocation.Information Pro-cessing Letters, 25(4):275–279.

[Baker, 1993] Baker, H. G. (1993). ‘Infant Mortality’and generational garbage collection.SIGPLAN Notices,28(4):55–57.

[Clinger and Hansen, 1997] Clinger, W. D. and Hansen,L. T. (1997). Generational garbage collection and theradioactive decay model.SIGPLAN Notices, 32(5):97–108. Proceedings of the ACM SIGPLAN ’97 Confer-ence on Programming Language Design and Imple-mentation.

[Cox and Oakes, 1984] Cox, D. R. and Oakes, D. (1984).Analysis of Survival Data. Chapman and Hall, London.

[Devroye, 1986] Devroye, L. (1986).Non-Uniform Ran-dom Variate Generation. Springer-Verlag, New York.

[Elandt-Johnson and Johnson, 1980] Elandt-Johnson,R. C. and Johnson, N. L. (1980).Survival Models andData Analysis. Wiley, New York.

[Hayes, 1993] Hayes, B. (1993).Key Objects in GarbageCollection. PhD thesis, Stanford University, Stanford,California.

[Hosking et al., 1992] Hosking, A. L., Moss, J. E. B.,and Stefanovic, D. (1992). A comparative perfor-mance evaluation of write barrier implementations.In Proceedings of the Conference on Object-OrientedProgramming Systems, Languages, and Applications,pages 92–109, Vancouver, Canada.SIGPLAN Notices27, 10 (October 1992).

[Hudson and Moss, 1992] Hudson, R. L. and Moss, J.E. B. (1992). Incremental collection of mature ob-jects. In Bekkers, Y. and Cohen, J., editors,Interna-tional Workshop on Memory Management, number 637in Lecture Notes in Computer Science, pages 388–403,St. Malo, France. Springer-Verlag.

[Hudson et al., 1991] Hudson, R. L., Moss, J. E. B., Di-wan, A., and Weight, C. F. (1991). A language-independent garbage collector toolkit. Technical Report91-47, University of Massachusetts, Amherst.

11

1.38 1.49 1.75 1.97 2.22 2.43 2.82 3.03 3.53 3.96 4.550.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

1.33 1.63 1.97 2.27 2.59 3.14 3.53 4.39 4.78 6.05 7.2 8.180.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

1.06 1.11 1.15 1.19 1.24 1.29 1.34 1.4 1.45 1.51 1.57 1.64 1.70.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

(a) SML SPEC101-Tomcatv(v� 13396,X � 0:01)

(b) SML SPEC102-Swim(v� 25094,X � 0:005)

(c) Smalltalk Tree-replace(v� 187966,X � 0:0007)

1.38 1.55 1.75 1.97 2.22 2.5 2.82 3.17 3.57 4.02 4.530.0002

0.0004

0.0008

0.0016

0.0032

0.0064

0.0128

0.0256

0.0512

0.102

0.205

0.41

0.819

1.64

3.28

6.55

13.1

26.2

L

X

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100 1000 10000 100000 1e+06 1e+07

Sur

vivo

r fu

nctio

n

Age

SML SPEC101-TomcatvSML SPEC102-Swim

Smalltalk Tree-replace

(d) Synthetic trace with distribution as inSML SPEC101-Tomcatv

(e) Survivor function of the mature object lifetimedistribution

Figure 7: Comparison of youngest-first and oldest-first collection (real traces).

12

[Jones and Lins, 1996] Jones, R. and Lins, R. (1996).Garbage Collection: Algorithms for Automatic Dy-namic Memory Management. John Wiley, Chichester.

[Seligmann and Grarup, 1995] Seligmann, J. and Grarup,S. (1995). Incremental mature garbage collection usingthe train algorith. In Nierstras, O., editor,Proceedingsof 1995 European Conference on Object-Oriented Pro-gramming, Lecture Notes in Computer Science, Uni-versity of Aarhus. Springer-Verlag.

[Stefanovic and Moss, 1994] Stefanovic, D. and Moss, J.E. B. (1994). Characterisation of object behaviour inStandard ML of New Jersey. In1994 ACM Confer-ence on Lisp and Functional Programming, Orlando,Florida.

13

Appendix

A Copying cost in the youngest-first collector

Let the younger generation (0) occupygV and older generation (1) occupy(1�g)V.The exact time between two minor collections isτ0 = gV. The expected live amount in the younger generation at

collection isε0V =

R τ00 so(t)dt = 1

λ�

1�e�λgV�

. Live data from generation 0 are promoted into generation 1.Upon some numberζ�1 of minor collections, the older generation fills up. On the next minor collection it becomes

necessary to collect the older generation as well in a major collection. Letε1V be the expected live amount in bothgenerations at the time of major collection. The expected available space in generation 1 between collections isα1V =

(1�g)V � ε1V. The expected number of minor collections between two majorcollections isζ� 1 = α1ε0

. For thepurposes of analysis, we take the values of the free parameters g andL such thatζ is an integer. (It can be shown thatthis assumption is more than just a convenience for analysis.) At collection time, the younger generation contains anarea of sizeε1V that was last established live at timeζτ0 ago, andζ�1 areas of sizeε0V that were last established liveat times(ζ�1)τ0; : : : ;τ0 ago. Generation 0 also contains new live data, as at any generation 0 collection. Owing to thememoryless property of the object lifetime distribution, the expected amount of live data in generation 1 is then:

ε1V = ε1Vs0(ζτ0)+

ζ�1

∑i=1

ε0Vs0(iτ0)+ ε0V = ε1Vs0(ζτ0)+

ζ�1

∑i=0

ε0Vs0(iτ0) = ε1Ve�λζτ0+ ε0V

ζ�1

∑i=0

e�λiτ0

or,

ε1 = ε0∑ζ�1

i=0 e�λigV

1�e�λζgV=

ε0

1�e�λgV:

In one generation 1 cycle, the amount allocated isζgV, and the amount copied is(ζ�1)ε0V+ε1V, so the mark/consratio is

µ=(ζ�1)ε0V + ε1V

ζgV=

ε0V�

ζ�1+ 11�e�λgV

�

ζgV=

�

1�e�λgV�

�

ζ�1+ 11�e�λgV

�

λζgV=

�

1�e�Lg�

(ζ�1)+1

ζLg

On the other hand,

ζ�1=α1

ε0=

(1�g)ε0

�

ε1

ε0=

(1�g)Lλ

1λ�

1�e�λgV�

�

1

1�e�λgV=

L(1�g)�1

1�e�Lg :

Thus,

ζ =L(1�g)�e�Lg

1�e�Lg :

With a given factorL, for particular (integer) values ofζ, this equation can be solved numerically forg, and for suchpairsL;g, the preceding analysis will be valid. Simplifying further,

µ=1�g

ζg:

14

B Copying cost in the oldest-first collector

The analysis of the oldest-first collector is somewhat involved; we present it for the case of the exponential distribution,which allows us to simplify the presentation by appeal to thememoryless property of the distribution: each time anobject is copied in collection, it is in effect, reborn. (Forany other distribution, it is possible to formulate similarequations involving infinite sums of integrals of the survivor function.) Our analysis is exact, but the result is not inclosed form, as it is a solution of a nonlinear equation.

εV

��

��

��

��

region logically moved region collected

copymove

gV (1-g)V

fV gV

before

afterallocation

Figure 8: Operation of the oldest-first collector.

The analysis will be made easier if we view the collector as inFigure 8, and first ask what is in the region of size(1�g)V that is collected. The young region is of sizegV, and it therefore shifts by an amount(1�g)V on eachcollection. On each collection, an amountεV of collected data is placed to the left of the shifted data; then an amountfV of new data is allocated in the leftmost, free, region. This pair of areasfV andεV are moved rightwards by(1�g)V

on each successive collection. It takesN(g) =l

g1�g

m

collections for this pair to move entirely pastgV, into the region

collected. In other words, ifN is an integer, then inN collections this pair becomes precisely the region collected.However, ifN is not an integer, the analysis is more complicated. We first note that ifg 2

�

n�1n ;

nn+1

�

;n 2 N, thenN(g) = n.

With respect to the contents of the region collected, there are four possibilities to consider, illustrated in Figure 9.Lety= 1�N(1�g). Case 3 obtains wheny= 1�g, in other words whenN is an integer. Otherwise, case 1 obtains whenf > y, case 2 whenf < y, and case 4 whenf = y. In cases 1, 3, and 4, anεV region is wholly within the collectedregion, and thus older data from an entire sequence of rangesof allocation time will remain together. However, in case2, two parts of two differentεV regions lie in the collected region; thus it matters which data are in these parts, andwhich outside. In other words, it matters in which order the copied data are placed into the “to-space” on collection: ina sliding implementation, it is the original order of the data.

We proceed with the analysis for the exponential distribution of object lifetimes, which allows a simpler treatmentof case 2, thanks to the memoryless property of the distribution.

Case 1.The collected region contains parts of two “new” areas, the younger of size( f �y)V, and the older of sizeyV; and an area with survivors of a previous collection that happenedN fV ago. The live amount in the collected regionis

Z N fV

(N�1) fV+yVe�λtdt+

Z N fV+yV

N fVe�λtdt+ εVe�λN fV

=

1

λe�λ(N f+1�N(1�g))V

�

eλ fV�1

�

+ εVe�λN fV:

This live amount can be equated withεV, thus:

εV =

1

λe�λ(N f+1�N(1�g))V

�

eλ fV�1

�

+ εVe�λN fV:

15

εV

=0.11ε

��

��

��

��

��

��

��

��

��

��

Case 4

gV (1 - g) V

fV

g=0.63 f=0.26

εVεV

ε=0.11

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Case 2

gV (1 - g) V

fV

g=0.65 f=0.25

εV

=0.096ε

��

��

��

��

��

��

��

��

��

��

��

��

Case 3

gV (1 - g) V

fV

f=0.24g=2/3

εV

��

��

��

��

��

��

��

��

��

��

��

��

Case 1

gV

fVfV

(1 - g) Vregion collected

f=0.31 ε=0.15g=0.54

Figure 9: Operation of the oldest-first collector: contentsof collected region (exponential distribution,L = 2).

UsingL =Vλ, we have:

εL�

1�e�N f L�= e�(N f+1�N(1�g))L �ef L

�1�

:

Finally, ε = 1�g� f , and the equation, to be solved numerically forf , givenL andg, is:

(1�g� f )L�

1�e�N f L�= e�(N f+1�N(1�g))L �ef L

�1�

:

Case 2.The collected region contains one “new” area of sizefV, and parts of two areas with survivors of previouscollections; one, of size(N(1�g)�g)V, from a collectionN fV ago, and the other, of sizeεV� (N(1�g)�g)V,from a collection(N+1) fV ago. The live amount in the collected region is

Z

(N+1) fV

N fVe�λtdt+(N(1�g)V�gV)e�λN fV

+(εV�N(1�g)V+gV)e�λ(N+1) fV

=

1

λ

�

e�λN fV�e�λ(N+1) fV

�

+(N(1�g)V�gV)e�λN fV+(εV�N(1�g)V+gV)e�λ(N+1) fV

:

Again, this live amount can be equated withεV:

εV =

1

λ

�


�

+(N(1�g)V�gV)e�λN fV+(εV�N(1�g)V+gV)e�λ(N+1) fV

:

UsingL =Vλ, we have:

εL = e�N f L(1+N(1�g)L�gL)+e�(N+1) f L

(�1+ εL�N(1�g)L+gL);

or:

εL�

1�e�(N+1) f L�

= (1+N(1�g)L�gL)e�N f L�1�e� f L�:

16

With ε = 1�g� f :

(1�g� f )L�

1�e�(N+1) f L�

= (1+N(1�g)L�gL)e�N f L�1�e� f L�:

Again, this equation must be solved numerically for any given L andg.Case 3.The fV region contains data with ages betweenN fV and(N+1) fV, and theεV region contains data that

were last collected(N+1) fV ago. The equation is then:

εV =

Z

(N+1) fV

N fVe�λtdt+ εVe�λ(N+1) fV

=

1

λ

�


�

+ εVe�λ(N+1) fV;

which, withL =Vλ, K = N+1= 11�g simplifies to:

(1�K f )L�

1�e�K f L�= Ke�K f L �ef L

�1�

:

This models a collector withK areas of equal size, which functions as a queue withK stations, similar to atrain ofRef. [Hudson and Moss, 1992].

Case 4.This is the case where the solutions for cases 1 and 2 coincide, at such pointsg?N that f = y. The equationgoverning these points is:

(N� (N+1)g?N)L�

1�e�N(1�N(1�g?N))L�

= e�(N+1)(1�N(1�g?N))L�

e(1�N(1�g?N))L�1

�

:

Once f is found, we computeε = 1�g� f , µ= εf , ρ = µ(L�1). Efficient computation first determines the values

of N of interest, corresponding to the interval�

n�1n ;

nn+1

�

, and for eachN andL finds the pointg?N, within this interval.To the left ofg?N, case 1 applies, and to the right ofg?N, case 2 applies.

As g! 1�

andL! ∞, the performance approaches that of an ideal object queue,µ= e�L.

17

oldest-first garbage collection - university of new mexicodarko/public/files/oldest-first... ·...

Documents