ulterior reference counting fast gc without the wait
Post on 06-Feb-2016
53 Views
Preview:
DESCRIPTION
TRANSCRIPT
Ulterior Reference CountingFast GC Without The Wait
Steve Blackburn – Kathryn McKinley
Presented by: Dimitris Prountzos
Slides adapted from presentation by Steve Blackburn
Outline
• Throughput-Responsiveness problem• Reference counting & optimizations• Ulterior in detail• BG-RC in action• Experimental evaluation• Conclusion
• GC and mutator share CPU
– Throughput: net GC/mutator ratio– Responsivness: length of GC pauses
GCmutator
CPU Utilization (time)
mutator poor responsiveness maximum pause
Throughput/Responsiveness Trade-off
The Ulterior approach• Match mechanisms to object demographics– Copying nursery (young space)
• Highly mutated, high mortality young objects• Ignores most mutations• GC time proportional to survivors, space efficient
– RC mature space• Low mutation, low mortality old objects• GC time proportional to mutations, space efficient
• Generalize deferred RC to heap objects– Defer fields of highly mutated objects & enumerate them quickly– Reference count only infrequently mutated fields
Pure Reference Counting• Tracks mutations: RCM(p)
RCM(p) generates a decrement and an increment for the before and after values of p:
RCM(p) RC(pbefore)--, RC(pafter)++
• If RC==0, Free
RC space
b1a1
Pure Reference Counting• Tracks mutations: RCM(p)
RCM(p) generates a decrement and an increment for the before and after values of p:
RCM(p) RC(pbefore)--, RC(pafter)++
• If RC==0, Free
RC space
b0a1
c1
Pure Reference Counting• Tracks mutations: RCM(p)
RCM(p) generates a decrement and an increment for the before and after values of p:
RCM(p) RC(pbefore)--, RC(pafter)++
• If RC==0, Free
RC space
b0a1
c1
Pure Reference Counting• Tracks mutations: RCM(p)
RCM(p) generates a decrement and an increment for the before and after values of p:
RCM(p) RC(pbefore)--, RC(pafter)++
• If RC==0, Free
RC space
a1
c1
RCM(p) for every mutation is very expensive
RC Optimizations
• Buffering: apply RC(p)--, RC(p)++ later
• Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values):
{RCM(p), RCM(p1), ... RCM(pn)} RC(pinitial)--, RC(pfinal)++
• Deferral of RCM events
Deferred Reference CountingGoal: Ignore RCM(p) for stacks & registers
• Deferral of p– A mutation of p does not generate an RCM(p)– Correctness: • For all deferred p: RCR(p) at each GC
• Retain Event: RCR(p)– po temporarily retains o regardless of RC(o)• Deutsch/Bobrow use a Zero Count Table• Bacon et al. use a temporary increment
Classic DeferralIn deferral phase: Ignore RCM(p) for stacks &
registers
RC space
b1
a0
Stacks & Regs
Classic DeferralIgnore RCM(p) for stacks & registers
RC space
b0
a0
c1
Stacks & Regs
Breaks RC==0 Invariant
Classic Deferral (Bacon et al.)• Divide execution in epochs
• Store information in buffers• Root buffer (RB): Store 1st level objects • Increment buffer (IB): Store increments to 1st level objects• Decrement buffer (DB): Store decrements to 1st level objects
• At GC time do:– Look at RB and apply temporary increments to all objects there– Process IB of this epoch– Look at RB of previous epoch and apply decrements to all objects there– Process DB of previous epoch
• During DB processing recycle o if RC(o)=0• Avoid race conditions by
• Processing IB before DB• Processing DB of one epoch behind
Classic Deferral (Bacon et al.)
RC space
b1
a1
c1
Stacks & Regs
root bufdec buf
ba
At GC time, RCR(p)for root pointers
applies temporaryincrements.
RC space
b1
a1
c1
Stacks & Regs
root bufdec buf
ba
At next GC,apply decrements
Classic Deferral (Bacon et al.)
RC space
b1
a1
c1
Stacks & Regs
root bufdec buf
ba
At next GC,apply decrements
Classic Deferral (Bacon et al.)Key: Efficient enumeration of deferred pointers
RC space
b1
a1
c1
Stacks & Regs
root bufdec buf
Classic Deferral (Bacon et al.)
Better, but not good enough!
Ulterior Reference Counting
• Idea: Extend deferral to select heap pointers– e.g. All pointers within nursery objects
• Deferral is not a fixed property of p– e.g. A nursery object gets promoted
Integrate Event I(p)– Changes p from deferred to not deferred
BG-RCBounded Nursery Generational - RC
• Heap organization– Bounded copying nursery
• Ignore mutations to nursery pointer fields– RC old space
• Object remembering, coalescing, buffering
• Collection– Process roots– Nursery phase promotes live p to old space and I(p)– RC phase processes object buffer, dec buffer
View of heap in Ulterior RC
Stacks Regs
RC space
non-RC space
d1a1 b1
e1s
t
r
•How can we efficiently•Enumerate all deferred pointer fields ?•Remember old to young pointers ?
remember
defer
defer
Bringing it Together• Deferral:– Defer nursery & roots– Perform I(p) on nursery promotion• Piggyback on copying nursery collection
• Coalescing:– Remember mutated RC objects• Upon first mutation, dec each referent• At GC time, inc each referent• Piggyback remset onto this mechanism
BG-RC Write Barrier
10 private void writeBarrierSlow(VM_Address srcObj)11 throws VM_PragmaNoInline {12 if (attemptToLog(srcObj)) {13 modifiedBuffer.push(srcObj);14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely15 setLogState(srcObj, LOGGED); // modified objects16 }17 }
1 private void writeBarrier(VM_Address srcObj,2 VM_Address srcSlot,3 VM_Address tgtObj)4 throws VM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj);7 VM_Magic.setMemoryAddress(srcSlot, tgtObj);8 }9 }
// unsync check for uniqueness
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
bed
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
bed
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
bed
r
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
s
bed
r
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
s
t
bed
r
BG-RCMutation Phase
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b0
e1
s
t
bed
r
BG-RCNursery Collection: Scan Roots
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b1
e1
s
t
bed
r
b
BG-RCNursery Collection: Scan Roots
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b1
e1
s
t
bed
r
bs
s1
BG-RCNursery Collection: Scan Roots
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a1
b1
e2
s
t
bed
r
bs
s1
t1
BG-RCNursery Collection: Process Object Buffer
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a2
b1
e3
s
t
bed
r
bs
s1
t1
r1
BG-RCNursery Collection: Reclaim Nursery
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d1
a2
b1
e3
s
t
ed
r
bs
s1
t1
r1
Reclaim
BG-RCRC Collection: Process Decrement Buffer
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d0
a2
b1
e3
ed b
s
s1
t1
r1
BG-RCRC Collection: Recursive Decrement
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
d0
a1
b1
e3
e bs
s1
t1
r1
free
BG-RCRC Collection: Process Decrement Buffer
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
a1
b1
e2
e bs
s1
t1
r1
BG-RCCollection Complete!
Stacks Regs
root buf
RC space
obj buf dec buf
non-RC space
a1
b1
e2
bs
s1
t1
r1
b
s
Controlling Pause Times• Modest bounded nursery size
• Meta Data– Decrement and modified object buffers– Trigger a collection if too big
• RC time cap– Limits time recursively decrementing RC obj & in cycle detection
• Cycles - pure RC is incomplete– Use Bacon/Rajan trial deletion algorithm
Experimental evaluation
• Jikes RVM with MMTK• Compare MS, BG-MS, BG-RC, RC• Examine various heap sizes• Collection triggers– Each 4MB of allocation for BG-RC (1 MB for RC)– Time cap of 60 ms– Cycle detection at 512 KB
1751.53530.982101.002141.23mean
1211.14430.961781.001851.05mpeg
431.11591.012441.002381.01db
2971.33531.002811.002641.00pjbb
720.93680.881751.00160.98cmpress
1301.75491.041801.002411.29mtrt
1331.71491.031841.002031.31raytrace
721.66440.941851.001841.52jack
5801.78681.002851.002681.01javac
1312.36440.991811.001821.91jess
max pause
norm time
max pause
norm time
max pause
norm time
max pause
norm time
RCBG-RCBG-MSMS
Throughput/Pause time Moderate Heap Size
Throughput & Responsiveness_228_jack
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10 100 1000 10000Window (msec)
Uti
liza
tion
BG-MS
RC
BG-RC
Conclusion
• Ulterior design based on careful study of object demographics and making collector aware of them
• Extends deferred RC to heap objects
• Practically shows that high throughput & low pause times are compatible
top related