ulterior reference counting fast gc without the wait

43
Ulterior Reference Counting Fast GC Without The Wait Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn

Upload: jolene

Post on 06-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Ulterior Reference Counting Fast GC Without The Wait. Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn. Outline. Throughput-Responsiveness problem Reference counting & optimizations Ulterior in detail BG-RC in action - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ulterior Reference Counting Fast GC Without The Wait

Ulterior Reference CountingFast GC Without The Wait

Steve Blackburn – Kathryn McKinley

Presented by: Dimitris Prountzos

Slides adapted from presentation by Steve Blackburn

Page 2: Ulterior Reference Counting Fast GC Without The Wait

Outline

• Throughput-Responsiveness problem• Reference counting & optimizations• Ulterior in detail• BG-RC in action• Experimental evaluation• Conclusion

Page 3: Ulterior Reference Counting Fast GC Without The Wait

• GC and mutator share CPU

– Throughput: net GC/mutator ratio– Responsivness: length of GC pauses

GCmutator

CPU Utilization (time)

mutator poor responsiveness maximum pause

Throughput/Responsiveness Trade-off

Page 4: Ulterior Reference Counting Fast GC Without The Wait

The Ulterior approach• Match mechanisms to object demographics– Copying nursery (young space)

• Highly mutated, high mortality young objects• Ignores most mutations• GC time proportional to survivors, space efficient

– RC mature space• Low mutation, low mortality old objects• GC time proportional to mutations, space efficient

• Generalize deferred RC to heap objects– Defer fields of highly mutated objects & enumerate them quickly– Reference count only infrequently mutated fields

Page 5: Ulterior Reference Counting Fast GC Without The Wait

Pure Reference Counting• Tracks mutations: RCM(p)

RCM(p) generates a decrement and an increment for the before and after values of p:

RCM(p) RC(pbefore)--, RC(pafter)++

• If RC==0, Free

RC space

b1a1

Page 6: Ulterior Reference Counting Fast GC Without The Wait

Pure Reference Counting• Tracks mutations: RCM(p)

RCM(p) generates a decrement and an increment for the before and after values of p:

RCM(p) RC(pbefore)--, RC(pafter)++

• If RC==0, Free

RC space

b0a1

c1

Page 7: Ulterior Reference Counting Fast GC Without The Wait

Pure Reference Counting• Tracks mutations: RCM(p)

RCM(p) generates a decrement and an increment for the before and after values of p:

RCM(p) RC(pbefore)--, RC(pafter)++

• If RC==0, Free

RC space

b0a1

c1

Page 8: Ulterior Reference Counting Fast GC Without The Wait

Pure Reference Counting• Tracks mutations: RCM(p)

RCM(p) generates a decrement and an increment for the before and after values of p:

RCM(p) RC(pbefore)--, RC(pafter)++

• If RC==0, Free

RC space

a1

c1

RCM(p) for every mutation is very expensive

Page 9: Ulterior Reference Counting Fast GC Without The Wait

RC Optimizations

• Buffering: apply RC(p)--, RC(p)++ later

• Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values):

{RCM(p), RCM(p1), ... RCM(pn)} RC(pinitial)--, RC(pfinal)++

• Deferral of RCM events

Page 10: Ulterior Reference Counting Fast GC Without The Wait

Deferred Reference CountingGoal: Ignore RCM(p) for stacks & registers

• Deferral of p– A mutation of p does not generate an RCM(p)– Correctness: • For all deferred p: RCR(p) at each GC

• Retain Event: RCR(p)– po temporarily retains o regardless of RC(o)• Deutsch/Bobrow use a Zero Count Table• Bacon et al. use a temporary increment

Page 11: Ulterior Reference Counting Fast GC Without The Wait

Classic DeferralIn deferral phase: Ignore RCM(p) for stacks &

registers

RC space

b1

a0

Stacks & Regs

Page 12: Ulterior Reference Counting Fast GC Without The Wait

Classic DeferralIgnore RCM(p) for stacks & registers

RC space

b0

a0

c1

Stacks & Regs

Breaks RC==0 Invariant

Page 13: Ulterior Reference Counting Fast GC Without The Wait

Classic Deferral (Bacon et al.)• Divide execution in epochs

• Store information in buffers• Root buffer (RB): Store 1st level objects • Increment buffer (IB): Store increments to 1st level objects• Decrement buffer (DB): Store decrements to 1st level objects

• At GC time do:– Look at RB and apply temporary increments to all objects there– Process IB of this epoch– Look at RB of previous epoch and apply decrements to all objects there– Process DB of previous epoch

• During DB processing recycle o if RC(o)=0• Avoid race conditions by

• Processing IB before DB• Processing DB of one epoch behind

Page 14: Ulterior Reference Counting Fast GC Without The Wait

Classic Deferral (Bacon et al.)

RC space

b1

a1

c1

Stacks & Regs

root bufdec buf

ba

At GC time, RCR(p)for root pointers

applies temporaryincrements.

Page 15: Ulterior Reference Counting Fast GC Without The Wait

RC space

b1

a1

c1

Stacks & Regs

root bufdec buf

ba

At next GC,apply decrements

Classic Deferral (Bacon et al.)

Page 16: Ulterior Reference Counting Fast GC Without The Wait

RC space

b1

a1

c1

Stacks & Regs

root bufdec buf

ba

At next GC,apply decrements

Classic Deferral (Bacon et al.)Key: Efficient enumeration of deferred pointers

Page 17: Ulterior Reference Counting Fast GC Without The Wait

RC space

b1

a1

c1

Stacks & Regs

root bufdec buf

Classic Deferral (Bacon et al.)

Better, but not good enough!

Page 18: Ulterior Reference Counting Fast GC Without The Wait

Ulterior Reference Counting

• Idea: Extend deferral to select heap pointers– e.g. All pointers within nursery objects

• Deferral is not a fixed property of p– e.g. A nursery object gets promoted

Integrate Event I(p)– Changes p from deferred to not deferred

Page 19: Ulterior Reference Counting Fast GC Without The Wait

BG-RCBounded Nursery Generational - RC

• Heap organization– Bounded copying nursery

• Ignore mutations to nursery pointer fields– RC old space

• Object remembering, coalescing, buffering

• Collection– Process roots– Nursery phase promotes live p to old space and I(p)– RC phase processes object buffer, dec buffer

Page 20: Ulterior Reference Counting Fast GC Without The Wait

View of heap in Ulterior RC

Stacks Regs

RC space

non-RC space

d1a1 b1

e1s

t

r

•How can we efficiently•Enumerate all deferred pointer fields ?•Remember old to young pointers ?

remember

defer

defer

Page 21: Ulterior Reference Counting Fast GC Without The Wait

Bringing it Together• Deferral:– Defer nursery & roots– Perform I(p) on nursery promotion• Piggyback on copying nursery collection

• Coalescing:– Remember mutated RC objects• Upon first mutation, dec each referent• At GC time, inc each referent• Piggyback remset onto this mechanism

Page 22: Ulterior Reference Counting Fast GC Without The Wait

BG-RC Write Barrier

10 private void writeBarrierSlow(VM_Address srcObj)11 throws VM_PragmaNoInline {12 if (attemptToLog(srcObj)) {13 modifiedBuffer.push(srcObj);14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely15 setLogState(srcObj, LOGGED); // modified objects16 }17 }

1 private void writeBarrier(VM_Address srcObj,2 VM_Address srcSlot,3 VM_Address tgtObj)4 throws VM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj);7 VM_Magic.setMemoryAddress(srcSlot, tgtObj);8 }9 }

// unsync check for uniqueness

Page 23: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

Page 24: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

bed

Page 25: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

bed

Page 26: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

bed

r

Page 27: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

s

bed

r

Page 28: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

s

t

bed

r

Page 29: Ulterior Reference Counting Fast GC Without The Wait

BG-RCMutation Phase

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b0

e1

s

t

bed

r

Page 30: Ulterior Reference Counting Fast GC Without The Wait

BG-RCNursery Collection: Scan Roots

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b1

e1

s

t

bed

r

b

Page 31: Ulterior Reference Counting Fast GC Without The Wait

BG-RCNursery Collection: Scan Roots

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b1

e1

s

t

bed

r

bs

s1

Page 32: Ulterior Reference Counting Fast GC Without The Wait

BG-RCNursery Collection: Scan Roots

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a1

b1

e2

s

t

bed

r

bs

s1

t1

Page 33: Ulterior Reference Counting Fast GC Without The Wait

BG-RCNursery Collection: Process Object Buffer

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a2

b1

e3

s

t

bed

r

bs

s1

t1

r1

Page 34: Ulterior Reference Counting Fast GC Without The Wait

BG-RCNursery Collection: Reclaim Nursery

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d1

a2

b1

e3

s

t

ed

r

bs

s1

t1

r1

Reclaim

Page 35: Ulterior Reference Counting Fast GC Without The Wait

BG-RCRC Collection: Process Decrement Buffer

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d0

a2

b1

e3

ed b

s

s1

t1

r1

Page 36: Ulterior Reference Counting Fast GC Without The Wait

BG-RCRC Collection: Recursive Decrement

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

d0

a1

b1

e3

e bs

s1

t1

r1

free

Page 37: Ulterior Reference Counting Fast GC Without The Wait

BG-RCRC Collection: Process Decrement Buffer

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

a1

b1

e2

e bs

s1

t1

r1

Page 38: Ulterior Reference Counting Fast GC Without The Wait

BG-RCCollection Complete!

Stacks Regs

root buf

RC space

obj buf dec buf

non-RC space

a1

b1

e2

bs

s1

t1

r1

b

s

Page 39: Ulterior Reference Counting Fast GC Without The Wait

Controlling Pause Times• Modest bounded nursery size

• Meta Data– Decrement and modified object buffers– Trigger a collection if too big

• RC time cap– Limits time recursively decrementing RC obj & in cycle detection

• Cycles - pure RC is incomplete– Use Bacon/Rajan trial deletion algorithm

Page 40: Ulterior Reference Counting Fast GC Without The Wait

Experimental evaluation

• Jikes RVM with MMTK• Compare MS, BG-MS, BG-RC, RC• Examine various heap sizes• Collection triggers– Each 4MB of allocation for BG-RC (1 MB for RC)– Time cap of 60 ms– Cycle detection at 512 KB

Page 41: Ulterior Reference Counting Fast GC Without The Wait

1751.53530.982101.002141.23mean

1211.14430.961781.001851.05mpeg

431.11591.012441.002381.01db

2971.33531.002811.002641.00pjbb

720.93680.881751.00160.98cmpress

1301.75491.041801.002411.29mtrt

1331.71491.031841.002031.31raytrace

721.66440.941851.001841.52jack

5801.78681.002851.002681.01javac

1312.36440.991811.001821.91jess

max pause

norm time

max pause

norm time

max pause

norm time

max pause

norm time

RCBG-RCBG-MSMS

Throughput/Pause time Moderate Heap Size

Page 42: Ulterior Reference Counting Fast GC Without The Wait

Throughput & Responsiveness_228_jack

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

10 100 1000 10000Window (msec)

Uti

liza

tion

BG-MS

RC

BG-RC

Page 43: Ulterior Reference Counting Fast GC Without The Wait

Conclusion

• Ulterior design based on careful study of object demographics and making collector aware of them

• Extends deferred RC to heap objects

• Practically shows that high throughput & low pause times are compatible