© imperial college london exploring the barrier to entry incremental generational garbage...
Post on 03-Jan-2016
224 Views
Preview:
TRANSCRIPT
© Imperial College London
Exploring the Barrier to Entry
Incremental Generational Garbage Collection for Haskell
Andy Cheadle & Tony FieldImperial College London
Simon Marlow & Simon Peyton JonesMicrosoft Research, Cambridge, UK
Lyndon WhileThe University of Western Australia, Perth
© Imperial College LondonPage 2
Introduction
We focus on Haskell with the intent of building an:
• Efficient• Barrierless• Hybrid• Incremental• Generational• …garbage collector for GHC
Investigate pause time bounds and mutator utilisation.
Explore application to other dynamic dispatch systems.
© Imperial College LondonPage 3
Highlights
• Improving Non-Stop Haskell– Incremental GC read-barrier optimisation without
the per-object space overhead
• Bridging the Generation Gap– Generational GC write-barrier optimisation
• Consistent Mutator Utilisation– Time-based versus Work based scheduling
© Imperial College LondonPage 4
Barriers: Friend or Foe - Summary
• Blackburn & Hosking - ISMM 2004
• Conditional read-barrier– AMD: 21.24%, P4: 15.91%, PPC: 6.49%– Incremental GC: Standard Baker read-barrier
• Unconditional read-barrier– AMD: 8.05%, P4: 5.04%, PPC: 0.85%– Brooks indirection read-barrier– Metronome ‘Eager’ barrier ~ 4%– BUT: space overhead -> increased GC count
• Must consider GC cost!!!
© Imperial College LondonPage 5
Non-Stop Haskell
• Implementing Baker’s incremental collector typically introduces high overheads
– The software read-barrier
• We have shown that this can be done efficiently in systems with dynamic dispatching
CaveatDynamic dispatching already “costs” something; we show that incremental garbage collection comes at virtually no extra cost.
© Imperial College LondonPage 6
Dynamic Dispatch and the STG Machine
• The STG machine is a model for the compilation of lazy functional languages
• All objects are represented on the heap as closures:
• To compute function ‘f’ applied to arguments ‘a b c d’ jump to Entry code
0: 3: imm2:1: 4: imm
2, 2
Other fields
Entry code …
heap pointers
a b c df
static info table
© Imperial College LondonPage 7
The Read-Barrier Invariant
2r
2: unscavenged
3r
3: unscavenged
Stack top
from-space
to-space
1r
1: scavenged
Problem 1
Problem 2
© Imperial College LondonPage 8
• When the garbage collector is on make info pointers point to code that scavenges evacuated closures before entering them
• At all other times the system operates with no read barrier!
Invariant Problem 1: Scavenging Closures
0: 3: imm2:1: 4: imm
Self-scav code …
heap pointers
2, 2
Other fields
© Imperial College LondonPage 9
Q How do we restore the original info pointer?
A We remember it when the closure is evacuated
Non-Stop Haskell:
• Use an extra word in to-space
• Note: the space overhead applies only to objects copied from from-space but effectively reduces to-space by 30%
• Freshly allocated objects carry no space overhead
0: 3: imm2:1: 4: imm
2, 2
Other fields
Entry code …
heap pointers
-1:
Self-scav code …
2, 2
Other fields
© Imperial College LondonPage 10
Q How do we restore the original info pointer?
A We remember it when the closure is evacuated
In production:
• Specialise every closure type at compile time
• Runtime space overhead is replaced by a static one of ~ 25%
0: 3: imm2:1: 4: imm
2, 2
Other fields
Entry code …
heap pointers
Self-scav code
JMP Entry code
2, 2
Other fields
© Imperial College LondonPage 11
Invariant Problem 2: Stack Scavenging
• STG machine stack frames look just like closures
• Before returning to the caller frame we ‘hijack’ the caller’s return address, replacing it with a pointer to self-scavenging code for that frame
1: scavenged
2r
2: unscavenged
3r
3: unscavenged
3r
3: unscavenged
2: scavengedscav; mod 3r; update; return
scav; mod 4r; update; returnupdate; return
© Imperial College LondonPage 12
Background Scavenging
• GHC’s heap is block allocated. So, scavenge at:– Every Allocation (EA)
– Every Block allocation (EB)
• Reduce forced-completions via block chaining
• Incremental scavenger pauses are allocation-dependent
• Exploit GHC’s lightweight scheduler to implement a time-scheduled scavenger (Jikes RVM Metronome)– Consistent mutator utilisation
– Increase in forced-completions due to allocation bursts
© Imperial College LondonPage 13
Results – Binary Sizes
Max 36
Min 36
x2n1
wave4main
symalg
lcss
circsim
Baker
(EA)Stop-copy
(KB)
Application Baker
(EB)
NSH
(EA)
NSH
(EB)
SPS
(EA)
SPS
(EB)
Metronome
10 ms
All 36
287
239
175
424
325
12.99%
12.76%
12.82%
12.84%
12.71%
12.12%
14.40%
12.98%
8.67%
8.75%
9.76%
7.93%
8.34%
7.93%
9.67%
8.81%
14.15%
15.83%
19.25%
11.49%
13.04%
9.00%
20.22%
14.87%
9.73%
11.67%
16.01%
6.51%
8.55%
2.98%
16.90%
10.54%
32.20%
30.83%
29.89%
28.03%
28.46%
26.98%
31.90%
29.59%
26.72%
26.65%
26.63%
22.95%
23.95%
22.04%
27.83%
25.23%
34.71%
34.02%
32.14%
32.11%
32.04%
29.05%
34.74%
32.95%
© Imperial College LondonPage 14
Results – Runtimes
Max 36
Min 36
x2n1
wave4main
symalg
lcss
circsim
Baker
(EA)Stop-copy
(seconds)
Application Baker
(EB)
NSH
(EA)
NSH
(EB)
SPS
(EA)
SPS
(EB)
Metronome
10 ms
All 36
39.64
54.24
181.41
26.28
218.39
29.33%
56.80%
16.56%
7.18%
6.52%
-0.11%
79.02%
27.34%
23.30%
34.05%
17.25%
0.51%
4.10%
-0.08%
73.56%
23.61%
20.29%
35.47%
7.06%
8.74%
9.45%
-4.04%
35.47%
11.22%
13.94%
17.74%
5.57%
0.68%
8.28%
-3.40%
37.26%
8.26%
8.70%
23.66%
2.88%
7.77%
9.77%
-3.10%
24.20%
7.94%
5.58%
8.03%
3.76%
0.39%
9.13%
-2.54%
18.65%
4.88%
9.73%
18.75%
-13.52%
2.20%
6.09%
-13.52%
70.89%
13.63%
© Imperial College LondonPage 21
The Generational Write-barrier
root set for generation N – 1
inter-generational pointer
generation N generation N - 1
root set
Depending on the number of updates, the write-barrier can impose an overhead of 8 – 24% (NJ/ML and Clean).
© Imperial College LondonPage 22
Bridging the Generation Gap
We implement in GHC a mechanism that again exploits dynamic dispatch to eliminate unnecessary write-barriers:
root set for generation 0
generation 0
THUNK_SELECT
THUNK_1
THUNK_2
root set
Promote to generation 1
© Imperial College LondonPage 23
Bridging the Generation Gap
root set for generation 0
generation 1 generation 0
THUNK_SELECT
THUNK_1 THUNK_2
IND_PRE_UPDIND_PRE_UPD
root set
force THUNK selectee evaluation
© Imperial College LondonPage 24
Bridging the Generation Gap
root set for generation 0
generation 1 generation 0
THUNK_SELECT
THUNK_1 THUNK_2
IND_UPDIND_PRE_UPD
root set
© Imperial College LondonPage 25
Bridging the Generation Gap
root set for generation 0
generation 1 generation 0
THUNK_SELECT
THUNK_1 IND_OLDGEN
IND_UPDIND_PRE_UPD
root set
CONSTR_2
inter-generational pointer
Preliminary benchmarks suggested a reduction of 5 - 9%, in production it is actually around 2 - 3%.
© Imperial College LondonPage 26
Ongoing Work
• Unfortunately Java programs are not “pure” in their use of dynamic dispatch
– Field access via get() / set() methods– Inlining must be disallowed
Application of read-barrier optimisation to Java
Investigating within Jikes RVM:
• Inter- and intra-class inlining
• Code bloat arising from get() / set() methods, restricted inlining and additional per-class VMT
• Cost of VMT TIB pointer flip
© Imperial College LondonPage 27
Removal of collector-specific barriers and tests:• Yields cheaper ‘vanilla’ collectors
• Allows the efficient hybridisation of multiple collector algorithms
Conclusion
Time-based scheduling is massively attractive, but: • Complete decoupling from the allocator is problematic*
• A hybrid approach looks promising:– Parameterised by mutator utilisation– Sensitive to allocation rate
Elimination of per-object overhead:• Mandatory for our production collector
top related