steve blackburn department of computer science australian national university perry cheng tj watson...

Post on 14-Dec-2015

218 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Steve BlackburnDepartment of Computer Science

Australian National University

Perry ChengTJ Watson Research Center

IBM Research

Kathryn McKinleyDepartment of Computer Sciences

University of Texas at Austin

IBM Research

Myths & RealitiesThe Performance Impact of Garbage

Collection

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Background• No prior apples-to-apples comparisons

• MMTk• Canonical policies implemented (SS, MS, RC, genX,

etc)

– Shared mechanisms– Good performance (match/beat old Watson GCs)

– Ideal platform for apples-to-apples comparisons

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Some Questions• Architecture

– How well do modern OO languages play to modern architectures?

• Collection– Is generational GC “a waste of time”?– Are write barriers expensive?

• Allocation– Free list or bump pointer?

• “Locality is everything”– Really???– Is it different for young & old? Why?

• Locality and architecture– What is the impact, what is the trend?

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Methodology• Jikes RVM & MMTk• Platforms• 1.6GHz G5 (PowerPC 970) • 1.9GHz AMD Athlon 2600+• 2.6GHz Intel P4• Linux 2.6.0 with perfctr patch & libraries– Separate accounting of GC & Mutator perf counts

• SPECjvm98 & pseudojbb

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Architecture

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Relative PerformanceAthlon

2600+ 1.9GHz

P42.6GHz

G51.6GHz

compress 0.93 1.00 1.18

jess 0.88 1.00 1.20

raytrace 0.71 1.00 0.73

db 0.97 1.00 1.68

javac 0.67 1.00 1.37

mtrt 0.69 1.00 0.75

jack 0.62 1.00 1.11

pseudojbb 0.77 1.00 1.24

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Architecture - Q & A

How big is the mismatch between modern arch & modern

languages???

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Allocation

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Allocation Choices• Bump pointer– ~70 bytes IA32 instructions, 726MB/s

• Free list– ~140 bytes IA32 instructions, 654MB/s

• Bump pointer 11% faster in tight loop– < 1% in practical setting– No significant difference (?)

• Second order effects?– Locality??– Collection mechanism??

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Implications for Locality• Compare SS & MS mutator– Mutator time = total – GC time– Mutator memory performance: L1, L2 & TLB

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

jess

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

jess mutator time

MarkSweepSemiSpace

Normalized Heap Size

Nor

mal

ized

mu

tato

r ti

me

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

jess

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61

1.2

1.4

1.6

1.8

2

jess L1 misses

MarkSweepSemiSpace

Normalized Heap Size

Nor

mal

ized

L1

mis

ses

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

jess

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61

2

3

4

5

6

7

8

9

jess L2 misses

MarkSweepSemiSpace

Normalized Heap Size

Nor

mal

ized

L2

mis

ses

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

jess

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 61

1.5

2

2.5

3

jess TLB misses

MarkSweepSemiSpace

Normalized Heap Size

Nor

mal

ized

TLB

mis

ses

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

javac

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.05

1.1

1.15

1.2

javac mutator time

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d m

uta

tor

tim

e

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.1

1.2

1.3

1.4

1.5

javac L1 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

1 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.2

1.4

1.6

1.8

javac L2 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

2 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.2

1.4

1.6

1.8

javac TLB misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d T

LB m

isse

s

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

pseudojbb

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.05

1.1

1.15

1.2

1.25

jbb mutator time

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d m

uta

tor

tim

e

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.1

1.2

1.3

1.4

jbb L1 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

1 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

jbb L2 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

2 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

jbb TLB misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d T

LB m

isse

s

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

db

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.02

1.04

1.06

1.08

1.1

1.12

db L1 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

1 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.025

1.05

1.075

1.1

1.125

1.15

db mutator time

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d m

uta

tor

tim

e

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

db L2 misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d L

2 m

isse

s

1 1.21 1.44 1.93 2.47 3.07 3.72 4.43 5.19 6

1

1.05

1.1

1.15

1.2

1.25

db TLB misses

MarkSweep

SemiSpace

Normalized Heap Size

No

rma

lize

d T

LB m

isse

s

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Locality

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

• Is the locality differential age-dependant?• Re-run experiment with GenCopy &

GenMS– Generational variants of MarkSweep &

SemiSpace– Young objects treated identically– Mature objects either SemiSpace or

MarkSweep

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07

javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26

db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17

Mutator L2MS/SS

Mutator TLBMS/SSMS/SS

Mutator Time Mutator L1MS/SS

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07

javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26

db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17

Mutator L2MS/SS

Mutator TLBMS/SSMS/SS

Mutator Time Mutator L1MS/SS

• Why? Mature space locality?

• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS

Contigious allocation in nursery?

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07

javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26

db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17

Mutator L2MS/SS

Mutator TLBMS/SSMS/SS

Mutator Time Mutator L1MS/SS

• Why?• Mature space locality?

• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS

• Contigious allocation in nursery?

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Whole Gen Whole Gen Whole Gen Whole Genjess 1.26 1.02 1.73 0.87 2.27 0.53 1.91 1.07

javac 1.13 1.05 1.32 1.02 1.38 1.25 1.53 1.20pseudojbb 1.15 1.08 1.25 1.14 1.44 1.22 1.45 1.26

db 1.10 1.10 1.07 1.09 1.01 1.05 1.17 1.17

Mutator L2MS/SS

Mutator TLBMS/SSMS/SS

Mutator Time Mutator L1MS/SS

• Why? Mature space locality

• Nursery absorbs most allocs – lower frag• Relatively frequent copying in SS

Contigious allocation in nursery

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Run SS & MS in “infinite” heap

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Run SS & MS in “infinite” heap

MarkSweep SemiSpace1.5X 1.5X

jess 3.37 3.44 1.02 2.63 3.00 1.14javac 8.51 8.34 0.98 7.38 7.60 1.03

pseudojbb 10.82 11.04 1.02 9.58 9.68 1.01db 14.12 14.40 1.02 13.06 11.88 0.91

geomean 1.01 1.02

1.5/ 1.5/

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Bump Pointer & Free List

Run SS & MS in “infinite” heap

MarkSweep SemiSpace1.5X 1.5X

jess 3.37 3.44 1.02 2.63 3.00 1.14javac 8.51 8.34 0.98 7.38 7.60 1.03

pseudojbb 10.82 11.04 1.02 9.58 9.68 1.01db 14.12 14.40 1.02 13.06 11.88 0.91

geomean 1.01 1.02

1.5/ 1.5/

• Infinite heap does not degrade locality (!?)– Exceptions: jess (degrades), db (improves)

why?– Is spatial locality unimportant in mature

space???

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

BP & FL Locality Implications

• Is spatial locality unimportant in mature space??– No [Huang et al OOPSLA 2004]– But perhaps temporal locality is more significant

• Seems clear contiguous allocation is good– Vast majority of objects < cache line– h/w prefetcher may be significant

• Hard to improve over alloc order, easy to mess up?– Unlikely to be true: MarkSweep < Compacting <

SemiSpace

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Locality &Architecture

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

MS/SS Crossover: 1.6GHz PPC

1

1.5

2

2.5

3

1 2 3 4 5 6

Heap Size Relative to Minimum

Normalized Total Time

1.6GHz PPC SemiSpace

1.6GHz PPC MarkSweep

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

MS/SS Crossover: 1.9GHz AMD

1

1.5

2

2.5

3

1 2 3 4 5 6

Heap Size Relative to Minimum

Normalized Total Time

1.6GHz PPC SemiSpace

1.6GHz PPC MarkSweep

1.9GHz AMD SemiSpace

1.9GHz AMD MarkSweep

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

MS/SS Crossover: 2.6GHz P4

1

1.5

2

2.5

3

1 2 3 4 5 6

Heap Size Relative to Minimum

Normalized Total Time

1.6GHz PPC SemiSpace

1.6GHz PPC MarkSweep

1.9GHz AMD SemiSpace

1.9GHz AMD MarkSweep

2.6GHz P4 SemiSpace

2.6GHz P4 MarkSweep

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

MS/SS Crossover: 3.2GHz P4

1

1.5

2

2.5

3

1 2 3 4 5 6

Heap Size Relative to Minimum

Normalized Total Time

1.6GHz PPC SemiSpace

1.6GHz PPC MarkSweep

1.9GHz AMD SemiSpace

1.9GHz AMD MarkSweep

2.6GHz P4 SemiSpace

2.6GHz P4 MarkSweep

3.2GHz P4 SemiSpace

3.2GHz P4 MarkSweep

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

1

1.5

2

2.5

3

1 2 3 4 5 6

Heap Size Relative to Minimum

Normalized Total Time

1.6GHz PPC SemiSpace

1.6GHz PPC MarkSweep

1.9GHz AMD SemiSpace

1.9GHz AMD MarkSweep

2.6GHz P4 SemiSpace

2.6GHz P4 MarkSweep

3.2GHz P4 SemiSpace

3.2GHz P4 MarkSweep

MS/SS Crossover

2.6GHz2.6GHz

1.9GHz1.9GHz

1.6GHz1.6GHz

locality space

3.2GHz3.2GHz

Tuesday, April 18, 2023Myths & Realities: The performance impact of garbage

collection

Conclusions• Need for (re) evaluation of GC

performance– Key GC insights > 20yrs old– Technology has changed– Absence of apples-to-apples comparisons– Highly architecturally sensitive

• MMTk + perf counters– High performance infrastructure– Multiple GCs, shared mechanisms

• Some myths exposed & new realities

top related