1 garbage collection advantage: improving program locality xianglong huang (ut) stephen m blackburn...

Post on 17-Jan-2016

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Garbage Collection Advantage:

Improving Program Locality

Xianglong Huang (UT)Stephen M Blackburn (ANU), Kathryn S McKinley (UT)

J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)

2

Motivation

• Memory gap problem• OO programs become more popular• OO programs exacerbates memory gap

problem– Automatic memory management– Pointer data structures– Many small methods

Goal: improve OO program locality

3

Cache Performance Matters

_213_javac

05

10152025303540

Tota

l Cyc

les

(in b

illio

ns)

4

Opportunity

• Generational copying garbage collector reorders objects at runtime

5

1

4

65

7

2 3

Copying of Linked Objects

BreadthFirst

65

7

432

1

6

71 2 3 4 5 6

1

4

65

7

2 3

Copying of Linked Objects

65

7

432

1

BreadthFirst

DepthFirst

7

71 2 3 4 5 6

Copying of Linked Objects

DepthFirst

OnlineObjectReordering

1 4BreadthFirst

61 2 3 4 75

1

4

65

7

2 3

65

7

432

1

41

8

Outline

• Motivation• Online Object Reordering

(OOR)• Methodology• Experimental Results• Conclusion

9

Online Object Reordering

• Where are the cache misses?• How to identify hot field accesses

at runtime?• How to reorder the objects?

10

Where Are The Cache Misses?

VM Objects StackOlder

Generation

• Heap structure:

Nursery

Not to scale

11

Where Are The Cache Misses?

_209_db

0200400600800

100012001400160018002000

To

tal

Acc

esse

s (i

n m

illi

on

s)

L2 hits

L2 misses

12

Where Are The Cache Misses?

• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection

13

How to Find Hot Fields?

• Runtime info (intercept every read)?

• Compiler analysis?• Runtime information + compiler

analysis Key: Low overhead estimation

14

Which Classes Need Reordering?

Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses

Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot

methods

Key: Low overhead estimation

15

Example: Compiler Analysis

Compiler

Hot BBCollect access info

Cold BBIgnore

Compiler

Access List:1. A.b2. ….….

Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}

16

Example: Adaptive Sampling

Method Foo { Class A a; try { …=a.b;

… } catch(Exception e){

…a.c }}

Adaptive Sampling

Foo is hot

Foo Accesses:1. A.b2. ….….

A.b is hot

A

B

b…..

c A’s type information

c b

17

1

4

65

7

2 3

Copying of Linked Objects

65

7

43

OnlineObjectReordering

Type Information

143

2

1

Hot space Cold space

18

OOR System Overview

BaselineCompiler

SourceCode

ExecutingCode

AdaptiveSampling Optimizing

Compiler

HotMethods

Access InfoDatabase

Register HotField Accesses

Look Up

AddsEntries

GC: CopiesObjects

Affects Locality

AdviceGC: CopiesObjects

OOR additionJikesRVM componentInput/Output

OptimizingCompiler

AdaptiveSampling

Improves Locality

19

Outline

• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion

20

Methodology: Virtual Machine

• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization

• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]

21

Methodology: Memory Management

• Memory Management Toolkit (MMTk):– Allocators and garbage collectors– Multi-space heap

• Boot image• Large object space (LOS)• Immortal space

• Experiment setup– Generational copying GC with 4M

bounded nursery

22

Overhead: OOR Analysis Only

Benchmark Base Execution Time (sec)

w/ only OOR Analysis (sec)

Overhead

jess 4.39 4.43 0.84%

jack 5.79 5.82 0.57%

raytrace 4.63 4.61 -0.59%

mtrt 4.95 4.99 0.70%

javac 12.83 12.70 -1.05%

compress 8.56 8.54 0.20%

pseudojbb 13.39 13.43 0.36%

db 18.88 18.88 -0.03%

antlr 0.94 0.91 -2.90%

hsqldb 160.56 158.46 -1.30%

ipsixql 41.62 42.43 1.93%

jython 37.71 37.16 -1.44%

ps-fun 129.24 128.04 -1.03%

Mean -0.19%

23

Detailed Experiments

• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic

blocks• Three architectures

– x86, AMD, PowerPC

• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB

24

Performance javac

25

Performance db

26

Performance jython

Any static ordering leaves you vulnerable to pathological cases.

27

Phase Changes

28

Related Work

• Evaluate static orderings [Wilson et al.]– Large performance variation

• Static profiling [Chilimbi et al., and others]– Lack of flexibility

• Instance-based object reordering [Chilimbi et al.]– Too expensive

29

Conclusion

• Static traversal orders have up to 25% variation

• OOR improves or matches best static ordering

• OOR has very low overhead• Past predicts future

30

Questions?

Thank you!

31

OOR System Overview

• Records object accesses in each method (excludes cold basic blocks)

• Finds hot methods by adaptive sampling

• Reorders objects with hot fields in older generation during GC

• Copies hot objects into separate region

top related