Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe and Efficient Hybrid Memory Management for Java
Codruț Stancu Christian Wimmer Oracle Labs
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Truffle System Structure
Common API separates language implementation and optimization system
Language agnostic dynamic compiler
AST Interpreter for every language
Integrate with Java applications
Low-footprint VM, also suitable for embedding
Substrate VM
Graal
JavaScript Ruby Python R
Graal VM
…
Truffle
4
Your language should be here!
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Substrate VM is …
5
an embeddable VM
for, and written in, a subset of Java
optimized to execute Truffle languages
ahead-of-time compiled using Graal
integrating with native development tools.
…
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Substrate VM: Execution Model
6
Ahead-of-Time Compilation
Static Analysis
Substrate VM
Truffle Language
JDK
Reachable methods, fields, and classes
Machine Code
Initial Heap
All Java classes from Truffle language
(or any application), JDK, and Substrate VM
Application running without dependency on JDK and without Java class loading
DWARF Info
ELF / MachO Binary
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 7
Microbenchmark for Startup and Peak Performance (1) function benchmark(n) { var obj = {i: 0, result: 0}; while (obj.i <= n) { obj.result = obj.result + obj.i; obj.i = obj.i + 1; } return obj.result; }
Function benchmark is invoked in a loop by harness (0 to 40000 iterations)
n fixed to 50000 for all iterations
JavaScript VM Version Command Line Flags
Google V8 Version 3.26.31 [none]
Mozilla Spidermonkey Version JavaScript-C36.0a1 [none]
Nashorn JDK 8 build 1.8.0-b132 -J-Xmx256M
Nashorn JDK8 update 40 build 1.8.0_40-ea-b12 -J-Xmx256M
Truffle on HotSpot VM Changeset bb0f4716f830 from Nov 14, 2014 -J-Xmx256M
Truffle on Substrate VM SVM changeset 1015c4a5458e from Nov 14, 2014 based on Graal and Truffle 0.5
[none]
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 8
00.10.20.30.40.50.60.70.80.9
0 2 4 6 8 10
Exec
utio
n Ti
me
[Sec
onds
]
Iterations
00.10.20.30.40.50.60.70.8
0
Google V8Mozilla SpidermonkeyNashorn JDK 8Nashorn JDK 8u40 b12Truffle on HotSpot VMTruffle on Substrate VM
00.20.40.60.8
11.21.41.6
0 50 100 150 200Iterations
0123456789
10
0 10000 20000 30000 40000Iterations
0
20
40
60
80
100
120
140
0 2 4 6 8 10
Mem
ory
Foot
prin
t [M
Byte
]
020406080
100120140160180200
0 50 100 150 200
Microbenchmark for Startup and Peak Performance (2)
Blocking compilation at iteration 4
Background compilation finished
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
How can execution pattern knowledge be exploited to optimize memory utilization?
9
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region-Based Memory Management
10
Static Analysis
Runtime Data Structures
Region analysis is built on top of our static analysis
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Automatic Memory Reclamation
11
• Modern VMs approach: one size fits all Garbage Collection
• GC has performance and memory footprint overhead
• Region allocation – Divide application in scoped phases – Release memory when phase exits
• GC as backup: hybrid memory management
Efficient memory reclamation
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Allocation • Coarse grain regions, matching method borders • Annotation based
• Static Region Analysis
– Determine allocation region for each allocation site
12
@RegionScope(name = "R0") public void foo() { // ... }
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Points-to Analysis • Context sensitive • Object abstraction: type & allocation site • Invocation context is the receiver object abstraction
– 2obj + 1H (2-object-sensitive with 1-context-sensitive heap)
• Generates: – Call graph – Points-to sets for reference variables and fields
13
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Analysis • The points-to analysis uses the region annotations as additional context
elements • A method is analyzed separately for each different region scope from
which it is invoked • 2obj + 1H analysis becomes 2obj + 1H + 1R (2-object-sensitive with 1-context-sensitive heap and 1-region-context)
14
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Analysis • Iterates annotated call graph, depth first, and builds a region tree.
15
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Analysis • Iterates over program statements • For each object records definitions and uses
– Definitions: new instance, new array – Uses: store, load, invoke, etc. – Return causes object to escape to caller – Store to static causes object to be allocated in the GC heap
16
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region analysis results: region mapping function
• m(A) = {(d, a, o), where m is the region mapping function, A is an allocation site, d is definition region context, a is allocation region, o is offset in runtime region stack}
17
Efficient allocation m(x) = {(D2,A,2)}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Analysis Example
18
m(x) = {(F1,B,2)}
m(x) = {(F1,B,2), (F2,B,2)}
m(x) = {(F1,B,2), (F2,B,2), (F3,C,1)}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Allocation
19
@Snippet public static Object newInstanceSnippet(DynamicHub hub, RegionMapping mapping) { Region regionContext = RegionManager.regionStack().top(); int offset = mapping.lookup(regionContext); // table lookup Region allocationRegion = RegionManager.regionStack().get(offset); return bump-pointer allocation in allocationRegion }
m(x) = {(F1,B,2), (F2,B,2), (F3,C,1)}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Region Analysis Normalization Example
20
m(x) = {(F1,B,2), (F2,B,2), (F3,C,1)} after normalization: m(x) = {(F1,B,2), (F2,B,2), (F3,A,2)}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Offset Normalization Optimization
21
@Snippet public static Object newInstanceSnippet(DynamicHub hub, RegionMapping mapping) { Region regionContext = RegionManager.regionStack().top(); int offset = mapping.lookup(regionContext); // table lookup Region allocationRegion = RegionManager.regionStack().get(offset); return bump-pointer allocation in allocationRegion }
@Snippet public static Object newInstanceSnippet(DynamicHub hub, int constantOffset) { Region allocationRegion = RegionManager.regionStack().get(constantOffset); return bump-pointer allocation in allocationRegion }
m(x) = {(F1,B,2), (F2,B,2), (F3,C,1)}
m(x) = {(F1,B,2), (F2,B,2), (F3,A,2)}
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 22
Memory Management Memory Chunks Fixed size, aligned on size-granularity, contains all GC tables e.g., 1 MByte, aligned on 1 MByte granularity
Card Marks
Offset Table
Objects
Young Generation (nursery space) GC evacuates all live objects to old generation, no aging, live objects identified using card marks
Old Generation Stop&Copy algorithm for first implementation
Large Arrays Memory requested per object from OS Never moved
Regions (identified by static analysis) No object referenced from outside Can reference outside objects Usually freed without GC Processed by Young GC
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Experiments
Full GC # normal GC 2 hybrid normalized 0
Incremental GC # normal GC 219 hybrid normalized 49
Full GC time (s) normal GC 0.95 hybrid normalized 0.00
Incremental GC time (s) normal GC 4.68 hybrid normalized 1.64
23
Lines of code 12,581
Annotations 7
• Benchmark: SPECjbb2005 • Configuration:
– Normal GC & hybrid normalized – 256MB young gen.
Region enter # hybrid normalized 3000030 Region deallocation # hybrid normalized 2999952 Max stack depth hybrid normalized 3
Allocated memory (MByte) all configurations 56777
GC freed memory (MByte) normal GC 56497 hybrid normalized 12124
Region freed memory (MByte) hybrid normalized 43625 Region freed memory (%) hybrid normalized 77.62% Max region stack size (KByte) hybrid normalized 583
Execution time (s) normal GC 127.43 hybrid normalized 124.12
Speedup (%) hybrid normalized 3%
• 78% of memory region allocatable • 583 KB maximum region stack size
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Execution and GC time, varying young generation size
24
0
50
100
150
200
1 2 4 8 16 32 64 128 256
Tim
e (se
cond
s)
Young Generation Size (MByte)
Execution Time: Normal GC Execution Time: Hybrid NormalizedGC Time: Normal GC GC Time: Hybrid Normalized
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Thank You!
• More information on hybrid memory management – Codrut Stancu, Christian Wimmer, Stefan Brunthaler, Per Larsen, Michael Franz:
Safe and Efficient Hybrid Memory Management for Java. In Proceedings of the International Symposium on Memory Management, pages 81–92. ACM Press, 2015. doi:10.1145/2754169.2754185
• More information on Graal and Truffle: – https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations
25
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 26
27