department of computer sciences ismm 20081 no bit left behind: the limits of heap data compression...
TRANSCRIPT
ISMM 2008 1
Department of Computer Sciences
No Bit Left Behind: The Limits of Heap Data
Compression
Jennifer B. Sartor*Martin Hirzel†, Kathryn S.
McKinley**U Texas at Austin, †IBM Watson
Department of Computer Sciences
ISMM 2008 2
Current State Managed languages ubiquitous
Embedded devices
Multicore
Need memory efficiency!
CPU L1
L2
CPUCPU
L1L1
L2
Department of Computer Sciences
ISMM 2008 3
Memory Efficiency of Managed Languages
X COSTX 8-94% information content in heap in 37
benchmarks. [Mitchell & Sevitsky, OOPSLA 07]X Boxed objectsX Trailing zeros in arraysX Redundant objectsX Extra bit-widthX Data structure back-bones
bzip2
86% OPPORTUNITY Memory layout abstraction (Location + size) != identity
Department of Computer Sciences
ISMM 2008 4
Related WorkAnanian & Rinard. LCTES 03 Dom value field hash
Appel & Goncalves. Tech Report 93 Eql obj sharing, Const field elide, Bit-width reduction
Chen, Kandemir & Irwin. VEE 05 Dom value field elide
Chen, et al. OOPSLA 03 Zero compr, Trail zero trim
Cooprider & Regehr. PLDI 07 Value set indirection
Marinov & O’Callahan. OOPSLA 03 Eql obj sharing
Stephenson, Babb & Amarasinghe. PLDI 00
Const field elide, Bit-width reduction
Titzer, et al. PLDI 07 Value set indirection
Zilles. ISMM 07 Bit-width reduction
Department of Computer Sciences
ISMM 2008 5
Limit Study
Quantitatively compare heap data compression Surveyed literature Savings equations Methodology for evaluation Apples-to-apples comparison Future work: implementation
Hybrid techniques
Findings: array & hybrid compression
58%
Department of Computer Sciences
ISMM 2008 6
Hybrid Array Compression
x0001 x0001 x0058 x0001 x0004 x0001 x0000 x0001
Redundancy Equal array sharing
x0001 x0001 x0058 x0001 x0004 x0001 x0000 x0001
Department of Computer Sciences
ISMM 2008 7
Equal Object Sharing Marinov & O’Callahan. OOPSLA 03;
Appel & Goncalves. Tech Report 93
Two objects are equal if both Same class & all fields have same
value Strictly-equal: pointer fields identical Deep: objects pointer targets are equal
JVM store only 1 copy in hashtable
€
(N −D)×sizeof (C)−hashTableSize(D,pntrSize)
14%
Class C, N objects, D distinct; save:
Department of Computer Sciences
ISMM 2008 8
Hybrid Array Compression
x0001 x0001 x0058 x0001 x0004 x0001 x0000 x0001
Redundancy Equal array sharing Value set indirection
x0001 x0001 x0058 x0001 x0004 x0001 x0000 x0001
Dictionary: x0001 x0058 x0004 x0000
0 0 1 0 2 0 3 0
Department of Computer Sciences
ISMM 2008 9
Value Set Indirection & Caching
Cooprider & Regehr/ Titzer, et al. PLDI 07 For object field or array elements
with large range of values Dictionary (or cache) of 256 most frequent
values, instance stores small 1 byte indices
14%
If > 256 values, 255 in dictionary, 256th says to store rest (M) in hashtable w/ objectID
€
a.length×(sizeof (T)−1)
a∈T[]
∑ −arrayHdrSize−256×sizeof (T)
−hashTableSiz ′ e (M,sizeof (T))
Department of Computer Sciences
ISMM 2008 10
Hybrid Array Compression 2
x00A0 x0073 x0002 x0001 x0101 x0000 x0000 x0000
Remove zeros Trim trailing zeros
Bit width reduce
Zero compress
x00A0 x0073 x0002 x0001 x0101 8 5
x0A0 x073 x002 x001 x101 8 5
x0A x73 x2 x001 x101 8 5 101011118 5 xAF
Department of Computer Sciences
ISMM 2008 11
Zero-based Object Compression
Chen, et al. OOPSLA 03 Remove bytes that are entirely
zero Per object bit-map: 1 bit per
byte Store only non-zero bytes
45%
Savings:
€
zeroBytes(o)− totalBytes(o)8
⎡
⎢
⎢ ⎢ ⎢ ⎢
⎤
⎥
⎥ ⎥ ⎥ ⎥o∈objects
∑
Department of Computer Sciences
ISMM 2008 12
Hybrid Array Compression 2
x00A0 x0073 x0002 x0001 x0101 x0000 x0000 x0000
Remove zeros Trim trailing zeros
Bit width reduce
Zero compress
x00A0 x0073 x0002 x0001 x0101 8 5
x0A0 x073 x002 x001 x101 8 5
x0A x73 x2 x001 x101 8 5 xAF
Department of Computer Sciences
ISMM 2008 13
Methodology
Program run Heap dumpseries
Analysisrepresentation
t
Model 1
–Model n
…s
Limit savings
Garbage
Collection
snapshot
Department of Computer Sciences
ISMM 2008 14
Experimental Details Jikes Research Virtual Machine
Java-in-Java DaCapo benchmarks + pseudojbb 20-25 heap snapshots per benchmark
MarkSweep with 2x min heap Analysis
Per class Objects and arrays separated JVM+app vs application (separated in
paper) Per heap snapshot, and over all snapshots
Department of Computer Sciences
ISMM 2008 15
Technique Class Array GC/RunLempel-Ziv compression X GC
Strictly-equal object sharing Obj Type GC
Deep-equal object sharing Obj Type GC
Zero-based object compression Obj Inst GC
Trailing zero array trimming Inst GC
Bit-width reduction Fld Inst GC/Run
Dominant-value field hashing Fld GC
Lazy invariant computation Fld GC
Value set indirection Fld Type GC
Value set caching Fld Type GC
Constant field elision Fld Run
Dominant-value field elision Fld Run
Department of Computer Sciences
ISMM 2008 16
Value Indirection & Cache
Deep Equal Sharing
Zero Compression
Hybrid Compression
Department of Computer Sciences
ISMM 2008 17
Stability of Savings
0%
5%
10%
15%
20%
25%
combArr
maxArr
zeroArr
combCls
maxCls
bitwArr
zeroCls
hashFld
indirFld
cacheArr
strEqArr
bitwFld
charArr
trailArr
strEqCls
lazyFld
cacheFld
indirArr
boolArr
fop: snapshots over time
Department of Computer Sciences
ISMM 2008 18
Conclusions Limit study compare apples-to-
apples heap data compression techniques
Potential to reduce memory inefficiencies in managed languages Arrays Hybrids
Future: save space Challenge: efficient detection &
recovery
Thank you!