© richard jonesismm 2008, tucson, az 000 a study of java object demographics richard jones chris...
TRANSCRIPT
© Richard Jones ISMM 2008, Tucson, AZ 001
A Study of Java Object DemographicsA Study of Java Object Demographics
Richard Jones
Chris RyderComputing Laboratory
University of Kent
© Richard Jones ISMM 2008, Tucson, AZ 002
Overview
Motivation & Contribution
Object demographics examples
Data capture
Clustering
Program inputs
Calling context
Related work
© Richard Jones ISMM 2008, Tucson, AZ 003
Tomorrow Never Dies?
Object segregation• Generations• Older-First• Immortal region• Large object areas
Idea: • Segregate objects by age,
size, type, mortality, etc.• Collect regions under
different policies and mechanisms.
Choice of GC• Select the best GC for the
application a priori.• Hot-swap running GC.
Idea: • Different applications have
different demographics.• Respond to phase changes.
Exploiting program behaviour
© Richard Jones ISMM 2008, Tucson, AZ 004
Dr. No ‘one size fits all’
Most systems manage objects uniformly• E.g. allocate all objects in a nursery and collect all nursery objects at
the same time, promoting to the same older generation.
Pre-tenuring GC uses a very simple classification• E.g. short-lived, long-lived, immortal.
Contributions
A detailed study of Java object demographics reveals• A richer landscape than short/long/immortal.• Distinct behaviour of application, library and JVM objects.• Clusters of allocation sites, stable across program inputs.• A small number of clusters dominate.• Context is an important predictor for library allocation.
© Richard Jones ISMM 2008, Tucson, AZ 005
The Living Daylights
Compiles JavaLex scanner 4 times.
Allocates [char] for a GNU classpath internal String constructor.
6% of total allocation.
Compiles JavaLex scanner 4 times.
85% of these objects very short-lived.
A few are immortal.
Some survive to the end of the phase.
A few are long lived.
Age
ToD
No go areaToD < Age
Lifespan
_213_javac, speed 100
© Richard Jones ISMM 2008, Tucson, AZ 006
DaCapo hsqldb, default input.
4 sites
17% volume
95% space rental• [volume x lifetime]
Scarcely any objects are very short-lived.
Die Another Day
© Richard Jones ISMM 2008, Tucson, AZ 007
Live And Let Die
DaCapo fop,default input
18 sites
19% volume• 8.29% short-lived• 9.27% immortal
16% space rental
© Richard Jones ISMM 2008, Tucson, AZ 008
For Your Eyes Only
MemTrace compiles method to…• Record allocation sites.• Modify allocation routines.
– Tag object header with site & position in calling context tree.
– Emit allocation record.• Benefit: same framework as for method specialisation [ISMM07].
MemTrace profiles using…• Baseline compiler — focus on application objects.• Forced full collections (64K granularity).
– GCspy framework to log death events.
– Exaggerates lifetimes of short-lived objects.
© Richard Jones ISMM 2008, Tucson, AZ 009
Casino Royale
Aim• Characterise lifetimes of
objects allocated by a site.
• Identify sites with similar lifetimes.
We call the cumulative frequency curve the lifetime distribution function (ldf) of the site.
• Expect collaborating sites to have similar ldf’s.
© Richard Jones ISMM 2008, Tucson, AZ 0010
From Russia With Love
Compare ldf’s statistically for some confidence n%• Kolmogorov-Smirnov Two Sample test• D = the maximum difference between 2 frequency distributions Ei(t)• p(D is significant) < n?• Benefit: non-parametric, distribution-free, cheap.
© Richard Jones ISMM 2008, Tucson, AZ 0011
Thunderball
Greedy, gravitational clustering
© Richard Jones ISMM 2008, Tucson, AZ 0012
Thunderball
Greedy, gravitational clustering
© Richard Jones ISMM 2008, Tucson, AZ 0013
Thunderball
Greedy, gravitational clustering
© Richard Jones ISMM 2008, Tucson, AZ 0014
Thunderball
Greedy, gravitational clustering
© Richard Jones ISMM 2008, Tucson, AZ 0015
DaCapo: all allocation
Immortal clusteralways cluster 0
© Richard Jones ISMM 2008, Tucson, AZ 0016
DaCapo: application packages
Immortal clusteralways cluster 0
© Richard Jones ISMM 2008, Tucson, AZ 0021
You Only Live Twice
Does an allocation site generate the same lifetime behaviour regardless of input?
Do allocation sites share the same cluster from one input to another?
• i.e. continue to behave in the same way as each other? • Compare cluster membership with Adjusted Rand Index
antlr jython pmd psCo Ap Li VM Co Ap Li VM Co Ap Li VM Co Ap Li VM
SD 0.9 0.5 1.0 1.0 0.7 1.0 1.0 1.0 0.9 1.0 1.0 1.0 1.0 1.0 1.0 0.9
SL 0.8 0.4 0.9 1.0 0.7 0.7 1.0 1.0 0.7 1.0 1.0 1.0 0.8 0.8 1.0 0.9
DL 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.8 1.0 1.0 1.0 0.6 0.7 1.0 0.9
© Richard Jones ISMM 2008, Tucson, AZ 0022
The World Is Not Enough?
Earlier studies• Java: site is sufficient [Blackburn et al, OOPSLA01]
• C: more context required [Zorn & Seidl, ASPLOS98]
Calling context• <site+method0, method1, method2, …>• Increasing depth of context splits an ldf into 1 or more.
Compare the variance of site ldf’s• Variance of program = weighted sum of the variances of its ldf’s
© Richard Jones ISMM 2008, Tucson, AZ 0023
ContextContext (2)
All
Jikes RVM Library
Application
Variance as a multiple of depth =
© Richard Jones ISMM 2008, Tucson, AZ 0024
A View To A Kill
Related work: choice of GC• Fitzgerald and Tarditi [ISMM00]. • Hot-swapping: Printezis [JVM01]; Soman, Krintz, Bacon [ISMM04];
Singer, Brown & Watson [ISMM07].• Thomas [Inf Proc Letters '95] tailors GC to the program.
Demographics• Dieckman and Holzle [ECOOP98] focus on reference densities,
proportion of arrays, etc.• DaCapo [OOPSLA06] characterise benchmarks by heap-related
metrics. • Pretenuring - Cheng, Harper, Lee [PLDI98], Harris [ISMM00];
Blackburn et al [TOPLAS07]; Marion, Jones, Ryder [ISMM07].• Merlin [SIGMETRICS02].
© Richard Jones ISMM 2008, Tucson, AZ 0025
Conclusions
No one size of collector fits all.
Programs exhibit only a few distinct object lifetime
distributions.
These are richer than short/long/immortal.
A very small number of clusters dominate.
Clusterings are stable across inputs.
Calling context is important for libraries.
http://www.cs.kent.ac.uk/projects/gc/demographics
© Richard Jones ISMM 2008, Tucson, AZ 0026
Questions?