garbage first and you

58
Garbage First and You The new* Garbage Collector in the JVM Kai Koenig @AgentK

Upload: kai-koenig

Post on 08-Aug-2015

224 views

Category:

Software


0 download

TRANSCRIPT

Garbage First and You !

The new* Garbage Collector in the JVM

Kai Koenig @AgentK

Web/Mobile Developer since the late 1990s

Interested in: Java & JVM, CFML, Functional

Programming, Go, JS, Android, Raspberry Pi

!

And this is my view of the world…

Me

1. The JVM and Garbage Collection in 5 mins2. Academic ideas behind G1 3. The G1 collector 4. Tuning G1 and practical implications5. Further changes in Java 8

Agenda

1. JVM and GC in 5 minutes

Fundamentals

The most simplistic view of the JVM:

!

“Java virtual machine (JVM) interprets compiled Java binary code (called bytecode) for a computer’s processor (or hardware platform) so that it can perform a Java program's instructions.”

1.1 JVM Architecture

High Level view (1)

Bytecode in .class files has the same semantics as the .java code

High Level view (II)

Java stack vs heap memory

Each method call creates a new stack frame, which has an operand stack, array of local vars and a program counter. → Seen ‘Stack Traces’ in a Java Error before? !

Exception  in  thread  "main"  java.lang.NullPointerException                  at  com.example.myproject.Book.getTitle(Book.java:16)                  at  com.example.myproject.Author.getBookTitles(Author.java:25)                  at  com.example.myproject.Bootstrap.main(Bootstrap.java:14)

Stack and Heap

1.2 Garbage Collection

Heap management

The JVM has no way of knowing the lifespan of a certain object in advance.

Generational Memory Management is a solution to overcome this issue and fragmentation:

- Young Generation

- Old Generation / Tenured Generation

- Sometimes: Permanent Generation

Contiguous generational heap

Garbage Collector selection criteria

Efficiency / Throughput

Concurrency

Overhead

JVM version you’re on

!

YG Collectors: Parallel

Parallel MaC (since Java 1.4.2) distributes the Marking and Copying phases over multiple threads.

The actual collection is still stop-the-world, but for a much shorter period of time.

YG default since Java 5 if machine has 2+ cores or CPUs, otherwise: -XX:+UseParallelGC.

!

OG Collectors: Concurrent

Up to Java 6/7 Concurrent Mark-and-Sweep is the preferred OG collector if you want to minimise stop-the-world collections.

CMS via -XX:+UseConcMarkSweepGC

Well suited for larger heaps (but be aware of fragmentation), there’s an ‘incremental’ mode for systems with 1-2 CPU cores.

Stop-the-world and concurrent collections

2. Academic ideas behind G1

“Garbage-First Garbage Collection”

Research paper originally published in 2004 by David Detlefs, Christine Flood, Steve Heller and Tony Printezis of Sun Research.

!

The actual research project started in the late 1990s to overcome common issues in Garbage Collection techniques known and used at the time.

Core ideas

Four core elements:

- SATB concurrent marking algorithm

- Better way to achieve a real-time goal

- Get rid of a contiguous heap and use regions

- Compacting and predictable

Snapshot-at-the-beginning

SATB does a periodic analysis of global reachability (liveness) and provide completeness.

Results:

- Accurate counts of live data in each region - Completeness: garbage is eventually identified - Very low pause time

!

‘Soft’ real-time goal and regions

Before G1, garbage collectors tried to achieve hard real time goals by:

- making collection interruptible - working on the granularity of object levels.

G1 works on a coarser granularity of regions:

- Chooses regions to collect that match goal - Collection of a region can be delayed

!

3. The G1 Collector

3.1 Basics

G1 (Garbage First)

G1 is a ‘replacement’ for CMS in Java 7+

Benefits:

- Consistently low-pause

- Adaptable

- Less fragmentation than CMS

- Less need for ongoing tuning

- Best collector for a really large heap

Fundamental ideas (I)

Minimum of 6 GB heap, if below - consider staying with CMS

Enable: -XX:+UseG1GC

Provide minimal set of expectations and let G1 do the job:

- Heap size (min/max) - How much CPU time can the application use? - How much CPU time can G1 use?

Fundamental ideas (II)

‘Main’ setup parameter: -XX:MaxGCPauseMillis=<n>

G1 is not an OG-only collector like CMS

G1 splits the whole area of heap memory:

- ~2000 regions - Size between 1-32 MB each - usually automatically chosen by the JVM

!

Region setup in G1

Note: There is another region type H (humongous)

3.2 G1 YoungGen

A YG collection in G1 (before)

Non-AllocatedOld GenerationYoung GenerationRecently copied YG

A YG collection in G1 (stop-the-world)

Non-AllocatedOld GenerationYoung GenerationRecently copied YG

A YG collection in G1 (result)

Non-AllocatedOld GenerationYoung GenerationRecently copied YG

YG in G1 - Summary

The YG is a set of non-contiguous regions, which helps resizing after a collection.

YG collections in G1 are stop-the-world events and all application threads will stop.

YG collections are done in multiple, parallel threads.

Leftover (alive) objects → move to a survivor or OG region.

3.3 G1 OldGen

G1 and the OG - overview (I)

1. Initial Mark (stop-the-world and piggybacking on a YG collection)

2. Root Region Scan (blocks YG from happening)

3. Concurrent Marking

4. Remark (stop-the-world and due to a new algorithm much faster than CMS)

G1 and the OG - overview (II)

5. Cleanup (stop-the-world and concurrent)

6. Copying (stop-the-world, piggybacking on YG collections)

!

!!

!

OG collection in G1(initial marking)

Marks root regions with references to OG objects.

Non-AllocatedOld GenerationYoung GenerationRecently copied YGRecently copied OG

OG collection in G1(concurrent marking)

Marks empty regions and calculates object ‘liveness’.

X

X

Non-AllocatedOld GenerationYoung GenerationRecently copied YGRecently copied OG

An OG collection in G1(remark)

Empty regions are removed and reclaimed.

Non-AllocatedOld GenerationYoung GenerationRecently copied YGRecently copied OG

An OG collection in G1(cleanup & copy)

Region with lowest liveness get collected with YG collections (‘mixed’ collections).

Non-AllocatedOld GenerationYoung GenerationRecently copied YGRecently copied OG

An OG collection in G1(result)

Collection is done and leftovers are compacted.

Non-AllocatedOld GenerationYoung GenerationRecently copied YGRecently copied OG

OG in G1 - Summary

Concurrent Marking:

- Liveness info determines where to collect - No sweeping phase like in CMS

Remark:

- SATB algorithm much faster than CMS - Completely empty regions are reclaimed

Cleanup: optimised for ‘mixed’ collections

4. Tuning G1

Do not trust consultants, blog posts, mailing list discussions etc. telling you what the ‘best’ JVM settings would be.

!

There is no such thing as global best settings. JVM settings depend on the environment, the application and the projected/actual usage.

JVM settings and logging

How do you find out what’s happening in your JVM?

-XX:+PrintGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

or

-XX:+PrintGCDateStamps

[GC 64781K->22983K(71360K), 0.0242084 secs] [GC 68487K->25003K(77888K), 0.0194041 secs] [Full GC 25003K->20302K(89600K), 0.1713420 secs] [GC 70670K->21755K(90048K), 0.0054093 secs] [GC 71913K->46558K(94912K), 0.0295257 secs] [Full GC 46558K->45267K(118336K), 0.2144038 secs] [GC 88214K->84651K(133056K), 0.0674443 secs] [Full GC 84651K->84633K(171648K), 0.1739369 secs] [GC 117977K->115114K(180736K), 0.0623399 secs] [GC 158613K->157136K(201152K), 0.0591171 secs] [Full GC 157136K->157098K(254784K), 0.1868453 secs] [GC 160678K->160455K(261184K), 0.0536678 secs] 01/24 19:36:22 Debug [scheduler-1] - Next mail spool run in 15 seconds. [GC 202912K->200819K(268288K), 0.0625820 secs] [Full GC 200819K->200776K(332224K), 0.2121724 secs] [GC 213293K->212423K(339520K), 0.0426462 secs] [GC 259465K->256115K(340288K), 0.0645039 secs] [Full GC 256115K->255462K(418432K), 0.3226731 secs] [GC 281947K->279651K(421760K), 0.0530268 secs] [GC 331073K->323785K(422720K), 0.0695117 secs] [Full GC 323785K->323697K(459264K), 0.2139458 secs] [Full GC 364365K->361525K(459264K), 0.2180439 secs] [Full GC 400859K->400859K(459264K), 0.1702890 secs] [Full GC 400859K->43989K(274112K), 0.2642407 secs] [GC 95197K->93707K(273216K), 0.0338568 secs] [GC 146978K->140363K(276032K), 0.0664380 secs] [GC 193696K->189635K(277952K), 0.0630006 secs] [Full GC 189635K->189604K(425920K), 0.1913979 secs] [GC 219773K->205157K(426048K), 0.0442126 secs]

The two main tuning parameters for G1

-XX:MaxGCPauseMillis

Soft goal target for maximum GC pause time - default 200ms

-XX:InitiatingHeapOccupancyPercent

Percentage of heap occupancy to start concurrent GC cycle

Good practice

Avoid setting absolute generation sizes with G1:

- Breaks self-optimisation and target times - Causes issues in region sizing & distribution

Avoid evacuation failures (‘space overflow’):

- Increase heap promotion ceiling (default 10) -XX:G1ReservePercent - Increase # of marking threads -XX:ConcGCThreads

Real-world observations (I)

G1 has a noticeable tradeoff between latency and throughput:

- G1: ~90-92% throughput goal - Parallel Hotspot GC: ~98-99% goal

If you want higher throughput - relax the pause time goal.

Real-world observations (II)

‘Mixed’ GCs are on the more expensive end in G1. You can tamper with the criteria through experimental settings*:

-XX:G1MixedGCLiveThresholdPercent -XX:G1HeapWastePercent

!

* Might not be available on your platform, CPU architecture or JVM version.

Real-world observations (III)

CPU usage tends to increase ~5-15% when using G1 vs. CMS.

G1 seems to be better in reclaiming the maximum heap sized used.

The more uniform your object size distribution is, the better is CMS over G1. With a very heterogenous object size distribution, G1 tends to be better.

5. Changes in Java 8

Most of the previous is valid for Java 7 and 8 -

but…

G1 string de-duplication

Java 8u_20 brings a new String de-duplication optimisation.

G1 collector can now identify strings that are duplicated across the heap and repoint them to the ‘same’ internal char[] representation:

-XX:+UseStringDeduplicationJVM

Java 8 and the PermGen

It’s gone and it’s been replaced by a Metaspace (Oracle’s JRockit actually never had a PermGen). Class metadata is now stored in native memory.

!

A word of warning: Oracle tries to sell the Metaspace as the new piece of awesomeness that ‘just works’, but it still needs observation and tuning!

Retired JVM GC combinations

Some rarely-used combinations of garbage collectors have been deprecated:

http://openjdk.java.net/jeps/173

!

Important: iCMS has been deprecated!

!

!

Additional Resources

Garbage Collection with Garbage First Research Paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.6386&rep=rep1&type=pdf

Understanding G1 logs: https://blogs.oracle.com/poonam/entry/understanding_g1_gc_logs

“The JVM is your friend” - my more general GC talk at cf.Objective() 2014: http://www.slideshare.net/AgentK/jvm-isyourfriend

Java Performance: The Definitive Guide http://www.amazon.com/Java-Performance-The-Definitive-Guide/dp/1449358454

!

Photo credits

https://www.flickr.com/photos/aigle_dore/6973012997/https://www.flickr.com/photos/65694112@N05/6147388744https://www.flickr.com/photos/teclasorg/2852716491https://www.flickr.com/photos/salendron/5390633053https://www.flickr.com/photos/tim_ellis/2269499855https://www.flickr.com/photos/apocalust/5729262611https://www.flickr.com/photos/fkhuckel/16995618202https://www.flickr.com/photos/poetprince/3389459474https://www.flickr.com/photos/mario-mancuso/8331716569https://www.flickr.com/photos/openindiana/16277077790/

Get in touch

Kai Koenig

[email protected] www.bloginblack.de Twitter: @AgentK