inside the jvm - follow the white rabbit!

59
Inside the JVM Follow the white rabbit! Sylvain Wallez - @bluxte Toulouse JUG - 2017-04-26

Upload: sylvain-wallez

Post on 21-Jan-2018

928 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Inside the JVM - Follow the white rabbit!

Inside the JVMFollow the white rabbit!

Sylvain Wallez - @bluxteToulouse JUG - 2017-04-26

Page 2: Inside the JVM - Follow the white rabbit!

Who’s this guy?

Software engineer at Elastic (cloud team)

Previously:

● IoT tech lead at OVH● CEO at Actoboard● Backend architect at Sigfox● CTO at Goojet/Scoop it● Lead architect at Joost● Member of the Apache Software Foundation● Cofounder & CTO at Anyware Technologies (now part of Sierra Wireless)

Page 3: Inside the JVM - Follow the white rabbit!

Agenda● How it started: let’s optimize 6 lines of (hot) code!● Profiling memory usage● What’s in a class file?● Micro-benchmarking with JMH● Exploration of the OpenJDK source code

Page 4: Inside the JVM - Follow the white rabbit!

How it startedLet’s optimize 6 lines of (hot) code

Page 5: Inside the JVM - Follow the white rabbit!

On the CouchBase blog...“JVM Profiling - Lessons from the trenches”: optimize the conversion of a protocol error code into a readable message.

...

private final short code;

private final String description;

KeyValueStatus(short code, String description) {

this.code = code;

this.description = description;

}

public static KeyValueStatus valueOf(final short code) {

for (KeyValueStatus value: values()) {

if (value.code() == code) return value;

}

return UNKNOWN;

}

public enum KeyValueStatus {

UNKNOWN((short) -1, "Unknown code"),

SUCCESS((short) 0x00,

"The operation completed successfully"),

ERR_NOT_FOUND((short) 0x01,

"The key does not exists"),

ERR_EXISTS((short) 0x02,

"The key exists in the cluster"),

ERR_TOO_BIG((short) 0x03,

"The document exceeds the maximum size"),

ERR_INVALID((short) 0x04,

"Invalid request"),

ERR_NOT_STORED((short) 0x05,

"The document was not stored"),

...

Page 6: Inside the JVM - Follow the white rabbit!

On the CouchBase blog...Finding: values() is allocating memory

public static KeyValueStatus valueOf(final short code) {

for (KeyValueStatus value: values()) {

if (value.code() == code) return value;

}

return UNKNOWN;

}

public static KeyValueStatus valueOf(final short code) {

if (code == SUCCESS.code) {

return SUCCESS;

} else if (code == ERR_NOT_FOUND.code) {

return ERR_NOT_FOUND;

} else if (code == ERR_EXISTS.code) {

return ERR_EXISTS;

} else if (code == ERR_NOT_MY_VBUCKET.code) {

return ERR_NOT_MY_VBUCKET;

}

for (KeyValueStatus value : values()) {

if (value.code() == code) {

return value;

}

}

return UNKNOWN;

}

Optimization: fast path on common values

If something goes wrong, it’ll make it worse!

Page 9: Inside the JVM - Follow the white rabbit!

Profiling memory usage

Page 10: Inside the JVM - Follow the white rabbit!

Various kinds of memory optimization● Memory usage / memory leaks

○ My application needs tons of heap○ How many objects are held active?

→ Memory profiler / jmap

● Garbage collection pressure○ My application spends a lot of time in the GC○ How often are objects allocated?

→ Java Mission Control / jmap

Page 11: Inside the JVM - Follow the white rabbit!

jmap histogramsjmap -histo num #instances #bytes class name

----------------------------------------------

1: 4217124 674740720 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus;

2: 486 14947912 [I

3: 5855 493864 [C

4: 1461 166752 java.lang.Class

5: 5848 140352 java.lang.String

6: 503 136440 [B

7: 968 62480 [Ljava.lang.Object;

8: 1255 40160 java.util.HashMap$Node

9: 991 39640 java.util.LinkedHashMap$Entry

10: 258 30720 [Ljava.util.HashMap$Node;

11: 259 22792 java.lang.reflect.Method

12: 441 20952 [Ljava.lang.String;

13: 229 16488 java.lang.reflect.Field

14: 171 9576 java.util.LinkedHashMap

15: 291 9312 java.util.concurrent.ConcurrentHashMap$Node

16: 160 7680 java.util.HashMap

17: 178 7120 java.lang.ref.SoftReference

18: 89 7120 java.net.URI

19: 102 6528 java.net.URL

20: 256 6144 java.lang.Long

21: 76 6080 java.lang.reflect.Constructor

22: 258 5896 [Ljava.lang.Class;

23: 265 5680 [S

24: 166 5312 java.util.Hashtable$Entry

25: 94 5264 java.lang.Class$ReflectionData

code available on GitHub

Page 12: Inside the JVM - Follow the white rabbit!

jmap histogramsjmap -histo:live – perform a full GC first num #instances #bytes class name

----------------------------------------------

1: 5855 493864 [C

2: 1461 166752 java.lang.Class

3: 5848 140352 java.lang.String

4: 503 136440 [B

5: 967 62456 [Ljava.lang.Object;

6: 1255 40160 java.util.HashMap$Node

7: 991 39640 java.util.LinkedHashMap$Entry

8: 258 30720 [Ljava.util.HashMap$Node;

9: 259 22792 java.lang.reflect.Method

10: 441 20952 [Ljava.lang.String;

11: 283 19272 [I

12: 229 16488 java.lang.reflect.Field

....................

51: 35 1400 javax.management.MBeanOperationInfo

52: 3 1360 [Lnet.bluxte.experiments.couchbase_keyvalue.KeyValueStatus;

53: 55 1320 java.io.ExpiringCache$Entry

54: 29 1304 [Ljava.lang.reflect.Field;

55: 36 1152 net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus

56: 27 1080 java.security.ProtectionDomain

Page 13: Inside the JVM - Follow the white rabbit!

Java Mission Control / Java Flight RecorderLightweight monitoring agent

● Integrated into the (Oracle) JVM● Very low overhead

Continuously samples diagnostics data

● Thread activity● GC activity● Memory allocations

Page 14: Inside the JVM - Follow the white rabbit!

Java Mission Control / Java Flight RecorderAvailable only with Oracle JDK

● Free for development● Commercial for use in production

How to enable it?

● at launch time: java -XX:+UnlockCommercialFeatures● after launch: jcmd <pid> VM.unlock_commercial_features

Page 15: Inside the JVM - Follow the white rabbit!

Original codeSimple loop on the enum values

public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN;}

Page 16: Inside the JVM - Follow the white rabbit!

Original code - Memory stats

Looks good! No leak!

Hmm… growing fast!

Page 17: Inside the JVM - Follow the white rabbit!

Original code - Allocations

Page 18: Inside the JVM - Follow the white rabbit!

Original code - GC activity

Page 19: Inside the JVM - Follow the white rabbit!

Iteration on constant arrayStill trivial, but reuse the values array

private static final KeyValueStatus[] VALUES = values();

public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return value; } return UNKNOWN;}

Page 20: Inside the JVM - Follow the white rabbit!

Constant array - Allocations

Page 21: Inside the JVM - Follow the white rabbit!

Constant array - GC activity

Page 22: Inside the JVM - Follow the white rabbit!

GC pressure collateral damages

Full GC clears weak references→ clears some caches→ additional load to repopulate them!

Page 23: Inside the JVM - Follow the white rabbit!

Enum.values() ?Exploring the bytecode

Page 24: Inside the JVM - Follow the white rabbit!

Enum.values() – a generated method

The compiler automatically adds some special methods when it creates an enum. For example, they have a static values method that returns an array containing all of the values of the enum in the order they are declared.

– The Java Tutorial

/** * Returns an array containing the constants of this enum * type, in the order they're declared. This method may be * used to iterate over the constants as follows: * * for(E c : E.values()) * System.out.println(c); * * @return an array containing the constants of this enum * type, in the order they're declared */public static E[] values();

– The Java Language Specification

Page 25: Inside the JVM - Follow the white rabbit!

Show me the (byte)code!public class SimpleMain { public static void main(String[] args) { System.out.println("Hello world!"); }}

public class net.bluxte.experiments.talk.SimpleMain { public net.bluxte.experiments.talk.SimpleMain(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return

public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello world! 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return}

Default constructor

javap -c SimpleMain.class

or IntelliJ’s bytecode plugin

Page 26: Inside the JVM - Follow the white rabbit!

Show me the (byte)code!

public class SimpleMain { static String hello = "Hello"; static String world = "world";

public static void main( String[] args ) { System.out.println( hello + " " + world ); }}

public class net.bluxte.experiments.talk.SimpleMain { static java.lang.String hello;

static java.lang.String world;

public net.bluxte.experiments.talk.SimpleMain(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return

public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: new #3 // class java/lang/StringBuilder 6: dup 7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V 10: getstatic #5 // Field hello:Ljava/lang/String; 13: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 16: ldc #7 // String “ “ 18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 21: getstatic #8 // Field world:Ljava/lang/String; 24: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 27: invokevirtual #9 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 30: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 33: return

static {}; Code: 0: ldc #11 // String Hello 2: putstatic #5 // Field hello:Ljava/lang/String; 5: ldc #12 // String world 7: putstatic #8 // Field world:Ljava/lang/String; 10: return}

String concat with StringBuilder

Static initializer

Page 27: Inside the JVM - Follow the white rabbit!

Show me the (byte)code!public enum SimpleEnum { FIRST_ENUM, SECOND_ENUM}

... public static net.bluxte.experiments.talk.SimpleEnum[] values(); Code: 0: getstatic #1 // Field $VALUES:[Lnet/bluxte/experiments/talk/SimpleEnum; 3: invokevirtual #2 // Method "[Lnet/bluxte/experiments/talk/SimpleEnum;".clone:()Ljava/lang/Object; 6: checkcast #3 // class "[Lnet/bluxte/experiments/talk/SimpleEnum;" 9: areturn

public static net.bluxte.experiments.talk.SimpleEnum valueOf(java.lang.String); Code: 0: ldc #4 // class net/bluxte/experiments/talk/SimpleEnum 2: aload_0 3: invokestatic #5 // Method java/lang/Enum.valueOf:(Ljava/lang/Class;Ljava/lang/String;)Ljava/lang/Enum; 6: checkcast #4 // class net/bluxte/experiments/talk/SimpleEnum 9: areturn...

Aha! We found the culprit!

Page 28: Inside the JVM - Follow the white rabbit!

But why the clone?Java arrays are mutable

The caller can mess with it, which would break other users→ Perform a defensive copy every time

How to could it be prevented?

Return an immutable List, but probably too high level here

Page 29: Inside the JVM - Follow the white rabbit!

More on the bytecodeA class file is composed of:

● constant pool: strings, fields/methods name+type, class names, etc.● fields and methods definitions and code

○ Access flags and attributes○ Code○ Line number table○ Local variable table (type and name)○ Exception table

Page 30: Inside the JVM - Follow the white rabbit!

But wait…...why would I want to know about this?

● Better understand low level diagnostics● Check generated code

○ Java: enum values (!), for loops, etc○ Scala, Kotlin: implementation of higher level constructs○ Hibernate & co: how do they mangle your code?

● Grasping low level stuff allows writing better high-level code

Page 31: Inside the JVM - Follow the white rabbit!

#1 = Methodref #6.#20 // java/lang/Object."<init>":()V #2 = Fieldref #21.#22 // java/lang/System.out:Ljava/io/PrintStream; #3 = String #23 // Hello world #4 = Methodref #24.#25 // java/io/PrintStream.println:(Ljava/lang/String;)V #5 = Class #26 // net/bluxte/experiments/talk/SimpleMain #6 = Class #27 // java/lang/Object #7 = Utf8 <init> #8 = Utf8 ()V #9 = Utf8 Code #10 = Utf8 LineNumberTable #11 = Utf8 LocalVariableTable #12 = Utf8 this #13 = Utf8 Lnet/bluxte/experiments/talk/SimpleMain; #14 = Utf8 main #15 = Utf8 ([Ljava/lang/String;)V #16 = Utf8 args #17 = Utf8 [Ljava/lang/String; #18 = Utf8 SourceFile #19 = Utf8 SimpleMain.java #20 = NameAndType #7:#8 // "<init>":()V #21 = Class #28 // java/lang/System #22 = NameAndType #29:#30 // out:Ljava/io/PrintStream; #23 = Utf8 Hello world #24 = Class #31 // java/io/PrintStream #25 = NameAndType #32:#33 // println:(Ljava/lang/String;)V #26 = Utf8 net/bluxte/experiments/talk/SimpleMain #27 = Utf8 java/lang/Object #28 = Utf8 java/lang/System #29 = Utf8 out #30 = Utf8 Ljava/io/PrintStream; #31 = Utf8 java/io/PrintStream #32 = Utf8 println #33 = Utf8 (Ljava/lang/String;)V

Constant pool for SimpleMain

public class net.bluxte.experiments.talk.SimpleMain { public net.bluxte.experiments.talk.SimpleMain(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return

public static void main(java.lang.String[]); Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello world! 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return}

Page 32: Inside the JVM - Follow the white rabbit!

Type encodingWhat is (Ljava/lang/String;)V ???

L<class path>; → class nameI, J, S, B, C → integer, long, short, byte, charF, D → float, doubleZ → boolean

public String foo(int a, char[] b, List<Integer> c, boolean d)

(I[CLjava/util/List;Z)Ljava/lang/String;

Page 33: Inside the JVM - Follow the white rabbit!

The bytecode “language”Stack-based machine

● Easier to target a large variety of CPUs(Android/Dalvik is register based)

Object-oriented assembler

● Method calls (static / virtual / interface / special)

Controlled memory access

● Local variables● Object fields

Page 34: Inside the JVM - Follow the white rabbit!

The bytecode “language”Very simple 200 instructions set

Instruction groups:

● Load and store● Arithmetic and logic● Type conversion● Object creation and manipulation● Operand stack management ● Control transfer● Method invocation and return

Only addition since 1996: invokedynamic in Java7

Page 35: Inside the JVM - Follow the white rabbit!

The bytecode “language”

public static void main(String[] args) {

long start = System.nanoTime();

while(System.nanoTime() - start < MAX_NANOS) { for (int i = 0; i < 1_000_000; i++) { resolved = resolve((short)rnd.nextInt(0x100)); } Thread.sleep(100); }}

0: invokestatic #2 // Method java/lang/System.nanoTime:()J 3: lstore_1 4: invokestatic #2 // Method java/lang/System.nanoTime:()J 7: lload_1 8: lsub 9: getstatic #3 // Field MAX_NANOS:J12: lcmp13: ifge 5516: iconst_017: istore_318: iload_319: ldc #4 // int 100000021: if_icmpge 4624: getstatic #5 // Field rnd:Ljava/util/Random;27: sipush 25630: invokevirtual #6 // Method java/util/Random.nextInt:(I)I33: i2s34: invokestatic #7 // Method resolve:(S)Lnet/bluxte/experiments/ couchbase_keyvalue/KeyValueStatus;37: putstatic #8 // Field resolved:Lnet/bluxte/experiments/ couchbase_keyvalue/KeyValueStatus;40: iinc 3, 143: goto 1846: ldc2_w #9 // long 100l49: invokestatic #11 // Method java/lang/Thread.sleep:(J)V52: goto 455: return

LocalVariableTable:Start Length Slot Name Signature 18 28 3 i I 0 56 0 args [Ljava/lang/String; 4 52 1 start J

Page 36: Inside the JVM - Follow the white rabbit!

Benchmarking with JMH(Back to good old Java)

Page 37: Inside the JVM - Follow the white rabbit!

Improving our solutionWe fixed the memory issue but it’s clearly non optimal

Let’s benchmark it!

private static final KeyValueStatus[] VALUES = values();

public static KeyValueStatus valueOf(final short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return vue; }al return UNKNOWN;}

O(n) on constant data!

Page 38: Inside the JVM - Follow the white rabbit!

JMH: an OpenJDK project

● Provides drivers and guidance for writing tests● Takes care of pre-warming the JVM, collecting results and computing stats● Provides a Maven artifact type for benchmarking projects

“JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other

languages targetting the JVM.”

Page 39: Inside the JVM - Follow the white rabbit!

Benchmark code@State(Scope.Benchmark)public class ValueOfBenchmark {

@Param({ "0", // 0x00, Success "1", // 0x01, Not Found "134", // 0x86 Temporary Failure "255", // undefined "1024" // undefined, out of bounds }) public short code;

@Benchmark public KeyValueStatus loopNoFastPath() { return KeyValueStatus.valueOfLoop(code); }

@Benchmark public KeyValueStatus loopFastPath() { return KeyValueStatus.valueOf(code); } ...}

mvn clean installjava -jar target/benchmarks.jar

# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/bin/java# VM options: <none># Warmup: 20 iterations, 1 s each# Measurement: 20 iterations, 1 s each

# Threads: 1 thread, will synchronize iterations# Benchmark mode: Throughput, ops/time# Benchmark: net.bluxte.experiments.couchbase_keyvalue.ValueOfBenchmark.loopNoFastPath# Parameters: (code = 0)

# Run progress: 0,00% complete, ETA 04:53:20# Fork: 1 of 10# Warmup Iteration 1: 152063982,769 ops/s# Warmup Iteration 2: 149808416,787 ops/s# Warmup Iteration 3: 210436722,740 ops/s# Warmup Iteration 4: 202906403,960 ops/s# Warmup Iteration 5: 204518647,481 ops/s# Warmup Iteration 6: 209602101,373 ops/s# Warmup Iteration 7: 204717066,594 ops/s# Warmup Iteration 8: 209156212,425 ops/s# Warmup Iteration 9: 215544157,049 ops/s# Warmup Iteration 10: 213919676,979 ops/s# Warmup Iteration 11: 211316588,650 ops/s# Warmup Iteration 12: 212046920,091 ops/s# Warmup Iteration 13: 212198820,202 ops/s# Warmup Iteration 14: 207165911,202 ops/s# Warmup Iteration 15: 209400520,248 ops/s# Warmup Iteration 16: 210509892,206 ops/s# Warmup Iteration 17: 207094517,640 ops/s# Warmup Iteration 18: 208435049,739 ops/s# Warmup Iteration 19: 208275287,735 ops/s# Warmup Iteration 20: 209727353,731 ops/sIteration 1: 210860563,039 ops/sIteration 2: 213258677,632 ops/sIteration 3: 210171275,812 ops/sIteration 4: 212516810,343 ops/s

Page 40: Inside the JVM - Follow the white rabbit!

Benchmark-driven optimizationpublic static KeyValueStatus valueOfLoop(final short code) { for (KeyValueStatus value: values()) { if (value.code() == code) return value; } return UNKNOWN;}

Benchmark (code) Mode Samples Score Score error Units

loopNoFastPath 0 avgt 10 19.383 0.331 ns/oploopNoFastPath 1 avgt 10 19.243 0.376 ns/oploopNoFastPath 134 avgt 10 24.855 0.651 ns/oploopNoFastPath 255 avgt 10 30.587 0.833 ns/oploopNoFastPath 1024 avgt 10 30.619 1.209 ns/op

Time grows linearly with value,even with out of bound values

Initial implementation

Page 41: Inside the JVM - Follow the white rabbit!

Benchmark-driven optimization

private static final KeyValueStatus[] VALUES = values();

public static KeyValueStatus valueOf(short code) { for (KeyValueStatus value: VALUES) { if (value.code() == code) return value; } return UNKNOWN;}

Benchmark (code) Mode Samples Score Score error Units

loopOnConstantArray 0 avgt 10 2.975 0.086 ns/oploopOnConstantArray 1 avgt 10 3.035 0.080 ns/oploopOnConstantArray 134 avgt 10 10.215 0.269 ns/oploopOnConstantArray 255 avgt 10 16.856 0.679 ns/oploopOnConstantArray 1024 avgt 10 17.015 0.577 ns/op

Still linear, removed ~15 ns allocation overhead

Reuse the constant array

Page 42: Inside the JVM - Follow the white rabbit!

Benchmark-driven optimizationprivate static final Map<Short, KeyValueStatus> code2statusMap = new HashMap<>();

static { for (KeyValueStatus value: values()) { code2statusMap.put(value.code(), value); }}

public static KeyValueStatus valueOf(final short code) { return code2statusMap.getOrDefault(code, UNKNOWN);}

Benchmark (code) Mode Samples Score Score error Units

lookupMap 0 avgt 10 4.954 0.134 ns/oplookupMap 1 avgt 10 4.036 0.125 ns/oplookupMap 134 avgt 10 5.597 0.157 ns/oplookupMap 255 avgt 10 4.006 0.144 ns/oplookupMap 1024 avgt 10 6.752 0.228 ns/op

More or less constantWorse on small valuesWay better on larger values

Prepare a hashmap,then simple lookup

Page 43: Inside the JVM - Follow the white rabbit!

Oh wait… autoboxing!

public static net.bluxte.experiments.couchbase_keyvalue.KeyValueStatus valueOfLookupMap(short);Code: 0: getstatic #17 // Field code2statusMap:Ljava/util/HashMap; 3: iload_0 4: invokestatic #18 // Method java/lang/Short.valueOf:(S)Ljava/lang/Short; 7: getstatic #15 // Field UNKNOWN:Lnet/bluxte/experiments/couchbase_keyvalue/KeyValueStatus; 10: invokevirtual #19 // Method java/util/HashMap.getOrDefault: (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; 13: checkcast #4 // class net/bluxte/experiments/couchbase_keyvalue/KeyValueStatus 16: areturn

public static KeyValueStatus valueOf(final short code) { return code2statusMap.getOrDefault(code, UNKNOWN);}

Page 44: Inside the JVM - Follow the white rabbit!

Oh wait… autoboxing!

Using Carrot HPPC (high performance primitive collections) avoids this

Page 45: Inside the JVM - Follow the white rabbit!

Benchmarking variationsprivate static final KeyValueStatus[] code2status = new KeyValueStatus[0x100];

static { Arrays.fill(code2status, UNKNOWN); for (KeyValueStatus keyValueStatus : values()) { if (keyValueStatus != UNKNOWN) { code2status[keyValueStatus.code()] = keyValueStatus; } }}

public static KeyValueStatus valueOfLookupArray(short code) { if (code >= 0 && code < code2status.length) { return code2status[code]; } else { return UNKNOWN; }}

Benchmark (code) Mode Samples Score Score error Units

lookupArray 0 avgt 10 3.061 0.126 ns/oplookupArray 1 avgt 10 3.048 0.127 ns/oplookupArray 134 avgt 10 3.070 0.084 ns/oplookupArray 255 avgt 10 3.035 0.113 ns/oplookupArray 1024 avgt 10 3.034 0.113 ns/op

Constant fast timeNo GC overheadw00t!

Prepare a lookup array,then simple lookup

Page 46: Inside the JVM - Follow the white rabbit!

Dangers of JMHBenchmark-driven iterations

● Can drive you to partial incremental improvements● Take a step back, think outside of the box

Optimizing for the sake of optimizing

● Time consuming● No real effect if not on “hot” code

Page 47: Inside the JVM - Follow the white rabbit!

Diving into OpenJDK(This gets scary!)

Page 48: Inside the JVM - Follow the white rabbit!

The VM does a lot of things

C1 “client”compiler

C2 “server”compilerInterpreter

Garbage collector

Page 49: Inside the JVM - Follow the white rabbit!

Finding your way in OpenJDKMain website http://openjdk.java.net/

Get the code:

hg clone http://hg.openjdk.java.net/jdk8/jdk8/hotspot/

hg clone http://hg.openjdk.java.net/jdk8/jdk8/jdk/

Mercurial still alive!

Page 50: Inside the JVM - Follow the white rabbit!

garbage collectors

bytecode interpreter

server compiler (c2 / opto)

client compiler (c1)

LLVM-based JIT

OS and/or CPU specific code

root of shared code

Page 51: Inside the JVM - Follow the white rabbit!

CPU-independent target (works with shark JIT)

Additional support in JDK9:● ARM 32 & 64 bits● PowerPC● S390● AIX

Page 52: Inside the JVM - Follow the white rabbit!

Intrinsic methodsWhat you see is not what you get

● The JVM “intercepts” some methods calls○ String / StringBuffer methods, Math, Unsafe, array manipulation, etc.

● Replaced inline with native (assembly) code○ Extremely fast and optimized○ Not even JNI overhead

● Find them in hotspot/src/share/vm/classfile/vmSymbols.hpp

Page 53: Inside the JVM - Follow the white rabbit!

Intrinsic methods

// IndexOf for constant substrings with size >= 8 chars// which don't need to be loaded through stack.void MacroAssembler::string_indexofC8(Register str1, Register str2, Register cnt1, Register cnt2, int int_cnt2, Register result, XMMRegister vec, Register tmp) { ShortBranchVerifier sbv(this); assert(UseSSE42Intrinsics, "SSE4.2 is required");

// This method uses pcmpestri inxtruction with bound registers // inputs: // xmm - substring // rax - substring length (elements count) // mem - scanned string // rdx - string length (elements count) // 0xd - mode: 1100 (substring search) + 01 (unsigned shorts) // outputs: // rcx - matched index in string assert(cnt1 == rdx && cnt2 == rax && tmp == rcx, "pcmpestri");

Label RELOAD_SUBSTR, SCAN_TO_SUBSTR, SCAN_SUBSTR, RET_FOUND, RET_NOT_FOUND, EXIT, FOUND_SUBSTR, MATCH_SUBSTR_HEAD, RELOAD_STR, FOUND_CANDIDATE;

// Note, inline_string_indexOf() generates checks: // if (substr.count > string.count) return -1; // if (substr.count == 0) return 0; assert(int_cnt2 >= 8, "this code isused only for cnt2 >= 8 chars");

// Load substring. movdqu(vec, Address(str2, 0)); movl(cnt2, int_cnt2); movptr(result, str1); // string addr

Example: String.indexOf on x86

Page 54: Inside the JVM - Follow the white rabbit!

In JDK9 beta, String.indexOf(String) is faster than String.indexOf(char)!

This is because one is intrinsic, and not yet the other

Intrinsic methods

Benchmark Mode Cnt Score Error Units

# JDK 8u121IndexOfBenchmark.StringIndexOfChar thrpt 5 141857.332 ± 5530.472 ops/sIndexOfBenchmark.StringIndexOfString thrpt 5 113091.517 ± 2241.533 ops/s

# JDK 9b152IndexOfBenchmark.StringIndexOfChar thrpt 5 154525.343 ± 3796.818 ops/sIndexOfBenchmark.StringIndexOfString thrpt 5 185917.059 ± 3391.230 ops/s

(from the jdk9-dev mailing-list)

Page 55: Inside the JVM - Follow the white rabbit!

Intrinsic methods● “I can do it better than JDK source” – think twice!

→ Have a look at vmSymbols.hpp first!

● Can sometimes be indirect (esp with strings and arrays)

● When in doubt, benchmark (with the same JVM)

Page 56: Inside the JVM - Follow the white rabbit!

Conclusion

Page 57: Inside the JVM - Follow the white rabbit!

Conclusion● Know your tools

● Be curious, and follow the white rabbit from time to time, you’ll learn a lot

● However… don’t go overboard and waste (too much) time!

Page 58: Inside the JVM - Follow the white rabbit!

Thanks!Questions?

Sylvain Wallez - @bluxteToulouse JUG - 2017-04-26