machine code snippets in javacr.openjdk.java.net/.../2016_jvmls_machinecodesnippets.pdf ·...

98
Machine Code Snippets in Java Vladimir Ivanov HotSpot JVM Compile r Oracle Corp. JVM Language Summit 2016

Upload: others

Post on 19-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

1 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets in Java

Vladimir Ivanov HotSpot JVM Compile r Oracle Corp. JVM Language Summit 2016

Page 2: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

2 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 3: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Page 4: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

4 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

0x11529c8c0:mov%eax,-0x16000(%rsp)0x11529c8c7:push%rbp0x11529c8c8:sub$0x20,%rsp0x11529c8cc:mov%rdx,(%rsp)0x11529c8d0:mov%rsi,%rbp0x11529c8d3:movabs$0x7c0013d10,%rsi0x11529c8dd:nop0x11529c8de:nop0x11529c8df:nop0x11529c8e0:vzeroupper0x11529c8e3:callq0x00000001152418a00x11529c8e8:mov%rax,%rbx0x11529c8eb:mov(%rsp),%r100x11529c8ef:vmovdqu0x10(%r10),%ymm10x11529c8f5:vmovdqu0x10(%rbp),%ymm00x11529c8fa:vpaddd%ymm0,%ymm1,%ymm00x11529c8fe:vmovdqu%ymm0,0x10(%rbx)0x11529c903:mov%rbx,%rax0x11529c906:vzeroupper0x11529c909:add$0x20,%rsp0x11529c90d:pop%rbp0x11529c90e:test%eax,-0xb786914(%rip)0x11529c914:retq

0x11529d240:mov%eax,-0x16000(%rsp)0x11529d247:push%rbp0x11529d248:sub$0x30,%rsp0x11529d24c:mov%rcx,%rbp0x11529d24f:vmovdqu0x10(%rsi),%ymm00x11529d254:vmovdqu0x10(%rdx),%ymm10x11529d259:vpaddd%ymm0,%ymm1,%ymm00x11529d25d:vmovdqu%ymm0,(%rsp)0x11529d262:movabs$0x7c0013d10,%rsi0x11529d26c:vzeroupper0x11529d26f:callq0x00000001152418a00x11529d274:mov%rax,%rbx0x11529d277:vmovdqu0x10(%rbp),%ymm10x11529d27c:vmovdqu(%rsp),%ymm00x11529d281:vpaddd%ymm0,%ymm1,%ymm00x11529d285:vmovdqu%ymm0,0x10(%rbx)0x11529d28a:mov%rbx,%rax0x11529d28d:vzeroupper0x11529d290:add$0x30,%rsp0x11529d294:pop%rbp0x11529d295:test%eax,-0xb78729b(%rip)0x11529d29b:retq

x86Assembly

Page 5: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

5 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

The Plan

§ Background

§ Machine Code Snippets –  the concept & its evolution

§ Vectors –  box elimination, C2 optimizations, GC

Page 6: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

6 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Page 7: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

7 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector ISA Extensions

§ 100s of vector instructions on x86 §  Intel intrinsic instructions

–  MMX: ~120 –  SSE: ~130 –  SSE2/3/SSSE3/4.1/4.2: ~260 –  AVX/AVX2: ~380

Page 8: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

8 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector ISA Extensions

§ 1000s of vector instructions on x86 §  Intel intrinsic instructions

–  MMX: ~120 –  SSE: ~130 –  SSE2/3/SSSE3/4.1/4.2: ~260 –  AVX/AVX2: ~380 –  AVX-512: ~3800

Page 9: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

9 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Page 10: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

10 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Motivation

§ Vector API –  expose data-parallel operations through a cross-platform API

§ How to bind to particular machine instructions in the implementation?

§ Existing solutions –  JVM intrinsics –  JNI / NativeMethodHandles (in Project Panama)

Page 11: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

11 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM Intrinsics

“A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve performance.”

@HotSpotIntrinsicCandidate JavaDoc

publicfinalclassjava.lang.Class<T>implements…{@HotSpotIntrinsicCandidatepublicnativebooleanisInstance(Objectobj);

Page 12: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

12 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JNI @since 1.1

Page 13: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

13 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JNI

classLib{staticnativevoidm();}

voidJNICALLJava_Lib_m(JNIEnv*env,jclassc){ m();}

Usage scenario

Page 14: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

14 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles

MethodTypemt=MethodType.methodType(void.class);MethodHandlemh= MethodHandles.lookup().findNative("m",mt);mh.invokeExact();

Project Panama

Page 15: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

15 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles Project Panama

Java Native

Construction Lookup.findVirtual() et al Lookup.findNative()

Reference (typed) DirectMethodHandle NativeMethodHandle

Reference (direct) MemberName NativeEntryPoint

Linker MH.linkToVirtual() et al MH.linkToNative()

Invocation indy, MH.invoke(), MH.invokeExact()

“Making native calls from the JVM” by John Rose http://cr.openjdk.java.net/~jrose/panama/native-call-primitive.html

Page 16: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

16 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handles Project Panama

callq 0x1057b2eb0 ; native method entry

getpid JNI 13.7 ± 0.5 ns Direct call 3.4 ± 0.2 ns

Page 17: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

17 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native code vs JVM Intrinsics

§ Native method + arbitrary native code - too much ceremony - opaque to the JVM

§  JVM Intrinsics + powerful, lightweight, and flexible - high development costs

Page 18: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

18 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

Page 19: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

19 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

New breed:

NativeMethodHandle + JVM intrinsic

Idea (1st iteration)

Wrap raw machine code in a method handle

The Idea

Page 20: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

20 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippets

§ Use case: prototyping 1.  minimize implementation costs 2.  decent performance 3.  up to a dozen instructions in size

§ Existing solutions –  JVM intrinsics: 1. no / 2. yes / 3. yes –  JNI / NMH: 1. yes / 2. no / 3. yes

Motivation / Goals

Page 21: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

21 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

vmovdqumem,reg//256-bitloadvmovdqureg,mem//256-bitstore

Page 22: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

22 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

mov256MH.invokeExact(src,off1,dst,off2);

C4E17E6F0437C4E17E7F040A

MH(LJLJ)V

Page 23: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

23 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

vmovdqu(?,?,1),%ymm0vmovdqu%ymm0,(?,?,1)

MH(LJLJ)Vmov256MH.invokeExact(src,off1,dst,off2);

Page 24: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

24 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy Machine Code as a Method Handle

vmovdqu(%rdi,%rsi,1),%ymm0vmovdqu%ymm0,(%rdx,%rcx,1)

/*(rdi,rsi,rdx,rcx)*/

mov256MH.invokeExact(src,off1,dst,off2);

MH(LJLJ)V

Page 25: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

25 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

§ 2 execution modes –  optimized:

§  embedded in generated code –  non-optimized, interpreted

§  invokes stand-alone version MethodHandleJava

Native

Unsafe wrapper

Safe wrapper

Stand-alone Embedded

User-defined

produced by j.l.i

Page 26: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

26 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

§ matches native ABI –  same machine code in all execution modes

§ System V AMD64 ABI: “Function Calling Sequence” –  first 6 integer arguments: RDI, RSI, RDX, RCX, R8, R9 –  first 8 FP arguments: XMM0, …, XMM7 –  return registers: RAX/RDX (integer), XMM0/XMM1 (FP) –  …

Calling Convention

Page 27: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

27 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Handle

packagejava.lang.invoke;/*non-public*/classNativeMethodHandleextendsMethodHandle{finalNativeEntryPointnativeFunc;

/*non-public*/classNativeEntryPoint{finallongaddr;finalMethodTypetype;/*non-public*/classMachineCodeSnippetextendsNativeEntryPoint{finalbyte[]code;

Page 28: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

28 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Machine Code Snippet

publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,byte...machineCode){...}

How To Use

Page 29: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

29 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

staticfinalMethodHandlemov256MH=MachineCodeSnippet.make(”copy256”,MethodType.methodType(void.class,//returntypeObject.class/*rdi*/,long.class/*rsi*/,//srcObject.class/*rdx*/,long.class/*rcx*/),//dstrequires(AVX),0xC4,0xE1,0x7E,0x6F,0x04,0x37,//vmovdqu(%rdi,%rsi,1),%ymm00xC4,0xE1,0x7E,0x7F,0x04,0x0A);//vmovdqu%ymm0,(%rdx,%rcx,1)

MethodHandle

Page 30: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

30 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Stand-alone version

Decodingcodesnippet"move256"@0x10f05d6a0<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+10>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a

<+16>:leaveq<+17>:retq

$ java … -XX:+PrintCodeSnippets ...

#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi=long#parm2:rdx:rdx='java/lang/Object’#parm3:rcx:rcx=long

Page 31: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

31 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

staticfinalMethodHandlemov256MH=...;//Unsafewrapperstaticvoidmove256(Objectsrc,longoff1,Objectdst,longoff2){try{mov256MH.invokeExact(src,off1,dst,off2);}catch(Throwablee){thrownewError(e);}}

Page 32: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

32 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

4159181MachineCodeSnippetSamples::move256(27bytes)@8LambdaForm$MH::invokeExact_MT(29bytes)forceinlinebyannotation@11Invokers::checkExactType(17bytes)forceinlinebyannotation@1MethodHandle::type(5bytes)accessor@15Invokers::checkCustomized(23bytes)forceinlinebyannotation@1MethodHandleImpl::isCompileConstant(2bytes)(intrinsic)@25LambdaForm$NMH::invokeNative_LJLJ_V(27bytes)forceinlinebyann…@7NativeMethodHandle::internalNativeEntryPoint(8bytes)forceinline…@23MethodHandle::linkToNative(LJLJL)V(0bytes)directnativecall

Page 33: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

33 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

#{method}'move256’...<+0>:mov%eax,-0x16000(%rsp)<+7>:push%rbp<+8>:sub$0x10,%rsp<+12>:mov%rsi,%rdi<+15>:mov%rdx,%rsi<+17>:mov%rcx,%rdx<+21>:mov%r8,%rcx<+24>:vmovdqu(%rdi,%rsi,1),%ymm0;c4e17e6f0437<+30>:vmovdqu%ymm0,(%rdx,%rcx,1);c4e17e7f040a<+36>:add$0x10,%rsp<+40>:pop%rbp<+41>:test%eax,-0x4d3d58f(%rip)<+47>:retq

#parm0:rsi:rsi='java/lang/Object'#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object'#parm3:r8:r8=long

Page 34: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

34 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Calling Convention

hotspot/src/cpu/x86/vm/sharedRuntime_x86_64.cpp:“TheJavacallingconventionisa"shifted"versionoftheCABI.

ByskippingthefirstCABIregisterwecancallnon-staticjnimethods

withsmallnumbersofargumentswithouthavingtoshufflethearguments

atall.SincewecontrolthejavaABIweoughttoatleastgetsome

advantageoutofit.“

Java vs C

arg 1st 2nd 3rd 4th 5th 6th … C RDI RSI RDX RCX R8 R9 stack

Java RSI RDX RCX R8 R9 RDI stack

System V AMD64 ABI

Page 35: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

35 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Native Method Linker MH.linkToNative

#{method}'linkToNative'’(LJLJL)V’<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:mov%rsi,%rdi<+7>:mov%rdx,%rsi<+10>:mov%rcx,%rdx<+13>:mov%r8,%rcx<+16>:mov%r9,%r8<+19>:callq*0x10(%r8)<+23>:leaveq<+24>:retq

#parm0:rsi:rsi='java/lang/Object’#parm1:rdx:rdx=long#parm2:rcx:rcx='java/lang/Object’#parm4:r8:r8=long#parm3:r9:r9=‘.../NativeEntryPoint'

Page 36: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

36 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Memory Copy

//Safewrapperstaticvoidcopy256(byte[]src,intidx1, byte[]dst,intidx2){//ArrayboundschecksObjects.checkIndex(idx1+32,src.length); Objects.checkIndex(idx2+32,dst.length);//Offsetcomputationslongoff1=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx1;longoff2=Unsafe.ARRAY_BYTE_BASE_OFFSET+idx2;move256(src,off1,dst,off2);//Unsafewrapper}

Page 37: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

37 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors

Page 38: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

38 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

Page 39: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

39 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

vector/*ymm0*/vpaddd(vectorv1/*ymm0*/,vectorv2/*ymm1*/)

MH(??)?/*(ymm0,ymm1)ymm0*/

Page 40: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

40 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM vs Hardware Impedance Mismatch

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J - - - …

Page 41: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

41 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors

§  java.lang.Long2 / Long4 / Long8 / … –  represent 128/256/512-bit values –  value-based class (but should be a value class!)

§  “well-known“ to the JVM –  special treatment in the JVM –  C2 knows how to map the values to appropriate vector registers

Page 42: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

42 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVM vs Hardware Impedance Mismatch

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …

Page 43: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

43 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Valhalla JVM vs Hardware

size (bits) 8 16 32 64 128 256 512 …

x86 regs AL AX EAX RAX XMM0 YMM0 ZMM0 -

JVM B S I J j.l.Long2 j.l.Long4 j.l.Long8 …

Page 44: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

44 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

Page 45: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

45 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vpaddd%ymm1,%ymm0,%ymm0

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

Page 46: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

46 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rdi),%ymm0;unboxvmovdqu0x10(%rsi),%ymm1;unboxvpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax);box

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

Page 47: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

47 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rdi),%ymm0vmovdqu0x10(%rsi),%ymm1vpaddd%ymm1,%ymm0,%ymm0vmovdqu%ymm0,0x10(%rax)

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/)

MH(L4L4)L4/*(rdi,rsi)rax*/

Page 48: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

48 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

Long4/*rax*/vpaddd(Long4box/*rsi*/, Long4v1/*rdx*/,Long4v2/*rcx*/)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

Page 49: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

49 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers

Long4/*rax*/vpaddd(Long4v1/*rsi*/,Long4v2/*rdx*/){Objectbox=Long4.make();return(Long4)MHm256_vpaddd.invokeExact(box,s1,s2);}

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

Page 50: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

50 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD Add Packed Integers //findStatic(Long?.class,“make”)+collectArguments()://Tadapter(A...a){//Long?b=Long?::make();//returntarget(b,a...);//}

vmovdqu0x10(%rsi),%ymm0vmovdqu0x10(%rdx),%ymm1vpaddd%ymm1,%ymm0,%ymm0mov%rdi,%raxvmovdqu%ymm0,0x10(%rax)

MH(L4L4L4)L4/*(rdi,rsi,rdx)rax*/

MH(L4L4)L4

Page 51: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

51 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Stand-alone version

Decodingcodesnippet"m256_vpaddd"@0x10f06dd20<+0>:push%rbp<+1>:mov%rsp,%rbp<+4>:vmovdqu0x10(%rsi),%ymm0<+9>:vmovdqu0x10(%rdx),%ymm1

<+14>:vpaddd%ymm1,%ymm0,%ymm0<+18>:mov%rdi,%rax<+21>:vmovdqu%ymm0,0x10(%rax)<+26>:leaveq<+27>:retq

$ java … -XX:+PrintCodeSnippets ...

#parm0:rdi:rdi='java/lang/Object’#parm1:rsi:rsi='java/lang/Long4’#parm2:rdx:rdx='java/lang/Long4’

Page 52: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

52 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

#{method}’vpaddd''(L4L4)L4<+0>:...preamble...<+12>:mov%rdx,(%rsp)<+16>:mov%rsi,%rbp

...<+35>:callq_new_instance_Java<+40>:mov%rax,%rbx<+43>:mov(%rsp),%r10<+47>:vmovdqu0x10(%r10),%ymm1

<+53>:vmovdqu0x10(%rbp),%ymm0<+58>:vpaddd%ymm0,%ymm1,%ymm0<+62>:vmovdqu%ymm0,0x10(%rbx)<+67>:mov%rbx,%rax<+70>:...prologue...

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

Page 53: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

53 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD

publicstaticLong4testVAdd4(Long4v1,Long4v2, Long4v3,Long4v4){Long4t1=vpaddd(v1,v2);Long4t2=vpaddd(v3,v4);Long4t3=vpaddd(t1,t2);returnvpaddd(t3,v1);}

Nested Sum

v1 v2 v3 v4

v1

+

+

+

+

Page 54: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

54 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...preamble...

<+12>:mov%r8,%r13

<+15>:mov%rcx,%rbx

<+18>:mov%rsi,%rbp

<+21>:vmovdqu0x10(%rsi),%ymm0;unbox

<+26>:vmovdqu0x10(%rdx),%ymm1;unbox

<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+35>:vmovdqu%ymm0,(%rsp)

<+40>:vmovdqu0x10(%rbx),%ymm0;unbox

<+45>:vmovdqu0x10(%r13),%ymm1;unbox

<+51>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+55>:vmovdqu%ymm0,%ymm1

<+59>:vmovdqu(%rsp),%ymm0

<+64>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+68>:vmovdqu%ymm0,(%rsp)

...objectallocation...

<+87>:callq_new_instance_Java

<+92>:mov%rax,%rbx

<+95>:vmovdqu0x10(%rbp),%ymm1;unbox

<+100>:vmovdqu(%rsp),%ymm0

<+105>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+109>:vmovdqu%ymm0,0x10(%rbx);box

<+114>:mov%rbx,%rax

...prologue...

<+131>:retq

0x112ea1640:

#{method}'testVAdd4’...

#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'

Page 55: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

55 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Allocator-aware Snippets

Page 56: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

56 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

RA-aware Snippets

Let user know about RA decisions!

Idea (2nd iteration):

Use snippet “recipe” instead of raw machine code.

Page 57: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

57 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Masks

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MH(L4L4)L4

([%ymm0-15],[%ymm0-15])[%ymm0-15]

Page 58: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

58 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Page 59: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

59 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVMCI

Page 60: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

60 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

JVMCI

packagejdk.vm.ci.amd64;

/**RepresentstheAMD64architecture.*/publicclassAMD64extendsArchitecture{ //GeneralpurposeCPUregisters publicstaticfinalRegisterrax=newRegister(0,0,"rax",CPU); publicstaticfinalRegisterrcx=newRegister(1,1,"rcx",CPU); publicstaticfinalRegisterrdx=newRegister(2,2,"rdx",CPU); publicstaticfinalRegisterrbx=newRegister(3,3,"rbx",CPU); publicstaticfinalRegisterrsp=newRegister(4,4,"rsp",CPU);

...

/**Representsatargetmachineregister.*/publicfinalclassRegisterimplementsComparable<Register>{

Platform Definitions

Page 61: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

61 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

RA-aware Snippets

packagejdk.vm.ci.panama;publicclassMachineCodeSnippet{publicstaticMethodHandlemake(Stringname,MethodTypemt,booleanisSupported,Register[][]regMasks,SnippetGeneratorcodeProducer){...}@FunctionalInterfacepublicinterfaceSnippetGenerator{int[]getCode(Register...regs);}

Page 62: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

62 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Masks

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MH(L4L4)L4([%ymm0-15],[%ymm0-15])[%ymm0-15]

Page 63: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

63 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

VPADDD

staticfinalMethodHandleMHm256_vpaddd=MachineCodeSnippet.make("mm256_vpaddd", MethodType.methodType(Long4.class,Long4.class,Long4.class), requires(AVX2), newRegister[][]{xmmRegs,xmmRegs,xmmRegs}, (Register[]regs)->{ Registerout=regs[0],in1=regs[1],in2=regs[2]; intvex2=vex2(/*R*/1,in2.encoding(),/*L*/1,/*pp*/1);returnnewint[]{0xC5,vex2,0xFE,modRM(out,in1)};});}

RA-Aware Version

Page 64: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

64 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...preamble...

<+12>:mov%r8,%r13 ;???

<+15>:mov%rcx,%rbx;???

<+18>:mov%rsi,%rbp;???

<+21>:vmovdqu0x10(%rsi),%ymm0;unbox

<+26>:vmovdqu0x10(%rdx),%ymm1;unbox

<+31>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+35>:vmovdqu0x10(%rbx),%ymm1;unbox

<+40>:vmovdqu0x10(%r13),%ymm2;unbox

<+46>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+50>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+54>:vmovdqu%ymm0,(%rsp);spill

...objectallocation...

<+75>:callq_new_instance_Java

<+80>:mov%rax,%rbx;???

<+83>:vmovdqu0x10(%rbp),%ymm0;unbox

<+88>:vmovdqu(%rsp),%ymm1;fill

<+93>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+97>:vmovdqu%ymm0,0x10(%rbx);box

<+102>:mov%rbx,%rax;???

...prologue...

<+119>:retq

0x112ea1640:

#{method}'testVAdd4’...

#parm0:rsi:rsi='java/lang/Long4’#parm1:rdx:rdx='java/lang/Long4’#parm2:rcx:rcx='java/lang/Long4’#parm3:r8:r8='java/lang/Long4'

Page 65: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

65 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Effects

•  Memory •  Killed registers

Page 66: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

66 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Preservation

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]

/*nomemoryeffects*//*KILL:[%rax,%rcx,%rdx,…,%xmm0,...,%xmm15]*/

Calling Convention

Page 67: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

67 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Register Preservation Explicit KILLs

vpaddd_,_,_

Long4vpaddd(Long4v1,Long4v2)

MT(L4L4)L4([ymm0-15],[ymm0-15])[ymm0-15]

/*nomemoryeffects*/KILL:[/*empty*/]

Page 68: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

68 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0;unbox

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

#{method}’vpaddd’...

Page 69: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

69 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectors and Box Elimination

Page 70: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

70 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

packagejava.lang;//128-bitvector.public/*value*/finalclassLong2{privatefinallongl1,l2;//FIXMEprivateLong2(){thrownewUnsupportedOperationException();}@HotSpotIntrinsicCandidatepublicstaticnativeLong2make(longlo,longhi);@HotSpotIntrinsicCandidatepublicnativelongextract(inti);@HotSpotIntrinsicCandidatepublicbooleanequals(Long2v){...}...

Page 71: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

71 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2

Page 72: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

72 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout

vmovdqu

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2

Page 73: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

73 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Endianness

vmovdqu

+8 +12 +20 0

l1 l2 mark klass

+16 +32

j.l.Long2 LSB MSB

MSB LSB

Page 74: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

74 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Endianness

vmovdqu

+8 +12 0

mark klass

+16 +32

j.l.Long2 LSB MSB

MSB LSB

Page 75: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

75 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vector Box Layout: Alignment

vmovdqa?

+8 +12 +20 0

mark klass

+16 +32

j.l.Long2

Page 76: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

76 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

8

Vector Box Layout: Alignment

vmovdqa

+8 +12 +20

mark klass

+32

j.l.Long2

vmovdqu

+16

8

Page 77: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

77 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Snippet Result Boxing C2 IR

Page 78: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

78 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Nested Vector Operations C2 IR

Page 79: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

79 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java;SAFEPOINT/DEOPT

<+72>:vmovdqu0x10(%rbp),%ymm0;unbox

<+77>:vmovdqu(%rsp),%ymm1;fill

...

PcDesc(pc=<+67>offset=48bits=4):

...

j.l.i.LambdaForm$BMH::reinvoke@20

Locals

-l0:a’.../BMH$Species_LL'{0x...}

-l1:obj[20]

...

Objects

-20:java.lang.Long4stack[0]

.

#parm0:rsi:rsi='java/lang/Long4'#parm1:rdx:rdx='java/lang/Long4’

No Boxing Across Safepoints

Page 80: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

80 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

Allocation Placement

Page 81: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

81 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

...nmethodpreamble...

<+12>:mov%rsi,%rbp;reg-to-regmove

<+15>:vmovdqu0x10(%rsi),%ymm0;unbox

<+20>:vmovdqu0x10(%rdx),%ymm1;unbox

<+25>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+29>:vmovdqu0x10(%rcx),%ymm1;unbox

<+34>:vmovdqu0x10(%r8),%ymm2;unbox

<+40>:vpaddd%ymm1,%ymm2,%ymm1;snippet

<+44>:vpaddd%ymm0,%ymm1,%ymm0;snippet

<+48>:vmovdqu%ymm0,(%rsp);spill

...allocation...

<+67>:callq_new_instance_Java

<+72>:vmovdqu0x10(%rbp),%ymm0;repeatedunbox

<+77>:vmovdqu(%rsp),%ymm1;fill

<+82>:vpaddd%ymm1,%ymm0,%ymm0;snippet

<+86>:vmovdqu%ymm0,0x10(%rax);box

...nmethodprologue...

Repeated Unboxing

Page 82: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

82 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Hash

hi+1 = 31*hi + vi

Page 83: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

83 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Hash

31 31 31 31 31 311 5432

i i i i i i i i234567 1 0

3176* * * * * * * *+ + + + + + + +a7318 a6318 a5318 a4318 a3318 a2318 a1318 a0318

b7 b6 b5 b4 b3 b2 b1 b0

a7 a6 a5 a4 a3 a2 a1 a0

+

Page 84: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

84 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Vectorized Hash

Long4vhash8(Long4acc,longch8){acc=mullo_epi32(acc,pow88);Long4cv8=load_v8qi_to_v8si(ch8);cv8=mullo_epi32(cv8,pow8);acc=add_epi32(acc,cv8);returnacc;}intvectorized_hash(byte[]buf,intoff,intlen){Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(buf,Unsafe.ARRAY_BYTE_BASE_OFFSET+off);acc=vhash8(acc,v);}

Page 85: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

85 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Page 86: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

86 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Long4acc=Long4.ZERO;for(;len>=8;off+=8,len-=8){longv=U.getLong(...);acc=vhash8(acc,v);}

Page 87: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

87 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Escape Analysis in C2

“The test case shows limitation of the current EA implementation. Objects will not be eliminated if there is merge point in which it is undefined which object is referenced.”

JDK-6853701 ”[The] address may point to more then one object. This may produce the false positive result (set not scalar replaceable) since the flow-insensitive escape analysis can't separate the case when stores overwrite the field's value from the case when stores happened on different control branches.”

hotpsot/src/share/vm/opto/escape.cpp#l1733

Limitations

Page 88: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

88 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Escape Analysis in C2

§ Phi node blocks allocation elimination § No vector load hoisting out of a loop § No constant folding of vector values § Repeated unboxing § Box allocation placement

Observations

Page 89: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

89 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Summary

Page 90: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

90 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Current Status

§ Almost feature complete § C2 support on Linux/Solaris/Mac x86-64 § Extensively used for Vector API experiments § Early adopters (thanks!)

–  Paul Sandoz (Oracle) –  Ian Graves (Intel)

Page 91: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

91 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Remaining Work

§ Feature work –  temporary registers –  instruction alignment –  multiple return values –  improve error diagnostics

§ Support non-x86 ISAs –  SPARC (VIS), ARM64 (NEON)

§ Support in other JIT-compilers (Graal, C1) –  implement snippet embedding

§ User-friendly API § EA enhancements

Page 92: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

92 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values cpuid

“CPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.”

Page 93: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

93 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values

staticfinalMethodHandleMHcpuid=MachineCodeSnippet.builder("cpuid").effects(CONTROL,READ_MEMORY,WRITE_MEMORY).argument(int.class,rsi).argument(int.class,rdx).returns(Long2.class,xmm0)//MT(II)L2.kills(rax,rbx,rcx,rdx).code(0x8B,0xC6,//mov%esi,%eax 0x8B,0xCA,//mov%edx,%ecx 0x0F,0xA2,//cpuid 0x66,0x0F,0x3A,0x22,0xC0,0x00,//pinsrd$0x0,%eax,%xmm0 0x66,0x0F,0x3A,0x22,0xC3,0x01,//pinsrd$0x1,%ebx,%xmm0 0x66,0x0F,0x3A,0x22,0xC1,0x02,//pinsrd$0x2,%ecx,%xmm0 0x66,0x0F,0x3A,0x22,0xC2,0x03)//pinsrd$0x3,%edx,%xmm0.make();

cpuid

Page 94: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

94 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Multiple Return Values

#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidpinsrd$0x0,%eax,%xmm0pinsrd$0x1,%ebx,%xmm0pinsrd$0x2,%ecx,%xmm0pinsrd$0x3,%edx,%xmm0pextrd%eax,%xmm0,$0x2retq

cpuid(0x0,0x0).ecx

#parm0:rsi=’I’#parm1:rdx=’I’mov%esi,%eaxmov%edx,%ecxcpuidmov%ecx,%eaxretq

Page 95: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

95 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Summary

§ Evolution of machine code snippet prototype –  raw code, vectors, code “recipes”, effects

§ Vector values –  should be value classes (w/ “heisenboxes”)

§  identity-less §  aggressive boxing/unboxing

–  EA is interim solution §  limitations of EA implementation in C2

Page 96: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

96 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Links

§ Project Panama repo: http://hg.openjdk.java.net/panama/panama/

§ Machine Code Snippets API: –  jdk.vm.ci/vm/ci/panama/MachineCodeSnippet.java

§ Samples –  http://hg.openjdk.java.net/panama/panama/jdk/file/tip/test/panama/snippets

Page 97: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

97 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Thank you!

[email protected] @iwan0www

Page 98: Machine Code Snippets in Javacr.openjdk.java.net/.../2016_JVMLS_MachineCodeSnippets.pdf · 2018-03-12 · Machine Code Snippets § Use case: prototyping 1. minimize implementation

98 Copyright © 2016, Oracle and/or its affiliates. All rights reserved

Graphic Section Divider