financial portfolio management with java on steroids - jax finance 2016

Marcus Gründler | aixigo AG

Financial Portfolio Management on Steroids

About meMarcus Gründler@marcusgruendler

Head of Portfolio Management SystemsArchitect at aixigo AG, Germany

www.aixigo.de

JAX Finance 2016 - London

Agenda

• Financial portfolio management

• Hardware

• Java memory

• Programming patterns

• Scaling

What exactly is

Financial PortfolioManagement?

Portfolio Management

• Transactions, prices, securities

• Financial algorithms

• Historical analysis

• Time series

Portfolio Management

• No extreme low latency

• High data throughput (1 mio rec/sec)

• Low response times (100ms/10,000 rec)

• Very large datasets

Ishigh performancecomputingpossible withJava?

Yes ...

... read the fine print!

Matrix Sum23 101 2 34 88 120 4

44 12 234 211 112 189 11

33 1 86 201 3 11 22

65 32 62 22 34 15 67

43 178 105 138 192 38 41

11 58 35 25 27 16 21

Row major access

Matrix Sum

23 101 2 34 88 120 4

44 12 234 211 112 189 11

33 1 86 201 3 11 22

65 32 62 22 34 15 67

43 178 105 138 192 38 41

11 58 35 25 27 16 21

Column major access

Matrix Sum(10,000 x 10,000 elements)

ops/sec

ops/sec

ops/sec

Matrix Sum

Row major access

Column major access

Tool Support - JMH

• OpenJDK JMH (Java Microbenchmark Harness)

• Eliminates measurement (in)accuracy

• Statistically robust measurements

• Maven and Jenkins support

http://openjdk.java.net/projects/code-tools/jmh/

Memory Access

CPU

RAM

Cached Memory Access

CPU

RAM

Cache

Multi Level Caches

CPU

RAM

L1 L2 L3

Latency NumbersCPU cycles Time Size

L1 latency ~ 4-5 cycles 1.5 ns 32 KBL2 latency ~ 12 cycles 3.5 ns 256 KBL3 latency ~ 36 cycles 10.6 ns 8-40 MBRAM latency ~ 230 cycles 67.6 ns 256 GB2KB over 1GBit 20,000 ns1 MB from RAM 250,000 nsDisk seek 10,000,000 nsRoundtrip US-EU-US 150,000,000 nsIntel i7-4770 (Haswell), 3.4 GHz Sources:http://www.7-cpu.com/cpu/Haswell.html

http://norvig.com/21-days.html#answers

Cache-oblivious Algorithms

• Optimized for minimal memory transfer

• All computation with L1 cache

• Cache-“oblivious“: no knowledge about

cache hierarchy

• Keeps CPU permanently „under pressure“

• Empowers cache prefetching

Plain Matrix Transpose

1 2 34 5 6

...

1 4...2 5

637

7

Cache-oblivious Transpose

Matrix Transpose(4096 x 4096 elements)

ops/sec

Matrix Transpose(4096 x 4096 elements)

Cores

ops/sec

• Memory access patterns matter

• Avoid main memory jumps

• Algorithms should support prefetching

How muchmemorydo weconsume?

How large is an object?

double = 8 byteDouble = ? byte

BigDecimal = ? byte

java.lang.Double


BigDecimal = ? byte

0 4 8 12 16 20 24 28

Padding8 bytealignment

Objectheader

„Mark word“• HashCode• Flags

Class pointer

Data

java.lang.Double


BigDecimal = ? byte

0 4 8 12 16 20 24 28

Objectheader


Class pointer

Data

double = 8 byteDouble = 24 byte

BigDecimal = ? byte

double array0 4 8 12 16 20 24 28


Class pointer

Data 0 Data 1Arraylength

...


BigDecimal = ? byte

java.lang.Double


BigDecimal = ? byte

BigDecimal0 4 8 12 16 20 24 28 32 36 40

ref to BigInteger

ref to int[ ]

Σ 40 byte

Σ 40 byte

Σ >16 byte>96 byte


BigDecimal = >96 byte

BigDecimal

Tool Support - JOL• OpenJDK tool – JOL (Java Object Layout)

• Insight into memory layout

• Heap dump analysis

• Exact memory usage

• Graphical layout visualization

• Maven module

http://openjdk.java.net/projects/code-tools/jol/

Tool Support - JOL

java -jar jol-cli.jar \internals java.math.BigDecimal

or

java –cp jol-cli.jar:my-own.jar \org.openjdk.jol.Main \internals foo.MyClass

Tool Support - JOL

java.math.BigDecimal object internals:OFFSET SIZE TYPE DESCRIPTION VALUE

0 12 (object header) N/A12 4 int BigDecimal.scale N/A16 8 long BigDecimal.intCompact N/A24 4 int BigDecimal.precision N/A28 4 BigInteger BigDecimal.intVal N/A32 4 String BigDecimal.stringCache N/A36 4 (loss due to the next object alignment)

Instance size: 40 bytes (estimated, the sample instance is not available)Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Data Locality

56 + 520 = 576 bytes 64 Bytes

Memory LayoutTransactionPlain[100]

TransactionCompact[100]

276 kB

• Keep data compact

• Think about data types

• Keep memory allocations low

• Keep garbage collection rate low

And...

Which patternsshould Iuse?

Tree Model

Flattened Data Model

Streaming Data Access

!𝑧𝑦𝑥

4

%𝑦𝑥

1%𝑥𝑥

3

2

Decoupled Algorithms

• Prefer primitives over classes

• Prefer arrays over object trees

• Process one array at a time

What aboutclusterreplication?

MVCC(Multi Version Concurrency Control)

V1

V2

V3

V4

View

1Vi

ew 1

+2Vi

ew 1

+2+3

View

1+2

+3+4

Compact Data

V1

V2

V3

V4

View

1Vi

ew 1

+2Vi

ew 1

+2+3

View

1+2

+3+4

Off heap RAMOn disk

From Disk to RAM

V1

V2

V3

V1 V2 V3

MappedByteBuffer

Data Distribution

Apache Kafka

Data Distribution

Summary

• Large speedups through cache optimized

algorithms

• Memory layout is crucial

• Scale by cache replication

@marcusgruendler

Thankyou!

financial portfolio management with java on steroids - jax finance 2016

Technology