financial portfolio management with java on steroids - jax finance 2016
TRANSCRIPT
About meMarcus Gründler@marcusgruendler
Head of Portfolio Management SystemsArchitect at aixigo AG, Germany
www.aixigo.de
JAX Finance 2016 - London
Portfolio Management
• Transactions, prices, securities
• Financial algorithms
• Historical analysis
• Time series
Portfolio Management
• No extreme low latency
• High data throughput (1 mio rec/sec)
• Low response times (100ms/10,000 rec)
• Very large datasets
Matrix Sum23 101 2 34 88 120 4
44 12 234 211 112 189 11
33 1 86 201 3 11 22
65 32 62 22 34 15 67
43 178 105 138 192 38 41
11 58 35 25 27 16 21
Row major access
Matrix Sum
23 101 2 34 88 120 4
44 12 234 211 112 189 11
33 1 86 201 3 11 22
65 32 62 22 34 15 67
43 178 105 138 192 38 41
11 58 35 25 27 16 21
Column major access
Tool Support - JMH
• OpenJDK JMH (Java Microbenchmark Harness)
• Eliminates measurement (in)accuracy
• Statistically robust measurements
• Maven and Jenkins support
http://openjdk.java.net/projects/code-tools/jmh/
Latency NumbersCPU cycles Time Size
L1 latency ~ 4-5 cycles 1.5 ns 32 KBL2 latency ~ 12 cycles 3.5 ns 256 KBL3 latency ~ 36 cycles 10.6 ns 8-40 MBRAM latency ~ 230 cycles 67.6 ns 256 GB2KB over 1GBit 20,000 ns1 MB from RAM 250,000 nsDisk seek 10,000,000 nsRoundtrip US-EU-US 150,000,000 nsIntel i7-4770 (Haswell), 3.4 GHz Sources:http://www.7-cpu.com/cpu/Haswell.html
http://norvig.com/21-days.html#answers
Latency NumbersCPU cycles Time Size
L1 latency ~ 4-5 cycles 1.5 ns 32 KBL2 latency ~ 12 cycles 3.5 ns 256 KBL3 latency ~ 36 cycles 10.6 ns 8-40 MBRAM latency ~ 230 cycles 67.6 ns 256 GB2KB over 1GBit 20,000 ns1 MB from RAM 250,000 nsDisk seek 10,000,000 nsRoundtrip US-EU-US 150,000,000 nsIntel i7-4770 (Haswell), 3.4 GHz Sources:http://www.7-cpu.com/cpu/Haswell.html
http://norvig.com/21-days.html#answers
Cache-oblivious Algorithms
• Optimized for minimal memory transfer
• All computation with L1 cache
• Cache-“oblivious“: no knowledge about
cache hierarchy
• Keeps CPU permanently „under pressure“
• Empowers cache prefetching
java.lang.Double
double = 8 byteDouble = ? byte
BigDecimal = ? byte
0 4 8 12 16 20 24 28
Padding8 bytealignment
Objectheader
„Mark word“• HashCode• Flags
Class pointer
Data
java.lang.Double
double = 8 byteDouble = ? byte
BigDecimal = ? byte
0 4 8 12 16 20 24 28
Objectheader
„Mark word“• HashCode• Flags
Class pointer
Data
double = 8 byteDouble = 24 byte
BigDecimal = ? byte
double array0 4 8 12 16 20 24 28
„Mark word“• HashCode• Flags
Class pointer
Data 0 Data 1Arraylength
...
double = 8 byteDouble = 24 byte
BigDecimal = ? byte
BigDecimal0 4 8 12 16 20 24 28 32 36 40
ref to BigInteger
ref to int[ ]
Σ 40 byte
Σ 40 byte
Σ >16 byte>96 byte
Tool Support - JOL• OpenJDK tool – JOL (Java Object Layout)
• Insight into memory layout
• Heap dump analysis
• Exact memory usage
• Graphical layout visualization
• Maven module
http://openjdk.java.net/projects/code-tools/jol/
Tool Support - JOL
java -jar jol-cli.jar \internals java.math.BigDecimal
or
java –cp jol-cli.jar:my-own.jar \org.openjdk.jol.Main \internals foo.MyClass
Tool Support - JOL
java.math.BigDecimal object internals:OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A12 4 int BigDecimal.scale N/A16 8 long BigDecimal.intCompact N/A24 4 int BigDecimal.precision N/A28 4 BigInteger BigDecimal.intVal N/A32 4 String BigDecimal.stringCache N/A36 4 (loss due to the next object alignment)
Instance size: 40 bytes (estimated, the sample instance is not available)Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
• Keep data compact
• Think about data types
• Keep memory allocations low
• Keep garbage collection rate low
• Large speedups through cache optimized
algorithms
• Memory layout is crucial
• Scale by cache replication
@marcusgruendler