die lmax-architecture with disruptors: 6m transactions per ...donatas/vadovavimas/temos... · costs...

Die LMAX-Architecture with Disruptors: 6M Transactions per Second

Stephan Schmidt, Vice CTO, brands4friends

Me Stephan Schmidt Vice CTO brands4friends

@codemonkeyism www.codemonkeyism.com [email protected]

brands4friends No.1 Shopping Club in Germany > 360k daily visitors > 4.5M Users eBay company

20.04.12 5 WJAX 2011

Development at brands4friends Team Java and web developers, data warehouse developers Process Scrum since 2009 Kanban for DWH since 2012

LMAX - The London Multi-Asset Exchange

20.04.12 Fußzeilentext 9

"We aim to build the highest performance financial exchange in the world"

High Performance Transaction Processing

20.04.12 10 Fußzeilentext

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send



Queue

Queue

Queue

Queue

Queue

Queue

Ghz CPU

Cores


Actors? SEDA?

Stuff that did not work for various reasons


1.  RDBMS

2.  Actors

3.  SEDA

4.  J2EE …



Queue

Queue

Queue

Queue

Queue

Queue

LMAX Architecture




Queue

Queue

Queue

Queue

Queue

Queue

Size

Node Node Node Node

Linked List Queue

Add Remove

Array Queue

Cache Line Cache Line

AddRemove

Queue as a data structure Problems with Queues

19

1.  Reading (Take) and Writing (Add) are both write access => Write Contention

2.  Write Contention solves with Locks 1.  Other solutions include Deques

3.  Locks lead to context switches to the kernel 1.  Context switches lead to CPU cache misses etc.

2.  Kernel might use opportunity to do other stuff as well

Locks Costs according to LMAX Paper

20

Method Time in ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ Volatile Write

4.700

“Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier

LMAX Data Structure – Ring Buffer

21

Ring Buffer

Publisher Event Processor

Pre-Allocation of Buckets

22

Ring Buffer

31

24

1918

Publisher

30 29 28272625

23

22

21

20

17161514131211

109

87

6

5

432

10

Event Processor

2^5•  No (less) GC problems •  Objects are near each other in memory

=> cache friendly

Coordination

23

Ring Buffer

31

24

1918

Publisher

30 29 28272625

23

22

21

20

17161514131211

109

87

6

5

432

10

Event Processor

2^5

Claim Strategy

1.Claim 2.Write 3.Make Public by advancing sequence

Wait Strategy



Queue

Queue

Queue

Queue

Queue

Queue

Latency

Receive Message

Journal

Replicate

Unmarshall

Business Logic



Datenstruktur

Datenstruktur

Ouput DisruptorOuput DisruptorInput Disruptor Ouput Disruptor

Business Logic Handler

LMAX Architektur

28

Input Disruptor

Receiver

Journaler

Replicator

Un-Marshaller


Output Disruptor

Publisher

Marshaller

HA Node

File System

Jede Stage kann mehrere Threads haben

29

31

24

1918

Receiver

Journaler

Replicator


Receiver writes on 31. Journaler and Replicator read on 24 and can move up the sequence to 30.

Business Logic Handler needs to stay behind all others.

Un-Marshaller can move beyond Journaler and Replicator up to 30.

Un-Marshaller

Java API


P1

C1

C2

C3

C4

C1P1

C2

C3

C4

P1

C1 C2

C3 C4

C1

C2

C3

C4P1

C1

P1

P2


Demo

LMAX Low Level Ideas


1.  Simple Code

2.  Everything in memory

3.  Single threaded per CPU for business logic

4.  Business logic has no I/O, I/O is done somewhere else

5.  Scheduler “knows” dependencies of handlers

6M TPS? How did LMAX do it?

40

10K+ TPS

If you don't do anything stupid

3 billions of instructions on modern CPU

100K+ TPS

Clean organized code

Standard libraries

1000K+ TPS

Custom, cache friendly collections

Performance Testing

Controlled GC

Very well modeled domain

x 10

x 10

We’re looking for very good developers

Thanks! @codemonkeyism [email protected]

43

Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone, Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno1

Sources


“Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011

"The LMAX Architecture”, Martin Fowler, 2011

http://martinfowler.com/articles/lmax.html

“How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 2010

die lmax-architecture with disruptors: 6m transactions per ...donatas/vadovavimas/temos... · costs...

Documents