report: high performance trading - onixs · performance trading - fix messaging testing for low...
TRANSCRIPT
Report: High
Performance Trading - FIX Messaging Testing
for Low Latency
Abstract:
FIX is the de-facto standard protocol used extensively
for electronic communication between buy and sell-side and execution venues, where the performance requirements of algorithmic and high frequency trading are extreme and or the benefits of STP (straight through processing) are sought from
electronic connectivity.
January 2012
WITH THANKS TO THE TEAM AT INTEL FASTERLAB UK
STATEMENT OF CONFIDENTIALITY / DISCLAIMER This document has been prepared by the consortium of companies described herein. No part of this document shall be reproduced without the consultation of these parties and
acknowledgement of its source. Contact can be made to [email protected].
FIX Messaging Low Latency Testing
January 2012
TABLE OF CONTENTS
1. SUMMARY .............................................................................................................................................................. 1
2. INTRODUCTION ................................................................................................................................................... 2
2.1 PURPOSE ...................................................................................................................................................................... 2
2.2 ROLES AND RESPONSIBILITIES ............................................................................................................................. 2
2.3 CONDUCT AND PRESENTATION ............................................................................................................................ 3
3. METHOD ................................................................................................................................................................. 4
3.1 TEST HARNESS – SOFTWARE DESIGN .................................................................................................................. 4
3.2 TEST HARNESS - HARDWARE DESIGN ................................................................................................................. 5
3.3 THE MESSAGE PASSING PROCESS……….………………………………………………………………………. 7
3.4 TIMINGS ...................................................................................................................................................................... 8
3.5 POST-TEST DATA PROCESSING .............................................................................................................................. 8
3.6 TEST SCENARIOS ....................................................................................................................................................... 9
4. RESULTS AND OBSERVATIONS ..................................................................................................................... 10
4.1 EFFECTIVENESS OF KERNEL BYPASS ................................................................................................................ 10
4.2 MAIN RESULTS ........................................................................................................................................................ 10
5. DISCUSSION ......................................................................................................................................................... 16
5.1 VALUE OF THE EXERCISE TO THE ELECTRONIC FINANCIAL TRADING COMMUNITY .......................... 16
5.2 PERFORMANCE OF THE TEST RIG ....................................................................................................................... 16
5.3 RAISING THE TEST RIG TO PRODUCTION STANDARD ................................................................................... 17
5.4 EXPLOITING THE RESULTS ................................................................................................................................... 19
6. CONCLUSION ...................................................................................................................................................... 20
APPENDICES ............................................................................................................................................................ 21
1A TECHNOLOGY MEMBERS ...................................................................................................................................... 21
FIX Messaging Low Latency Testing
1 January 2012
1. Summary
This briefing paper reports on the activity of a consortium of leading IT vendors that have joined forces to
create demonstrable high performance solution stacks to address common business requirements in
financial trading. The initial focus of the consortium is on a reference-able technology stack of products
and services to support FIX protocol communication functions. The paper describes the test environment,
documents a set of benchmark tests performed on both commercial and open source FIX engine offerings,
and details and interprets the representative latency and throughput figures achieved.
The objective is to create transparency in and capability around comparing performance statistics for key
functions along the trading life cycle. The tests used business workloads and were deliberately aligned to
reflect the market‟s current interest in the measurement of interparty latency across the trade life cycle –
using FIX formatted messages for defined legs.
An on-going objective is to provide the market with useful data in order to support decisions in
technology investment. Therefore, a range of technologies and application software has been addressed.
Approaches were made to a number of application vendors with the ultimate agreement to test FIX
engines covering both C++ and Java implementations from EPAM Systems‟ B2BITS unit and Rapid
Addition, respectively. As a datum point for comparison, the open source QuickFIX, in both its C++ and
Java variants was used.
OnX Enterprise Solutions Ltd is leading a consortium whose charter members include Intel, Dell, Arista
Networks and Solarflare Communications, with additional services provided by Edge Technology Group,
GreySpark Partners and Equinix. The foundation objective is to create transparent comparative
performance statistics for key functions along the trading life cycles using business workloads – FIX
being used on a number of legs of the typical trade life cycle.
A series of tests were undertaken that demonstrate the value of commercial software (versus open source)
and use of specialist technologies in a low latency infrastructure. The consortium approach recognises the
reality that the creation of high performance solutions requires the interaction of many leading edge
technologies and the integration of components from several vendors. These parties must work together in
order to specify correct parts and then to tune them together such that a complete and reliable solution is
available through a collective single channel.
Results for the tests showed that both B2BITS and Rapid Addition‟s commercial FIX engines out
performed the open source QuickFIX offerings (C++ and Java) in a range of tests, being between 4 and
16 times faster in generating messages during a standardised simulated trade. The average latency for the
commercial engines was 11 to 12 microseconds, whereas the open source engines were between 45 and
180 microseconds. The variation in results was equally stark, since the frequency distribution results from
the commercial engines were bell curved but the open source results had a long fat tail. This indicated the
commercial solutions significantly reduced the effect of network jitter and with it the undesired variance
of performance.
B2BITS FIX Antenna engine was a C++ version; Rapid Addition‟s Cheetah engine was Java. Both
demonstrated similar performance characteristics over a range of tests and workloads. The similarity of
results between the commercial C++ and Java engines stood in contrast to the open source equivalents,
demonstrating that Java can perform as well as C++ code when implemented in an optimised fashion.
FIX Messaging Low Latency Testing
2 January 2012
2. Introduction
In the online and Co-Lo based financial trading markets, performance, both in terms of latency and
throughput is paramount. It is the difference between a firm being „in the market‟ or not. Complete
trading systems are built from many complex elements, including market data capture, trading algorithms,
trade execution, and in-flow risk analysis. These elements run on critical infrastructure components,
hardware, software, network and connectivity all of which must interoperate with each other.
Today, there is a lack of industry-recognised benchmarks for designers, which can demonstrate solutions
have „high performance‟ characteristics. To achieve performance and agility, with low up-front and on-
going operating costs, trade infrastructure implementation teams need to source the best available
components from different innovative specialist vendors, integrate them and tune their interoperability.
FIX is the de-facto standard protocol used extensively for electronic communication between buy and
sell-side and execution venues, where the performance requirements of algorithmic and high frequency
trading are extreme and or the benefits of STP (straight through processing) are sought from electronic
connectivity.
2.1 Purpose
FIX message generation is an increasingly important leg in automated trading and can be a source of
significant latency and jitter which can adversely impact the success of business and trading strategies. As
trading strategies require access to a greater diversity of execution venues, communication over the
standard FIX protocol is more cost effective than accessing markets via diverse proprietary protocols at
the various venues.
Infrastructure deployment teams have to select appropriate components, integrating them, commissioning
them, deploying them for maximum performance, which can be an extreme challenge, It requires a
combination of knowledge, skills, experience and deployment ability that is today scarce and expensive in
the market.
The testing undertaken by OnX in the Intel lab with support from the consortium was to investigate these
assertions:
1. Using commercial FIX engines would achieve lower latency and less jitter.
2. Using specialist low latency network techniques would have a significant impact on latency.
Full results for each environment and latency improvement are available on request.
2.2 Roles and Responsibilities
OnX Enterprise Solutions, as a “solution facilitator” led a collaborative approach through the creation of a
consortium of IT vendors focused on the creation of high performance infrastructure designs specifically
for financial trading systems. OnX consultants provided input into the hardware selection, conduct of the
tests and post-test analysis.
FIX Messaging Low Latency Testing
3 January 2012
The benchmarks were conducted at Intel‟s fasterLAB in the UK. Intel engineers screened hardware and
software performance for optimization. Intel engineers performed the tests, recorded the results and
provided post test process to produce average tables and graph outputs.
Software suppliers Rapid Addition and B2BITS EPAM Systems provided their FIX engines. The test
harness was designed by Rapid Addition and the open source implementations in Java and C++ were
supplied by Rapid Addition and B2BITS respectively.
2.2.1 Consortium Members
A number of technology and services providers have invested as charter members of the consortium.
However, the initiative is open, and further participant members may be added in the future. Between
them, these members provide a complete infrastructure capability and created the reference architecture,
each drawing on specific expertise while OnX provided the integration and build capability.
The charter group members directly involved in building the initial technology stack and in the
performance benchmark testing comprise:
Lead:
OnX Enterprise Solutions – Product procurement and architecture design
Infrastructure component providers:
Arista Networks – Network Switch
Dell – X86 Servers
Intel – Intel® Xeon® processors, and lab environment
Solarflare Communications – Network Interface Card
Implementation and deployment Services:
Edge Technology Group – Buy-side solutions
Equinix – Trading ecosystem hosting
GreySpark Partners – Capital Markets business, management and Technology consulting services
Applications under test:
Rapid Addition – FIX engine
B2BITS EPAM – FIX engine
QuickFIX – Open Source FIX engine
2.3 Conduct and Presentation
Tests were performed by Intel engineers and preliminary results shared with the software suppliers who
were then given an opportunity to optimize their code. A second round of testing was then conducted, the
results of which were used in the preparation of this paper.
The software houses had access to the test harness prior to testing, in order to agree and finalize the
methodology – but no access or amendment was allowed during the test runs. All results were captured by
Intel and shared with OnX. Only results from Rapid Addition were shared with Rapid Addition and
likewise the results from B2BITS EPAM were shared only with B2BITS EPAM.
FIX Messaging Low Latency Testing
4 January 2012
3. Method
3.1 Test Harness – Software Design
The test harness used to perform the benchmarks was designed by Rapid Addition (audited by B2BITS
EPAM) with implementations for C++ and Java written by B2BITS EPAM and Rapid Addition,
respectively.
The test software was implemented across the two servers. One ran simulators for a market data (MD)
feed and an execution venue (EV), the other represented a typical software implementation of a real life
algorithmic trading system – with all applications run on a single server to minimise latency, and through
the „in-process‟ linkage of the algorithmic application logic and the FIX engine under test.
The tests measured recognisable legs in the trading life cycle, mapping to real life workflow scenarios and
researching current industry interest in the measurement of interparty latency over discrete legs of a
trading cycle.
The very simple logic of the simulated algorithmic trading component minimises latency and jitter, so
allowing the focus of the benchmark to be on the FIX engines themselves.
The benchmarking on the FIX engines focused on their ability (a) to process both FIX-formatted market
data and (b) order processing messages for 2 defined stages in the trade cycle at different throughput
rates, over both burst and prolonged periods.
The companies under scrutiny were given controlled access to the test rig with the ability to run tests,
analyse results, tune and re-test. This activity was supported by skilled Intel engineers, who were also
available to assist the companies optimising their code for the target hardware stack.
The diagram below illustrates the test harness with its simulated market data and execution venue.
Figure 1: Test Harness Overview
FIX Messaging Low Latency Testing
5 January 2012
3.2 Test Harness - Hardware Design
In order to conduct benchmark tests on the FIX engines, the reference architecture was specified and built
by OnX at the fasterLAB in the UK. OnX also analysed and interpreted the benchmarks, and provided an
independent audit of the test activities by the FIX engine vendors. These vendors accessed the test rig via
remote access under pre-approved and agreed conditions. The main components of the reference
architecture are shown below:
Figure 2: Reference Architecture Components
3.2.1 CPU and Servers
The test harness server was a Dell PowerEdge R710 server. This occupies 2U of rack space and
incorporates energy efficient technologies to reduce power consumption and cooling. These are typically
deployed in co-location environments, where space and power can be limited.
The market data simulator and execution venue server included 2 x Intel® Xeon® processor X5677 , each
with 4 cores, at 3.47GHz and 16GB of RAM; running Microsoft Windows Server 2008. This
configuration was sufficient for the test harness task of generating a suitable trading workload.
Dell also provided the monitoring server that hosted the network monitoring service. This comprised an
Endace network monitor, timings were uploaded to the operating system only for post test processing.
FIX Messaging Low Latency Testing
6 January 2012
The test harness server housed the algorithmic trading system simulator and FIX engine. This server
included a single Intel® Xeon® processor X5698 (dual-core), clocked at 4.4 GHz, with 12MB of L3
cache and 96GB of RAM (12 x 8GB); running Red Hat Enterprise Linux (RHEL 6.0). This processor has
been designed based on feedback directly from Intel‟s teams in the field close to financial trading for
applications where the fastest single-thread instruction execution is required. Performance increases of
more than 20% compared to other Intel® Xeon® processor X5600 Series from Intel were noted.
Preliminary tests were undertaken to select the most appropriate processor for the workload by comparing
the Intel® Xeon® processor X5698 (4.4GHz) against the Intel® Xeon® processor X5680 (3.33GHz) .
The preliminary test, using a message rate of 100,000 messages a second showed the Intel® Xeon®
processor X5698 to have 36% better latency performance than the Intel® Xeon® processor X5680. The
speed difference between the processors was 32%, indicating the Intel® Xeon® processor X5698 was
exhibiting better linear scalability under test and was better suited to the FIX engine workload. On the
basis of this preliminary test, the Intel® Xeon® processor X5698 was selected for the test environment.
3.2.2 Network Design
The network used in the test harness used a network switch design rather than incorporating network taps
at the points of measurement. Network taps are often deployed to measure latency across certain trade
processing legs; however they can introduce instability and unreliability into the network. Port mirroring
was used to forward packet data to the Endace network monitor. This is a much more common network
implementation in production trading environments.
Two types of network switch were considered:
1. Cut-through switch. This switch starts forwarding a network packet before the whole packet has
been received, normally as soon as the destination address is processed. This reduces latency at
the switch but decreases reliability as corrupted packets may be forwarded.
2. Store and Forward switch. This design buffers the whole packet before processing it. This enables
the switch to validate the integrity of the packet before forwarding it. There is a consequential
delay as a result of the buffering process, which increases latency.
Knowing that timings were likely to be in the range of 5 microseconds to 300 microseconds, the low
latency cut-through switch design was selected. The delay of switching packets using a cut-though switch
is of the order of 300 to 1000 nanoseconds, depending on manufacturer. The delay of store and forward
switching is between 500 to 1000 microseconds, again depending on manufacturer.
Therefore, the cut through design was adopted – with the switch from Arista Networks (7124SX) built
into the stack. The 7124SX uses a low latency design application specific integrated circuit (ASIC,
switching at a 250 nanosecond rate. The ASIC is from Fulcrum Microsystems (an Intel company). This
network switch has an extended operating system, (EOS), which can support additional features, such as
PTP (Precision Time Protocol) and can also use Arista‟s latency analyser utility, known as LANZ.
The switched design depended on a feature called port mirroring, which is used for monitoring traffic by
sending information on a specified physical port to another interface. In this case it was also essential port
mirroring copied both received and transmitted packets on the mirror source to the mirror destination. In
the configuration, the source was the port connecting the FIX engine server, and the destination was the
server hosting the Endace Network monitor card.
FIX Messaging Low Latency Testing
7 January 2012
Low latency network interface cards (NICs) were selected for all servers. With a non-specialized network
card, latencies are around 20 microseconds. Empirical evidence from Solarflare Communications
indicates that this can be reduced by 50% by using a specialized low latency network card and by a
further 50% using a technique referred to as „kernel bypass‟. Solarflare is a recognized provider of low
latency NICs offering kernel bypass support for both Transmission Control Protocol (TCP) and User
Datagram Protocol (UDP) traffic. Typically, market data is broadcast using stateless UDP and trade
execution uses TCP. The model selected was the dual interface SFN5122-F.
3.3 The Message Passing Process
The diagram below illustrates how messages pass through the test harness.
Figure 3: Execution Flow of Messages Through the Test Harness
Referring to figure 3 above, the message flow in detail is now described:
1. The market data simulator created Market Data incremental refresh messages (tag 35 = X),
assigning an MD Entry Price (tag 270) that was incremented through a saw tooth pattern from
0.001 in 0.001 increments, cycling through a small, in memory, list of stocks (tag 55); with each
cycle of the saw tooth pattern the integer part of the price was incremented.
Key
MD Simulator FIX Engine Under Test
EV Simulator
Algo simulator
Se
ssio
n-1
Se
ssio
n-2
35=X
Price
Stock
+
+
Other
Start
Is this a bid
(269=0)?
Invoke “Create
Order Single”
class to “buy”
stock
Yes
Yes
End
No
Does MD Entry
Price end .00?
No
Order Single
Status=Filled
Create
35=D
Hand up message
Create 1st ER
35=8,39=0
Create 2nd
ER
35=8,39=2
Hand up message
No
Yes
Invoke “Create
Order Single”
class to “sell”
stock
Discard
Yes
Ne
two
rk C
ard
= timing points
Has “buy”
Traded?
Is this “New”
ER?No
FIX Messaging Low Latency Testing
8 January 2012
2. The FIX engine listened to this stream of messages on a single FIX session (Session-1) and hands
each message up to the algorithmic trading simulator.
3. The algorithm simulator interrogated the data and when a bid (tag 269=0) had a MD Entry Price
that ends in ".000” (e.g. 270=56.000) it instructed the FIX engine to create and send a new Order
Single (tag 35=D) message to buy 100 lots (tag 38=100) of the symbol (tag 55) to an execution
venue simulator on a second FIX session (Session-2). Since each market data message had a
unique price, market data messages could be correlated with the order messages that they
triggered.
4. The execution venue simulator automatically filled the order by creating two Execution Reports
(tag 35=8). The first had an Order Status of “New” (tag 39=0); the second, “Filled” (tag 39=2).
These were returned on the same FIX session (Session-2).
5. On receipt of the fill (tag 35=8; tag 39=2) the algo simulator instructed the FIX engine to send
another Order Single (tag35=D) to sell 100 lots (tag 38=100) the same symbol (tag 55).
6. Again, the execution venue simulator automatically filled the order by creating two Execution
Reports (tag 35=8). The first will have an Order Status of “New” (tag 39=0); the second, “Filled”
(tag 39=2).
7. Note: Tests were performed without use of persistent storage.
3.4 Timings Since timestamps within the test harness hardware components lacked sufficient accuracy to the
microsecond, timings were recorded on an Endace network monitor.
Three timestamps were recorded for each benchmark process:
1. Receipt of the market data message from the market data simulator (T1).
2. Transmission of each of the 2 single order messages to the execution venue simulator (T2).
3. Receipt of confirmation of the execution of T1 from the execution venue simulator (T3).
3.5 Post-Test Data Processing
The timings collected on the Endace card were uploaded to a Unix workstation for post processing.
Depending on the test run parameters and duration between 3000 and 24 million timings (generating
24MB of test result data per second) were recorded. Scripts were used to analyze the timestamps to give 2
performance measurements:
1. The FIX engine‟s ability to process market data messages and create single order messages as a
result, calculated as T2-T1 for each buy order.
2. The FIX engine‟s ability to generate single order messages after receipt of an order filled
message, calculated as T3-T2 for each sell order.
FIX Messaging Low Latency Testing
9 January 2012
3.6 Test Scenarios
The benchmark process was repeated as part of 3 different test cycles covering short to extended
duration periods. Each set of benchmark cycles was repeated 3 times, in order to establish mean
latency figures of the FIX engines.
The 3 intended test cycles were:
1. Burst test – where 50,000 market data messages per second were generated by the market data
simulator for a period of 5 minutes.
2. Sustained test – where market data message rates were increased from 10,000 to 100,000 per
second, by 10,000 every 4 minutes, for a total of 40 minutes.
3. Extended sustained test – where market data rates were increased from 10,000 to 50,000 per
second, by 10,000 every 10 minutes, and then held at 50,000 for a total time of 4 hours.
At 50,000 market data messages per second, 50 orders per second were generated by the algo simulator,
and at 100,000 market data messages, 100 orders per second were generated.
Different execution venue simulator delays were tested since not all matching engines at different
exchanges are equal. The delays were varied for the burst test as follows:
Delay (Microseconds) Packets Per Second
10 100,000
14 71,429
20 50,000
50 20,000
100 10,000
200 5,000
1,000 1,000
2,000 500
FIX Messaging Low Latency Testing
10 January 2012
4. Results and Observations
A total of 84 test runs were conducted and analyzed out of an anticipated total number of 96 test results
from the FIX engines tested: B2BITS EPAM, QuickFIX C++; and Rapid Addition, QuickFIX Java.
Performance issues with the QuickFIX C++ engine would not permit it to operate when the execution
venues matching engine was set to perform at a rate faster than 50 microseconds.
4.1 Effectiveness of Kernel Bypass
A preliminary design test was conducted and results indicated that the use of kernel bypass (Solarflare‟s
Open Onload product) had an impact on the commercial FIX engines from B2BITS and Rapid Addition
across all tests. No observable impact was recorded when testing the open source variants.
With this observation being established, it was decided that kernel bypass would be enabled for all tests,
irrespective of whether the application design was capable of taking advantage of it.
4.2 Main Results
The results of the tests showed varying performance characteristics between both Java, C++ and open
source code streams.
This included outright latency when delivering message workloads, and the level of jitter displayed by the
engines as they performed their tasks across the period of the test workloads.
The graphs below show a selection of performance characteristics. Full detailed figures for each
environment can be seen in the C++ and Java results reports respectively, where each commercial engine
is compared with its open source equivalent.
FIX Messaging Low Latency Testing
11 January 2012
4.2.1 Test Results and Observations
The two graphs above show the latency of the workload completion over a 300 microsecond range,
comparing open source against the commercially available Java and C++ FIX engines, respectively.
Buy Orders – Execution Venue simulating a 50
microsecond order matching delay
FIX Messaging Low Latency Testing
12 January 2012
The two graphs above show the same results over a 60 microsecond range, comparing open source
against the commercially available Java and C++ FIX engines, respectively.
FIX Messaging Low Latency Testing
13 January 2012
The two graphs above show the latency of the workload completion over a 300 microsecond range,
comparing open source against the commercially available Java and C++ FIX engines, respectively. Note
the absence of performance test results from the open source C++ engine under these test conditions.
Buy Orders – Execution Venue simulating a 14
microsecond order matching delay
FIX Messaging Low Latency Testing
14 January 2012
The two graphs above show the same results over a 60 microsecond range, comparing open source
against the commercially available Java and C++ FIX engines, respectively.
1. The commercial FIX engines completed the messaging tasks between 30 and 50 microseconds
more quickly than the QuickFIX engines.
2. The QuickFIX engines had outlying results to 300 microseconds (they did not complete their task
inside this time), a source of jitter (unpredictability).
3. QuickFIX C++ was unable to perform with the Exchange simulator set at 14 Microseconds.
4. Across the range of tests, each commercial engine exhibited different characteristics, with
differences in outright latency and jitter, which showed no common theme as to performance
characteristics and are hence considered to be within experimental error. This assertion is
demonstrated when examining the whole result set.
5. Open source/free Java and C++ QuickFIX engines show random variation between themselves –
C++ version could not perform at the 14 microsecond load level.
6. The commercial FIX engines were consistent and deterministic throughout the tests.
The commercial engines showed a normal distribution pattern and calculations of standard deviation were
undertaken. The results for the QuickFIX engines showed a large number of outlying results (which
translates to poor reliability in handling trading workloads) and did not fit the normal distribution model.
FIX Messaging Low Latency Testing
15 January 2012
The commercial FIX engines showed a much tighter distribution range of 4 microseconds, as opposed to
50 microseconds.
The sample run below illustrates the point. Note the difference in microsecond range on the X axis of
each graph below.
Nu
mb
er
Of
Sa
mp
les
Time in µs
Nu
mb
er
Of
Sa
mp
les
Time in µs
QuickFIX - open source
Sample commercial engine
FIX Messaging Low Latency Testing
16 January 2012
5. Discussion
5.1 Value of the Exercise to the Electronic Financial Trading Community The testing exercise has illuminated the debate by practitioners who look to quantify the benefits of
commercial FIX engines over their open source counterparts. It is clear that the commercial engines
outperform open source versions by an order of magnitude – and also have significantly higher
consistency in performance, an essential feature for the execution of certain trading strategies. While the
open source model is widely successful as a driver for innovation, in the case of FIX it is clearly
important to select software products based on the required workload and performance characteristics.
The Java based FIX engine closely matched the native C++ code – with each engine showing individual
characteristics.
Finally, the exercise has demonstrated the value of optimized high performance infrastructure when
deploying automated electronic trading systems.
5.2 Performance of the Test Rig
The test rig in the Intel fasterLAB did not prove to be a limiting factor in the testing process. The
infrastructure showed itself to be reliable (with no failed components over the test cycle) when running at
the extremes of performance, including running the CPU‟s at 100% capacity for prolonged periods.
In future tests an enhancement which is being pursued is to implement the Precision Time Protocol (PTP),
which is accurate to 500 nanoseconds. PTP enabled NIC‟s will be tested. These will include Solarflare‟s -
SFN5322F, which has an accurate oscillator to act as a grand master clock. Other network components
are then synchronized, provided they implement the PTP network daemon, which is available for both
Red Hat and the Arista EOS switch operating system.
Latency can be measured at the switch using the LANZ feature from Arista. This will reduce the number
of components required to accurately time stamp network packets generated during the trade cycle.
This enhancement will continue to ensure the integrated trading test suite remains at the leading edge of
network component innovation.
FIX Messaging Low Latency Testing
17 January 2012
5.3 Raising the Test Rig to Production Standard
5.3.1 Deploying Production-Quality Infrastructure
A focus of this series of tests has been to illustrate the importance of design in the technical infrastructure
and its direct and positive impact on performance.
Moving from a lab experiment to a stable production system, which can support live trading execution
strategies that rely on speed and reliability, can be expensive and time consuming.
Deploying high performance infrastructure requires prudent engineering discipline, which has to be
accommodated in any implementation plan. This is characterized by the non-functional requirements
listed below:
Reliability – the stability of a system to reproduce the same results under the same conditions on
an on-going basis requiring minimized intervention.
Availability – the ability to continue operation, with failover/disaster recovery when one or more
components fail.
Testability – scrutinize and assert the integrity of the system as fit for purpose as planned and
required.
Manageability – control of the system, start, stop and vary the control parameters, using planned
resources, be they in-house or outsourced to a service provider.
Performance – closely aligned to reliability – the ability of the system to work within the required
functional constraints and meet operational expectations.
Security – ensure the access control, audit and privacy of the system is maintained – maintaining
required audit trails – and access to information for internal prudence and external compliance.
Scalability – the system can maintain performance requirements and/or accommodate spikes in
demand as workloads increase within defined boundaries.
Extensibility – the ease of change of a component without consequential change to adjacent
components – the ability to extend the scope of the system to support additional business
functions, e.g. adjacent and/or new roles such as risk reporting, compliance, etc.
Project governance is required across the implementation of a high performance trading infrastructure.
This begins with an analysis of the current environment, whether it is a green field deployment or a
complete replacement of existing systems.
A critical component is to ensure that any new system can integrate effectively with existing systems
(SOR, risk, market data, etc.). Across the financial services technology landscape, these skills and
competencies are typically spread across multiple parties with differing and often overlapping areas of
competence and responsibility. This can introduce variance in the effectiveness of the trading
infrastructure, which can impact the overall effectiveness of the deployment project.
The consortium has been assembled to create teaming amongst parties, who can carry the resource loads
of planning and designing suitable infrastructure within the context of each firm‟s current and ongoing
environment. This approach can be equally applied whether the deployment is in-house or at a co-
location facility in proximity to market liquidity.
FIX Messaging Low Latency Testing
18 January 2012
5.3.2 Implementing Commercial FIX Engines
Implementing a FIX engine is a non-trivial exercise, which can be split into two parts.
1. Application Integration: The market data, algorithm and trade execution components of a
trading platform need to be linked to the FIX engine through linked libraries. This requires a level
of programming skill that depends on the complexity of the trading platform, level and quality of
the FIX engine documentation and the number of FIX engine touch points to the trading
application.
Even the simplified test rig had three implementation stages: 1) Planning the application
integration; 2) Executing and optimizing the applications and infrastructure for optimum
performance; and 3) Commissioning and deploying the infrastructure. These stages can be
accelerated while identifying and containing elements of risk by engaging a suitable specialist
systems integrator, such as GreySpark Partners.
2. Execution Venue Integration: Each execution venue will have its own rules and tests to allow
market participants to join the market. These tests typically require validation within test
environments with a prescribed test schedule. Passing the venue integration test requires planning
and logistical rigor.
5.3.3 Implementing Production-Ready Networks
Building the network infrastructure to production standard is a pre-requisite of the integration work. Four
areas of infrastructure design and operation need to be considered.
1. Assessment of and elimination of single points of failure: the test rig had one network link to a
single switch. Building redundant links between servers and having a redundant switch is
common practice, which is recommended when deploying this type of infrastructure. In network
terms this is “Multi-chassis, link aggregation groups” or MLAG. See figure 4 below.
2. Application failover: via the network enables the software components of a system to restart on a
standby server (for High Availability). Using clustering techniques reduces the amount of time
failover takes to complete and requires input to assess the relative cost of the outage period to
determine the complexity of the clustering solution.
3. Backup and Restore: every solution should have a tested backup and restore mechanism to protect
the business from system failure. Since some trading platforms tend to be stateless, the restore
mechanism will resemble the original commissioning steps (having recorded configuration details
and files). Designing a backup and restoration mechanism requires the same business input as
application failover, specifically guided by the cost of outage.
4. Operational Management: which encompasses all aspects of change and configuration
management, systems monitoring and maintenance of operational integrity.
FIX Messaging Low Latency Testing
19 January 2012
Figure 4: Reliable Network Connectivity
5.4 Exploiting the Results
The choice of application software for financial services is extensive. Making an appropriate selection is a
challenge for the business – be it buy-side, sell-side or execution venue. For those applications addressing
the trading functions there is a lack of transparency and consistency in measuring and assessing
performance of solutions from individual vendors, or solution sets of interoperating elements.
This consortium based testing program is an exercise in collaboration to achieve operating solutions with
FIX messaging as the initial focus. The composition of the group models the reality facing trading firms.
Production systems come from the assembly of many parts from many entities and with resources from
different sources. This exercise has provided a basis for consistent testing and comparison of how
technologies handle the (FIX) business workloads.
Its success in achieving a granular and detailed set of results comes in major part from the facilitation that
OnX Enterprise Solutions brings through its product distribution and architecture design capabilities –
and Intel‟s objective to support testing of solution scenarios on its Xeon processors. Across the team, each
party to the consortium has volunteered its core capability, whether it is product or service IP.
The combined resources effectively anticipate the exercise that trading firms would have to address in
selection, procurement and commissioning of systems. Remarkable levels of co-operation, and open
sharing have been displayed with visible useful results. This output can feed directly into the technology
selection processes of firms.
MLAG Peer Link
Private Cluster links
MLAG pair
MLAG pair
FIX Messaging Low Latency Testing
20 January 2012
6. Conclusion
The major result from the testing exercise was the collaboration between parties to create a robust and
representative testing environment, which was able to produce results simulating real-life conditions and
their effect on the key function of FIX message transmission.
The commercial FIX engines were between 4 and 16 times faster (depending on load) than the open
source QuickFIX equivalent engines, with an average latency test result of 11 microseconds, as opposed
to 180 microseconds. This was even more evident when the performance of the execution venue was
increased to reflect faster matching (sub 50 microseconds). Stress exerted on the FIX engine drew out
different performance characteristics.
Under different stress conditions, each engine exhibited different characteristics. The commercial
engines‟ performance was vastly superior to that of the open source models. The standard deviation from
the mean for a commercial engine was only 1 microsecond.
The open source software exhibited results which when translated into the real production world would
not be considered sufficiently robust to support automated trading strategies. The major factors affecting
open source variants are poor performance under high load, higher levels of network jitter and trade
execution outliers up to 300 microseconds.
Tuning the Network Interface Cards with kernel bypass technology improved the performance of both
commercial engines and demonstrated a 50% reduction in latency. This translated into a round-trip saving
in latency, which would have material impact on the trading strategy being executed. Engineering an
integrated trading platform was proven to deliver incremental benefits in reducing overall latency.
Both Java and C++ environments in open source and commercial form exhibited individual
characteristics across the various code streams in the applications. This indicates the on-going scope for
improvement in the software, which can lead to improvements in overall performance.
The test results demonstrated that trading strategies which rely on minimising response times should be
deployed on a high performance infrastructure. This is integral in obtaining enhanced levels of
performance and reliability. Each layer in the technology stack has a role to play with incremental
enhancements being possible when implementing options, such as kernel bypass.
FIX Messaging Low Latency Testing
21 January 2012
Appendices
1. Consortium
The consortium comprises a group of companies whose combined capability maps to the provision of
trading technology solutions. This is not a closed group – and fully is open to inputs from additional
parties on an on-going basis.
1a Technology Members
A number of technology and services providers have invested as charter members of the consortium.
However the initiative is open, and further participant members may be added in the future. Between
them, these members provide a complete infrastructure capability and created the reference architecture,
each drawing on specific expertise, while OnX provided the integration and build capability.
The charter group members with technology product directly involved building the rig and in the
performance benchmark testing comprise:
OnX Enterprise Solutions – As consortium lead, OnX selected vendors for the benchmark test stack,
built the test rig by integrating the product components and interpreted the results of the tests.
Arista Networks – Provided its 7124SX network switch to connect servers for the benchmark and its
LANZ (Latency Analyzer) capability for tuning.
Dell – The benchmark was run on two Dell PowerEdge R710 servers, one of which was equipped with an
Intel® Xeon® processor X5698 .
Intel – The benchmarks were conducted at Intel‟s fasterLAB in the UK. Intel® Xeon® processors X5677
and X5698 were installed in the Dell servers. Intel engineers screened hardware and software
performance for optimum utilisation of iA (Intel Architecture) features, including use of Intel Compiler .
Solarflare Communications – SFN5122F 10 gigabit Ethernet network adaptors were installed in each of
the Dell servers, offering kernel-bypass communications.
FIX Messaging Low Latency Testing
22 January 2012
Other consortium members, which can provide services for deployment in real life production scenarios –
be they Co-Lo, onsite or other – include:
Edge Technology Group – Provides integration and managed services, in particular for buy-
side participants.
Equinix – run financial services data centres around the globe supporting high-performance
trading across multiple-asset classes on a deep mix of trading venues. Trading participants are
connected inside the data centre using cross-connects to reduce network latency delay and enable
price discovery, order routing and execution at the highest possible performance levels.
GreySpark Partners – Provides „top down‟ trading strategy and technology consulting, and
integration services, with a focus on assessing requirements and designing „technology bundles‟
for high performance.
1b Application Providers Being Tested
Rapid Addition - FIX engine "Cheetah" - in Java - and Quick FIX Java harness.
B2BITS EPAM - FIX engine "FIX Antenna" 2.7 - in C++ and Quick FIX C++ harness.
QuickFIX – Open source FIX engine in C++ and Java.