arista @ hpc on wall street 2012
Post on 10-Jun-2015
1.613 Views
Preview:
TRANSCRIPT
HPC on Wall Street - 2012
20-25% CAGR in market volumes
Competitive advantage hinges on speed, transparency, and proximity to data sources. The application must be in the data path – seamlessly
Quest to balance risk/compliance with performance
HPC on Wall Street - 2012
10GbE Switches for the Virtualized Data Center, but a software company at the core
>1300 Customers>325 EmployeesProfitable, self-funded, pre-IPO network infrastructure providerOpen Linux-based OSFully automated testing, and SW development
HPC on Wall Street - 2012
• Couples ultra-low latency switch with next generation programmable FPGA and memory subsystem
• Customer programmable FPGA and Control Plane provides total control over the network, forwarding, inspection, redirection, etc.
• Targeted for early adopters of hardware accelerated applications such as risk analysis, data arbitrage, order routing
Arista Application Switch - 7124FX
4v 1
Exegy believes…
• Exegy believes in continually challenging the status quo of market data delivery systems and trading platforms.– First to market with hardware-accelerated market data appliances
based on FPGA technology.– Best of breed solutions for major use cases faced by low-latency,
high-capacity consumers of financial market data feeds.
• Exegy believes that delivery and consumption of quality market data should be as easy and painless as possible.– Fully managed and constantly monitored appliances to assure
optimal performance and the best customer experience.– A passion to help our customers succeed in the face of escalating
complexity and the increasing demands placed on them.
Converting C to multiple streaming hardware processes ain’t that hard.
Focus on reducing clock cycles
Verify as you go
Iterate, iterate, iterate (no “magic button”)
The tool flow is a bit awkward for first timers.
Visual Studio or equivalent
Impulse C co-development, analysis & compile
Altera Quartus II for place & route into FPGA
Things you can do to get up to speed quickly:
Work from known good sw modules
Get up-front training or factory engineering
Impulse C, Custom FPGA-Accelerated Solutions for the Arista 7124FX
Brian Durwood, Co-founder
Programming With Impulse CNot a new language
Based on standard ANSI C
C-language for FPGA programming For embedded and HPC applications Supports standard C development tools Supports multi-process partitioning
A software-to-hardware compiler Optimizes C code for parallelism Generates HDL, ready for FPGA synthesis Also generates hardware/software interfaces
Purpose Describe hardware accelerators using C Move compute-intensive functions to FPGAs
C languageapplications
HDLfiles
Generateacceleratorhardware
Generatehardwareinterfaces
Generatesoftwareinterfaces
C softwarelibraries
Arista’s on-board
FGPA
www.ImpulseAccelerated.com
Reference slides from hereafter
7www.ImpulseC.com
Custom FPGA-Accelerated Solutions for the Arista 7124FX
Brian Durwood, Co-founder
Converting C to Multiple Streaming Hardware Processes
FPGAs – Advantages Over Software
Massive parallelism At system level, loop level, instruction level
One FPGA can replace multiple CPUs For specific tasks/algorithms, using much lower power
No need for separate NIC card Enable in line processing at near line speed
Minimize OS interference in filtering Especially during high transaction load events Reduces jitter and other interference
Offloads standard CPUs with customized pre-processors e.g. select limited analysis of X message types that meet X criteria for X symbols
9www.ImpulseAccelerated.com Confidential
10
3 Popular FPGA Configurations
GeneratedHardwareModule
Hostprocessoror cluster
1
EmbeddedHardware
Accelerators
EmbeddedCPUCore
Create a hardware moduleAccelerate an embedded CPU
Accelerate anexternal/host CPUor computingcluster
Usage Option
2
Usage
3
Usage
Generatedhardwaremodule Generated
hardwareaccelerator
Generatedhardware
accelerator
Generatedhardware
accelerator
Generatedhardware
accelerator
Generatedhardware
accelerator
FPGAFPGA
FPGA coprocessor
FPGA
www.ImpulseAccelerated.com
Configurations Can Be Combined
FPGA
Combining streaming, embedded processor, and host processor
FPGA strategies can be coded using C for hardware and for embedded CPU, with shared RAM for hash table lookup or other local data
Matchingalgorithm
and strategy
Streamprocessing
andparsing
Hostmessage
generation
10G Ethernet EmbeddedCPUfor
configuration
Embedded and shared RAM
12www.ImpulseAccelerated.com
Impulse C Programming Model
Communicating C-Language Processes Supports dataflow and message-based communications Supports parallelism at the application level and at the level of
individual processes Allows simulation and
debugging of parallel software processes.
S/W process
H/W process
H/W process
H/W process
S/W process
13www.ImpulseAccelerated.com
Parallelism via Multiple Processes
Spatialparallelism
Temporalparallelism
(system-level pipelining)
C
14www.ImpulseAccelerated.com
C
C
C
C
An Impulse C Process
Processes are independently synchronized
Shared memoryblock reads/writes
Streaminputs
Streamoutputs
Registerinputs
Registeroutputs
App Monitoroutputs
Signalinputs
Signaloutputs
Multiple methods ofprocess-to-processcommunicationsare supported
Cprocess
Compile and Optimize
15www.ImpulseAccelerated.com
Optimize the results using interactive tools Pipeline analysis Loop unrolling Instruction scheduling
Generate FPGA hardware VHDL or Verilog Low level interfaces to
memory, I/O and busses. ModelSim Test bench
Debug and Verify
16www.ImpulseAccelerated.com
Use C tools for application debugging Source-level debuggers C-language testing
Test and analyze paralleldataflow with the Impulse Application Monitor
Automatically generate VHDL or Verilog Test-benches
17www.ImpulseC.com
co_stream_create
co_stream_open co_stream_close co_stream_eos
co_stream_read co_stream_write
co_stream_read_nb co_stream_write_nb
Constructs Familiar to C Programmers
Used in configuration
Open the stream (clear eos) Close the stream (set eos) Check end of stream (eos)
Read from stream (with rdy, en) Write to stream (with rdy, en)
Non-blocking read (no rdy)No-blocking write (no rdy)
Concept is similar to getc(), putc() in C for I/O
18
Credible Solution in use by:
www.ImpulseAccelerated.com Confidential
Multiple ConfidentialFinancial
NDA CoveredFinancial Teams
PSP generates HW/SWwrappers between FPGAcore & system elements
Produces
19
Impulse Platform Support Package
FPGAFabric Processing
CoreImpulse
CoDeveloper™
Other I/O
Extensions (scripts and wrapper generators) Platform-specific library functions Documentation and tutorials Current ready to run examples for platform
www.ImpulseAccelerated.com Confidential
Ethernet
Host Interfaces
MemoryResources
FPGAEmbeddedProcessor
Examples of FPGA processing:
Financial feed kernel bypass or Full Hardware based trading Direct handling of financial feeds
Parsing incoming feeds and triggering outbound orders – your strategy in hardware
Normalization or Protocol Conversion Gateway sending a sub-feed of data
Pre-Trade Risk Checking Low Latency Broker Dealer Compliance
Financial valuations Co-processor off-loading for Monte Carlo
and other algorithms
20www.ImpulseAccelerated.com Confidential
1G or 10G Ethernet
MAC
RX Adapter
(Verilog)
TX Adapter
(Verilog)
Feed Handlerand
Outbound UDP(Impulse C)
Stand-Alone Feed Handling Solution
3
Usage
www.ImpulseAccelerated.com Confidential 21
HostI/O Interface
MAC1/10GigEUDP Parser
and/or TCP/IP Stack
CustomFiltering
Application
FPGA
EmbeddedCPU
Host System
Driver
User Applica-
tion
Host Memory
EnetFilter
Network Processing Pipeline UDP and TCP/IPimplementeddirectly in FPGAhardware for lowlatency
www.ImpulseAccelerated.com Confidential 22
www.ImpulseAccelerated.com Confidential
Complex Order Support
Incoming Outgoing
Replace NIC Apply Trade Logic
Processing without OS
Revert feed to exchange formats
Hardwire potential X required responses
Normalizing Across Feeds
Ultra-fast pattern matching
Message Management With Exchanges
Decompression
Pull and Present Opportunities
Decryption
Produce Sub-Feed
Replace NIC
Manage Risk
Insert risk limitations awaiting confirm
23
Standard and
Custom Feed
Handler Formats
e.g.: ITCH, OUCH, OPRA,
BATS, & Generic
UDP.
Standard and
Custom Feed
Handler Formats
e.g.: ITCH, OUCH, OPRA,
BATS, & Generic
UDP.
Exch
ange
s, fe
ed h
andl
ers,
ord
er d
ata
sour
ces
Exch
ange
s, fe
ed h
andl
ers,
ord
er d
ata
sour
ces
Adapters RMDS,
Bloomberg and
Custom.
Adapters RMDS,
Bloomberg and
Custom.
10 Gb/S Ethernet
FPGA or FPGA-Based Board
Dire
ct c
onne
ction
Impu
lse
UD
P/TC
PD
irect
con
necti
on Im
puls
e U
DP/
TCP
24www.ImpulseAccelerated.com Confidential
Three Ways To Get Started
Learn the tools Acquire an Impulse CoDeveloper license. Work from the included reference designs. Experiment with ways to optimize your algorithms to run efficiently as
multiple streaming processes in FPGA.
Turn Key System (“Bump in the Wire”) License above + UDP or other network attached FPGA-enabled reference design. FPGA-based accelerator platform. Impulse factory engineers to help get your system on line.
Turn Key System Running A Target Algorithm License above + Turn Key System above + Impulse Engineers, under NDA, refactor your target algorithm(s) for
efficient compilation to FPGA. Impulse Engineers train your team on how the refactoring works.
About Impulse
Most widely used C to FGPA tool
Pure ANSI C No PAR or HW statements inserted
Founded in 2002By part of the original ABEL team
25www.ImpulseAccelerated.com Confidential
26www.ImpulseAccelerated.com
Additional Resources
Engineering consultationinfo@ImpulseAccelerated.com
Tutorials:www.ImpulseAccelerated.com/Tutorials
Book:Practical FPGAProgramming in C
HPC on Wall Street - 2012
Compute, Storage, Memory, I/O, Application Acceleration – Together
Arista Application Switch – Systems Design
HPC on Wall Street - 2012Application Switching for Cloud
Networks
High Availability:Dual Hot-swappable Power SuppliesMultiple Hot-swappable Fan Units
Designed for Data Center + Colocation:Flexible Front-to-Rear or Rear-to-Front AirflowChoice of AC or DC Power Supplies
Platform Details
24 Wirespeed 1G/ 10G SFP/ SFP + Ports
Air VentsConsole Port
Management PortUSB Port16 Base SFP/SFP+ Ports 8 FX SFP/SFP+ Ports
Clock Input
HPC on Wall Street - 2012
Ultra Low Latency 24 port 10GbE Switch•16 10GbE ports connected to LLE ASIC•8 10GbE ports connected through Stratix V FPGA•Built in 50GB SSD•Optional Chip-Scale Atomic Clock and External Clock Source
Arista Application Switch - 7124FX
HPC on Wall Street - 2012
Application Switch Markets
HPC on Wall Street - 2012
Instrument transaction performance at high resolution
Offload line arbitration to dramatically improve application performance
Reducing system latency increases performance of trading strategies
Financial Services Applications
Algorithmic trading
Feed Handling and A/B Arbitration
Real-time Data analysis
Convert or normalize multiple order entry formats to a common formatOrder Protocol Conversion
Set order policies for best executionOrder Execution Routing
Low Latency Broker Dealer Compliance Inline Risk Analysis
Application Switching for Cloud Networks March 19, 2011
HPC on Wall Street - 2012
Developing on the Application Switch
HPC on Wall Street - 2012
Application Switch Development Partners
Complete integrated appliance model• Novasparks 100% Hardware market data solution
• Exegy Appliance based robust ticker plant
System integrators and development support
• Impulse C C to RTL tools
• Enyx Customer trading solutions and IP blocks
HPC on Wall Street - 2012
A new category of product that provides a network accelerated platform for high performance app vendors to develop on
Combines a true network switch with full routing and switching protocols, with fully-programmable hardware creates a new market for the most demanding applications
Application logic inserted into real-time environments with complete transparency
Arista Application Switch 7124FX
top related