introduction to kdb+ and kdb+ usage in deutsche...

32
Introduction to kdb+ and kdb+ usage in Deutsche Bank Andrey Babanin, TCAKDB team/Mercury team 12/02/2015

Upload: vuongtram

Post on 19-Mar-2018

291 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Introduction to kdb+ and kdb+ usage in Deutsche Bank

Andrey Babanin, TCAKDB team/Mercury team 12/02/2015

Page 2: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• During last 10 years data volumes from the vast majority of exchanges increased

by several degrees

Exchange data increase

Number of trade and quotes on NYSE, bln. daily

2004 2005 2006 2007 2008 2009 2010 2011 2012

2.0

1.5

1.0

0.5

Page 3: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

It became more obvious each year that traditional relational DBMS are not able

effectively cope with such huge amounts of data:

• Traditional RDBMS are not able to effectively receive and save tens and hundreds

thousand ticks per second

• RDBMS doesn’t contain special functionality to operate on time series

(data ordered by date and time)

• Also traditional languages are not convenient for handling huge arrays of data and

has no built-in support for such structures

Background

Page 4: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ is mainstream product of KX systems

• KDB+ provides the same techniques for data in memory and on disk

• KDB+ is accompanied with strong interpreted language Q

• During years of development and promotion KX got most of the top banks and

investment companies as their clients

KDB+ technology

www.kx.com

Page 5: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KDB+ feature set

Column-based In

Memory DB

Stream data processing Capability

Integrated development and querying

language

Compression rates up to

10X

No performance penalties on dynamically run queries

Optimized In memory and

Disk Processing

Interfacing Capabilities with Python,

R, Matlab and others

Built-in map-reduce for

many internal

functions

Data volumes of

over 2 billion records per

day

Page 6: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ is specifically developed for 64-bit architecture

• With built-in multi-threading

• Program code for KDB+ is quite compact and optimized

• KDB+ utilizes broader range of Intel processor instructions

• Tests shows that KDB+ performance many times higher in comparison with

other DBMS systems

Performance

Page 7: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ is non-relational column-oriented database

• KDB+ joining algorithms and data in memory and on disk

• Built-in database clusterization over several parameters

Data storage features

Same data handling methods for different storage models

Page 8: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• Built-in clusterization over dates, physical partitions and columns

• KDB+ optimizes complicated insert and update queries

• Data types describing temporals, from nanoseconds to year, are built-in first-

class data types

• KDB+ uses dynamic indexing for string data and also allows to use customized

indexing scenarios

Two string types are available in KDB+:

• fast strings, which are indexed automatically and aliased with integers;

• simple char list, which is slow.

Data handling

Page 9: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ designed for the chained architecture

• Unified file and network I/O

• KDB+ also supports HTTP and WebSockets

Data transmission

Page 10: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ supports C code natively

• There are extension libraries for C/C++, Java, .Net, R, Matlab, Perl, Python

• KDB+ has base functionality to work with Thomson Reuters and Bloomberg

• Has libraries to support protocols like Tibco, LBM, MAMA/MAMDA

• Also a lot of KDB+/Q extensions are collected on the code.kx.com

• Several Windows clients/IDEs are available to work with KDB+ databases

Program interfaces and extensions

Page 11: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ is based on lists, which are ordered collections allowing duplicates,

whereas SQL is based on sets, which are unordered collections of distinct

elements.

• KDB+ stores data as contiguous items in column lists, whereas an RDMS

stores data as fields within non-contiguous rows.

• KDB+ table operations are vector operations on columns, whereas SQL

operates on individual fields and rows.

Comparing KDB+ to an RDBMS

Page 12: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Major differences between an RDBMS and KDB+

Traditional RDBMS KDB+ Database

Table Creation Tables defined declaratively using DDL and created on disk.

Tables created functionally in the q language.

Data Persistence Tables and related metadata held in an opaque repository. Tables are stored by row.

Serialized q entities stored in the O/S file system. No separate table metadata. Tables are stored by column.

Data Access Access to stored information is via DDL for metadata and SQL for data. Must retrieve via a query into program.

Data directly accessible in q. Provides query and functional forms for table manipulation.

Memory Residency

Tables reside on disk; query result sets reside in program memory.

Tables live in memory but can be persisted to disk. Column subsets are page faulted into memory for mapped tables.

Data Format Based on sets, which are unordered collection of distinct items. Data is stored in fields within rows, which are not contiguous.

Based on lists, which are ordered collections allowing duplicates. Data is stored as contiguous items in column lists.

Data Modification Persisted table modifiable via SQL (INSERT, UPDATE, etc.)

Memory resident tables modifiable via q and Q-sql. Persisted table modifiable only with append (upsert).

Data Programming

SQL is declarative relational. Programs, called stored procedures, written in proprietary procedural language.

Programs written in integrated vector functional language q. Tables are first class entities in q.

Transactions Support for transactions via COMMIT and ROLLBACK.

No built-in transaction support.

Page 13: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• Designed basing on set theory

• Terse language with single symbol operators and functions

• Functional language with lambda calculus

• Contains own SQL implementation

• Supports namespaces

• Supports automatic data types

• Contains garbage collector

Q language

Page 14: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

More Q examples

Page 15: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Q features

• Q works with big data directly

• Lists, dictionaries and tables are base data types

• Temporal data types are built-in

• Tables has special attribute set

• Q is interpreted

• Q has good network connectivity

• Highly integrated with Unix

• Development with Q is much faster, comparing to other languages

Page 16: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Functional forms

The functional forms of select, update and delete can be used in any situation but are especially useful for programmatically generated queries, such as when column names are dynamically produced. The functional forms are,

?[t;c;b;a] / select

![t;c;b;a] / update and delete

where t is a table, a is a dictionary of aggregates, b is a dictionary of groupbys and c is a list of constraints.

The q interpreter parses the syntactic forms of select, exec, update and delete into their equivalent functional forms, so there is no performance difference.

Page 17: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Asof join The asof join is so-named because it is often used to join tables along time columns, but this is not a

restriction.

In general, the triadic function aj can be used to join two tables along common columns. Significantly, there is no requirement for any of the join columns to be keys. The syntax of asof join is,

aj [c1...cn;t1;t2]

where c1...cn is a symbol list of common column names for the join and t1 and t2 are the tables to be joined. The result is a table containing records from the left outer join of t1 and t2 along the specified columns.

For each record in t1, the result has one record containing all the items in t1. If there is no record in t2 whose values in the specified columns match those in the corresponding columns of t1, there are no further items in the result record.

If there are matching records in t2, the items of the last (in row order) matching record are appended to those of the t1 record in the result.

Page 18: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Q weaknesses

• Single-threaded for the most variety of tasks

• No built-in traditional user access control functionality

• No genuine debugger

• A lot of database management functions should be

implemented by yourself

• Same Q code is not often compatible with different Q versions

Page 19: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• KDB+ is base technology for the global market data capturing system – Mercury

• Global data storage system is used simultaneously for real-time calculations

as well as for historical analysis

• All logic is almost 100% on Q

Main KDB+ usage scenario in DB

User queries

Mercury asset stack layout

Page 20: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

• 4 global locations - LND, NY, HK, TK

• 30 bln. incoming messages per day

• 2,5 mln. active market subscriptions

• 2 petabytes of disk space

• 3000 active processes working 24х7

• More than 200 active servers

• More than 70 mln. user queries per day

• More than 1000 users across DB

(>100 down stream applications)

• System environment consists of 4 clusters –

PROD and active DR, UAT and DEV in all regions

• There also several satellite systems built over Mercury

Market data capturing in numbers

Page 21: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KTS - DB’s KDB+ Development Framework

As over the past 7 years, the bank has gained substantial knowledge of the KDB+ technology through various implementations. This experience is key for newer projects that will implement new solutions. For new KDB+ based applications teams would have had build the same foundational features, resulting in duplication and inconsistent adoption of the technology. KTS was architected to be generic enough to address most of the current use cases, extensible to address future needs.

KTS Highlights

• Pre-built / Pre-tested: Increase the reliability of the new application and reduce the programming and testing effort, and time to market.

• Application / Framework Independence: Isolated application code from core kdb+ code decouples releases

• Data Capture functionality out of the box

• Command and Control: Supervisory process for management, state monitoring and administration. Control remote processes from a single point

• Data flows Segregation: Data flows can be separated in stacks that loads its specific code, configurations and business logic. Capture data in one stack, generate derived data in another, increasing performance

• Code Inheritance: Core functionality is inherited by the KTS base functions. These can be extended by the client as needed. Code can be inherited at the application level, stack, region or cluster levels

• Access Control: An application has the ability to limit access to a process, stack, cluster and even query

• Load balancing functionality and process replication

• Slice and Dice Data: Real-time and historical data can be joined and accessed through Gateways (Both Async and Sync)

• Java API allows seamless publishing and subscription

Page 22: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KTS component overview

Utilities

High Data Volume Management

Load Balancing Data Clustering

Application Operation Management

Process Management Data Recovery & Replication

Developer Frameworks

Code Management

Code Loader Configuration

Manager Code inheritance

Event Processing

Event Engine Event Scheduler

Operational Libraries

ACL – Access Control

Logging

Data Access & Storage

Real-time Database

Historical Database

Multi Source Publication

/Subscription Mercury Plug-in

Page 23: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KaaMS (KDB as a managed service)

Mission The Bank-standard managed service for realizing KDB+ application solutions.

Objectives

• Reduce the cost of implementing KDB+ solutions

• Enforce KTS Solution governance and best practices to simplify application support and stability

• Reduce time to market for KDB+ solutions

KaaMS Highlights

• Guided Engagement: End to end expert guidance for adoption of KTS libraries

• HW Advisory: Facilitate hardware capacity planning expertise

• KTS Training: To assist application teams with KTS component and framework usage

• KTS Advisory: Provide expert level KTS consulting / optimization / best practice review

• Specialist KDB Support: Provide experienced KDB Support with a global support model

Initiation

•Problem Evaluation

•Solution Design Review

•Technology applicability

Hardware Advisory

•Capacity Planning Guidance & Recommendations

•RfS Request Guidance

Business Functionality

•Consulting Services

•Actual Development is done by application team

Support

•Flexible SLAs

•L1 Support Option

•L2 Support Option

•Geneos Monitoring

•KTS Library Support

Production On boarding

•Implementation best practices

•Recommendations

KaaMS Client On boarding Workflow

Page 24: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KDB+ in High Frequency Trading system

Page 25: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

TCA reporting

• TCA client reporting system is consuming and parsing Algo order flow for the real-time enrichment with pricing data provided by main KDB storage (sources are EBS, Reuters, internal DB rates)

• Actual transaction cost analysis work containing customized TWAP, VWAP, market impact and internal values benchmarking for Algo orders and fills, resulting in real-time graphic reporting for traders

• TCA reporting system covering global FX/Listed Derivatives transactions made through Algo platform

• Transaction cost analysis/Business intelligence client reporting provide the increase in transparency and client trust, along with better values for Algo trading platform

TCA project is a real-time client reporting engine to support FX/LD current instrument line (FX spots and crosses, index and commodity futures, future spreads, future baskets, options).

Page 26: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Real-time TCA work-flow

Page 27: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

TCA event processor

FX Algo updates LD Algo updates

Real-time order enriching state machine (stack of functions which implements required business logic on order level)

FX/LD order disk cache (persisting cache aimed to restore processing state after restart)

Updates for order entry

In-memory buffer (MIG updates grouped over parent order ID )

Order entries and fill counts

Thread 1 – collect new messages from TP (from HDB in case of restoration)

Thread 2 – match new orders and corresponding fills and form cache entries

Market data GWs

Market data requests/ results

Real-time fill processor (business logic for individual

executions)

Updates on fill count

Thread 3 – enrich next FX/LD order, publish when ready

Thread 4– enrich group of pending FX/LD fills, publish when ready

Send order message to publisher

Send fill messages to EMS publisher/bus

process 2 process 1 …

Page 28: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

HFT analytical platform

• Capture generated client pricing, client orders, executions, platform hedging activity, client pricing settings

• Generate enriched datasets that fall into one of the following categories: a. market impact b. trade valuation c. client reaction to pricing changes

• Provide tools for experimentation with parameters and functions

used to generate derived datasets

• Provide basic API to retrieve data as well as sophisticated wrapper functions for quantitative analysis

• Provide a visual toolkits to study the data described above

Page 29: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

KDB+ market maker engine

• Market maker application aligned to the European Government business

• KDB+ application is key market making application with backend and frontend sides where backend is placed on KX side and frontend on trade engine part.

• Analytical application - native application which is running on trader's desktop and operated via excel spreadsheet

Page 30: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Q debugger (by Andrey Kozyrev)

http://code.kx.com/wsvn/code/contrib/akozyrev/debug/

Page 31: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Global Technology Deutsche Bank

Qpad - KDB+/Q client and IDE (by Oleg Zakharov)

http://www.qinsightpad.com/

Page 32: Introduction to kdb+ and kdb+ usage in Deutsche Bankfiles.meetup.com/18240221/Babanin_FD_FEB_2015.pdf · Introduction to kdb+ and kdb+ usage in Deutsche Bank ... • There are extension

Thank you!

Andrey Babanin, TCAKDB team/Mercury team [email protected]