aquaq analytics kx event - data direct networks presentation

18
Getting the most out of multi-year and multi-source trading history Glenn Wright, EMEA Systems Architect DDN June 2014

Upload: aquaq-analytics

Post on 29-Nov-2014

652 views

Category:

Software


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: AquaQ Analytics Kx Event - Data Direct Networks Presentation

Getting the most out of multi-year and multi-source trading historyGlenn Wright, EMEA Systems Architect DDN

June 2014

Page 2: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Agenda

Uh? Who is DDN?

The Evolution of Data in Data Handling Market Systems

The Big Analytics Crunch

What’s hot, what not…. It’s Parallel Performance, stupid!

Page 3: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

DDN | The “Big” In Big Data

800%

Paypal accelerates stream processing and fraud analytics by 8x with DDN, saves $100Ms.

1TB/s

The world’s fastest file system, to power the US’s fastest supercomputer, is powered by DDN.

Tier 1Tier1 CDN accelerates the world’s video traffic using DDN technology to exceed customer SLAs.

3

Page 4: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

DDN | The Technology Behind The World’s Leading Data-Driven Organizations

HPC &Big Data Analysis

Cloud &Web Infrastructure

ProfessionalMedia

Security

Page 5: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Big Data & Cloud Infrastructure DDN’s Award-Winning Product Portfolio

Analytics Reference

Architectures

EXAScaler™

10Ks of Clients1TB/s+, HSM

Linux HPC ClientsNFS & CIFS [2014]

Petascale Lustre® Storag

e

Enterprise Scale-Out File

Storage

GRIDScaler™

~10K Clients1TB/s+, HSM

Linux/Windows HPC ClientsNFS & CIFS

SFA™12KX48GB/s, 1.7M IOPS1,680 Drives in 2 RacksOptional Embedded Computing

SFA770012.5GB/s, 450K IOPS60 Drives in 4U228 Drives in 12U

Storage Fusion Architecture™ Core Storage Platforms

SATA SSD

Flexible Drive Configuration

SAS

SFX™ Automated Flash Caching

WOS® 3.032 Trillion Unique Objects

Geo-Replicated Cloud Storage256 Million Objects/Second

Self-Healing CloudParallel Boolean Search

Cloud Foundation

Big Data PlatformManagement

DirectMon™

CloudTiering

Infinite Memory Engine™ [Tech Preview]Distributed File System Buffer Cache

WOS700060 Drives in 4U

Self-Contained ServersAdaptive Transparent Flash Cache SFX API Gives Users Control [pre-staging, alignment, by-pass]

Page 6: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110.0

20,000,000.0

40,000,000.0

60,000,000.0

80,000,000.0

100,000,000.0

120,000,000.0

140,000,000.0

160,000,000.0TOTALAmericasAsia - PacificEurope - Africa - Middle East

Evolution of Market Systems

SOURCE: World Federation of Exchanges 2011 Annual Report and Statistics

DASD

F DASD

Scale-out NAS

Parallel File System

Page 7: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

UNDERLYING ISSUE:Gaping Performance Bottlenecks

• Moore’s Law has out-stripped improvements to disk drive technology by two orders of magnitude during the last decade

• Analytics moved to HPC clusters

• Today’s servers are hopelessly unbalanced between the CPUs need for data and the HDDs ability to keep up

HDD vs. CPU Relative Performance Improvement

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

20,000 x

1gb16gb

Page 8: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Welcome to the Big Analytics Crunch

• 500TB to > 2PB of historical data for one TZ

• Distributed cache : online model reads data at 100s of GB/s IO (Tick DB application such as kdb+)

• 3D “cube” of in memory distributed data, online, realtime

• 100’s of services/servers working together in memory: low latency analytics w/ simplicity of persistent File system semantics

• Burst buffer low latency operation mainstream in FSI► Real time Back testing ► Real time intra-day risk positioning

Page 9: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Why DDN & Why Parallel ?

In Production

Many systems deployed W/W

@ Global Investment Banks and Hedge Funds

Performance and Consolidation

Back test in a few seconds is much closer to the trade event

Mix online history and real time trade analytics

Consolidate in-memory databases against on copy of data

At Scale Flash – is NOT scale @ capacity

Single namespace, history and real-time

Page 10: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Limitless Scale up and Scale out with kdb+…

Compute Fabric

KDB+ (1) KDB+ (2) KDB+ (3) KDB+ (16)

MDS Primary

MDS Replica

OSS1

MDT DDN SFA7700

DDN SFA7700

OSS2 OSS3 OSS4

Page 11: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

What we changed:

export SLAVECOUNT=160 # number of kdb+ client tasksExport CLIENTCOUNT=10 # number of processes per kdb server

Q script Query:

\l beforeeach.qR1S:rrdextras flip`k`v!(" S*";",")0:`:rrd.csv/ year-hibidoutp t:”YRHIBID";fn:{[f;s;d] flip`date`sym`a!flip raze(f each s)peach d};NRS:.tasks.rxsg[H;`$t;1;(fn[hb];apickAs[R1S;`Symbol];reverse ALLDATES2011)];\l aftereach.q

symbols: $glenn head rrd.csv1,Symbol,LKQQ1,Symbol,LHDE1,Symbol,LNJO1,Symbol,LLTR1,Symbol,LRFC1,Symbol,LQGA1,Symbol,LTNQ1,Symbol,LSAG1,Symbol,LQIA1,Symbol,LKSJ

… x850 symbols vs 84

Page 12: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

glenn$ more hostport.txt127.0.0.1:5000127.0.0.1:5001127.0.0.1:5002127.0.0.1:5003127.0.0.1:5004127.0.0.1:5005127.0.0.1:5006127.0.0.1:5007127.0.0.1:5008127.0.0.1:5009

What we changed (2):

# replace $QEXEC initdb.k -g 1 -p $((baseport+i)) </dev/null &>log$((baseport+i)).log&for i in `seq 20000 20009`do for j in `seq 0 15` do echo ssh server-$j "cd $HOME;QHOME=/home/glenn/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &" ssh gp-2-$j "cd $HOME;QHOME=/home/mpiuser/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &" while ! nc -z "gp-2-$j" $i; do sleep 0.1; done donedone# get ready ??echo `date -u` $SLAVECOUNT slave tasks started# then start the servers aimed at the slavesbaseport=5000for ((i=0; i<$CLIENTCOUNT; i++));do $QEXEC initdb.k -g 1 -s -$SLAVECOUNT -p $((baseport+i)) </dev/null &>log$((baseport+i)).log& while ! nc -z localhost $((baseport+i)); do sleep 0.1; doneDone

# check that everything can startup : $QEXEC startdb.q -s -$SLAVECOUNT -q

Page 13: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

What we changed (3):

Startdb.q ……/ check all servers are there/{hopen(x;500)}each("I"$getenv`BASEPORT)+til"I"$getenv`SLAVECOUNT;{hopen(x;2500)}each hsym`$read0`:slavehostport.txt;\l initdb.k{hopen(x;500)}each 5000+til"I"$getenv`CLIENTCOUNT;\\

Cat slavehostport.txt:192.168.3.51:20000192.168.3.51:20001192.168.3.51:20002192.168.3.51:20003192.168.3.51:20004192.168.3.51:20005192.168.3.51:20006192.168.3.51:20007192.168.3.51:20008192.168.3.51:20009192.168.3.52:20000192.168.3.52:20001192.168.3.52:20002…. 160 times

Page 14: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Slave (1) slave (2) Slave (3) Slave n

Lustre/DDN Service

/mnt/onefilesystem

Q clients:

Slave x10Slave x10

Slave x10 Slave x10

Up to 1TB/sec… “n” way server striping or by date/sym

Page 15: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Results of Scaling the service ….

Single Thread Lustre0

50

100

150

200

250

Latency reduction (number of seconds for query) *Lower is bet-

ter

The Parallel FS solution shows a near linear scalability model for one instance running over many nodes, as measured from kdb+. Latency is the time to wait from the kdb+ query of 245GB of data. To put this in context, these nodes were only equipped with 64GB of memory.

Page 16: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Some of the many Benefits of kdb+ on Parallel FS1. Significant decrease in operational latency per kdb+ query, especially when running queries that search

through significant amounts of historical market information. Achieved by balancing content around

multiple file system servers

2. Parallelization of kdb+ query “threads” in a single shared namespace, allowing a user to treat any data

workload independently from other data workloads. “query from hell” on production system is now OK?”

3. Simultaneous read/write operations on a single namespace for the entre database and for any

number of kdb+ clients, (e.g. end of day data consolidations into a hdb instance)

4. Sharing of data amongst different independent hdb/rdb instances. Many instances of kdb can view the

same data, meaning that strategies for data sharing and private data segments may be

consolidated onto the same space. Avoids the need for kdb+ admins to physically copy data around

the network or disks

5. Kdb+ context can be “striped” around all FS servers, or can be allocated in a round robin fashion against

each server. Striping allows the opportunity for some files to attain maximal I/O rates for a single kdb+

“object”.

Page 17: AquaQ Analytics Kx Event - Data Direct Networks Presentation

© 2013 DataDirect Networks, Inc.

ddn.com

Next Steps?

Page 18: AquaQ Analytics Kx Event - Data Direct Networks Presentation

Thanks!

[email protected]/Big Data

Glenn Wright, EMEA Systems Architect DDNJune 2014