accelerating real time big data applications...big data analytics off-line analysis of huge data...

16
PRESENTATION TITLE GOES HERE Accelerating Real Time Big Data Applications Bob Hansen

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

PRESENTATION TITLE GOES HERE

Accelerating

Real Time Big Data Applications

Bob Hansen

Page 2: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

2 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Apeiron Data Systems

Apeiron is developing a VERY high performance Flash storage system

that alters the economics of Big Data and HPC applications. Our

storage systems dramatically improve application performance while

simultaneously reducing the size of the cluster.

Page 3: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

3 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Agenda

“Real Time, Big Data” What is that?

Enhanced user experience drives high IOP performance

Storage perf = $$

Scale out, in-memory compute/storage architecture evolution

In-memory => in-box flash => external flash

The Ideal, Very High Performance scale out system

VHP System Vision

What is possible?

Page 4: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

4 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Big Data

Big Data –

Structured or unstructured

Too big or too fast for traditional data base and SW techniques

Drove the development of scale-out applications using DAS

Big Data Analytics

Off-line analysis of huge data sets

Storage requirements = high capacity and BW

Real Time Big Data (web personalization)

Very fast lookups, small records, large data sets

Storage requirements

Large block writes, high IOPs,

very low, predictable, consistent latency

with linear performance scaling

Page 5: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

5 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

> Customer personalization and

simplified data management

> Fortune 500 companies mid-layer

meta cache rapidly growing

> Kayak

Caching aged airline quotes to

speed service

> Netflix

Personalization for >50M

customers

> Amadeus

3.7 Million Bookings per Day

High IOP Application

Enhanced User Experience

Customer

personalization

NoSQL

Cluster

Web / App Layer

Structured data store

Web Personalization

Page 6: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

6 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Ad Tech Example

Deliver Ad

within 200ms Web page request Match

request

to user

Fast Lookup (<1ms)

Read consumer profile

Update profile

Lookup Available Ads

Rank adds for best match

Match

user to

ad

Determine best ROI

Bid for

ad

Storage IOPs / latency = $$ >1 billion consumers

>3 billion devices

Log

results

Lose

Log

transaction

Deliver

ad

Win

Page 7: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

7 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

High Performance (IOPs)

Compute / Storage

Architecture Evolution

NoSQL and other “in-memory” data applications

Page 8: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

8 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

NoSQL Solution – Scale Out nodes

with the data set In-memory

Scale-out in-memory goodness

Shared nothing compute nodes

scale well

Data base is “sharded” evenly

across all nodes

Data set in-memory is VERY FAST

To scale – just add another node,

shard the DB again and go

Application Servers

Central

Processing Unit

(CPU)

Compute Node

DRAM NIC

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

DRAM

. . . N

IC

NIC

NIC

switch

Issues

DRAM can be VERY expensive

Node failure = very long recovery time

• Data at risk during recovery

As data set grows more servers must be

added

• = higher cost and foot print

CPU to mem ratio can not be optimized

This all breaks before you approach 100TB

Page 9: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

9 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Expensive DRAM?

Add Internal Flash

Scale-out in-memory goodness

Shared nothing compute nodes

scale well

Data base is “sharded” evenly

across all nodes

Data set in-memory is VERY FAST

Data in flash is FAST

To scale – just add another node,

shard the DB again and go

Application Servers

Issues

Flash size must be equal on all nodes

• Adding storage = downtime

Node failure = very long recovery time

• Data at risk during recovery

As data set grows more nodes must be

added

• = higher cost and foot print

CPU to mem ratio can not be optimized

Central

Processing Unit

(CPU)

Compute Node

DRAM NIC

Central

Processing Unit

(CPU)

Compute Node

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

DRAM

. . .

NIC

NIC

NIC

Flash Flash Flash

switch

DRAM

Flash

Storage Management is a Pain!

Page 10: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

10 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

VERY High Performance (VHP)

External Storage is the Answer

Shared DAS Goodness

CPU and Storage scale

independently

Minimize cost / rack space

Improved CPU utilization

Fine Grain, On-line provisioning

Server failures don’t take out data

Minimize failure recovery time

Application Servers

Central

Processing Unit

(CPU)

Compute Node

DRAM NIC

Central

Processing Unit

(CPU)

Compute Node

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

DRAM

. . . N

IC

NIC

NIC

switch

DRAM

HBA HBA HBA HBA

Storage Fabric

Issues

Performance

• IOPs and Predictable Latency

Availability

• HA design and Replicas

Scale –

• PBs and 100s of nodes

Page 11: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

11 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Storage technology choices

IOPs / latency performance

15K HDD – 210 IOPs

6Gb SATA SSD – 90K IOPs*

12Gb SAS SSD – 155K IOPs*

NVMe SSD >> 600K IOPs*

* Typical 4K Random Reads

SATA and SAS

can’t cut it!

SSD Performance

Objectives

Performance

Availability

Scale

CPU is the

Bottleneck

Driver

HBA

Fabric

CPU

NVMe

SSD

Networked NVMe

Driver

HBA

Point to

Point

SSD

Get Out of the Box!

JBOD

Fast but

Doesn’t

Scale

Kills Performance or

Adds Cost$$

Objectives

Performance

Availability

Scale

Objectives

Performance

Availability

Scale

Page 12: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

12 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

The ideal, VHP

persistent storage solution

Shared Direct Attached Storage

Best performing persistent storage media

Standard NVMe SSDs – also best cost

Bare metal, commodity storage network HW

Low cost, industry standard networking (Enet) – also best cost

Add value where you get best ROI

Data path optimization

SSD Virtualization

High availability

Best in class management

On-line provisioning and failure recovery

Storage performance statistics / predictive modeling

Deliver raw NVMe performance to the application

Page 13: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

13 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

Why not PCIe on a rope?

PCIe is not a network

PCIe is an evolution and extension to a parallel system bus

Initially scoped to support a handful of devices

PCIe was not designed to be resilient

Bus errors = panic

Failure isolation is a work in progress

There are currently no PCIe networking standards or components

Yes, there are switches designed for in-box PCI extensions

A native PCIe storage network is possible

but faces several challenges -

Why re-invent PCIe as a high cost, very complex,

non-standard fabric?

Page 14: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

14 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

VHP System Architecture Vision

. . .

. . .

Switched Storage Fabric

. . .

Switched Storage Fabric

40Gb Bare Metal Ethernet Storage Network

Very High Performance Network to SSD interface

Simple, scalable architecture with better than in-box flash performance

Highly available, shared storage using standard SSDs and networking

components

Virtualized storage, on-line provisioning, failure isolation

Application Servers

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

Central

Processing Unit

(CPU)

Compute Node

DRAM

Central

Processing Unit

(CPU)

Compute Node

DRAM

. . .

switch

DRAM

HBA HBA HBA HBA Storage Processing Offload On Application Servers

Deliver raw NVMe performance to the application

Page 15: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

15 2015 Data Storage Innovation Conference. © Apeiron Data Systems All Rights Reserved.

What is possible?

Industry leading IOPs and BW performance

Industry leading performance density

Linear scaling to 100s of servers and Petabytes of

storage

Dramatically lower solution cost

CapEx

OpEx

An easy to manage, extensible solution – ready for the

next gen flash, networking and PCIe technology

“Watch this Space!”

Page 16: Accelerating Real Time Big Data Applications...Big Data Analytics Off-line analysis of huge data sets Storage requirements = high capacity and BW Real Time Big Data (web personalization)

PRESENTATION TITLE GOES HERE

www.apeirondata.com

[email protected]

Thank You!