red hat storage: emerging use cases

19
© RED HAT, INC. 1 Red Hat Storage - Emerging Use Cases Narendra N. Narang Sr. Cloud Storage Solutions Architect [email protected] January 2016

Upload: redhatstorage

Post on 16-Feb-2017

402 views

Category:

Technology


0 download

TRANSCRIPT

© RED HAT, INC.1

Red Hat Storage - Emerging Use Cases

Narendra N. Narang

Sr. Cloud Storage Solutions Architect

[email protected]

January 2016

© RED HAT, INC.2

Agenda

Based on discussions, customer presentations and information, we now highlight some emerging use cases for our software-defined-storage products:

● Use Case 1.: Historical Tick Data

● Use Case 2.: Analytics

● Use Case 3.: Storage for Network Function Virtualization (NFV)

● Use Case 4.: Storage for IoT, Edge Computing

● Future Use Cases

Use Case 1: Historical Tick Data

What is a Tick?

A “tick” is the minimum upward or downward movement (any change) in the price of a security as measured over a period of time.

An "uptick" refers to a trade where the current transaction occurred at a price higher than the Previous transaction and a "downtick" refers to a transaction that has occurred at a lower price than the previous transaction. Consequently, a “zerotick” refers to a trade where the current transaction occurred at a price higher than the previous transaction.

What is Tick Data?

Tick data is time series data containing price, volume and many other dimensions (bid/ask prices, bid/ask sizes, quote time, trade time, exchange information) for each point of granularity.

Tick Data and Storage

The higher the resolution of tick data collected, the larger will be the dataset size and hence, the amount of storage capacity required.

High-level Tick Data Workflow

Data Feed 1 Data Feed 2 Data Feed 3

Market Data Servers(Aggregation of Feed Handlers)

Data Feed N

KDB+In-memory

TickDB(Real-time)

TickLogFile

Historical Tick Database

End o

f Day (E

OD

)

Intraday

EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file for that day. This file is typically written as a large sequential stream of blocks.

NewsSocialMedia

Historical Tick Data onRed Hat Gluster Storage

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

GLUSTERFSRHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

Data Feed 1 Data Feed 2 Data Feed 3

Market Data Servers(Aggregation of Feed Handlers)

Data Feed N

TickLogFile

EO

D

Intraday

EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file for that day. This file is typically written as a large sequential stream of blocks.

NewsSocialMedia

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

GLUSTERFSRHGSNODE

RHGSNODE

RHGSNODE

RHGSNODE

SITE A SITE BAsyncGeo-rep

Mathematical, Algorithmic

In-memoryKDB+

(Real-time)

Use Case 2.: Analytics

● Splunk on Red Hat Gluster Storage

● Hadoop typically employed to run batch analytics against data residing in HDFS. Incidentally, Red Hat Gluster Storage functions as an HCFS

● MR framework, clusters typically high throughput, many disks, colocated data and compute

A better way...

● Employ the Spark core analytical processing engine

● Directly access data stored in Red Hat Gluster Storage.

Splunk on Red Hat Gluster StorageScale-out operational analytics built on affordable, industry-standard infrastructure

Splunk Storage Approaches

© RED HAT, INC.9

Analytics on Red Hat Gluster Storage

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Spark

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Red Hat Gluster Storage Cloud

Kubernetes Orchestration for Docker Containers

Message Bus for Microservices

Figure1. Shared Storage

Spark

Spark

Spark

Spark

Spark

Spark

Spark

Spark

Spark

Spark

Spark

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Spark

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

SSD, HDD 15K, HDD, 7.2K (Self-healing,Tiering, Replicas, Geo-replication, Erasure Coding)

Kubernetes Orchestration for Docker Containers

Message Bus for Microservices

Figure2. Containerized Storage

Spark

Spark

Sto

rag

e

Spark

Spark

Spark

Spark

Spark

Sto

rage

Sto

rage

Sto

rage

© RED HAT, INC.10

Key Benefits

● Containerize and orchestrate Spark computation instances

● Scale computation instances elastically and independently

● Affine key storage resources and elastically scale computation microservices

● Run the same batch analytics with less MR shuffles

● Flexibility to run both batch and streaming analytics within the same framework

● Ability to spill data over to disk or to an in-memory filesystem e.g. Tachyon.

© RED HAT, INC.11

Use Case 3.: Storage for Network Function Virtualization (NFV)

Moving beyond the business of Network Function Virtualization (NFV) by implementing containers.

At a high-level NFV extends virtualization technology to network functions that may subsequently be connected and organized via the concept of “service chaining” to create an end-to-end network service.

Example: Instead of deploying physical load balancers or firewalls, employ the use of virtual network functions, within containers, that may be orchestrated to create a “chain of service” to deliver an end-to-end network service.

Leverage the ability to deploy network functions as microservices, that may be orchestrated to scale elastically and on demand.

The infrastructure on which NFV functions and operates is the NFVi.

© RED HAT, INC.12

Red Hat Storage for Containerized VNFs

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

App1

Sto

rage

VN

F1

Sto

rage

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

Physical, Virtual, Cloud

Linux Kernel

RHEL Atomic Container Engine

SSD, HDD 15K, HDD, 7.2K (Self-healing,Tiering, Replicas, Geo-replication, Erasure Coding)

App1

App1

App2

App2

App2

Sto

rage

VN

F1

VN

F2

App2

App3

App3

Sto

rage

App3

VN

F3

VN

F2

VN

F3

App4

App4

Sto

rage

Sto

rage

App5

VN

F4

VN

F4

Sto

rage

App4

App5

App4

Sto

rage

Kubernetes Orchestration for Docker Containers

Message Bus for Microservices

© RED HAT, INC.13

Key Benefits

● Orchestrate a true microservices architecture, where application, storage and network services are delivered elastically and on-demand

● Architect containerized platform within the realm of OpenStack infrastructure orindependently

● Deliver end-to-end network functions to a multinenant environment at hyperscale

● Segregate network function from session “stateful” information. Store “state”information on a scalable, performant distributed storage platform

● Virtual network functions are now rendered stateless and may be scaled independently

● Distributed storage delivers the performance, resiliency and enterprise features liketiering, replicas, snapshots, quotas, geo-replication and self-healing

● Choice of block or file implementations based on Red Hat Ceph Storage orRed Hat Gluster Storage

● Operate within more determenistic parameters and with increased cost and performanceefficiencies.

© RED HAT, INC.14

Use Case 4.: Storage for IoT, Edge Computing

The Internet of Things (IoT) - is a network of physical objects embedded with electronics, software, sensors, and network connectivity, which enables these objects to collect and exchange data.

IDC Predictions:

● By 2018, 20% of all IoT intelligent gateways will have “container technology” for packaging IoT application code in a thin containerized environment thus accelerating IoT microservices

● By 2019, 45% of IoT-created data will be stored, processed, analyzed and acted upon close to, or at the edge, of the network

Sources: http://event.lvl3.on24.com/event/10/38/69/4/rt/1/documents/resourceList1444759978563/idc_iot_futurescape_wc.pdfhttp://www.idc.com/getdoc.jsp?containerId=prUS25291514

Edge Computing – Computation and/or analysis of data is performed at the edge devices of a network rather than at a centralized location.

© RED HAT, INC.15

The Broader Implications● Within three years, 50% of

IT networks will transition from having excess capacity to handle the additional IoT devices to being network constrained with nearly 10% of sites being overwhelmed

● By 2017, 90% of datacenter and enterprise systems management will rapidly adopt new business models to manage non-traditional infrastructure and BYOD device categories

● Within five years all industries will have rolled out IoT initiatives

Source: http://www.idc.com/Predictions2015.

© RED HAT, INC.16

Newer Architectures, Hardware & Software Implementations, Containerization

NEW INSIGHTS

NEW VALUE (IoT)

DATA COLLECTION & STORAGE

Not just a capacity issue any more!

It’s about enabling the edge nodes to filter, index and compute the data stream. Equipping the edge with sufficient computation and the appropriate storage combination for both latency sensitive and cold/archival storage.

Questions that really need answering:* How much data actually needs to be stored?* How much of the data is transient?* How much data needs to be stored, but can’t be stored owing to cost/capacity constraints?* Within cold storage what are access patterns and appropriate mediums?

NEWREVENUE

Val

ue

© RED HAT, INC.17

Key Benefits

● Improved QoS and reduced latency of data analytics

● Less data traversing networks

● Higher resiliency since it’s pushed to redundant components at edge of network

● Employ better “swarm intelligence” for diffusion of data across networks

● Optimized data placement for higher energy efficiency

● Agility in infrastructure with containerization

● Higher cost efficiency based on commoditized and open source implementations.

© RED HAT, INC.18

Stay tuned...

● Database Workloads on Ceph

● Latency sensitive, high IOPS workloads on Ceph RBD

● CephFS workloads in production

● Ceph iSCSI target implementation with HA gateways

● Hyperconvergent architectures

● Containerization of storage services

● Support for ARM processors in newer architectures

© RED HAT, INC.19

Driving Value in a Hyperscale Model

IMHO, some predictions for incremental gains at hyperscale:

* Increased prevalence and dominance of open-source software-defined-storage technology

* Proliferation of containerization for heavy densities, availability and elasticity of microservices delivery in the cloud

* Implementation of all-flash technologies and tiering for iops intensive and mixed workloads

* Increased prominence of ARM processors in scale-out architectures

* Use of key-value drives.

It will become imperative to leverage a combination of these technologies to remain competitive and to maintain cost and operational efficiency.