red hat storage: emerging use cases
TRANSCRIPT
© RED HAT, INC.1
Red Hat Storage - Emerging Use Cases
Narendra N. Narang
Sr. Cloud Storage Solutions Architect
January 2016
© RED HAT, INC.2
Agenda
Based on discussions, customer presentations and information, we now highlight some emerging use cases for our software-defined-storage products:
● Use Case 1.: Historical Tick Data
● Use Case 2.: Analytics
● Use Case 3.: Storage for Network Function Virtualization (NFV)
● Use Case 4.: Storage for IoT, Edge Computing
● Future Use Cases
Use Case 1: Historical Tick Data
What is a Tick?
A “tick” is the minimum upward or downward movement (any change) in the price of a security as measured over a period of time.
An "uptick" refers to a trade where the current transaction occurred at a price higher than the Previous transaction and a "downtick" refers to a transaction that has occurred at a lower price than the previous transaction. Consequently, a “zerotick” refers to a trade where the current transaction occurred at a price higher than the previous transaction.
What is Tick Data?
Tick data is time series data containing price, volume and many other dimensions (bid/ask prices, bid/ask sizes, quote time, trade time, exchange information) for each point of granularity.
Tick Data and Storage
The higher the resolution of tick data collected, the larger will be the dataset size and hence, the amount of storage capacity required.
High-level Tick Data Workflow
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers(Aggregation of Feed Handlers)
Data Feed N
KDB+In-memory
TickDB(Real-time)
TickLogFile
Historical Tick Database
End o
f Day (E
OD
)
Intraday
EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file for that day. This file is typically written as a large sequential stream of blocks.
NewsSocialMedia
Historical Tick Data onRed Hat Gluster Storage
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
GLUSTERFSRHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
Data Feed 1 Data Feed 2 Data Feed 3
Market Data Servers(Aggregation of Feed Handlers)
Data Feed N
TickLogFile
EO
D
Intraday
EOD data stored as a distinct Historical Database Partitioned Format “hdpf” file for that day. This file is typically written as a large sequential stream of blocks.
NewsSocialMedia
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
GLUSTERFSRHGSNODE
RHGSNODE
RHGSNODE
RHGSNODE
SITE A SITE BAsyncGeo-rep
Mathematical, Algorithmic
In-memoryKDB+
(Real-time)
Use Case 2.: Analytics
● Splunk on Red Hat Gluster Storage
● Hadoop typically employed to run batch analytics against data residing in HDFS. Incidentally, Red Hat Gluster Storage functions as an HCFS
● MR framework, clusters typically high throughput, many disks, colocated data and compute
A better way...
● Employ the Spark core analytical processing engine
● Directly access data stored in Red Hat Gluster Storage.
Splunk on Red Hat Gluster StorageScale-out operational analytics built on affordable, industry-standard infrastructure
© RED HAT, INC.9
Analytics on Red Hat Gluster Storage
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Spark
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Red Hat Gluster Storage Cloud
Kubernetes Orchestration for Docker Containers
Message Bus for Microservices
Figure1. Shared Storage
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Spark
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Spark
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
SSD, HDD 15K, HDD, 7.2K (Self-healing,Tiering, Replicas, Geo-replication, Erasure Coding)
Kubernetes Orchestration for Docker Containers
Message Bus for Microservices
Figure2. Containerized Storage
Spark
Spark
Sto
rag
e
Spark
Spark
Spark
Spark
Spark
Sto
rage
Sto
rage
Sto
rage
© RED HAT, INC.10
Key Benefits
● Containerize and orchestrate Spark computation instances
● Scale computation instances elastically and independently
● Affine key storage resources and elastically scale computation microservices
● Run the same batch analytics with less MR shuffles
● Flexibility to run both batch and streaming analytics within the same framework
● Ability to spill data over to disk or to an in-memory filesystem e.g. Tachyon.
© RED HAT, INC.11
Use Case 3.: Storage for Network Function Virtualization (NFV)
Moving beyond the business of Network Function Virtualization (NFV) by implementing containers.
At a high-level NFV extends virtualization technology to network functions that may subsequently be connected and organized via the concept of “service chaining” to create an end-to-end network service.
Example: Instead of deploying physical load balancers or firewalls, employ the use of virtual network functions, within containers, that may be orchestrated to create a “chain of service” to deliver an end-to-end network service.
Leverage the ability to deploy network functions as microservices, that may be orchestrated to scale elastically and on demand.
The infrastructure on which NFV functions and operates is the NFVi.
© RED HAT, INC.12
Red Hat Storage for Containerized VNFs
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
App1
Sto
rage
VN
F1
Sto
rage
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
Physical, Virtual, Cloud
Linux Kernel
RHEL Atomic Container Engine
SSD, HDD 15K, HDD, 7.2K (Self-healing,Tiering, Replicas, Geo-replication, Erasure Coding)
App1
App1
App2
App2
App2
Sto
rage
VN
F1
VN
F2
App2
App3
App3
Sto
rage
App3
VN
F3
VN
F2
VN
F3
App4
App4
Sto
rage
Sto
rage
App5
VN
F4
VN
F4
Sto
rage
App4
App5
App4
Sto
rage
Kubernetes Orchestration for Docker Containers
Message Bus for Microservices
© RED HAT, INC.13
Key Benefits
● Orchestrate a true microservices architecture, where application, storage and network services are delivered elastically and on-demand
● Architect containerized platform within the realm of OpenStack infrastructure orindependently
● Deliver end-to-end network functions to a multinenant environment at hyperscale
● Segregate network function from session “stateful” information. Store “state”information on a scalable, performant distributed storage platform
● Virtual network functions are now rendered stateless and may be scaled independently
● Distributed storage delivers the performance, resiliency and enterprise features liketiering, replicas, snapshots, quotas, geo-replication and self-healing
● Choice of block or file implementations based on Red Hat Ceph Storage orRed Hat Gluster Storage
● Operate within more determenistic parameters and with increased cost and performanceefficiencies.
© RED HAT, INC.14
Use Case 4.: Storage for IoT, Edge Computing
The Internet of Things (IoT) - is a network of physical objects embedded with electronics, software, sensors, and network connectivity, which enables these objects to collect and exchange data.
IDC Predictions:
● By 2018, 20% of all IoT intelligent gateways will have “container technology” for packaging IoT application code in a thin containerized environment thus accelerating IoT microservices
● By 2019, 45% of IoT-created data will be stored, processed, analyzed and acted upon close to, or at the edge, of the network
Sources: http://event.lvl3.on24.com/event/10/38/69/4/rt/1/documents/resourceList1444759978563/idc_iot_futurescape_wc.pdfhttp://www.idc.com/getdoc.jsp?containerId=prUS25291514
Edge Computing – Computation and/or analysis of data is performed at the edge devices of a network rather than at a centralized location.
© RED HAT, INC.15
The Broader Implications● Within three years, 50% of
IT networks will transition from having excess capacity to handle the additional IoT devices to being network constrained with nearly 10% of sites being overwhelmed
● By 2017, 90% of datacenter and enterprise systems management will rapidly adopt new business models to manage non-traditional infrastructure and BYOD device categories
● Within five years all industries will have rolled out IoT initiatives
Source: http://www.idc.com/Predictions2015.
© RED HAT, INC.16
Newer Architectures, Hardware & Software Implementations, Containerization
NEW INSIGHTS
NEW VALUE (IoT)
DATA COLLECTION & STORAGE
Not just a capacity issue any more!
It’s about enabling the edge nodes to filter, index and compute the data stream. Equipping the edge with sufficient computation and the appropriate storage combination for both latency sensitive and cold/archival storage.
Questions that really need answering:* How much data actually needs to be stored?* How much of the data is transient?* How much data needs to be stored, but can’t be stored owing to cost/capacity constraints?* Within cold storage what are access patterns and appropriate mediums?
NEWREVENUE
Val
ue
© RED HAT, INC.17
Key Benefits
● Improved QoS and reduced latency of data analytics
● Less data traversing networks
● Higher resiliency since it’s pushed to redundant components at edge of network
● Employ better “swarm intelligence” for diffusion of data across networks
● Optimized data placement for higher energy efficiency
● Agility in infrastructure with containerization
● Higher cost efficiency based on commoditized and open source implementations.
© RED HAT, INC.18
Stay tuned...
● Database Workloads on Ceph
● Latency sensitive, high IOPS workloads on Ceph RBD
● CephFS workloads in production
● Ceph iSCSI target implementation with HA gateways
● Hyperconvergent architectures
● Containerization of storage services
● Support for ARM processors in newer architectures
© RED HAT, INC.19
Driving Value in a Hyperscale Model
IMHO, some predictions for incremental gains at hyperscale:
* Increased prevalence and dominance of open-source software-defined-storage technology
* Proliferation of containerization for heavy densities, availability and elasticity of microservices delivery in the cloud
* Implementation of all-flash technologies and tiering for iops intensive and mixed workloads
* Increased prominence of ARM processors in scale-out architectures
* Use of key-value drives.
It will become imperative to leverage a combination of these technologies to remain competitive and to maintain cost and operational efficiency.