anilvasudeva nextgen storage big data
TRANSCRIPT
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
1/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
2/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
3/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
33
Abstract
NextGen Infrastructure forBig DataThis session will appeal to Business Planning, Marketing, Technology System Integrators and Data CenterManagers seeking to understand the drivers behind the demand for and rise of Big Data.
AbstractThe internet has spawned an explosion in data growth in the form of data sets, called Big Data, that areso large they are difficult to store, manage and analyze using traditional RDBMS which are tuned forOnline Transaction Processing (OLTP) only. Not only is this new data heavily unstructured, voluminousand streams rapidly and difficult to harness but even more importantly, the infrastructure cost of HW andSW required to crunch it using traditional RDBMS, to derive any analytics or business intelligence online
(OLAP) from it, is prohibitive.To capitalize on the Big Data trend, a new breed of Big Data technologies (such as Hadoop and others)many companies have emerged which are leveraging new parallelized processing, commodity hardware,open source software and tools to capture and analyze these new data sets and provide aprice/performance that is 10 times better than existing Database/Data Warehousing/Business IntelligenceSystems.
Learning ObjectivesThe presentation will illustrate the existing operational challenges businesses face today using RDBMSsystems despite using fast access in-memory and solid state storage technologies. It details how IT isharnessing the emergent Big Data to manage massive amounts of data and new techniques such asparallelization and virtualization to solve complex problems in order to empower businesses withknowledgeable decision-making.It lays out the rapidly evolving big data technology ecosystem - different big data technologies fromHadoop, Distributed File Systems, emerging NoSQL derivatives for implementation in private and hybridcloud-based environments, Storage Infrastructure Requirements to Store, Access, Secure, Prepare foranalytics and visualization of data while manipulating it rapidly to derive business intelligence online, to runbusinesses smartly.
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
4/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
5/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
55
NextGen IT Infrastructure
Enterprise VZ Data CenterOn-Premise Cloud
Home Networks
Web 2.0Social Ntwks .Facebook,
Twitter, YouTubeCable/DSL
Cellular
Wireless
Internet ISP
CoreOptical
Edge
ISP
ISPISP
ISP
ISP
Supplier/Partners
Remote/Branch Office
Public CloudCenter
Servers VPNIaaS, PaaS
SaaS
VerticalClouds
ISP
Tier-3Data Base
ServersTier-2 Apps
ManagementDirectory Security Policy
Middleware Platform
Switches: Layer 4-7,Layer 2, 10GbE, FC Stg.
Caching, Proxy,FW, SSL, IDS, DNS,
LB, Web Servers
App licat ion ServersHA, File/Print, ERP,
SCM, CRM Servers
Database Servers,Middleware, DataMgmt.
Tier-1Edge Apps
FC/ IPSANs
Request for data from a remote client to a Data Center or Cloud crosses a myriad ofsystems and devices. Key is identifying bott lenecks & improving performance
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
6/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Harnessing Big Data for Business Insights
6
Majority of data growth is beingdriven by unstruc tured data
and billions of large objects
Information is at the center ofNew Wave of opportunity
80% of worlds data is unstructureddriven by rise in Mobility devices,
collaboration machine generated data.
Data
Sources
Big Data
Infrastructure
Business
Insights
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
7/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Unstructured Big Data can provide NextGen Analytics to help businesses makeinformed, better decision in:Product StrategyTargeting SalesJust-In-Time Supply-Chain EconomicsBusiness Performance OptimizationPredictive Analytics &RecommendationsCountry Resources Management
Corporate Need: Business Perf... Optimization
7
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
8/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
9/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Corporate Need: Business Insights
.
Store
Information ExplodingVolume: Digital Content doubling every 18months. Velocity: >80% growth driven fromunstructured data. Variety: sources of datachanging
A unif ied in formation/contentstorage methodologythat enablesusers to manage the volume, velocity andvariety of information from multiple sources
Manage
Complexity in "managing"information.- Need to classify,
synchronize, aggregate, integrate, share,transform, profile, move, cleanse, protect,retire
A solution port fo lio of tools andservices to manage all types of
information in a hybrid storage environment
AnalyzeCurrent solutions limited to BI toolsfocused on structured and lagginginformation
Build/buy packaged Real-TimePredictive Analytical Solutionsforunstructured analytics tools
Collaborate Multiple access methodsneeded to
meet needs of a diverse audience.Centralized share, collaborateandact on insights anytime, anywhere on anydevice.
Model/ AdaptAbility to understand how theinformation impacts the business.How to transfer to action.
Model Information on currentoperations w/potential strategyimpact.Leverage Tech. to adapt.
Item Issue Solution
9
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
10/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Opportunity: Converting Big Data Delugeinto Predictive Analytics & Insights
Personal Location Services Data Generated by TB/Year
Navigation Devices 600
Navigation Apps on Phones 20Smart Phone Opt-In Tracking 1000
Geo Targeted Ads 20
People Locator (Emergency Calls/Search..) 10
Location based Services (e.g.Games) 5
Other 45
Total (Est.) 1700
Big Data Predict ive Analytics
10
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
11/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Issues with Existing RDBMS
Key Issues with RDBMS Technologies
Handling Mixed Unstructured Data- RDBMs dont handle non-tabular data
(Notorious for doing a poor job on recursive data structure)
Legacy Archaic Architecture- RDBMS dont parallelize well to accommodatecommodity HW clusters
Speed- Seek time of physical Storage has not kept pace withnetwork speed improvementsScale- Difficult to scale-out RDBMS efficiently Clusteringbeyond few servers notoriously hardIntegration- Data processing tasks need to combine data from non-related sources, over a networkVolume- Data volumes have grown from 10s GB >100s TB >PBs in recent years. Existing Tabular RDBMS canthandle such large DBs
11
FaultTolerance
Avai labil ityDeep
Insights
EU AdhocAnalytics
Cost
UnstructuredData
Latency
Scalability
Big DataAnalytics
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
12/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Issues with Existing RDBMS
Present RDBMS struggling to Store & Analyze Big Data
12
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
13/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data - Database Solutions
13
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
14/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data The New Face of DBs
Big Data Paradigm - The New face of DB Systems Adopts Schema-Free Architecture
Can do away with Legacy Relational DB SystemsSome data have sparse attributes, do not need relational property Key Oriented Queries
Some data stored/retrieved mainly by primary key, w/o complex joins
Trade-off of Consistency, Availability & Partition Tolerance
Scale Out, not up, - Online Load balancing cluster growth
14
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
15/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Analytics The Next Frontier in IT
15
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
16/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Key Innovations: HW Technologies
16
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
17/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
17
Key Innovations Solid State Storage
Source: IMEX Research SSD Industry Report 2011
Note: 2U storage rack, 2.5 HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, 2.5 SSD max cap = 800GB / 36 SSDs
0
300
600
2009 2010 2011 2012 2013 2014
Un
its
(Millio
ns
)
0
8
IOPS/GB
IOPS/GB HDDs
HDD
SSD
17
Key to Database performance are random IOPS. SSDs outshine HDD in IO price/performance
a major reason, besides better space and power, for their explosive growth.
Storage - IOPS/GB & Price Erosion - HDD vs. SSDs
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
18/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Innovations DB SW Technologies
Tech Innovation 1985 1990 1995 2000 2005 2010 2015
OLTP TransactionsDB SW Rows Locking Optimizer Parallel Query Clustering XML Grid Open Source /Hadoop
OLAP- AnalyticsDB SW
Indexing Partitioning ColumnarMaterialized
ViewBit Mapped
IndexIn-Memory Query Binding
Hardware 32 bit SMP NUMA 64 bitMulti-core/Blades
Flash MPP
Big Data Multi-coreColumnarIn-Memory
MPPVisualization
OLTP Database Innovation Progress
0.1
1
10
100
1000
10000
100000
1985 1990 1995 2000 2005 2010 2015
0.01
0.1
1
10
100
1000
10000
$/TPMc
TPMc/Processor$/TPMc
TPMc/Processor
Big Data:Analy tics DB Technology Impact
0
0
1
10
100
1,000
10,000
100,000
1985 1990 1995 2000 2005 2010 2015
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
.1
.01
DW Size TB$/GB
18
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
19/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data - Key Requirements
Types of Data Organizations Analyze59%
44%
69%
64%
41%
33%
51%
28%
46%
36%
36%
18%18%
21%
8%
3%
3%
68%
68%
37%
23%
33%
34%
26%
32%
21%
15%
11%
15%
11%
9%
3%
6%
5%
Customer/member data
Transactional data from applications
Application Logs
Other Types of Event Data
Network Monitoring/Network Traffic
Online Retail TransactionsOther Log Files
Call Data Records
Web Logs
Text data from social media and online
Search logs
Trade/quote data
Intelligence/defense data
Multimedia (audio/video/images)
Weather
Smartmeter data
Other (please specify)
Hadoop
Non Hadoop
19
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
20/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
21/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
21
Big Data - Market Requirements
Unified system: Pre-integrated for Ease of Installation and Management Platform Large Scale Indexing Pre-integrated using Hadoop Foundation, Integrated Text Analytics -Address Unstructured Data Usability- User Friendly Admin Console including HDFS Explorer, Query
Languages Enterprise Class Features Provisioning, Storage, Scheduler, Advance Security
Supports search-centric, document-based XML data modelo store documents within a transactional repository.
Schema-Free:o No advance knowledge of the document structure (its "schema") needed
o Index words and values from each of the loaded documents together withits document structure.
Standard commodity hardware leveraged
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
22/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Market Requirements
Architectural Shared-nothing clustered DB architecture
o programmable and extensible application servers.
Support massive scalabili tyto petabytes of source data Support open-source XQuery- and XSLT-driven architecture Simple to Deploy, Develop and Manage (UI & Restful Interface)
Support extreme mixed workloads- a wide variety of data types includingarbitrarily hierarchical data structures, images, waveforms, data logs etc.
Support thousands of geographically dispersed on--line usersandprogramsexecuting variety of requests from ad hoc queries to strategic analysis
Loading data before declaring or discovering its structure Load data in batch and streaming fashion
Integrate data from multiple sourcesduring load process at very high ratesSpread I/O and data across instances
Provide consistent performance with linear cost Leverage Open Source SWLo Costs, Multiple Sources, Hadoop Foundation Tools Connectivity with Oracle DB, Teradata Warehouse, JDBC Connectivity,
22
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
23/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Real Time Analytics Execution Execute streaming analytic queries in real timeon incoming load data Updating data in placeat full load speeds Scheduling and execution of complex multi-hundred node workflows Join a billion row dimension table to a trillion row fact tablewithout pre-
clustering the dimension table with the fact tablePerformance Analyze data, at very high rates >GB/sec Predictable Sub-ms response timefor highly constrained standard SQL
queries
Availability Ability to configure without any single point of failure Auto-Failover Extreme High Availability
o Automated failover and process continuation without operationalinterruption when processing nodes fail
23
Big Data - Market Requirements
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
24/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Product Metrics Choices
Big Data-
Product
Metrics
Data Set Size
PB
TB
GB
Data Structure
Transaction
Machine
Unstructured
Other
Access/Use
Transaction
Search
Analytics
Parallel
Processing
Appliance
Cluster < 1K
Cluster > 1K
MemoryIn-Memory
Flash
DB Technique
Columnar
Zero Sharing
No SQL
Data
Cataloging
SW
Text
Image
Audio
Video
24
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
25/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Advantage: Big Data Products
Characteristic Legacy Paradigm Big Data Paradigm
Structure Transactional/Corporate Unstructured/Derivative/Internet
Mode Data Collection Data Analysis
Focus Find Answers Find Questions
Facility Reportive / What Happened? Analytic / Why did it Happen?Predictive / What will Happen Next?
Opportunity Very Small Growth Massive Growth
Players Legacy Players Agile Start Ups, well funded
Impact Analyze Existing Businesses Create New Businesses
25
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
26/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Characteristic Tradit ional RDBMS Big Data/MapReduce
Data Size GB PB
Access Interactive Batch/Near Real-Time
Latency Low High
Data Updates Read & Write Many Times Write Once Read Many Times
Schema/Structure Static Schema Dynamic Schema
Language SQL UQL/Procedural (Java,C++..)
Integrity High Not 100%
Works Well for Process Intensive Jobs Data Intensive Jobs
Works Well w Data Size Gigabytes Petabytes
Data/ProcessingInteractions
Low Latency/High BW precursor tosuccess. Ntwk. BW can be abottleneck causing nodes to be idle
Sends Code to Data, instead ofSending Data to other Nodes(Requiring Lower BW in Cluster)
Fault Tolerance Coordinating Processes with NodeFailures a challenge
Fault Tolerant for HW/SW Failures
Access Interactive Batch/Near Real-time
Scaling Non-linear Linear
Pgm-Distribution of Jobs Difficult Simple & Effective
26
Advantage: Big Data Products
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
27/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Ecosystem
27
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
28/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Stack
Collector
Admin Center
Data
Importer
Data
Importer
RHive
EnterprisePerfMon, Query
Plan
Hive WorkflowOracle-to-Hive
Search
Rest/
JSON
API
Data
exporter
Databases
Advanced
Analytics
OLAP
Server
OLTP
Server
ETL
Ad-hoc query
DataSources
Data Store(Hadoop)
OracleIBM
Teradata
Real-timeQueries
DBA
StreamingData
Devices
Analyt ics Plat form Data
SourcesAppl ications
Merging Hadoop innovations into Nextgen DBMS
28
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
29/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Hadoops Fit in Enterprise Stack
29
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
30/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
31/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Connectors to EDW/BI
Servers
Operating System
Hypervisor/VMs
Big Data Storage Framework(HDFS)
Big Data Processing Framework(MapReduce)
Big Data Access Framework
Pig Hive Sqoop
Big Data(Connectors)
Big Data Orchestration Framework
HBase Avro Flume ZooKeeper
BI APPLICATIONS(Query, Analytics, Reporting, Statistics)
EDW
Backup&
Recovery
Management
Security
Network
BI Framework - Interoperable with Enterprise Data Warehousing
31
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
32/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Infrastructure Map Reduce
32
Map Reduce
A Distributed Computing Model Typical Pipeline:
Input>Map>Shuffle/Sort>Reduce>Output Easy to Use , Developer writes few functions,
Moves compute to Data Schedules work on HDFS node with data Scans through data, reducing seeks Automatic Reliability and re-execution on failure
Column-Stg(HBase)
DataFlow(Pig)
SQL
(Hive)
Metadata(HCatalog)CV
Coord
ination
(Zookeeper)
Manag
ement
(HMS)
Hadoop
Distributed
File System(HDFS)
Metadata
(HCatalog)CV
MapReduce
Computing
Framework)
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
33/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
34/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Hadoop Distributed File System
HDFS Archi tecture
HDFS Characteristics Based on Google GFS (Google
File System) Redundant Storage for massive
amounts of data
Data is distributed across allnodes at load time efficientMapReduce processing
Runs on commodi ty hardwareassumes high failure rate forcomponents
Works well with lots o f large files Built around Write once Read
many times Large Streaming Reads Not
random access High Throughtput more important
than low latency
34
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
35/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Hadoop Architecture - Overview
Hadoop Data Processing Architecture
35
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
36/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Key Technologies Required for Big Data
Key Technologies Required for Big Data Cloud Infrastructure Virtualization Networking
Storageo In-Memory Data Base (Solid State Memory)o Tiered Storage Software (Performance Enhancement)o Deduplication (Cost Reduction)o Data Protection (Back Up, Archive & Recovery)
36
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
37/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Cloud Infrastructure for Big Data
37
Examples
eMail - Yahoo!,Google
Collaboration- Facebook,Twi tter Bus.Apps- SalesForce, GoogleApps, Intuit
ExamplesAmazon EC2Force.comNavitaire
ExamplesAmazon S3Nirvanix
Infrastructure HW &Services
- Servers, Network, Storage- Management, Reporting
SaaS
PaaS
IaaS
Platform Tools & Services- Deploy developed platformsready for Application SW on
Cloud Aware Infrastructure
Software-as-a-Service- Servers, Network, Storage
- Management, Reporting
ServiceProviders
Examples
Public - BT, Telstra, T-Systems France TelecomPrivate Hybrid IBM/Cloudburst,
Cloud Services ProvidersPublic Mut itenancy,OnDemandPrivate - On Premises, EnterpriseHybrid Interoperable P2P
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
38/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Platform Tools & Services
Operating Systems
Cloud ComputingPublic Cloud Service
ProvidersPrivate Cloud
Enterprise
AppSLA
SaaSApplications
...Net
Python
EJB
Ruby
PHP ..
PaaS
IaaS
SaaS
Virtualization
Resources (Servers, Storage, Networks)
HybridCloud
App SLA
App SLA
App SLA
App SLA
Managemen
t
Cloud Infrastructure for Big Data
Applications SLA dictates the Resources Required to meet specific
requirements of Availabil ity, Performance, Cost, Security, Manageabil ity etc.
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
39/60
Vi t li ti W kl d C lid ti
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
40/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Virtualization: Workloads Consolidation
Source: Dan Olds & IMEX Research 2009
A single server 1.5x largerthan
standard 2-way server will handleconsolidated load of 6 servers. VZ manages the workloads +important apps get the computeresourcesthey need automaticallyw/o operator intervention. Physical consolidation of 15-20:1is easily possible Reasonable goal for VZ x86servers 40-50% utilizationonlarge systems (>4way), rising asdual/quad core processorsbecomes available
Savingsresult in Real Estate,Power & Cooling, High Availability,Hardware, Management
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
41/60
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
42/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Storage Infrastructure for Big Data
42
Storage
Efficiency
Virtualization Mapping P > V, VM Management
Performance In-Memory DB, Auto-Tiering-SSD/HDD
Costs Reduction Thin Provisioning
Deduplication
AvailabilityRAID/Auto recover HA, Snapshots, CDP, Cloning, DRS
SecurityEncryption/DLP
Service
Efficiency
Storage -as-a Service Service Catalogs by Workloads etc.
Policy Infrastructure
Service Level Attributes
Service Measurements Performance Analytics
IOPS/Response Time, Bandwidth Automation
Unified SAN/NAS Protocols Auto learning Workload Forensics Provisioning to Match Workloads Assured Auto recovery
St A hit t I t f VZ
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
43/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Storage Architecture - Impact from VZ
Source: IMEX Research SSD Industry Report 2011
43
Replication
RAID 0,1,5,6,10
Virtual Tape
Back Up/Archive/DR
Data Protection
Virtualization
MAID
Deduplication
Thin Provisioning
Storage Efficiency
Auto Tier ing
Virtualization (VZ)requires Shared Storage for- VMotion
- Storage VMotion- HA/DRS- Fault Tolerance
Addi tional CapacityConsumed for- VZ snapshots,
- VM Kernel etc.
St I & S l ti
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
44/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Storage Issues & Solutions
TraditionalData Growth
StorageCostsReduction
Capacity Requirements
Snapshots ~ 75%
Thin Provisioning ~30%
DeDupl ication ~ 25-95%
Auto-Tier ing 65-95%
Thin Replication ~ 95%
RAID*DP ~ 40% vs. R10
Virtual Clones ~80%
CAPACITY
SAVINGS
~ xx %
Technologies Reducing Storage Costs
St A hit t I ti Bi D t
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
45/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Storage Architecture Impacting Big Data
Auto-Tiering Systemusing Flash SSDs
Data Class
(Tiers 0,1,2,3)
Storage Media Type
(Flash/Disk/Tape)
Policy Engines(Workload Mgmt.)
Transparent Migration
(Data Placement)
File Virtualization(Uninterrupted App.Opns.in Migration)
Source: IMEX Research SSD Industry Report 2011
Replication
RAID 0,1,5,6,10
Virtual Tape
Back Up/Archive/DR
Data Protection
Storage Virtualization
MAID
Deduplication
Thin Provisioning
Storage Efficiency
Auto-Tier ing
45
DRAM
FlashSSD
PerformanceDisk
Capacity Disk
Tape
Data Storage: Hierarchical Usage
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
46/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
4646
I/O Access Frequency vs. Percent of Corporate Data
SSD Logs
Journals
Temp Tables
Hot Tables
FCoE/
SASArrays
Tables
Indices
Hot Data
CloudStorage
SATABack Up Data
Archived Data
Offsi te DataVault
2% 10% 50% 100%1%% of Corporate Data
65%
75%
95%
%o
fI/OAccesses
Data Storage: Hierarchical Usage
Source:: IMEX Research - Cloud Infrastructure Report 2009-12
SSD St Filli P i /P f G
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
47/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
4747
SSD Storage: Filling Price/Perf.Gaps
HDD
Tape
DRAM
CPUSDRAM
PerformanceI/OAccess Latency
HDD becomingCheaper, not faster
DRAM gettingFaster (to feed faster CPUs) &Larger (to feed Multi -cores &
Multi-VMs from Virtualization)
SCM
NOR
NANDPCIeSSD
SATA
SSD
Price$/GB
Source: IMEX Research SSD Industry Report 2010-12
SSD segmenting intoPCIe SSD Cache
- as backend to DRAM&SATA SSD- as front end to HDD
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
48/60
Workloads Characterization
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
49/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Workloads Characterization
*IOPS for a required response time ( ms)*=(#Channels*Latency-1)
(RAID - 0, 3)
500100
MB/sec101 505
Data
Warehousing
OLAP
Business
Intelligence(RAID - 1, 5, 6)
IOPS*(*La
tency-1
)
Web 2.0Audio
Video
Scientific Computing
Imaging
HPC
TPHPC
10K
100 K
1K
100
10
1000 K
OLTP
eCommerce
Transaction
Processing
Source:: IMEX Research - Cloud Infrastructure Report 2009-12
Workloads Characterization
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
50/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Storage performance, management and costs
are big issues in running Databases
Data Warehousing Workloads are I/O intensive Predominantly read based with low hit ratios on buffer pools High concurrent sequential and random read levels
Sequential Reads requires high level of I/O Bandwidth (MB/sec)
Random Reads require high IOPS) Write rates driven by life cycle management and sort operations
OLTP Workloadsare strongly random I/O intensive Random I/O is more dominant
Read/write ratios of 80/20 are most common but can be 50/50 Can be difficult to build out test systems with sufficient I/O characteristics
Batch Workloadsare more write intensive Sequential Writes requires high level of I/O Bandwidth (MB/sec)
Backup & Recoverytimes are crit ical for these workloads Backup operations drive high level of sequential IO Recovery operation drives high levels of random I/O
Workloads Characterization
Source: IMEX Research SSD Industry Report 2011 50
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
51/60
Best Practices: I/O Forensics in Storage Tiering
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
52/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
5252
Best Practices: I/O Forensics in Storage-Tiering
Source: IBM & IMEX Research SSD Industry Report 2011 IMEX 2010-12
LBA Monitoring and Tiered Placement
Every workload has unique I/O access signature Historical performance data for a LUN can
identify performance skews & hot data regionsby LBAs
Storage-Tiering at LBA/Sub-LUN Level
Storage-Tiered Virtualization
Physical Storage Logical Volume
SSDsArrays
HDDsArrays
Hot Data
Cold Data
AutomaticMigration
(Policy Based)
Best Practices: Cached Storage
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
53/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Best Practices: Cached Storage
App/DBServer
LSI MegaRAID CacheCade Pro 2.0
Application Improvement over Cached
vs.HDD onlyOracle OLTP Benchmarks 681%
SQL Server OLTP
Benchmark
1251%
Neoload (Web Server
Simulation
533%
SysBench (MySQL OLTP
Server)
150%
0
373
655
0
100
200
300
400
500
600
700
All HDD Smart FlashCache
Persist Dataon Warpdrive
TPS
0
330
660
0
100
200
300
400
500
600
700
All HDD Smart FlashCache
Persist Dataon
Warpdrive
Response Time
53
Big Data Targets: Analytics
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
54/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Targets: Analytics
Key Industries Benefitt ing f rom Big Data Analytics
CAGR%
(2010-15
)
Global IT Spending by Industry Verticals 2010-15 $B
5 Year Cum Global IT Spending 2010-15 ($B)
54
Bi D T S I f
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
55/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Targets Storage Infrastructure
Data Intensity by Industry Vertical
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Banking & Financial Services
Media & Entrertainment
Healthcare Providers
Professional Services
TelecommunicationsPharma, Life Sciences & Medical Pdcts
Retail & Wholesales
Utilities
Ind'l Elex & Electrical Equipment
SW Publishing & Internet Srvcs
Consumer Products
Insurance
Transportation
Energy
Installed Terabytes/$M Rev 2010
Value Potential of Using Big Data by Data Intensive Verticals
55
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
56/60
Big Data Targets: Savings w Open Source
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
57/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Big Data Targets: Savings w Open Source
Legacy BI vs. Open Source Big Data Analytics
57
Rise of Big Data Adoption
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
58/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-12
Rise of Big Data Adoption
58
Key Takeaways
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
59/60
NextGen Infrastructu re forBig Data 2012 Storage Networking Industry Association. All Rights Reserved.Source: IMEX Research Big Data Industry Report 2011-1259
Key Takeaways
Big Data creating paradigm shift in IT Industry
Leverage the opportunity to optimize your computing infrastructure with Big DataInfrastructure after making a due diligence in selection of vendors/products, industry testing
and interoperability.
Apply best storage technologies listed in this presentation and elsewhere
Optimize Big Data Analytics for Query Response Time vs. # of Users
Improving Query Response time for a given number of users (IOPs) or Serving more users
(IOPS) for a given query response timeSelect Automated Storage Management Software
Data Forensics and Tiered Placement
Every workload has unique I/O access signature
Historical performance data for a LUN can identify performance skews & hot data regions by
LBAs.Non-disruptively migrate hot data using auto-tiering Software
Optimize Infrastructure to meet needs ofApplications/SLA
Performance Economics/Benefits
Typically 4-8% of data becomes a candidate and when migrated for higher performance
tiering can provide response time reduction of ~65% at peak loads. Many industry Verticals
and Applications will benefit using Big Data
Source: IMEX Research SSD Industry Report 2011
Q&A / Feedback
-
8/12/2019 AnilVasudeva NextGen Storage Big Data
60/60
Q&A / Feedback
Many thanks to the following individuals
for their contributions to this tutorial.
Source: IMEX Research
Joseph White
Anil Vasudeva
Send any questions or comments on this presentation to SNIA: [email protected]
mailto:[email protected]:[email protected]