vmworld 2013: beyond mission critical: virtualizing big-data, hadoop, hpc, cloud-scale apps
DESCRIPTION
VMworld 2013 Chris Greer, FedEx Richard McDougall, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshareTRANSCRIPT
Beyond Mission Critical: Virtualizing Big-Data,
Hadoop, HPC, Cloud-scale Apps
Chris Greer, FedEx
Richard McDougall, VMware
VAPP5402
#VAPP5402
© 2013 VMware Inc. All rights reserved
Beyond Mission Critical: Virtualizing Big-Data, Hadoop and Cloud Apps
Richard McDougall
CTO, Storage and Application Services
Chris Greer,
Enterprise Architect, FedEx
3
Virtualize Everything: Next Generation Apps
Virtual Storage
Arrays
vSphere
SAN/NAS Object / BLOB
Traditional Applications
• Traditional enterprise storage
• HW-based resiliency, QoS
Next Gen Cloud Apps
• Scale out, flash, DAS
• Application specific storage
All SSD
Array
Server-side
Flash
4
The complexity enterprise IT and developers face today
An Idea for a cool app
Spec a server config
Justify server costs
Procurement process
Wait for HW to arrive
Wait for IT ops to Image the server
Install a Database
LOB Architecture approval
Central IT Architectural
approval
Justify more server for scale
testing
Wait for more HW
Configure ACLs and LBs
New infrastructures
New Languages and
Frameworks
New Devices
and Domains
New Data types and
requirements
5
Micro Clouds
Cloud Foundry – Announced Today on vSphere
Data
Services
Other
Services
Msg
Services
.js
Public Clouds
Private Clouds
6
Big Data - Not Just for the Web Giants – Now the Intelligent Enterprise
7
Real-time analysis allows
instant understanding of
market dynamics.
Retailers can have intimate
understanding of their
customers needs and use
direct targeted marketing.
Market Segment Analysis Personalized Customer Targeting`
8
The Emerging Pattern of Big Data Systems: Retail Example
Real-Time
Streams
Exa-scale Data Store
Parallel Data Processing
Real-Time
Processing
Machine
Learning
Data Science
Cloud Infrastructure
9
Storage: Plan for Peta-scale Data Storage and Processing
0.01
0.1
1
10
100
1000
2000 2003 2006 2009 2012 2015
Online Apps
AnalyticsPB of
Data
Analytics Rapidly Outgrows Traditional Data Size
by 100x
10
Unprecedented Scale
“Data transparency,
amplified by Social Networks
generates data at a
scale never seen before” - The Human Face of Big Data
We are creating an Exabyte
of data every minute in 2013
Yottabyte by 2030
11
A single GE Jet Engine produces
10 Terabytes of data in one hour
– 90 Petabytes per year.
Enabling early detection of
faults, common mode failures,
product engineering feedback.
Post Mortem Proactively Maintained Connected Product
12
Cloud Infrastructure Supports Mixed Big Data Workloads
Machine Learning Hadoop
Real-Time Analytics
Change workload types to Real-time
Analytics, Machine Learning , Hadoop
above cloud infra, too
Cloud Infrastructure
Machine Learning
Hadoop
Real-Time Analytics
Management
Network/Security
Storage/Availability
Compute
13
Cloud Infrastructure Supports Multiple Tenants
Change workload types to Real-time
Analytics, Machine Learning , Hadoop
above cloud infra, too
Cloud Infrastructure
Management
Network/Security
Storage/Availability
Compute
Web User
Analytics
Financial
Analysis
Historical Customer
Behavior
14
Software-defined Datacenter: Compute
Agility / Rapid deployment
Lower Capex
Isolation for resource control
and security
1
2
3
Operational efficiency 4
Management
The Core Values of Virtualization Apply to Big Data
Network/Security
Storage/Availability
Compute
15
Strong Isolation between Workloads is Key
Hungry
Workload 1
Reckless
Workload 2
Nosy
Workload 3
Cloud Infrastructure
16
Virtualizing Hadoop
Shrink and expand
cluster on demand
Independent scaling of
Compute and data
Strong multi-tenancy
Elasticity & Multi-tenancy
High availability for
entire Hadoop stack
One click to setup
Battle-tested
High Availability
Rapid deployment
One stop command
center
Easy to
configure/reconfigure
Operational Simplicity
17
Serengeti
Virtual Hadoop Manager (VHM)
Hadoop Virtualization Extensions
(HVE)
Big Data Extensions: Core Components
Core is Open Source
Tool to simplify virtualized Hadoop deployment & operations
Serengeti
Virtualization changes for core Hadoop
Contributed back to Apache Hadoop
Advanced resource management on vSphere
18
Hadoop
batch analysis
Big Data Family of Frameworks
File System/Data Store
Host Host Host Host Host Host
HBase
real-time queries
NoSQL Cassandra,
Mongo, etc Big SQL
Impala,
Pivotal HawQ
Compute
layer
Virtualization
Host
Other Spark,
Shark,
Solr,
Platfora,
Etc,…
19
Traditional Hadoop vs. Elastic Hadoop
Scale-out Network Storage
Traditional Hadoop:
Converged
Compute/Storage Elastic Compute
Scale-out Network Storage
20
Management
Software-defined Datacenter: Storage
Requirements of Next Generation Storage
Network/Security
Storage/Availability
Compute
10x lower cost of storage
Handle explosive data growth
Support a variety of
application types
1
2
3
Solve the privacy and
security issues 4
21
HDFS Model
ESX ESX ESX
J
T
HDFS or MAPR VM HDFS or MAPR VM HDFS or MAPR VM
Local Disks
SAN/NAS Non-Hadoop VMs
Hadoop Compute VMs
JT: JobTracker
TT: TaskTracker
NN: NameNode
VHM: Virtual Hadoop Manager
N
N
T
T
T
T T
T
VirtualCenter Management Server
DRS DRS DRS DRS DRS
VHM
Hadoop HDFS VMs
T
T
T
T T
T
J
T
22
Big-Data using Local Disks
Host
Host
Host
Host
Host
Host
Host
Top of Rack Switch
Servers with
Local Disks
16-24 core server
12-24 SATA 2-4TB Disks
10 GbE adapter
iSCSI/NFS for Shared
Storage for vMotion etc,…
High Performance 10GBE
Switch per Rack
23
Scale-out Storage for Big Data
$-
$0.50
$1.00
$1.50
$2.00
$2.50
$3.00
$3.50
$4.00
$4.50
$5.00
$5.50
0.5 1 2 4 8 16 32 64 128
Cost per GB
Petabytes Deployed
Traditional
SAN/NAS
Distributed
Object
Storage HDFS
MAPR
CEPH
Scale-out NAS Isilon, NTAP
24
Big Data Storage
Scale-out Network Storage
Elastic Compute
Scale-out Network Storage
• Hadoop Protocol
• Snapshots
• Posix Apps
• Full NFS Access
• Replication
• Erasure Coding
25
Big Data with Scale-out-NAS
Big-Data using Scale-out NAS
Host
Host
Host
Host
Host
Host
Top of Rack Switch
Scale-out NAS
Host
Host
Host
Host
Host
Host
Top of Rack Switch
Scale-out NAS
Temp
Data
Shared
Data
Isilon
Scale-out
NAS
Local
Disk or SSD
In each Host
For Transient Data
26
Chris Greer, FedEx Services
27
Breakthrough Use Cases
Web Log Analysis
Initial exploration was around detection of mobile devices accessing the
website.
Analysis of 570 billion web server log entries took approximately 9 minutes to
complete on a small cluster.
ZIP code Analysis
Analysis of data to determine which ZIP codes are the highest source or
destination for shipments.
Shipment Analysis
Analysis of shipment information to determine patterns
that may delay a package.
28
Agile Big Data at FedEx
• Trusted Isolation
• Well known auditable platform
Security
• Deploy in minutes
• Optimize for shift in workload characteristics
Agility
• Create true multi-tenancy
• Mixed workloads
Elasticity
29
Hadoop Service at FedEx: vSphere + Isilon Storage
Scale-out Isilon Cluster
- Shared Data
- NAS + Hadoop
Elastic vSphere Cluster
- Mixed Workloads
- vSphere
- Existing Rack Mount
Servers
30
Agility: Automation of Hadoop Cluster Management
Deploy
Resize
Elastic scaling
Customize
Incorporate
best practices
Manage
Tune configuration
Run
Execute jobs
Access HDFS
31
Monitoring
Agility: Ease of Management Due to Consolidation
Cluster setup
and provisioning
Monitoring
HW procurement
and sizing
Cluster setup
and provisioning
HW procurement
and sizing
32
Elasticity: Mixed Workloads on a Shared Platform
Production
Test
Experimentation
Dept A: Marketing Dept B: Operations
Production
Test
Experimentation
Log files
Social data Transaction data Historical data
Common Infrastructure Common Infrastructure
can be shared by multiple
logical Hadoop clusters
and prioritized with
VMWare resource pools.
Data Segregation Data that should not be
shared can be kept
separate and leverage
VMWare security controls
for isolation.
33
Security
Known Security Model
• VMs provide the required levels of Isolation for different workloads
Trusted Auditable Platform
• Leverage virtualization as the platform
• Known to auditors
• Accepted as a valid deployment model
34
Summary
35
Customers Winning from Consolidated Big Data Platforms
“Dedicated hardware makes no
sense”
“Software-defined Datacenter
enables rapid deployment
multiple tenants and labs”
“Our mixed workloads include
Hadoop, Database, ETL and
App-servers”
“Any performance penalties are
minor” Management
Network/Security
Storage/Availability
Compute
36
Q&A
37
Other VMware Activities Related to This Session
HOL-SDC-1309 - vSphere Big Data Extensions
VAPP5484 – Big Data Extensions Advanced Features
VAPP5626 – Big Data Panel
VAPP5402
THANK YOU
Beyond Mission Critical: Virtualizing Big-Data,
Hadoop, HPC, Cloud-scale Apps
Chris Greer, FedEx
Richard McDougall, VMware
VAPP5402
#VAPP5402