architecting hadoop workloads on hci powered by vmware vsan
TRANSCRIPT
#vmworld
HCI1941BU
Architecting Hadoop Workloads on HCI Powered by VMware vSAN
Palanivenkatesan Murugan, VMware, Inc. David Boone, VMware, Inc.
#HCI1941BU
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Disclaimer
This presentation may contain product features or functionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
2
The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Agenda
3
Hadoop Overview01
Why Hadoop on vSAN02
Hadoop Deployment options on vSAN03
Small Hadoop clusters
Large Hadoop clusters
Using vSAN Host Affinity Feature
Technical Recommendation for Deployment04
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 4
Introduction
Hadoop on VMware vSAN
Hadoop deployment architecture on vSAN
Use features of vSAN and Hadoop to Complement each other
Design choices and recommendations
Hardware, Network, Software Configuration ( Storage Policy Management, HVE)
Design the Solution for Availability Performance Capacity (TCO)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 5
• Compute and Storage Intensive
• Massive Storage Capacity
• Flexibility
• Linear Performance increase with Scalability
• HCI (vSphere & vSAN )
• Scale up and scale out
• Software innovation
• Next Generation Hardware support
• Greater Availability
• vMotion, vSphere HA, DRS
• Simplified Management with vSphere ecosystem
• VUM, vRA, vROPS, VCF
• Ease of Hardware refresh
• Multi Cloud ready Platform
Hadoop Requirements VMware Solution
Why VMware HCI for Hadoop
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 6
vSAN Terminology
• vSAN Cluster is a vSphere Cluster with vSAN service enabled
• vSAN node is a ESXi host
• Storage Policy or SPBM ( Storage Policy Based Management)
• FTT ( Failure To Tolerate)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 7
Hadoop Terminology
• Infrastructure Nodes ( Master VM, Gateway VM )
• Data Nodes (aka Worker VM)
• Replication Factor ( HDFS Default RF 3 )
• Hadoop Rack awareness
• Hadoop Virtualization Extension (HVE)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 8
Hadoop Rack Awareness Separate Nodes by physical racks
• Maximum performance is obtained if Hadoop is aware of the network topology ( Racks)
• Place replicas more intelligently to trade off performance and resilience
• Rack awareness scripts in respective Hadoop distribution
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 9
Hadoop Virtualization Extension ( HVE )
• HVE feature extends Hadoop topology awareness to account for virtualization layer
• Example :
• To avoid VMs on the same physical host storing the same replica of a file.
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 10
Traditional use of HVE
HVE – Graphical Representation
Hadoop Cluster
DC 1
Rack 1
Node 1
Host 1 Host 2
Node 2
Host 3
Rack 2
Node 3
Host 4
Node 4
Host 5 Host 6
DC 2
Rack 3
Node 5
Host 7 Host 8
Node 6
Host 9 Host 10
Each ESXi host is a
HVE Node Group
“Hosts” are VMs VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 11
Use of HVE with vSAN
HVE with vSAN – Graphical Representation
Hadoop Cluster
DC 1
Rack 1
Node 1
Host 1 Host 2
Node 2
Host 3
Rack 2
Node 3
Host 4
Node 4
Host 5 Host 6
DC 2
Rack 3
Node 5
Host 7 Host 8
Node 6
Host 9 Host 10
Each vSAN cluster is a
Node Group
“Hosts” are VMs VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 12
Power of vSAN Storage Policies
Policies applied to a VM or VMDK, not an entire array
• Unlike traditional storage
• Prescriptive
Change existing, or apply new policies on the fly
Easily view when VM or VMDK is compliant with new policy
Flexibility to meet any Hadoop admin requirements
Apply Policy
View Result
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 13
vSAN terms:
FTT=1FTM=mirroringStripe Width=1
vSAN terms:
FTT=1FTM=mirroringStripe Width=1
vSAN terms:
FTT=0Stripe Width= 1 (1
Object JBOD)
Or
Stripe 2 to 12( Multiple Objects )
vSAN terms:
FTT=0Stripe Width=1
to 12
Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 14
vSAN terms:
FTT=1FTM=mirroring
Stripe Width=1 to 12
vSAN terms:
FTT=0Stripe Width= 1 to
12
vSAN terms:
FTT=1FTM=mirroring
Stripe Width= 2 to 12
Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 15
vSAN terms:
FTT=1FTM=mirroring
Stripe Width=1 to 12
vSAN terms:
FTT=1FTM=mirroring
Stripe Width=2 to 12
vSAN terms:
FTT=0Stripe Width=1 to 12
Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 16
vSAN terms:
FTT=1FTM=mirroring
Stripe Width=1 to 12
vSAN terms:
FTT=1FTM=mirroring
Stripe Width=2 to 12
vSAN terms:
FTT=0Stripe Width=1 to 12
Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 17
vSAN terms:
FTT=1FTM=mirroringStripe Width=1
to 12
vSAN terms:
FTT=0Stripe Width=1
to 12
Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution
18©2019 VMware, Inc.
Small deployment:
Single cluster design
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 19
FTT=0, single cluster
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
HOST 1 HOST 2 HOST 3 HOST 4
Node1-data
Node2-OS
Node4-log
Node2-data Node3-log
Node3-os Node1-log Node4-data
Node1-OSNode2-log Node3-data
Node4-OS
Node1-VM Node2-VM Node3-VM Node4-VM
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 20
FTT=0, single cluster – Failure scenario
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
HOST 1 HOST 2 HOST 3 HOST 4
Node1-data
Node2-OS
Node4-log
Node2-data Node3-log
Node3-os Node1-log Node4-data
Node1-OSNode2-log Node3-data
Node4-OS
Node1-VM Node2-VM Node3-VM Node4-VM
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 21
Better solution for single cluster: FTT=1
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
HOST 1 HOST 2 HOST 3 HOST 4
Node1-data
Node2-OSNode4-log
Node2-data
Node3-logNode3-os
Node1-log Node4-data
Node1-OSNode2-log
Node3-data Node4-OS
Node1-VM Node2-VM Node3-VM Node4-VM
Node1-OS
Node1-data
Node1-log
Node3-log
Node3-data
Node3-os Node4-log
Node4-OS
Node4-dataNode2-OS
Node2-data
Node2-log
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 22
Better solution: FTT=1 – Failure Scenario
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
HOST 1 HOST 2 HOST 3 HOST 4
Node1-data
Node2-OSNode4-log
Node2-data
Node3-logNode3-os
Node1-log Node4-data
Node1-OSNode2-log
Node3-data Node4-OS
Node1-VM Node2-VM Node3-VM Node4-VM
Node1-OS
Node1-data
Node1-log
Node3-log
Node3-data
Node3-os Node4-log
Node4-OS
Node4-dataNode2-OS
Node2-data
Node2-log
Node2-VM
VMworld 2019 Content: Not for publication or distribution
23©2019 VMware, Inc.
Large DeploymentsMulti-cluster Design
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 24
Hadoop nodes on vSAN Clusters – HDFS with Rack awareness
C01-ESX01
Master X
Data Node X
C01-ESX02
Gateway X
Data Node X
C01-ESX04
Data node X
Data Node X
C01-ESX03
Data node X
Data Node X
RACK01vSAN CLUSTER01
- Three vSAN Clusters one per Rack
Hadoop rack awareness with default replication factor = 3
- 1st replica on local Data node/Rack as the writer
- 2nd replica on Datanode in different Rack
- 3rd replica on different data node however same as 2nd
rack
C02-ESX01
Master X
Data Node X
C02-ESX02
Gateway X
Data Node X
C02-ESX04
Data node X
Data Node X
C02-ESX03
Data node X
Data Node X
RACK02vSAN CLUSTER02
C03-ESX01
Master X
Data Node X
C03-ESX02
Gateway X
Data Node X
C03-ESX04
Data node X
Data Node X
C03-ESX03
Data node X
Data Node X
RACK03vSAN CLUSTER03
1
2
3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 25
Topology Map
Rack Awareness
vSAN terms:
Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN
Cluster
vSAN terms:
Each Rack is a vSAN/vSphere Cluster name
Data node VM
Rack ID vSAN Cluster Name
cdh-wn-vm1 1 vSAN 1
cdh-wn-vm2 1 vSAN 1
cdh-wn-vm3 1 vSAN 1
cdh-wn-vm4 2 vSAN 2
cdh-wn-vm5 2 vSAN 2
cdh-wn-vm6 2 vSAN 2
cdh-wn-vm7 3 vSAN 3
cdh-wn-vm8 3 vSAN 3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 26
Hadoop nodes on vSAN Clusters – Rack Awareness and HVE
C01-ESX01
Master X
Data Node X
C01-ESX02
Gateway X
Data Node X
C01-ESX04
Data node X
Data Node X
C01-ESX03
Data node X
Data Node X
RACK01 / HVE Nodegroup1vSAN CLUSTER01
- Three vSAN Clusters one per Rack
- Hadoop rack awareness and HVE (Hadoop Virtualization Extension)
- Default Hadoop replication factor = 3
- Each nodegroup is a vSAN cluster
- Improved availability – Nodegroup(HVE) forces 3 replicas on 3 different Rack
- Network utilization across racks may not be concern with high bandwidth availability and leaf spine architecture used in Data centers
C02-ESX01
Master X
Data Node X
C02-ESX02
Gateway X
Data Node X
C02-ESX04
Data node X
Data Node X
C02-ESX03
Data node X
Data Node X
RACK02 / HVE Nodegroup2vSAN CLUSTER02 /
C03-ESX01
Master X
Data Node X
C03-ESX02
Gateway X
Data Node X
C03-ESX04
Data node X
Data Node X
C03-ESX03
Data node X
Data Node X
RACK03 / HVE Nodegroup3vSAN CLUSTER03 /
1
2
3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 27
Topology Map
Rack Awareness with HVE ( 3 Racks)
vSAN terms:
Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN
Cluster
vSAN terms:
Each HVE Node group is a
vSAN/vSphere Cluster name
Data node VM Rack ID
Node group ID and vSAN Cluster name
cdh-wn-vm1 1 1 - vSAN 1
cdh-wn-vm2 1 1 - vSAN 1
cdh-wn-vm3 1 1 - vSAN 1
cdh-wn-vm4 2 2 - vSAN 2
cdh-wn-vm5 2 2 - vSAN 2
cdh-wn-vm6 2 2 - vSAN 2
cdh-wn-vm7 3 3 - vSAN 3
cdh-wn-vm8 3 3 - vSAN 3VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 28
Hadoop nodes on vSAN Cluster - HVE Rack awareness - 2-Rack
- Four vSAN Clusters two per Rack
- Hadoop rack awareness and HVE (Hadoop Virtualization Extension)
- Default Hadoop replication factor = 3
- Each nodegroup is a vSAN cluster
- Improved availability – Nodegroup(HVE) makes 3 replicas on 3 different nodegroups/vSAN clusters.
- Avoid increase in Network bandwidth across racks as 2 Nodegroups are available within rack to place the replicas
HVE Nodegroup1vSAN CLUSTER01
C01-ESX01
Master Node X
Data Node X
C01-ESX02
Data Node X
Data Node X
C01-ESX03
Data Node X
Data Node X
C01-ESX04
Data Node X
Data Node X
C02-ESX01
Gateway Node
Data Node X
C02-ESX02
Data Node X
Data Node X
C02-ESX03
Data Node X
Data Node X
C02-ESX04
Data Node X
Data Node X
HVE Nodegroup2 vSAN CLUSTER02
1
HVE Nodegroup3vSAN CLUSTER03
C03-ESX01
Master Node X
Data Node X
C03-ESX02
Data Node X
Data Node X
C03-ESX03
Data Node X
Data Node X
C03-ESX04
Data Node X
Data Node X
C04-ESX01
Gateway Node
Data Node X
C04-ESX02
Data Node X
Data Node X
C04-ESX03
Data Node X
Data Node X
C04-ESX04
Data Node X
Data Node X
HVE Nodegroup4 vSAN CLUSTER04
2
3
RACK01 RACK02
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 29
Topology Map
Rack Awareness with HVE ( 2 Racks)
vSAN terms:
Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN
Cluster
vSAN terms:
Each HVE Node group is a
vSAN/vSphere Cluster name
Data node VM Rack ID
Nodegroup ID and vSAN Cluster name
cdh-wn-vm1 1 1 - vSAN 1
cdh-wn-vm2 1 1 - vSAN 1
cdh-wn-vm3 1 2 - vSAN 2
cdh-wn-vm4 1 2 - vSAN 2
cdh-wn-vm5 2 3 - vSAN 3
cdh-wn-vm6 2 3 - vSAN 3
cdh-wn-vm7 2 4 - vSAN 4
cdh-wn-vm8 2 4 - vSAN 4
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 30
Challenges and Mitigation
Multiple vSAN Cluster deployment option with Hadoop HVE
Challenges Mitigation
Single Disk failures could impact more Hadoop nodes in the same vSAN Cluster thereby increasing number of nodes which requires HDFS datanode hotswap and rebuild
• Avoid large vSAN Clusters, when possible create more number of small vSAN Clusters
• Avoid large vSAN Stripe width • Thanks to Rack awareness and HVE
- or –
• FTT greater than 0
Managing Multiple vSAN Cluster Simplify Day 2 Operation using• vROPS for single pane of multi-cluster
management• VMware Cloud Foundation (VCF) provides SDDC
Manager for centralized multi-cluster life cycle management
VMworld 2019 Content: Not for publication or distribution
31©2019 VMware, Inc.
vSAN Host Affinity (RPQ)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 32
Why vSAN Host Affinity
For Next-Gen w/ built-in apps resiliency
Local data to VMs
FTT=0 for space efficiency
High Performance
Apps determines availability
* (RQP only)
Hadoop Data Node
vSphere vSAN
vSAN Datastore
Hadoop Data Node
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 33
FTT=0 “with” and “without” Host Affinity – Graphical Representation
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
cache
capacity
capacity
capacity
HOST 1 HOST 2 HOST 3 HOST 4
Node1-data
Node2-OS
Node4-logNode2-data
Node3-logNode3-os
Node1-log Node4-data
Node1-OS
Node2-log
Node3-data Node4-OS
Node1-VM Node2-VM Node3-VM Node4-VM
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 34
Limitations with vSAN FTT = 0 with Host affinity
• Cannot use other FTT policies for Objects in same Cluster.
• Cannot perform maintenance of ESXi host by migrating VMs to other Hosts.
• vSphere DRS and HA must be turned off
• vSAN Encryption cannot be used
• vSAN Deduplication and compression cannot be used
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 35
Summary - Deployment options in vSANvSAN Policy Hadoop Config Replication
Factor (RF)Benefits Tradeoffs
vSAN mirror to protect 1 failure (FTT=1)
RF = 3
Use HVE if multiple VMs per ESXi Host
• Simple management, HA, V-Motion, DRS available
• vSAN SPBM Advantage, share cluster with other workloads
• Some tradeoff in performance. • Requires 2x Storage
vSAN mirror to protect 1 failure (FTT=1)
RF = 2
Use HVE if multiple VMs per ESXi Host
• Simple management, HA, V-Motion, DRS available
• vSAN SPBM Advantage• Capacity savings by reducing copy in HDFS• Improved write performance
• Minimal tradeoff in performance.• Potential HDFS read optimization
benefit lost due reduction in HDFS copy
No Data redundancy (FTT =0)
* Minimum of 3 vSAN Cluster
RF = 3
Must Use Hadoop Virtualization Extension (HVE)
• Suitable for Large deployment• Performance (Fast). • Avoid additional vSAN storage capacity for data
redundancy • vSAN SPBM Advantage
• Disk failures may impact more Hadoop nodes
• Requires additional planning to potentially reduce impact during failure
No Data redundancy (FTT=0) with vSAN Host Affinity
RF = 3
Use HVE if multiple VMs per ESXi Host
• Performance (Faster)• Avoid additional vSAN storage capacity for data
redundancy • Storage local to VMs
• Reduced SPBM Advantage• All objects in vSAN cluster use
FTT=0• No Support for vSphere features like
HA, live vMotion• RPQ only solution
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 36
Recommendation for Hadoop on vSAN
NetworkvSAN Storage Policy and
data services
vSAN Mirror is preferred
Avoid vSAN Deduplication and Compression
Use Hadoop level data efficiency features
Hardware Choice
Storage Controllers to sustain high outstanding IO
NVMe Cache device for vSAN
At least 2 vSAN diskgroupsper Host
Performance Class of drives matter
For Large Deployment − 25Gbps Uplinks− LACP
Multi Rack Deployment− Leaf Spine architecture
recommended− Carefully plan
Oversubscription Jumbo frames is a advantage Multi DC Hadoop replication
Separate Physical NICs from vSAN trafficVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 37
Conclusion Hadoop on VMware vSAN
Choice of Deployment
Options
Small Hadoop Cluster
Large Hadoop Cluster
Test and Dev Hadoop Cluster
Pros and Cons of Deployment
OptionsAvailability Performance Capacity
(TCO)
Technical Recommendations
Server and Storage
Hardware
Software (vSAN and Hadoop)
Network
VMworld 2019 Content: Not for publication or distribution
38©2019 VMware, Inc.
Cloudera Distribution Including Apache Hadoop on VMware vSAN – Reference Architecturehttps://storagehub.vmware.com/t/vmware-vsan/cloudera-distribution-including-apache-hadoop-on-vmware-vsan-tm/
vSAN Design and sizing Guidehttps://storagehub.vmware.com/t/vmware-vsan/vmware-r-vsan-tm-design-and-sizing-guide-2/
Generic Reference architecture for Cloudera enterprise running in a Private Cloudhttps://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_private_cloud.pdf
vSAN Network Design Guidehttps://storagehub.vmware.com/t/vmware-vsan/vmware-r-vsan-tm-network-design/
Cloudera Networking Requirementshttps://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_vpc_networking.html
References
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution
40©2019 VMware, Inc.
Palani Murugan [email protected] @palani_vm
David Boone [email protected] @DavidBoone007
To continue the conversation
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution