architecting hadoop workloads on hci powered by vmware vsan

41
#vmworld HCI1941BU Architecting Hadoop Workloads on HCI Powered by VMware vSAN Palanivenkatesan Murugan, VMware, Inc. David Boone, VMware, Inc. #HCI1941BU VMworld 2019 Content: Not for publication or distribution

Upload: others

Post on 08-May-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

#vmworld

HCI1941BU

Architecting Hadoop Workloads on HCI Powered by VMware vSAN

Palanivenkatesan Murugan, VMware, Inc. David Boone, VMware, Inc.

#HCI1941BU

VMworld 2019 Content: Not for publication or distribution

Page 2: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc.

Disclaimer

This presentation may contain product features or functionality that are currently under development.

This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.

2

The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution

Page 3: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc.

Agenda

3

Hadoop Overview01

Why Hadoop on vSAN02

Hadoop Deployment options on vSAN03

Small Hadoop clusters

Large Hadoop clusters

Using vSAN Host Affinity Feature

Technical Recommendation for Deployment04

VMworld 2019 Content: Not for publication or distribution

Page 4: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 4

Introduction

Hadoop on VMware vSAN

Hadoop deployment architecture on vSAN

Use features of vSAN and Hadoop to Complement each other

Design choices and recommendations

Hardware, Network, Software Configuration ( Storage Policy Management, HVE)

Design the Solution for Availability Performance Capacity (TCO)

VMworld 2019 Content: Not for publication or distribution

Page 5: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 5

• Compute and Storage Intensive

• Massive Storage Capacity

• Flexibility

• Linear Performance increase with Scalability

• HCI (vSphere & vSAN )

• Scale up and scale out

• Software innovation

• Next Generation Hardware support

• Greater Availability

• vMotion, vSphere HA, DRS

• Simplified Management with vSphere ecosystem

• VUM, vRA, vROPS, VCF

• Ease of Hardware refresh

• Multi Cloud ready Platform

Hadoop Requirements VMware Solution

Why VMware HCI for Hadoop

VMworld 2019 Content: Not for publication or distribution

Page 6: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 6

vSAN Terminology

• vSAN Cluster is a vSphere Cluster with vSAN service enabled

• vSAN node is a ESXi host

• Storage Policy or SPBM ( Storage Policy Based Management)

• FTT ( Failure To Tolerate)

VMworld 2019 Content: Not for publication or distribution

Page 7: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 7

Hadoop Terminology

• Infrastructure Nodes ( Master VM, Gateway VM )

• Data Nodes (aka Worker VM)

• Replication Factor ( HDFS Default RF 3 )

• Hadoop Rack awareness

• Hadoop Virtualization Extension (HVE)

VMworld 2019 Content: Not for publication or distribution

Page 8: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 8

Hadoop Rack Awareness Separate Nodes by physical racks

• Maximum performance is obtained if Hadoop is aware of the network topology ( Racks)

• Place replicas more intelligently to trade off performance and resilience

• Rack awareness scripts in respective Hadoop distribution

VMworld 2019 Content: Not for publication or distribution

Page 9: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 9

Hadoop Virtualization Extension ( HVE )

• HVE feature extends Hadoop topology awareness to account for virtualization layer

• Example :

• To avoid VMs on the same physical host storing the same replica of a file.

VMworld 2019 Content: Not for publication or distribution

Page 10: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 10

Traditional use of HVE

HVE – Graphical Representation

Hadoop Cluster

DC 1

Rack 1

Node 1

Host 1 Host 2

Node 2

Host 3

Rack 2

Node 3

Host 4

Node 4

Host 5 Host 6

DC 2

Rack 3

Node 5

Host 7 Host 8

Node 6

Host 9 Host 10

Each ESXi host is a

HVE Node Group

“Hosts” are VMs VMworld 2019 Content: Not for publication or distribution

Page 11: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 11

Use of HVE with vSAN

HVE with vSAN – Graphical Representation

Hadoop Cluster

DC 1

Rack 1

Node 1

Host 1 Host 2

Node 2

Host 3

Rack 2

Node 3

Host 4

Node 4

Host 5 Host 6

DC 2

Rack 3

Node 5

Host 7 Host 8

Node 6

Host 9 Host 10

Each vSAN cluster is a

Node Group

“Hosts” are VMs VMworld 2019 Content: Not for publication or distribution

Page 12: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 12

Power of vSAN Storage Policies

Policies applied to a VM or VMDK, not an entire array

• Unlike traditional storage

• Prescriptive

Change existing, or apply new policies on the fly

Easily view when VM or VMDK is compliant with new policy

Flexibility to meet any Hadoop admin requirements

Apply Policy

View Result

VMworld 2019 Content: Not for publication or distribution

Page 13: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 13

vSAN terms:

FTT=1FTM=mirroringStripe Width=1

vSAN terms:

FTT=1FTM=mirroringStripe Width=1

vSAN terms:

FTT=0Stripe Width= 1 (1

Object JBOD)

Or

Stripe 2 to 12( Multiple Objects )

vSAN terms:

FTT=0Stripe Width=1

to 12

Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution

Page 14: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 14

vSAN terms:

FTT=1FTM=mirroring

Stripe Width=1 to 12

vSAN terms:

FTT=0Stripe Width= 1 to

12

vSAN terms:

FTT=1FTM=mirroring

Stripe Width= 2 to 12

Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution

Page 15: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 15

vSAN terms:

FTT=1FTM=mirroring

Stripe Width=1 to 12

vSAN terms:

FTT=1FTM=mirroring

Stripe Width=2 to 12

vSAN terms:

FTT=0Stripe Width=1 to 12

Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution

Page 16: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 16

vSAN terms:

FTT=1FTM=mirroring

Stripe Width=1 to 12

vSAN terms:

FTT=1FTM=mirroring

Stripe Width=2 to 12

vSAN terms:

FTT=0Stripe Width=1 to 12

Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution

Page 17: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 17

vSAN terms:

FTT=1FTM=mirroringStripe Width=1

to 12

vSAN terms:

FTT=0Stripe Width=1

to 12

Source: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.htmlVMworld 2019 Content: Not for publication or distribution

Page 18: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

18©2019 VMware, Inc.

Small deployment:

Single cluster design

VMworld 2019 Content: Not for publication or distribution

Page 19: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 19

FTT=0, single cluster

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

HOST 1 HOST 2 HOST 3 HOST 4

Node1-data

Node2-OS

Node4-log

Node2-data Node3-log

Node3-os Node1-log Node4-data

Node1-OSNode2-log Node3-data

Node4-OS

Node1-VM Node2-VM Node3-VM Node4-VM

VMworld 2019 Content: Not for publication or distribution

Page 20: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 20

FTT=0, single cluster – Failure scenario

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

HOST 1 HOST 2 HOST 3 HOST 4

Node1-data

Node2-OS

Node4-log

Node2-data Node3-log

Node3-os Node1-log Node4-data

Node1-OSNode2-log Node3-data

Node4-OS

Node1-VM Node2-VM Node3-VM Node4-VM

VMworld 2019 Content: Not for publication or distribution

Page 21: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 21

Better solution for single cluster: FTT=1

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

HOST 1 HOST 2 HOST 3 HOST 4

Node1-data

Node2-OSNode4-log

Node2-data

Node3-logNode3-os

Node1-log Node4-data

Node1-OSNode2-log

Node3-data Node4-OS

Node1-VM Node2-VM Node3-VM Node4-VM

Node1-OS

Node1-data

Node1-log

Node3-log

Node3-data

Node3-os Node4-log

Node4-OS

Node4-dataNode2-OS

Node2-data

Node2-log

VMworld 2019 Content: Not for publication or distribution

Page 22: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 22

Better solution: FTT=1 – Failure Scenario

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

HOST 1 HOST 2 HOST 3 HOST 4

Node1-data

Node2-OSNode4-log

Node2-data

Node3-logNode3-os

Node1-log Node4-data

Node1-OSNode2-log

Node3-data Node4-OS

Node1-VM Node2-VM Node3-VM Node4-VM

Node1-OS

Node1-data

Node1-log

Node3-log

Node3-data

Node3-os Node4-log

Node4-OS

Node4-dataNode2-OS

Node2-data

Node2-log

Node2-VM

VMworld 2019 Content: Not for publication or distribution

Page 23: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

23©2019 VMware, Inc.

Large DeploymentsMulti-cluster Design

VMworld 2019 Content: Not for publication or distribution

Page 24: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 24

Hadoop nodes on vSAN Clusters – HDFS with Rack awareness

C01-ESX01

Master X

Data Node X

C01-ESX02

Gateway X

Data Node X

C01-ESX04

Data node X

Data Node X

C01-ESX03

Data node X

Data Node X

RACK01vSAN CLUSTER01

- Three vSAN Clusters one per Rack

Hadoop rack awareness with default replication factor = 3

- 1st replica on local Data node/Rack as the writer

- 2nd replica on Datanode in different Rack

- 3rd replica on different data node however same as 2nd

rack

C02-ESX01

Master X

Data Node X

C02-ESX02

Gateway X

Data Node X

C02-ESX04

Data node X

Data Node X

C02-ESX03

Data node X

Data Node X

RACK02vSAN CLUSTER02

C03-ESX01

Master X

Data Node X

C03-ESX02

Gateway X

Data Node X

C03-ESX04

Data node X

Data Node X

C03-ESX03

Data node X

Data Node X

RACK03vSAN CLUSTER03

1

2

3

VMworld 2019 Content: Not for publication or distribution

Page 25: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 25

Topology Map

Rack Awareness

vSAN terms:

Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN

Cluster

vSAN terms:

Each Rack is a vSAN/vSphere Cluster name

Data node VM

Rack ID vSAN Cluster Name

cdh-wn-vm1 1 vSAN 1

cdh-wn-vm2 1 vSAN 1

cdh-wn-vm3 1 vSAN 1

cdh-wn-vm4 2 vSAN 2

cdh-wn-vm5 2 vSAN 2

cdh-wn-vm6 2 vSAN 2

cdh-wn-vm7 3 vSAN 3

cdh-wn-vm8 3 vSAN 3

VMworld 2019 Content: Not for publication or distribution

Page 26: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 26

Hadoop nodes on vSAN Clusters – Rack Awareness and HVE

C01-ESX01

Master X

Data Node X

C01-ESX02

Gateway X

Data Node X

C01-ESX04

Data node X

Data Node X

C01-ESX03

Data node X

Data Node X

RACK01 / HVE Nodegroup1vSAN CLUSTER01

- Three vSAN Clusters one per Rack

- Hadoop rack awareness and HVE (Hadoop Virtualization Extension)

- Default Hadoop replication factor = 3

- Each nodegroup is a vSAN cluster

- Improved availability – Nodegroup(HVE) forces 3 replicas on 3 different Rack

- Network utilization across racks may not be concern with high bandwidth availability and leaf spine architecture used in Data centers

C02-ESX01

Master X

Data Node X

C02-ESX02

Gateway X

Data Node X

C02-ESX04

Data node X

Data Node X

C02-ESX03

Data node X

Data Node X

RACK02 / HVE Nodegroup2vSAN CLUSTER02 /

C03-ESX01

Master X

Data Node X

C03-ESX02

Gateway X

Data Node X

C03-ESX04

Data node X

Data Node X

C03-ESX03

Data node X

Data Node X

RACK03 / HVE Nodegroup3vSAN CLUSTER03 /

1

2

3

VMworld 2019 Content: Not for publication or distribution

Page 27: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 27

Topology Map

Rack Awareness with HVE ( 3 Racks)

vSAN terms:

Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN

Cluster

vSAN terms:

Each HVE Node group is a

vSAN/vSphere Cluster name

Data node VM Rack ID

Node group ID and vSAN Cluster name

cdh-wn-vm1 1 1 - vSAN 1

cdh-wn-vm2 1 1 - vSAN 1

cdh-wn-vm3 1 1 - vSAN 1

cdh-wn-vm4 2 2 - vSAN 2

cdh-wn-vm5 2 2 - vSAN 2

cdh-wn-vm6 2 2 - vSAN 2

cdh-wn-vm7 3 3 - vSAN 3

cdh-wn-vm8 3 3 - vSAN 3VMworld 2019 Content: Not for publication or distribution

Page 28: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 28

Hadoop nodes on vSAN Cluster - HVE Rack awareness - 2-Rack

- Four vSAN Clusters two per Rack

- Hadoop rack awareness and HVE (Hadoop Virtualization Extension)

- Default Hadoop replication factor = 3

- Each nodegroup is a vSAN cluster

- Improved availability – Nodegroup(HVE) makes 3 replicas on 3 different nodegroups/vSAN clusters.

- Avoid increase in Network bandwidth across racks as 2 Nodegroups are available within rack to place the replicas

HVE Nodegroup1vSAN CLUSTER01

C01-ESX01

Master Node X

Data Node X

C01-ESX02

Data Node X

Data Node X

C01-ESX03

Data Node X

Data Node X

C01-ESX04

Data Node X

Data Node X

C02-ESX01

Gateway Node

Data Node X

C02-ESX02

Data Node X

Data Node X

C02-ESX03

Data Node X

Data Node X

C02-ESX04

Data Node X

Data Node X

HVE Nodegroup2 vSAN CLUSTER02

1

HVE Nodegroup3vSAN CLUSTER03

C03-ESX01

Master Node X

Data Node X

C03-ESX02

Data Node X

Data Node X

C03-ESX03

Data Node X

Data Node X

C03-ESX04

Data Node X

Data Node X

C04-ESX01

Gateway Node

Data Node X

C04-ESX02

Data Node X

Data Node X

C04-ESX03

Data Node X

Data Node X

C04-ESX04

Data Node X

Data Node X

HVE Nodegroup4 vSAN CLUSTER04

2

3

RACK01 RACK02

VMworld 2019 Content: Not for publication or distribution

Page 29: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 29

Topology Map

Rack Awareness with HVE ( 2 Racks)

vSAN terms:

Each Node (cdh-wn-vm’x’) is a Data Node VM on vSAN

Cluster

vSAN terms:

Each HVE Node group is a

vSAN/vSphere Cluster name

Data node VM Rack ID

Nodegroup ID and vSAN Cluster name

cdh-wn-vm1 1 1 - vSAN 1

cdh-wn-vm2 1 1 - vSAN 1

cdh-wn-vm3 1 2 - vSAN 2

cdh-wn-vm4 1 2 - vSAN 2

cdh-wn-vm5 2 3 - vSAN 3

cdh-wn-vm6 2 3 - vSAN 3

cdh-wn-vm7 2 4 - vSAN 4

cdh-wn-vm8 2 4 - vSAN 4

VMworld 2019 Content: Not for publication or distribution

Page 30: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 30

Challenges and Mitigation

Multiple vSAN Cluster deployment option with Hadoop HVE

Challenges Mitigation

Single Disk failures could impact more Hadoop nodes in the same vSAN Cluster thereby increasing number of nodes which requires HDFS datanode hotswap and rebuild

• Avoid large vSAN Clusters, when possible create more number of small vSAN Clusters

• Avoid large vSAN Stripe width • Thanks to Rack awareness and HVE

- or –

• FTT greater than 0

Managing Multiple vSAN Cluster Simplify Day 2 Operation using• vROPS for single pane of multi-cluster

management• VMware Cloud Foundation (VCF) provides SDDC

Manager for centralized multi-cluster life cycle management

VMworld 2019 Content: Not for publication or distribution

Page 31: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

31©2019 VMware, Inc.

vSAN Host Affinity (RPQ)

VMworld 2019 Content: Not for publication or distribution

Page 32: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 32

Why vSAN Host Affinity

For Next-Gen w/ built-in apps resiliency

Local data to VMs

FTT=0 for space efficiency

High Performance

Apps determines availability

* (RQP only)

Hadoop Data Node

vSphere vSAN

vSAN Datastore

Hadoop Data Node

VMworld 2019 Content: Not for publication or distribution

Page 33: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 33

FTT=0 “with” and “without” Host Affinity – Graphical Representation

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

cache

capacity

capacity

capacity

HOST 1 HOST 2 HOST 3 HOST 4

Node1-data

Node2-OS

Node4-logNode2-data

Node3-logNode3-os

Node1-log Node4-data

Node1-OS

Node2-log

Node3-data Node4-OS

Node1-VM Node2-VM Node3-VM Node4-VM

VMworld 2019 Content: Not for publication or distribution

Page 34: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 34

Limitations with vSAN FTT = 0 with Host affinity

• Cannot use other FTT policies for Objects in same Cluster.

• Cannot perform maintenance of ESXi host by migrating VMs to other Hosts.

• vSphere DRS and HA must be turned off

• vSAN Encryption cannot be used

• vSAN Deduplication and compression cannot be used

VMworld 2019 Content: Not for publication or distribution

Page 35: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 35

Summary - Deployment options in vSANvSAN Policy Hadoop Config Replication

Factor (RF)Benefits Tradeoffs

vSAN mirror to protect 1 failure (FTT=1)

RF = 3

Use HVE if multiple VMs per ESXi Host

• Simple management, HA, V-Motion, DRS available

• vSAN SPBM Advantage, share cluster with other workloads

• Some tradeoff in performance. • Requires 2x Storage

vSAN mirror to protect 1 failure (FTT=1)

RF = 2

Use HVE if multiple VMs per ESXi Host

• Simple management, HA, V-Motion, DRS available

• vSAN SPBM Advantage• Capacity savings by reducing copy in HDFS• Improved write performance

• Minimal tradeoff in performance.• Potential HDFS read optimization

benefit lost due reduction in HDFS copy

No Data redundancy (FTT =0)

* Minimum of 3 vSAN Cluster

RF = 3

Must Use Hadoop Virtualization Extension (HVE)

• Suitable for Large deployment• Performance (Fast). • Avoid additional vSAN storage capacity for data

redundancy • vSAN SPBM Advantage

• Disk failures may impact more Hadoop nodes

• Requires additional planning to potentially reduce impact during failure

No Data redundancy (FTT=0) with vSAN Host Affinity

RF = 3

Use HVE if multiple VMs per ESXi Host

• Performance (Faster)• Avoid additional vSAN storage capacity for data

redundancy • Storage local to VMs

• Reduced SPBM Advantage• All objects in vSAN cluster use

FTT=0• No Support for vSphere features like

HA, live vMotion• RPQ only solution

VMworld 2019 Content: Not for publication or distribution

Page 36: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 36

Recommendation for Hadoop on vSAN

NetworkvSAN Storage Policy and

data services

vSAN Mirror is preferred

Avoid vSAN Deduplication and Compression

Use Hadoop level data efficiency features

Hardware Choice

Storage Controllers to sustain high outstanding IO

NVMe Cache device for vSAN

At least 2 vSAN diskgroupsper Host

Performance Class of drives matter

For Large Deployment − 25Gbps Uplinks− LACP

Multi Rack Deployment− Leaf Spine architecture

recommended− Carefully plan

Oversubscription Jumbo frames is a advantage Multi DC Hadoop replication

Separate Physical NICs from vSAN trafficVMworld 2019 Content: Not for publication or distribution

Page 37: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

©2019 VMware, Inc. 37

Conclusion Hadoop on VMware vSAN

Choice of Deployment

Options

Small Hadoop Cluster

Large Hadoop Cluster

Test and Dev Hadoop Cluster

Pros and Cons of Deployment

OptionsAvailability Performance Capacity

(TCO)

Technical Recommendations

Server and Storage

Hardware

Software (vSAN and Hadoop)

Network

VMworld 2019 Content: Not for publication or distribution

Page 38: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

38©2019 VMware, Inc.

Cloudera Distribution Including Apache Hadoop on VMware vSAN – Reference Architecturehttps://storagehub.vmware.com/t/vmware-vsan/cloudera-distribution-including-apache-hadoop-on-vmware-vsan-tm/

vSAN Design and sizing Guidehttps://storagehub.vmware.com/t/vmware-vsan/vmware-r-vsan-tm-design-and-sizing-guide-2/

Generic Reference architecture for Cloudera enterprise running in a Private Cloudhttps://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_private_cloud.pdf

vSAN Network Design Guidehttps://storagehub.vmware.com/t/vmware-vsan/vmware-r-vsan-tm-network-design/

Cloudera Networking Requirementshttps://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_vpc_networking.html

References

VMworld 2019 Content: Not for publication or distribution

Page 39: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

VMworld 2019 Content: Not for publication or distribution

Page 40: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

40©2019 VMware, Inc.

Palani Murugan [email protected] @palani_vm

David Boone [email protected] @DavidBoone007

To continue the conversation

VMworld 2019 Content: Not for publication or distribution

Page 41: Architecting Hadoop Workloads on HCI Powered by VMware vSAN

VMworld 2019 Content: Not for publication or distribution