big data/hadoop infrastructure considerations

© 2009 VMware Inc. All rights reserved

Big Data in a Software-Defined Datacenter

Richard McDougall

Chief Architect, Storage and Application Services

VMware, Inc

@richardmcdougll

2

Trend: New Data Growing at 60% Y/Y

Source: The Information Explosion, 2009

medical(imaging,(sensors(

cad/cam,(appliances,(machine(data,(digital(movies(

digital(photos(

digital(tv(

audio(

camera(phones,(rfid(

satellite(images,(logs,(scanners,(twi7er(

Exabytes of information stored 20 Zetta by 2015 1 Yotta by 2030 Yes, you are part of the yotta generation…

3

Trend: Big Data – Driven by Real-World Benefit

4

Hadoop batch analysis

Big Data Family of Frameworks

File System/Data Store

Host Host Host Host Host Host

HBase real-time queries

NoSQL Cassandra, Mongo, etc Big SQL

Impala, Pivotal HawQ

Compute layer

Distributed Resource Management

Host

Other Spark, Shark, Solr,

Platfora, Etc,…

5

Big (Data) problems: making management easier

6

Log Processing / Click Stream Analytics

Machine Learning / sophisticated data mining

Web crawling / text processing

Extract Transform Load (ETL) replacement

Image / XML message processing

Broad Application of Hadoop technology

General archiving / compliance

Financial Services

Mobile / Telecom

Internet Retailer

Scientific Research

Pharmaceutical / Drug Discovery

Social Media

Vertical Use Cases Horizontal Use Cases

Hadoop’s ability to handle large unstructured data affordably and efficiently makes it a valuable tool kit for enterprises across a number of applications and fields.

7

How does Hadoop enable parallel processing?

Source: http://architects.dzone.com/articles/how-hadoop-mapreduce-works

!  A framework for distributed processing of large data sets across clusters of computers using a simple programming model.

8

Hadoop System Architecture

! MapReduce: Programming framework for highly parallel data processing

!  Hadoop Distributed File System (HDFS): Distributed data storage

! Other Distributed Storage Options: •  Alternatives to HDFS • MAPR (SW), Isilon (HW)

9

Host%1 Host%2 Host%3

%Input%File

Input%File

Job Tracker Schedules Tasks Where the Data Resides

Job Tracker

Job

DataNode

Task%%Tracker

Split%1%–%64MB

Task%<%1 Split%2%–%64MB Split%3%–%64MB

Task%%Tracker

Task%%Tracker

DataNode DataNode

Block%1%–%64MB Block%2%–%64MB Block%3%–%64MB

Task%<%2 Task%<%3

10

Hadoop Data Locality and Replication

11

Hadoop Topology Awareness

12

Rules of Thumb: Sizing for Hadoop

!  Disk: •  Provide about 50Mbytes/sec of disk bandwidth per core •  If using SATA, that’s about one disk per core

!  Network •  Provide about 200mbits of aggregate network bandwidth per core

! Memory • Use a memory:core ratio of about 4Gbytes:core

!  Example: 100 node cluster •  100 x 16 cores = 1600 cores

•  1600 x 50Mbytes/sec = 80Gbytes/sec(!) •  1600 x 200mbits = 320gbits of network traffic

13

Big Data Frameworks and Characteristics

Framework Scale of data

Scale of Cluster

Computable Data?

Local Disks?

File System: Gluster, Isilon, HDFS, etc,…

10s PB 10s to 100s Some Yes, for cost

Map-reduce: Hadoop

100s PB 10s to 1,000s Yes Yes, for cost, bandwidth and availability

Big-SQL: HawQ,, Aster Data, Impala, …

PB’s 10s to 100s Some Yes, for cost and bandwidth

No-SQL: Cassandra, hBase, …

Trilions Of rows

10s to 100s Some Yes, for cost and availability

In-Memory: Redis, Gemfire, Membase, …

Billions of rows

10s-100s Yes Primarily Memory

14

Big Data Virtual Infrastructure

15

Virtualization enables a Common Infrastructure for Big Data

Single purpose clusters for various business applications lead to cluster sprawl.

Virtualization Platform

!  Simplify •  Single Hardware Infrastructure

•  Unified operations

!  Optimize •  Shared Resources = higher utilization

•  Elastic resources = faster on-demand access

MPP DB Hadoop HBase

Virtualization Platform

Scaleout SQL

Hadoop

HBase

Cluster Sprawling

Cluster Consolidation

16

Native versus Virtual Platforms, 24 hosts, 12 disks/host

0

50

100

150

200

250

300

350

400

450

TeraGen TeraSort TeraValidate

Elap

sed

tim

e, s

econ

ds

(low

er is

bet

ter)

Native 1 VM 2 VMs 4 VMs

Native versus Virtual Platforms, 24 hosts, 12 disks/host

17

Why Virtualize Hadoop?

!  Shrink and expand cluster on demand

!  Resource Guarantee !  Independent scaling of

Compute and data

Mixed Workloads

!  No more single point of failure

!  One click to setup

!  High availability for MR Jobs

Highly Available

!  Rapid deployment !  Unified operations

across enterprise

!  Easy Clone of Cluster

Simple to Operate

18

Ad hoc data mining

In-house Hadoop as a Service “Enterprise EMR” – (Hadoop + Hadoop)

Compute layer

Data layer

HDFS


Production recommendation engine

Production ETL of log files

Virtualization platform

HDFS

19

Short-lived Hadoop compute cluster

Integrated Hadoop and Webapps – (Hadoop + Other Workloads)

HDFS


Web servers for ecommerce site

Compute layer

Data layer

Hadoop compute cluster

Virtualization platform

20

Hadoop batch analysis

Integrated Big Data Production – (Hadoop + other big data)

HDFS


HBase real-time queries NoSQL –

Cassandra, Mongo, etc

Big SQL – Impala

Compute layer

Data layer

Virtualization

Host

Other Spark, Shark, Solr,

Platfora, Etc,…

21

Serengeti: Deploy a Hadoop Cluster in under 30 Minutes

Deploy vHelperOVF to vSphere

Select configuration template

Automate deployment

Select Compute, memory, storage and network

Done

Step 1: Deploy Serengeti virtual appliance on vSphere.

Step 2: A few simple commands to stand up Hadoop Cluster.

22




Compute and data

Mixed Workloads




Highly Available


across enterprise


Simple to Operate

23

vMotion, HA, FT enables High Availability for the Hadoop Stack

HDFS (Hadoop Distributed File System)

HBase (Key-Value store)

MapReduce (Job Scheduling/Execution System)

Pig (Data Flow) Hive (SQL)

BI Reporting ETL Tools

Man

agem

ent S

erve

r

Zook

eepr

(Coo

rdin

atio

n) HCatalog

RDBMS

Namenode

Jobtracker

Hive MetaDB

Hcatalog MDB

Server

24




Compute and data

Mixed Workloads




Highly Available


across enterprise


Simple to Operate

25

Containers with Isolation are a Tried and Tested Approach


vSphere, DRS

Host

Hungry Workload 1 Reckless Workload 2

Nosy Workload 3

26

Mixing Workloads: Three big types of Isolation are Required

!  Resource Isolation • Control the greedy noisy neighbor • Reserve resources to meet needs

!  Version Isolation •  Allow concurrent OS, App, Distro versions

!  Security Isolation •  Provide privacy between users/groups

• Runtime and data privacy required


vSphere, DRS

Host

27

Elasticity Enables Sharing of Resources

28 O

ther

VM

Oth

er V

M

Oth

er V

M

“Time Share”

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Host Host Host

Oth

er V

M

Oth

er V

M

Oth

er V

M

Oth

er V

M

Had

oop

Had

oop

Had

oop

Had

oop

Had

oop

Had

oop

While existing apps run during the day to support business operations, Hadoop batch jobs kicks off at night to conduct deep analysis of data.

VMware vSphere

vHelper

HDFS HDFS HDFS

29

Big Data Impact to Storage

30

Traditional Storage – VMDK’s on SAN/NAS

VMDK (block)

VMDK (block)

VMDK (block)

App VM DB VM

Boot Data Boot

VMFS (SAN or NFS)

EMC NTAP

Archive

VMCB VM Level Archive

• VMs have just a few self-contained VMDKs • Data is not shared between VMs • VMs have limited individual bandwidth needs

31

Rapid Growth of Big Data Capacity Necessitates New storage

Source: The Information Explosion, 2009

medical(imaging,(sensors(

cad/cam,(appliances,(machine(data,(digital(movies(

digital(photos(

digital(tv(

audio(

camera(phones,(rfid(

satellite(images,(logs,(scanners,(twi7er(

Exabytes of information stored 20 Zetta by 2015 1 Yotta by 2030 Yes, you are part of the yotta generation…

32

Use Local Disk where it’s Needed

SAN Storage

$2 - $10/Gigabyte

$1M gets: 0.5Petabytes

200,000 Disk IOPS 8Gbyte/sec

NAS Filers

$1 - $5/Gigabyte

$1M gets: 1 Petabyte

200,000 Disk IOPS 10Gbyte/sec

Local Storage

$0.05/Gigabyte

$1M gets: 10 Petabytes

400,000 Disk IOPS 250 Gbytes/sec

33

Storage Economics

$-

$0.50

$1.00

$1.50

$2.00

$2.50

$3.00

$3.50

$4.00

$4.50

$5.00

$5.50

0.5 1 2 4 8 16 32 64 128

Cost per GB

Petabytes Deployed

Traditional SAN/NAS VMware vSAN

Distributed Object Storage

HDFS MAPR CEPH Scale-out NAS

Isilon, NTAP

34

Scalable Storage Architecture: Hadoop Demands

!  Each node needs approximately ~1 disk of bandwidth per core •  16 core node needs ~1GByte/sec of bandwidth

!  A 1,000 node Hadoop cluster needs 1Terabyte/sec of bandwidth •  Local disks: $800 of local disks per node

•  Traditional SAN: Would require an est. $50k of SAN storage per node

Data Node

Com

pute

Com

pute

Com

pute

Data Node C

ompu

te

Com

pute

Com

pute

Data Node

Com

pute

Com

pute

Com

pute

Data Node

Com

pute

Com

pute

Com

pute

35

Storage-Servers: Configurations and Capabilities

16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter Throughput: 2.5Gbytes/sec Capacity: 96TB

16-24 core server 80 SATA 2-4TB Disks 10 GbE adapter Throughput: 2.5Gbytes/sec Capacity: 320TB

Typical Storage Server

High Capacity Server

36

Scalable Storage Bandwidth is Important for Big Data

0

20

40

60

80

100

120

1 10 20 30 40 50

Scalable Storage Single SAN/NAS

# Hosts

GBytes/sec

37

Hadoop has Significant Ephemeral Data

%75%%of%Disk%Bandwidth%

Job%

Map%Task%

Map%Task%

Map%Task%

Map%Task%

Reduce%

Reduce%

HDFS%

DFS%Input%Data%

DFS%Output%Data%%

12%%of%Bandwidth%

%12%%of%Bandwidth%

Spills%&%Logs%spill*.out*

Spills%

Map%Output%file.out*

Shuffle%Map_*.out*

Sort%

Combine%Intermediate.out*

38

Impact of Temporary Data

!  Leverage local-disks when shared-storage is used •  As much as 75% of the bandwidth will be transient • No-need for reliable, replicated storage

•  Just use unprotected local-disk

HDFS%

Shared Data (Shared, Network Storage)

Temporary Data (Local Disks)

39

Extend Virtual Storage Architecture to Include Local Disk

!  Shared Storage: SAN or NAS •  Easy to provision •  Automated cluster rebalancing

!  Hybrid Storage •  SAN for boot images, VMs, other

workloads

•  Local disk for Hadoop & HDFS •  Scalable Bandwidth, Lower Cost/GB

Host

Had

oop

Oth

er V

M

Oth

er V

M

Host

Had

oop

Had

oop

Oth

er V

M

Host

Had

oop

Had

oop

Oth

er V

M

Host

Had

oop

Oth

er V

M

Oth

er V

M

Host

Had

oop

Had

oop

Oth

er V

M

Host

Had

oop

Had

oop

Oth

er V

M

40

Scale-out Storage for Big Data: Local Disk or Scale-out-NAS

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Servers with Local Disks Big-Data using Scale-out NAS

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Scale<out%NAS%

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Scale<out%NAS%

Temp Data

Shared Data

41

Big-Data using Local Disks

Host%

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Servers with Local Disks

16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotion etc,…

High Performance 10GBE Switch per Rack

42

Big Data with Scale-out-NAS

Big-Data using Scale-out NAS

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Scale<out%NAS%

Host%

Host%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Scale<out%NAS%

Temp Data

Shared Data

Isilon Scale-out

NAS

Local Disk or SSD In each Host

For Transient Data

43

Big Data Impact to Networking

44

Hadoop has Specific Network Demands that must be Considered

Source: http://architects.dzone.com/articles/how-hadoop-mapreduce-works

!  A framework for distributed processing of large data sets across clusters of computers using a simple programming model.

45

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

L2%Switch%

Top%of%Rack%Switch%

L2%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

L2%Switch% L2%Switch%

AggregaQng%Switch%

AggregaQng%Switch%

A Typical Network Architecture

46

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

L2%Switch%

Top%of%Rack%Switch%

L2%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%


AggregaQng%Switch%

AggregaQng%Switch%

A Typical Network Architecture

!  Hadoop Requirements • Hadoop can generate up to 200mbits per core of network traffic •  Bursts of bandwidth during remote data read and shuffle phases

• Data locality can help minimize net reads, but shuffle and reduce are global

!  Inside Rack Network •  Today’s core-counts require 10GBe to hosts inside Rack

• High throughput switches required to meet aggregate bandwidth

•  Switch Network Buffers sizing is important due to coordinated bursts

!  Rack-to-rack Network •  Two level switch aggregation can be a bottleneck

•  Easy to saturate, especially during shuffle phases

!  Link-Speed Sizing • Hadoop originally designed around 4 cores, 1Gbit uplink

• Now, 16-24 cores per box means 10Gbe is required for uplink

47

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

L2%Switch%

Top%of%Rack%Switch%

L2%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%

Top%of%Rack%Switch%

Host%

Host%

Host%

Host%


AggregaQng%Switch%

AggregaQng%Switch%

Two Level Aggregated Network Characteristics

Top of Rack: Overprovisioning Possible 10Gbe High End Switch >500Gbits/sec

Between Racks: Bandwidth can be saturated

48

Future Network Designs, optimized for Big-Data

Traditional Tree-structured networks

Clos network between Aggregation and Intermediate (core) switches

LA – (physical) Locator address AA – (flat, virtual) Application address

Pros: full-bisection bandwidth, non-blocking switching fabric, path diversity! Formalized in 1953

Cons: High oversubscription of upper-level (Aggregation) and core switches. Hard to get full-bisection bandwidth Images courtesy Greenberg et al.:http://research.microsoft.com/pubs/80693/vl2-sigcomm09-final.pdf and http://www.stanford.edu/class/ee384y/Handouts/clos_networks.pdf

49

In Conclusion

!  Software Defined Datacenter enables Unified Big Data Platform !  Plan to build a Software Defined “Big Data”, Datacenter •  Enable Hadoop and other Big-Data ecosystem components

!  Values • Consolidated Hardware, reduced SKU

• Higher Utilization: Can leverage a shared cluster for bursty workloads • Rapid Deployment: Use APIs to deploy software-defined clusters rapidly

•  Flexible Cluster Use: Share the same infrastructure for different workload types

!  References www.projectserengeti.org vmware.com/hadoop VMware Hadoop Performance Whitepaper VMware Hadoop HA/FT Paper

big data/hadoop infrastructure considerations

Documents

distributed data storage

new data

parallel data processing

big data problems

large unstructured data

mb task

parallel processing

big sqlmongo