vmworld 2013: beyond mission critical: virtualizing big-data, hadoop, hpc, cloud-scale apps

40
Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps Chris Greer, FedEx Richard McDougall, VMware VAPP5402 #VAPP5402

Upload: vmworld

Post on 22-Jan-2015

99 views

Category:

Technology


0 download

DESCRIPTION

VMworld 2013 Chris Greer, FedEx Richard McDougall, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

TRANSCRIPT

Page 1: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

Beyond Mission Critical: Virtualizing Big-Data,

Hadoop, HPC, Cloud-scale Apps

Chris Greer, FedEx

Richard McDougall, VMware

VAPP5402

#VAPP5402

Page 2: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

© 2013 VMware Inc. All rights reserved

Beyond Mission Critical: Virtualizing Big-Data, Hadoop and Cloud Apps

Richard McDougall

CTO, Storage and Application Services

Chris Greer,

Enterprise Architect, FedEx

Page 3: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

3

Virtualize Everything: Next Generation Apps

Virtual Storage

Arrays

vSphere

SAN/NAS Object / BLOB

Traditional Applications

• Traditional enterprise storage

• HW-based resiliency, QoS

Next Gen Cloud Apps

• Scale out, flash, DAS

• Application specific storage

All SSD

Array

Server-side

Flash

Page 4: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

4

The complexity enterprise IT and developers face today

An Idea for a cool app

Spec a server config

Justify server costs

Procurement process

Wait for HW to arrive

Wait for IT ops to Image the server

Install a Database

LOB Architecture approval

Central IT Architectural

approval

Justify more server for scale

testing

Wait for more HW

Configure ACLs and LBs

New infrastructures

New Languages and

Frameworks

New Devices

and Domains

New Data types and

requirements

Page 5: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

5

Micro Clouds

Cloud Foundry – Announced Today on vSphere

Data

Services

Other

Services

Msg

Services

.js

Public Clouds

Private Clouds

Page 6: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

6

Big Data - Not Just for the Web Giants – Now the Intelligent Enterprise

Page 7: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

7

Real-time analysis allows

instant understanding of

market dynamics.

Retailers can have intimate

understanding of their

customers needs and use

direct targeted marketing.

Market Segment Analysis Personalized Customer Targeting`

Page 8: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

8

The Emerging Pattern of Big Data Systems: Retail Example

Real-Time

Streams

Exa-scale Data Store

Parallel Data Processing

Real-Time

Processing

Machine

Learning

Data Science

Cloud Infrastructure

Page 9: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

9

Storage: Plan for Peta-scale Data Storage and Processing

0.01

0.1

1

10

100

1000

2000 2003 2006 2009 2012 2015

Online Apps

AnalyticsPB of

Data

Analytics Rapidly Outgrows Traditional Data Size

by 100x

Page 10: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

10

Unprecedented Scale

“Data transparency,

amplified by Social Networks

generates data at a

scale never seen before” - The Human Face of Big Data

We are creating an Exabyte

of data every minute in 2013

Yottabyte by 2030

Page 11: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

11

A single GE Jet Engine produces

10 Terabytes of data in one hour

– 90 Petabytes per year.

Enabling early detection of

faults, common mode failures,

product engineering feedback.

Post Mortem Proactively Maintained Connected Product

Page 12: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

12

Cloud Infrastructure Supports Mixed Big Data Workloads

Machine Learning Hadoop

Real-Time Analytics

Change workload types to Real-time

Analytics, Machine Learning , Hadoop

above cloud infra, too

Cloud Infrastructure

Machine Learning

Hadoop

Real-Time Analytics

Management

Network/Security

Storage/Availability

Compute

Page 13: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

13

Cloud Infrastructure Supports Multiple Tenants

Change workload types to Real-time

Analytics, Machine Learning , Hadoop

above cloud infra, too

Cloud Infrastructure

Management

Network/Security

Storage/Availability

Compute

Web User

Analytics

Financial

Analysis

Historical Customer

Behavior

Page 14: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

14

Software-defined Datacenter: Compute

Agility / Rapid deployment

Lower Capex

Isolation for resource control

and security

1

2

3

Operational efficiency 4

Management

The Core Values of Virtualization Apply to Big Data

Network/Security

Storage/Availability

Compute

Page 15: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

15

Strong Isolation between Workloads is Key

Hungry

Workload 1

Reckless

Workload 2

Nosy

Workload 3

Cloud Infrastructure

Page 16: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

16

Virtualizing Hadoop

Shrink and expand

cluster on demand

Independent scaling of

Compute and data

Strong multi-tenancy

Elasticity & Multi-tenancy

High availability for

entire Hadoop stack

One click to setup

Battle-tested

High Availability

Rapid deployment

One stop command

center

Easy to

configure/reconfigure

Operational Simplicity

Page 17: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

17

Serengeti

Virtual Hadoop Manager (VHM)

Hadoop Virtualization Extensions

(HVE)

Big Data Extensions: Core Components

Core is Open Source

Tool to simplify virtualized Hadoop deployment & operations

Serengeti

Virtualization changes for core Hadoop

Contributed back to Apache Hadoop

Advanced resource management on vSphere

Page 18: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

18

Hadoop

batch analysis

Big Data Family of Frameworks

File System/Data Store

Host Host Host Host Host Host

HBase

real-time queries

NoSQL Cassandra,

Mongo, etc Big SQL

Impala,

Pivotal HawQ

Compute

layer

Virtualization

Host

Other Spark,

Shark,

Solr,

Platfora,

Etc,…

Page 19: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

19

Traditional Hadoop vs. Elastic Hadoop

Scale-out Network Storage

Traditional Hadoop:

Converged

Compute/Storage Elastic Compute

Scale-out Network Storage

Page 20: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

20

Management

Software-defined Datacenter: Storage

Requirements of Next Generation Storage

Network/Security

Storage/Availability

Compute

10x lower cost of storage

Handle explosive data growth

Support a variety of

application types

1

2

3

Solve the privacy and

security issues 4

Page 21: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

21

HDFS Model

ESX ESX ESX

J

T

HDFS or MAPR VM HDFS or MAPR VM HDFS or MAPR VM

Local Disks

SAN/NAS Non-Hadoop VMs

Hadoop Compute VMs

JT: JobTracker

TT: TaskTracker

NN: NameNode

VHM: Virtual Hadoop Manager

N

N

T

T

T

T T

T

VirtualCenter Management Server

DRS DRS DRS DRS DRS

VHM

Hadoop HDFS VMs

T

T

T

T T

T

J

T

Page 22: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

22

Big-Data using Local Disks

Host

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Servers with

Local Disks

16-24 core server

12-24 SATA 2-4TB Disks

10 GbE adapter

iSCSI/NFS for Shared

Storage for vMotion etc,…

High Performance 10GBE

Switch per Rack

Page 23: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

23

Scale-out Storage for Big Data

$-

$0.50

$1.00

$1.50

$2.00

$2.50

$3.00

$3.50

$4.00

$4.50

$5.00

$5.50

0.5 1 2 4 8 16 32 64 128

Cost per GB

Petabytes Deployed

Traditional

SAN/NAS

Distributed

Object

Storage HDFS

MAPR

CEPH

Scale-out NAS Isilon, NTAP

Page 24: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

24

Big Data Storage

Scale-out Network Storage

Elastic Compute

Scale-out Network Storage

• Hadoop Protocol

• Snapshots

• Posix Apps

• Full NFS Access

• Replication

• Erasure Coding

Page 25: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

25

Big Data with Scale-out-NAS

Big-Data using Scale-out NAS

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Scale-out NAS

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Scale-out NAS

Temp

Data

Shared

Data

Isilon

Scale-out

NAS

Local

Disk or SSD

In each Host

For Transient Data

Page 26: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

26

Chris Greer, FedEx Services

Page 27: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

27

Breakthrough Use Cases

Web Log Analysis

Initial exploration was around detection of mobile devices accessing the

website.

Analysis of 570 billion web server log entries took approximately 9 minutes to

complete on a small cluster.

ZIP code Analysis

Analysis of data to determine which ZIP codes are the highest source or

destination for shipments.

Shipment Analysis

Analysis of shipment information to determine patterns

that may delay a package.

Page 28: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

28

Agile Big Data at FedEx

• Trusted Isolation

• Well known auditable platform

Security

• Deploy in minutes

• Optimize for shift in workload characteristics

Agility

• Create true multi-tenancy

• Mixed workloads

Elasticity

Page 29: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

29

Hadoop Service at FedEx: vSphere + Isilon Storage

Scale-out Isilon Cluster

- Shared Data

- NAS + Hadoop

Elastic vSphere Cluster

- Mixed Workloads

- vSphere

- Existing Rack Mount

Servers

Page 30: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

30

Agility: Automation of Hadoop Cluster Management

Deploy

Resize

Elastic scaling

Customize

Incorporate

best practices

Manage

Tune configuration

Run

Execute jobs

Access HDFS

Page 31: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

31

Monitoring

Agility: Ease of Management Due to Consolidation

Cluster setup

and provisioning

Monitoring

HW procurement

and sizing

Cluster setup

and provisioning

HW procurement

and sizing

Page 32: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

32

Elasticity: Mixed Workloads on a Shared Platform

Production

Test

Experimentation

Dept A: Marketing Dept B: Operations

Production

Test

Experimentation

Log files

Social data Transaction data Historical data

Common Infrastructure Common Infrastructure

can be shared by multiple

logical Hadoop clusters

and prioritized with

VMWare resource pools.

Data Segregation Data that should not be

shared can be kept

separate and leverage

VMWare security controls

for isolation.

Page 33: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

33

Security

Known Security Model

• VMs provide the required levels of Isolation for different workloads

Trusted Auditable Platform

• Leverage virtualization as the platform

• Known to auditors

• Accepted as a valid deployment model

Page 34: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

34

Summary

Page 35: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

35

Customers Winning from Consolidated Big Data Platforms

“Dedicated hardware makes no

sense”

“Software-defined Datacenter

enables rapid deployment

multiple tenants and labs”

“Our mixed workloads include

Hadoop, Database, ETL and

App-servers”

“Any performance penalties are

minor” Management

Network/Security

Storage/Availability

Compute

Page 36: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

36

Q&A

Page 37: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

37

Other VMware Activities Related to This Session

HOL-SDC-1309 - vSphere Big Data Extensions

VAPP5484 – Big Data Extensions Advanced Features

VAPP5626 – Big Data Panel

VAPP5402

Page 38: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

THANK YOU

Page 39: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps
Page 40: VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

Beyond Mission Critical: Virtualizing Big-Data,

Hadoop, HPC, Cloud-scale Apps

Chris Greer, FedEx

Richard McDougall, VMware

VAPP5402

#VAPP5402