vmworld 2013: beyond mission critical: virtualizing big-data, hadoop, hpc, cloud-scale apps

Post on 22-Jan-2015

99 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

VMworld 2013 Chris Greer, FedEx Richard McDougall, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

TRANSCRIPT

Beyond Mission Critical: Virtualizing Big-Data,

Hadoop, HPC, Cloud-scale Apps

Chris Greer, FedEx

Richard McDougall, VMware

VAPP5402

#VAPP5402

© 2013 VMware Inc. All rights reserved

Beyond Mission Critical: Virtualizing Big-Data, Hadoop and Cloud Apps

Richard McDougall

CTO, Storage and Application Services

Chris Greer,

Enterprise Architect, FedEx

3

Virtualize Everything: Next Generation Apps

Virtual Storage

Arrays

vSphere

SAN/NAS Object / BLOB

Traditional Applications

• Traditional enterprise storage

• HW-based resiliency, QoS

Next Gen Cloud Apps

• Scale out, flash, DAS

• Application specific storage

All SSD

Array

Server-side

Flash

4

The complexity enterprise IT and developers face today

An Idea for a cool app

Spec a server config

Justify server costs

Procurement process

Wait for HW to arrive

Wait for IT ops to Image the server

Install a Database

LOB Architecture approval

Central IT Architectural

approval

Justify more server for scale

testing

Wait for more HW

Configure ACLs and LBs

New infrastructures

New Languages and

Frameworks

New Devices

and Domains

New Data types and

requirements

5

Micro Clouds

Cloud Foundry – Announced Today on vSphere

Data

Services

Other

Services

Msg

Services

.js

Public Clouds

Private Clouds

6

Big Data - Not Just for the Web Giants – Now the Intelligent Enterprise

7

Real-time analysis allows

instant understanding of

market dynamics.

Retailers can have intimate

understanding of their

customers needs and use

direct targeted marketing.

Market Segment Analysis Personalized Customer Targeting`

8

The Emerging Pattern of Big Data Systems: Retail Example

Real-Time

Streams

Exa-scale Data Store

Parallel Data Processing

Real-Time

Processing

Machine

Learning

Data Science

Cloud Infrastructure

9

Storage: Plan for Peta-scale Data Storage and Processing

0.01

0.1

1

10

100

1000

2000 2003 2006 2009 2012 2015

Online Apps

AnalyticsPB of

Data

Analytics Rapidly Outgrows Traditional Data Size

by 100x

10

Unprecedented Scale

“Data transparency,

amplified by Social Networks

generates data at a

scale never seen before” - The Human Face of Big Data

We are creating an Exabyte

of data every minute in 2013

Yottabyte by 2030

11

A single GE Jet Engine produces

10 Terabytes of data in one hour

– 90 Petabytes per year.

Enabling early detection of

faults, common mode failures,

product engineering feedback.

Post Mortem Proactively Maintained Connected Product

12

Cloud Infrastructure Supports Mixed Big Data Workloads

Machine Learning Hadoop

Real-Time Analytics

Change workload types to Real-time

Analytics, Machine Learning , Hadoop

above cloud infra, too

Cloud Infrastructure

Machine Learning

Hadoop

Real-Time Analytics

Management

Network/Security

Storage/Availability

Compute

13

Cloud Infrastructure Supports Multiple Tenants

Change workload types to Real-time

Analytics, Machine Learning , Hadoop

above cloud infra, too

Cloud Infrastructure

Management

Network/Security

Storage/Availability

Compute

Web User

Analytics

Financial

Analysis

Historical Customer

Behavior

14

Software-defined Datacenter: Compute

Agility / Rapid deployment

Lower Capex

Isolation for resource control

and security

1

2

3

Operational efficiency 4

Management

The Core Values of Virtualization Apply to Big Data

Network/Security

Storage/Availability

Compute

15

Strong Isolation between Workloads is Key

Hungry

Workload 1

Reckless

Workload 2

Nosy

Workload 3

Cloud Infrastructure

16

Virtualizing Hadoop

Shrink and expand

cluster on demand

Independent scaling of

Compute and data

Strong multi-tenancy

Elasticity & Multi-tenancy

High availability for

entire Hadoop stack

One click to setup

Battle-tested

High Availability

Rapid deployment

One stop command

center

Easy to

configure/reconfigure

Operational Simplicity

17

Serengeti

Virtual Hadoop Manager (VHM)

Hadoop Virtualization Extensions

(HVE)

Big Data Extensions: Core Components

Core is Open Source

Tool to simplify virtualized Hadoop deployment & operations

Serengeti

Virtualization changes for core Hadoop

Contributed back to Apache Hadoop

Advanced resource management on vSphere

18

Hadoop

batch analysis

Big Data Family of Frameworks

File System/Data Store

Host Host Host Host Host Host

HBase

real-time queries

NoSQL Cassandra,

Mongo, etc Big SQL

Impala,

Pivotal HawQ

Compute

layer

Virtualization

Host

Other Spark,

Shark,

Solr,

Platfora,

Etc,…

19

Traditional Hadoop vs. Elastic Hadoop

Scale-out Network Storage

Traditional Hadoop:

Converged

Compute/Storage Elastic Compute

Scale-out Network Storage

20

Management

Software-defined Datacenter: Storage

Requirements of Next Generation Storage

Network/Security

Storage/Availability

Compute

10x lower cost of storage

Handle explosive data growth

Support a variety of

application types

1

2

3

Solve the privacy and

security issues 4

21

HDFS Model

ESX ESX ESX

J

T

HDFS or MAPR VM HDFS or MAPR VM HDFS or MAPR VM

Local Disks

SAN/NAS Non-Hadoop VMs

Hadoop Compute VMs

JT: JobTracker

TT: TaskTracker

NN: NameNode

VHM: Virtual Hadoop Manager

N

N

T

T

T

T T

T

VirtualCenter Management Server

DRS DRS DRS DRS DRS

VHM

Hadoop HDFS VMs

T

T

T

T T

T

J

T

22

Big-Data using Local Disks

Host

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Servers with

Local Disks

16-24 core server

12-24 SATA 2-4TB Disks

10 GbE adapter

iSCSI/NFS for Shared

Storage for vMotion etc,…

High Performance 10GBE

Switch per Rack

23

Scale-out Storage for Big Data

$-

$0.50

$1.00

$1.50

$2.00

$2.50

$3.00

$3.50

$4.00

$4.50

$5.00

$5.50

0.5 1 2 4 8 16 32 64 128

Cost per GB

Petabytes Deployed

Traditional

SAN/NAS

Distributed

Object

Storage HDFS

MAPR

CEPH

Scale-out NAS Isilon, NTAP

24

Big Data Storage

Scale-out Network Storage

Elastic Compute

Scale-out Network Storage

• Hadoop Protocol

• Snapshots

• Posix Apps

• Full NFS Access

• Replication

• Erasure Coding

25

Big Data with Scale-out-NAS

Big-Data using Scale-out NAS

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Scale-out NAS

Host

Host

Host

Host

Host

Host

Top of Rack Switch

Scale-out NAS

Temp

Data

Shared

Data

Isilon

Scale-out

NAS

Local

Disk or SSD

In each Host

For Transient Data

26

Chris Greer, FedEx Services

27

Breakthrough Use Cases

Web Log Analysis

Initial exploration was around detection of mobile devices accessing the

website.

Analysis of 570 billion web server log entries took approximately 9 minutes to

complete on a small cluster.

ZIP code Analysis

Analysis of data to determine which ZIP codes are the highest source or

destination for shipments.

Shipment Analysis

Analysis of shipment information to determine patterns

that may delay a package.

28

Agile Big Data at FedEx

• Trusted Isolation

• Well known auditable platform

Security

• Deploy in minutes

• Optimize for shift in workload characteristics

Agility

• Create true multi-tenancy

• Mixed workloads

Elasticity

29

Hadoop Service at FedEx: vSphere + Isilon Storage

Scale-out Isilon Cluster

- Shared Data

- NAS + Hadoop

Elastic vSphere Cluster

- Mixed Workloads

- vSphere

- Existing Rack Mount

Servers

30

Agility: Automation of Hadoop Cluster Management

Deploy

Resize

Elastic scaling

Customize

Incorporate

best practices

Manage

Tune configuration

Run

Execute jobs

Access HDFS

31

Monitoring

Agility: Ease of Management Due to Consolidation

Cluster setup

and provisioning

Monitoring

HW procurement

and sizing

Cluster setup

and provisioning

HW procurement

and sizing

32

Elasticity: Mixed Workloads on a Shared Platform

Production

Test

Experimentation

Dept A: Marketing Dept B: Operations

Production

Test

Experimentation

Log files

Social data Transaction data Historical data

Common Infrastructure Common Infrastructure

can be shared by multiple

logical Hadoop clusters

and prioritized with

VMWare resource pools.

Data Segregation Data that should not be

shared can be kept

separate and leverage

VMWare security controls

for isolation.

33

Security

Known Security Model

• VMs provide the required levels of Isolation for different workloads

Trusted Auditable Platform

• Leverage virtualization as the platform

• Known to auditors

• Accepted as a valid deployment model

34

Summary

35

Customers Winning from Consolidated Big Data Platforms

“Dedicated hardware makes no

sense”

“Software-defined Datacenter

enables rapid deployment

multiple tenants and labs”

“Our mixed workloads include

Hadoop, Database, ETL and

App-servers”

“Any performance penalties are

minor” Management

Network/Security

Storage/Availability

Compute

36

Q&A

37

Other VMware Activities Related to This Session

HOL-SDC-1309 - vSphere Big Data Extensions

VAPP5484 – Big Data Extensions Advanced Features

VAPP5626 – Big Data Panel

VAPP5402

THANK YOU

Beyond Mission Critical: Virtualizing Big-Data,

Hadoop, HPC, Cloud-scale Apps

Chris Greer, FedEx

Richard McDougall, VMware

VAPP5402

#VAPP5402

top related