cisco - presentation at hortonworks booth - strata 2014

24
Hadoop as a Service: HDP 2.0 with OpenStack on Cisco UCS Servers Karthik Kulkarni, TME, Big Data Solutions Architect Date: 10.17.14 <<Insert show banner header here>>

Upload: hortonworks

Post on 05-Dec-2014

220 views

Category:

Software


0 download

DESCRIPTION

Hadoop has become a strategic data platform embraced by mainstream enterprises as it offers the fastest path for businesses to unlock value in big data while maximizing existing investments. Hadoop as a Service (or HaaS) is gaining traction in many Service Providers and IT Organizations within Enterprise to offer Hadoop as a Service to a larger audience within its organization or to customers in a more easy to manage and automated fashion. This session focuses on Openstack to manage VM lifecycle on Cisco UCS Common Platform Architecture version 2 (CPAv2) and deploying Hortonworks Data Platform 2.0 on top of the VMs spawned through Openstack and some performance results seen compared to bare metal.

TRANSCRIPT

Page 1: CISCO - Presentation at Hortonworks Booth - Strata 2014

Hadoop as a Service: HDP 2.0 with OpenStack on Cisco UCS Servers

Karthik Kulkarni, TME, Big Data Solutions Architect

Date: 10.17.14

<<Insert show banner header here>>

Page 2: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 2 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

2

•  Hadoop as a Service is basically virtualizing Hadoop and refers to a cloud computing solution for Hadoop

•  HaaS is a managed Hadoop cluster where all nitty gritty details of the underlying services are transparent to the user

Page 3: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 3 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

3

By combining the innovation of OpenStack to Hadoop, we bring in the following benefits to Hadoop seamlessly •  Self-service provisioning •  Elastic scaling •  Support for multi-tenancy and •  Improve Infrastructure Utilization •  Pay based on use

Page 4: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 4 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

4

OpenStack provides a free and open-source cloud computing software platform

Source: openstack.org

OpenStack provides an Infrastructure as a Service (IaaS)

Page 5: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 5 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

5

OpenStack has a modular architecture with various code names for its components.

Source: openstack.org

Service Project name Description

Dashboard Horizon

Provides a web-based self-service portal to interact with underlying OpenStack services, such as launching an instance, assigning IP addresses and configuring access controls.

Compute Nova

Manages the lifecycle of compute instances in an OpenStack environment. Responsibilities include spawning, scheduling and decomissioning of machines on demand.

Networking Neutron

Enables network connectivity as a service for other OpenStack services, such as OpenStack Compute. Has a pluggable architecture that supports many popular networking vendors and technologies.

Page 6: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 6 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

6

OpenStack has a modular architecture with various components.

Source: openstack.org

Page 7: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 7 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

7

Source: openstack.org

OpenStack has three roles for the nodes underneath (Host OS) •  Controller node – It is the main management for

Openstack which controls compute and storage node. •  Compute node – These nodes are hosts to the VMs

spawned •  Storage node – These nodes hosts the storage for VM. In this architecture of HaaS, storage is Ephemeral, which is local to VM. Hence compute nodes are also storage nodes and there are no separate Storage nodes.

Page 8: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco UCS Common Platform Architecture for Big Data

8

Page 9: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 9 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Provisioning

Monitoring

Maintenance

Growth UCSM provides: •  Speed •  Consistency •  Simplicity •  Visibility

Common Platform Architecture (CPA) is a highly scalable architecture designed to meet variety of scale-out application demands

LAN, SAN, Management

UCS Manager

UCS 6200 Series Fabric Internments: High speed connectivity and management, integration with enterprise application on blades

Nexus 2232 Fabric

Extenders: Scalability at

lower cost

UCS 240 Servers: Compute, storage

Page 10: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 10 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Consistent Management at Scale

Single Rack Single Domain

Multiple Domains

UCS Manager

Page 11: CISCO - Presentation at Hortonworks Booth - Strata 2014

HaaS with Open Stack on UCS

11

Page 12: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 12 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

12

The following hardware and software infrastructure were used for HaaS solution on UCS •  Cisco UCS Common Platform Architecture for BigData

Version 2 (CPAv2) with Capacity Optimized configuration

•  Ubuntu 12.04 LTS for Host and Guest OS

•  OpenStack release - Havana

•  Hortonworks 2.0.6 - installed manually on the guest VMs

Page 13: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 13 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

13

OpenStack components used are as follows

•  Keystone - Identity Service,

•  Glance - VM Image service,

•  Nova - compute (KVM as the hypervisor),

•  Storage - Ephemeral storage (if VM is deleted

all data associated with the VM is lost )

•  Networking - nova-network (flat-network) and

•  Horizon - OpenStack Dashboard

Page 14: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 14 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

14

•  One of the node is going to be Controller node

•  All other nodes are Compute nodes

•  Hadoop Namenode is run as a Single VM on the

controller node

•  Hadoop Resource Mgr is run as a Single VM on one of

the compute node

Controller Compute Compute Compute …

Name node Resource Mgr DN … DN DN … DN

Page 15: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 15 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

15

Controller Compute Compute Compute …

Name node Resource Mgr DN … DN DN … DN

Pass --hint option to “nova boot” command with same_host or different_host! In nova.conf add scheduler_default_filters=SameHostFilter,DifferentHostFilter! #nova boot --flavor 1 --key_name mykey --image <image-id> \!--security_group default --hint different_host=<vm-id>!!#nova boot --flavor 1 --key_name mykey --image <image-id> \!--security_group default --hint same_host=<vm-id>!!

Additional details: www.cisco.com/go/bigdata_design

Page 16: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 16 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Page 17: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 17 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

 

Category Workloads Micro Benchmarks WordCount (per node)

TeraSort (cluster) Sort (per node)

Machine Learning Mahout Bayesian Classification (Bayes) Mahout K-means clustering (kmeans)

HDFS Benchmark EnhancedDFSIO (dfsioe) Hive Query Benchmark Hive Bench

Page 18: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 18 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Hardware  /  So+ware  

Configura1on  

Servers     20  x  UCS  C240  M3  LFF  (1  Name  node,  1  Secondary  Name  node,  18  Data  nodes)  

Processor     2  x  Intel®  Xeon®  Processor  E5-­‐2680  v2    (25M  Cache,  2.80  GHz),    10  Cores  (Each)  

Hard  disk  drives  

12  x  4TB  SATA  7200RPM  HDDs,  RAID  10  

Memory     256  GB  RAM  

Network     2  x  10  Cisco  VIC  1225  Gigabit  Ethernet  NIC  

Opera[on  system  

Ubuntu  14.04LTS  (Host  OS  and  Guest  OS)  

Hadoop  Version  

Hortonworks  HDP  2.0.6  

HiBench   HiBench  2.2  

Page 19: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 19 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

 

Name   vCPU  RAM  (MB)   Root  Disk(GB)   Ephemeral(GB)   VM  Filesystem  

hadoop.8vm.ephemeral   2   28250   50   2000   ext3  

hadoop.4vm.ephemeral   4   56500   50   4000   xfs  

hadoop.2vm.ephemeral   8   113000   50   8000   xfs  

hadoop.1vm.ephemeral   16   226000   50   16000   xfs  

hadoop.master   16   226000   50   20000   xfs  

Page 20: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 20 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Sec

onds

This workload sorts its text input data (24GB) and results are per node

Page 21: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 21 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Sec

onds

TeraSort is a standard benchmark created by Jim Gray. Its input data is generated by Hadoop TeraGen (1TB) example program

Page 22: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 22 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Sec

onds

This workload counts the occurrence of each word in the input data, which are generated using the Hadoop RandomTextWriter (32GB/node)

Page 23: CISCO - Presentation at Hortonworks Booth - Strata 2014

Cisco Confidential 23 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Summary

While Mainstream Hadoop is still expected to be on Bare-Metal, Hadoop as a Service with OpenStack holds great promise and gain more popularity with Service Providers, IT offering HaaS internally within an Organization, Testing and Development environments, to name a few. Additional details: www.cisco.com/go/bigdata_design Cisco Validated Design: Hadoop as a Service (HaaS) with Cisco UCS CPA v2 for Big Data and Open Stack

Page 24: CISCO - Presentation at Hortonworks Booth - Strata 2014

Thank You

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.