cisco - presentation at hortonworks booth - strata 2014
DESCRIPTION
Hadoop has become a strategic data platform embraced by mainstream enterprises as it offers the fastest path for businesses to unlock value in big data while maximizing existing investments. Hadoop as a Service (or HaaS) is gaining traction in many Service Providers and IT Organizations within Enterprise to offer Hadoop as a Service to a larger audience within its organization or to customers in a more easy to manage and automated fashion. This session focuses on Openstack to manage VM lifecycle on Cisco UCS Common Platform Architecture version 2 (CPAv2) and deploying Hortonworks Data Platform 2.0 on top of the VMs spawned through Openstack and some performance results seen compared to bare metal.TRANSCRIPT
Hadoop as a Service: HDP 2.0 with OpenStack on Cisco UCS Servers
Karthik Kulkarni, TME, Big Data Solutions Architect
Date: 10.17.14
<<Insert show banner header here>>
Cisco Confidential 2 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
2
• Hadoop as a Service is basically virtualizing Hadoop and refers to a cloud computing solution for Hadoop
• HaaS is a managed Hadoop cluster where all nitty gritty details of the underlying services are transparent to the user
Cisco Confidential 3 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
3
By combining the innovation of OpenStack to Hadoop, we bring in the following benefits to Hadoop seamlessly • Self-service provisioning • Elastic scaling • Support for multi-tenancy and • Improve Infrastructure Utilization • Pay based on use
Cisco Confidential 4 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
4
OpenStack provides a free and open-source cloud computing software platform
Source: openstack.org
OpenStack provides an Infrastructure as a Service (IaaS)
Cisco Confidential 5 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
5
OpenStack has a modular architecture with various code names for its components.
Source: openstack.org
Service Project name Description
Dashboard Horizon
Provides a web-based self-service portal to interact with underlying OpenStack services, such as launching an instance, assigning IP addresses and configuring access controls.
Compute Nova
Manages the lifecycle of compute instances in an OpenStack environment. Responsibilities include spawning, scheduling and decomissioning of machines on demand.
Networking Neutron
Enables network connectivity as a service for other OpenStack services, such as OpenStack Compute. Has a pluggable architecture that supports many popular networking vendors and technologies.
Cisco Confidential 6 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
6
OpenStack has a modular architecture with various components.
Source: openstack.org
Cisco Confidential 7 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
7
Source: openstack.org
OpenStack has three roles for the nodes underneath (Host OS) • Controller node – It is the main management for
Openstack which controls compute and storage node. • Compute node – These nodes are hosts to the VMs
spawned • Storage node – These nodes hosts the storage for VM. In this architecture of HaaS, storage is Ephemeral, which is local to VM. Hence compute nodes are also storage nodes and there are no separate Storage nodes.
Cisco UCS Common Platform Architecture for Big Data
8
Cisco Confidential 9 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Provisioning
Monitoring
Maintenance
Growth UCSM provides: • Speed • Consistency • Simplicity • Visibility
Common Platform Architecture (CPA) is a highly scalable architecture designed to meet variety of scale-out application demands
LAN, SAN, Management
UCS Manager
UCS 6200 Series Fabric Internments: High speed connectivity and management, integration with enterprise application on blades
Nexus 2232 Fabric
Extenders: Scalability at
lower cost
UCS 240 Servers: Compute, storage
Cisco Confidential 10 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Consistent Management at Scale
Single Rack Single Domain
Multiple Domains
UCS Manager
HaaS with Open Stack on UCS
11
Cisco Confidential 12 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
12
The following hardware and software infrastructure were used for HaaS solution on UCS • Cisco UCS Common Platform Architecture for BigData
Version 2 (CPAv2) with Capacity Optimized configuration
• Ubuntu 12.04 LTS for Host and Guest OS
• OpenStack release - Havana
• Hortonworks 2.0.6 - installed manually on the guest VMs
Cisco Confidential 13 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
13
OpenStack components used are as follows
• Keystone - Identity Service,
• Glance - VM Image service,
• Nova - compute (KVM as the hypervisor),
• Storage - Ephemeral storage (if VM is deleted
all data associated with the VM is lost )
• Networking - nova-network (flat-network) and
• Horizon - OpenStack Dashboard
Cisco Confidential 14 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
14
• One of the node is going to be Controller node
• All other nodes are Compute nodes
• Hadoop Namenode is run as a Single VM on the
controller node
• Hadoop Resource Mgr is run as a Single VM on one of
the compute node
Controller Compute Compute Compute …
Name node Resource Mgr DN … DN DN … DN
Cisco Confidential 15 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
15
Controller Compute Compute Compute …
Name node Resource Mgr DN … DN DN … DN
Pass --hint option to “nova boot” command with same_host or different_host! In nova.conf add scheduler_default_filters=SameHostFilter,DifferentHostFilter! #nova boot --flavor 1 --key_name mykey --image <image-id> \!--security_group default --hint different_host=<vm-id>!!#nova boot --flavor 1 --key_name mykey --image <image-id> \!--security_group default --hint same_host=<vm-id>!!
Additional details: www.cisco.com/go/bigdata_design
Cisco Confidential 16 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Cisco Confidential 17 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Category Workloads Micro Benchmarks WordCount (per node)
TeraSort (cluster) Sort (per node)
Machine Learning Mahout Bayesian Classification (Bayes) Mahout K-means clustering (kmeans)
HDFS Benchmark EnhancedDFSIO (dfsioe) Hive Query Benchmark Hive Bench
Cisco Confidential 18 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Hardware / So+ware
Configura1on
Servers 20 x UCS C240 M3 LFF (1 Name node, 1 Secondary Name node, 18 Data nodes)
Processor 2 x Intel® Xeon® Processor E5-‐2680 v2 (25M Cache, 2.80 GHz), 10 Cores (Each)
Hard disk drives
12 x 4TB SATA 7200RPM HDDs, RAID 10
Memory 256 GB RAM
Network 2 x 10 Cisco VIC 1225 Gigabit Ethernet NIC
Opera[on system
Ubuntu 14.04LTS (Host OS and Guest OS)
Hadoop Version
Hortonworks HDP 2.0.6
HiBench HiBench 2.2
Cisco Confidential 19 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Name vCPU RAM (MB) Root Disk(GB) Ephemeral(GB) VM Filesystem
hadoop.8vm.ephemeral 2 28250 50 2000 ext3
hadoop.4vm.ephemeral 4 56500 50 4000 xfs
hadoop.2vm.ephemeral 8 113000 50 8000 xfs
hadoop.1vm.ephemeral 16 226000 50 16000 xfs
hadoop.master 16 226000 50 20000 xfs
Cisco Confidential 20 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Sec
onds
This workload sorts its text input data (24GB) and results are per node
Cisco Confidential 21 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Sec
onds
TeraSort is a standard benchmark created by Jim Gray. Its input data is generated by Hadoop TeraGen (1TB) example program
Cisco Confidential 22 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Sec
onds
This workload counts the occurrence of each word in the input data, which are generated using the Hadoop RandomTextWriter (32GB/node)
Cisco Confidential 23 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Summary
While Mainstream Hadoop is still expected to be on Bare-Metal, Hadoop as a Service with OpenStack holds great promise and gain more popularity with Service Providers, IT offering HaaS internally within an Organization, Testing and Development environments, to name a few. Additional details: www.cisco.com/go/bigdata_design Cisco Validated Design: Hadoop as a Service (HaaS) with Cisco UCS CPA v2 for Big Data and Open Stack
Thank You
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.