how to upgrade your hadoop stack in 1 step -- with zero downtime
DESCRIPTION
Outline: - The Apache Project's 4-step upgrade process for its Hadoop distro - Upgrade processes for the Hadoop stack involving Apache Ambari and other management tools - Bright roles for Hadoop service definition, assignment and composition - The 1-step, 0-downtime Bright upgrade process for Hadoop distros and the analytics stackTRANSCRIPT
http://poll.fm/50lt0
How to Upgrade Your Hadoop Stack in 1
Step -- with Zero Downtime
Ian Lumb
Bright Evangelist
Developed originally for a Bright Computing webinar (link) delivered November 5, 2014.
Key Takeaways
The Apache Project
• 4-step upgrade process for its Hadoop distro
Upgrade processes for the Hadoop stack
• Apache Ambari
• Other management tools
Bright roles for Hadoop
• Service definition, assignment and composition
The 1-step, 0-downtime Bright upgrade process
• Hadoop distros and the analytics stack
Why Upgrade Hadoop?
Gain access to new capabilities
• Enhancements - new features and/or functionalities
• Improvements – maintenance (e.g., security)
Transitioning from pilot to production
Maintain compatibility
• Between sites within an organization
• Between project participants
Other reasons?
4-Step Rolling Upgrade Process: Overview
1. Prepare the rolling upgrade
• Snapshot HDFS metadata
2. Upgrade active and standby NameNode services
3. Upgrade DataNodes
4. Finalize the rolling upgrade
4-Step Rolling Upgrade Process: Considerations
High Availability (HA)?
• if ( “No” ) then
Downtime!
Federated clusters?
• Repeat for each namespace
Out of scope
• JournalNodes
• ZooKeeperNodes
• Analytics stack
Other Options
Manual
• CDH
Hadoop plus analytics stack (Step 8)
• HDP
Automated
• Cloudera Manager
Some manual steps required …
• Apache Ambari
Redeploy entire distros (e.g., HDP)
Analytics stack upgrades
– A planned enhancement
What Makes Hadoop Upgrades Challenging?
HDFS is the underlying platform
• YARN and analytics apps depend upon HDFS
Complexity
• Interdependencies
HDFS services plus the rest of the Hadoop stack
Highly distributed
Scale
Bright Cluster Manager and Hadoop Upgrades
Bright roles
• Facilitates service definition, assignment and composition
Almost any service can be made highly available
– Run redundant copies on different nodes
Bright CMSH
• Cluster-Management SHell
Bright Concepts - Role
Device:Entity in cluster management infrastructure which represents a physical device in the cluster
Category:A group of nodes sharing the same configuration. A node must always be a member of exactly 1 category
Node group:A group of nodes, not necessarily sharing the same configuration. A node can be a member of 0 or more node groups.
Role:Task that can be assigned to a node.
For example, a node can be assigned the Provisioning role, which makes it a provisioning node.
Hadoop-Related Roles in Bright Cluster Manager
Bright Cluster Management Interfaces
Three ways to manage cluster:
CMSH
• Command-line interface to cluster
• Usually runs on head node, but can also be used remotely
• Can be used interactively and from scripts
• Powerful tool but takes some time to get familiar with …
CMGUI
• Desktop GUI application (supported: Windows, Linux, OS X)
(installable packages in /cm/shared/apps/cmgui/dist)
• Can also be run on head node through SSH with X-
forwarding
• Intuitive and easy to use
SOAP / JSON API
• Python and C++ interfaces available which hide SOAP /
JSON
Bright Cluster Management Shell (CMSH)
Features:
Modular interface
Command completion using tab key
Command line history
Output redirection to file or shell command
Scriptable in batch mode
Support for looping over objects
Example[demo]% device
[demo->device]% status
demo ................ [ UP ]
node001 ............. [ UP ]
node002 ............. [ UP ]
Bright Hadoop Upgrades
Single script captures the Apache Project’s 4 steps
Enhancements
• Automated deployment of updated software
Ensures configured instances of Hadoop are updated
• DataNodes can be upgraded simultaneously
Distributed provisioning (large-cluster option)
• JournalNodes are upgraded without downtime
• Automated testing of the upgrade prior to commitment
Validation of the Hadoop setup
– Teragen, terasort and teravalidate are executed
DEMO …
Cascading Upgrade
Bright Support for Apache Hadoop
FULLY INTEGRATED — Bright Cluster Manager
bundles, installs and manages the `product’
completely. Nothing else is needed.
INTEGRATED — Bright Cluster Manager installs and
manages some aspects of the `product’, but
something else is need for COMPLETE support.
COMPATIBLE — Bright Cluster Manager doesn’t
install or manage the `product’, but it can be installed
on a cluster that is itself Bright-managed.
INCOMPATIBLE — Bright Cluster Manager doesn’t
work with the `product’ at all.
Hadoop Support
FULLY INTEGRATED
• Apache Hadoop, CDH & HDP
HDFS and its services
– HBase, NameNode, DataNode & JournalNode
• ZooKeeper
INTEGRATED
• YARN
• Pig, Hive, Accumulo & Spark
COMPATIBLE
• E.g., Giraph
INCOMPATIBLE
Note: HA YARN available soon.
Compatible Support Example: Giraph
Bright Maintenance of Hadoop
Innovation characterizes the entire history and
evolution of Big Data Analytics via Hadoop
• BUT … introduces challenges and opportunities …
Bright Computing’s approach leverages
• People
Proactively maintaining business and technical relationships
• Process
`Hands-on engineering’ begins with each release
– Preliminary to fully enterprise-ready implementations
• Product
Bright Cluster Manager released once per year
– Compatible updates flow continuously via YUM …
Further Discussion
Upgrade scenarios
Migrating distros
Hadoop stack
Key Takeaways
The Apache Project
• 4-step upgrade process for its Hadoop distro
Upgrade processes for the Hadoop stack
• Apache Ambari
• Other management tools
Bright roles for Hadoop
• Service definition, assignment and composition
The 1-step, 0-downtime Bright upgrade process
• Hadoop distros and the analytics stack
Q & A
Ian Lumb, [email protected]
Additional Slides
Customer Needs
Quickly build and deploy a Hadoop cluster
• Needed yesterday for an important project?
Build a PoC cluster to test drive Hadoop
• Unsure about taking the Hadoop plunge?
Build a hybrid HPC/Hadoop cluster
• HPC and Hadoop required
1
2
3
Feature Benefit
Installs on bare metal Pallet to production in less time
Simple deployment process Running right – first time, every time
Comprehensive monitoring and health checking
Know how your cluster is running
Deploys multiple distributions Make the choice that best fits your needs
Operate multiple Hadoop instances simultaneously
Accommodate multiple choices at the same time
Integrated HDFS management operations Easily allocate storage resources to users
Product Details
Product Differentiation
• Installs on bare metal through to the Hadoop distro
• Works with almost any Hadoop distro
• Single-pane-of-glass management interface
Addresses the physical cluster and Hadoop
• Fully manages Hadoop services
• HDFS, YARN, etc.
• Customized monitoring and health checks
• Multiple instances of Hadoop
Architected specifically for Hadoop
• Simultaneous, independent instances on dedicated hardware
• Time-sliced instances on shared hardware
HPC and Hadoop together