scaling map-reduce in new hadoop release

Upload: nitin-jain

Post on 07-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Scaling Map-Reduce in new Hadoop release

    1/5

    Scaling Hadoop Map-Reduce in version 0.23

    As businesses are getting more and more technology dependent to gain competitive

    advantage, their systems are generating lots of feedback, usage data. While this data canunlock hugely valued business critical information and help fine-tune services, this data

    is lying unused in backend system on disks and tapes. This is mainly because most of thisdata is unstructured, vast and there are not many well-established cost effectivealternatives that help analyzing this information.

    This is where Hadoop comes to rescue. As Hadoop deployments are getting larger, its

    main components HDFS and Map-Reduce are confronted with scalability concerns.Hadoop team is working on addressing some of these aspects in release 0.23, planned for

    release in early 2012. One of the major change as a part of this effort is rearchitecture ofHadoop Map-Reduce infrastructure with focus on debottlenecking job tracker, one of themajor component of Map-Reduce Infrastructure.

    This blog compares new Map-Reduce architecture with existing one and highlights howit addresses associated scale issues.

    Image below shows 4 main component of map reduce infrastructure

    Client creates a job, splits the data that needs Map-Reduce processing into smallerchunks, submit it to job tracker for execution and probe job tracker for job status.

    Job Tracker (JT) Mainly responsible for -

    Client Job Tracker

    Task TrackerTask runner

    Task TrackerTask runner

  • 8/3/2019 Scaling Map-Reduce in new Hadoop release

    2/5

    Cluster Resource management Keeps track of available task tracker in thecluster and each task trackers capacity to execute concurrent map and reduce

    tasks

    Job management Complete management of job state and its execution,including creating and scheduling of map/reduce task on task tracker nodes,

    guiding task tracker through various state transition associated with a taskexecution, tracking tasks progress and running speculative tasks (speculativetasks are run in lieu of slow running tasks).

    Monitoring the health of task trackers and recovery of failed task/jobs when atask tracker fails in the middle of tasks execution

    Maintain job progress, history and diagnostics information. Task Tracker (TT) Responsible for initiating map/reduce task execution on a

    cluster node. It does so by spawning a new process for task execution, called taskrunner

    Task Runner Executes actual map/reduce tasks and reports progress to TaskTracker, which intern report it to Job Tracker and Job Tracker guides various statetransition related to task management.

    There can only be one instance of Job tracker in Hadoop cluster, which make it a single point of scale bottleneck and failure. If a job tracker fails, all information related to

    currently running job and tasks is lost and all jobs need to be executed again on recoveryof Job tracker. A large Hadoop cluster could have 1000s of Task Tracker, and mostly

    limited by the capacity of how many task trackers can be managed by Job Trackerinstance. As per some of industry benchmarks available, an instance of Job tracker is able

    to manage at max 4000 Task trackers at the moment and new Hadoop version aims toscale this to 10s of thousand of nodes.

    Let us see how new version aims to achieve this scale.

    Let us get introduced to new roles in new architecture

    Significant chunk of processing and memory intensive activity of Job tracker andtask tracker is moved into another entity named Application Master (AM).

    Job tracker mainly left responsible for resource management and allocatingresources when demanded by AM. Resource management mainly includes

    o Tracking nodes (NM see below for definition of NM) that are availablefor task executions

    o Keeping track of available resource (at the moment available memory fortask execution on that node) on these nodes.

    The component owning this reduced functionality of Job Tracker is name asResource Manager (RM) in new architecture.

  • 8/3/2019 Scaling Map-Reduce in new Hadoop release

    3/5

    Application Master (AM) is responsible for initiating and tracking the map/reducetask execution. For every map/reduce task that needs to be executed, AM asks

    RM to allocate resources on any for the available task execution nodes (NM).While asking for resource allocation for a task, AM specifies the resource

    requirement for that task. For every Job there is one and only one AM in cluster.

    Task tracker is mainly left responsible for allocation and deallocation of TaskRunners. Task Runner is referred generically as Container in new architecture.

    This reduced functionality of Task tracker is named as Node Manager (NM).

    Following image segregates different roles of Job Tracker and Task Tracker in upper halfof the image and how those roles are mapped to components of new architecture in lowerhalf of the image

    Executes map/reducetasks

    Tracks available TT and their taskexecution capacity. Assigns tasks to a

    TT based on available capacity

    Maintains the progress and status of all the

    running jobs, associated tasks capacity,

    interacts with TT for task scheduling and

    task status progress and state transition.This function consumes majority of JT

    Allocates and deallocated Task Runnerbased on the task scheduling request

    from JT

    Coordinates and tracks the execution of

    tasks and its progress with Task Runnerand feeds the status back into JT

    Current Map Reduce Components

    ClientRM NM

    Container

    AM

    ClientJob Tracker

    Task TrackerTask runner

    New Map Reduce Components

  • 8/3/2019 Scaling Map-Reduce in new Hadoop release

    4/5

    Following image explains in detail how various components in new architecture interactwith each other to execute a Map-Reduce job.

    1. NM registers itspresence with RM andregularly heartbeatsRM with its status

    4. NM launchesnew JVM to run

    AM on RM re

    1. NM registers itspresence with RM and

    regularly heartbeats

    RM with its status

    3. RM selects one of available NM tohost AM that will control the jobexecution. There will only be one instanceof AM per job in cluster

    2. Client splits the jobdata into small chunks

    and submits the ob to RM

    6. AM requests RM toallocate containers to run

    ma and reduce tasks.

    10. Once containelaunched, AM and Ccoordinate Map-Re

    execution amon th

    NM

    Container

    (map/Red TaskRunner)

    AM

    RMClient

    5. Client interact with AMto track the ob status

    7. RM responds with list of NMURLs that can host map/reducetasks containers

    8. AM coordinates withNM to launch the containeron NM host machine

    9. NMlaunches

    Container asper AMs ecification

  • 8/3/2019 Scaling Map-Reduce in new Hadoop release

    5/5

    1. As new NM instances are provisioned in the cluster; they register their present withRM along with the available resources (memory) on their host machine. Once

    registered, NM periodically heartbeats RM with its status and status of containerrunning on its host. If a NM fails to heartbeat within expected time window, RM

    considers it failed and starts the recovery procedures.

    2.

    Map Reduce client defines the job execution environment (AM executable, which inthis case is Map-Reduce executable, AM and map-reduce tasks resourcerequirements, data that needs to Map-Reduce analyzed), split the data into smaller

    independently analyzable chunks (typically each chunk size is equal to the size ofHDFS block) and submits the job to RM for execution.

    3. On receiving a job submission, RM selected one of the registered NM to host AM forthat job instance. There can only be one AM instance per job in the cluster. Onceinitiated, AM is completely responsible for Map-Reduce execution for that job.

    4. When instructed by RM, NM launches a java process to run AM executable.5. Reference to AM process is passed back to map-reduce client, which uses AM

    interface to administrate and track/inquire the status of map-reduce job.

    6.

    Each data split is Mapped by a separate Map task. For each Map & Reduce task, AMasks RM to allocate a container (JVM that will execute the Map/Reduce task) in thecluster.

    7. RM selects the appropriate NM hosts for container allocation based on the resourceavailability at NM and the Map-Reduce tasks resource requirements. This list ofselected NM hosts is returned to AM.

    8. AM coordinates with NM for launching container and passes on details of Map-Reduce executables and their execution environment to NM

    9. NM launches a new Map-Reduce task execution container based on AMspecification.

    10.Once Container is launched, AM and Container directly interact with each other fortask execution, status tracking, recovery from failed/slow tasks etc.

    With reduced responsibility, the load on RM (erstwhile Job Tracker) is significantly

    lesser; hence it is able to support much larger cluster of NMs and concurrent jobs.

    Notice, AM is generic entity, which is not restricted to run only Map-Reduce tasks.

    Interfaces are generic enough to have other style of data processing e.g. Master-Worker.In new architecture Map-Reduce becomes a user land library that is executed by AM.

    There are many other changes in release 0.23. Some of those are HDFS federation (aims

    at scaling HSFS NameNode and reduce the impact of NameNode failure), improvedresource utilization in HDFS (e.g. connection pooling with Data Nodes), Job recovery if

    RM/AM fails in the middle, improved Map-Reduce performance, Node local Map-Reduce tasks for smaller sized jobs. I will be covering them in detail in other blogs.