scaling map-reduce in new hadoop release

8/3/2019 Scaling Map-Reduce in new Hadoop release

1/5

Scaling Hadoop Map-Reduce in version 0.23

As businesses are getting more and more technology dependent to gain competitive

advantage, their systems are generating lots of feedback, usage data. While this data canunlock hugely valued business critical information and help fine-tune services, this data

is lying unused in backend system on disks and tapes. This is mainly because most of thisdata is unstructured, vast and there are not many well-established cost effectivealternatives that help analyzing this information.

This is where Hadoop comes to rescue. As Hadoop deployments are getting larger, its

main components HDFS and Map-Reduce are confronted with scalability concerns.Hadoop team is working on addressing some of these aspects in release 0.23, planned for

release in early 2012. One of the major change as a part of this effort is rearchitecture ofHadoop Map-Reduce infrastructure with focus on debottlenecking job tracker, one of themajor component of Map-Reduce Infrastructure.

This blog compares new Map-Reduce architecture with existing one and highlights howit addresses associated scale issues.

Image below shows 4 main component of map reduce infrastructure

Client creates a job, splits the data that needs Map-Reduce processing into smallerchunks, submit it to job tracker for execution and probe job tracker for job status.

Job Tracker (JT) Mainly responsible for -

Client Job Tracker

Task TrackerTask runner



2/5

Cluster Resource management Keeps track of available task tracker in thecluster and each task trackers capacity to execute concurrent map and reduce

tasks

Job management Complete management of job state and its execution,including creating and scheduling of map/reduce task on task tracker nodes,

guiding task tracker through various state transition associated with a taskexecution, tracking tasks progress and running speculative tasks (speculativetasks are run in lieu of slow running tasks).

Monitoring the health of task trackers and recovery of failed task/jobs when atask tracker fails in the middle of tasks execution

Maintain job progress, history and diagnostics information. Task Tracker (TT) Responsible for initiating map/reduce task execution on a

cluster node. It does so by spawning a new process for task execution, called taskrunner

Task Runner Executes actual map/reduce tasks and reports progress to TaskTracker, which intern report it to Job Tracker and Job Tracker guides various statetransition related to task management.

There can only be one instance of Job tracker in Hadoop cluster, which make it a single point of scale bottleneck and failure. If a job tracker fails, all information related to

currently running job and tasks is lost and all jobs need to be executed again on recoveryof Job tracker. A large Hadoop cluster could have 1000s of Task Tracker, and mostly

limited by the capacity of how many task trackers can be managed by Job Trackerinstance. As per some of industry benchmarks available, an instance of Job tracker is able

to manage at max 4000 Task trackers at the moment and new Hadoop version aims toscale this to 10s of thousand of nodes.

Let us see how new version aims to achieve this scale.

Let us get introduced to new roles in new architecture

Significant chunk of processing and memory intensive activity of Job tracker andtask tracker is moved into another entity named Application Master (AM).

Job tracker mainly left responsible for resource management and allocatingresources when demanded by AM. Resource management mainly includes

o Tracking nodes (NM see below for definition of NM) that are availablefor task executions

o Keeping track of available resource (at the moment available memory fortask execution on that node) on these nodes.

The component owning this reduced functionality of Job Tracker is name asResource Manager (RM) in new architecture.


3/5

Application Master (AM) is responsible for initiating and tracking the map/reducetask execution. For every map/reduce task that needs to be executed, AM asks

RM to allocate resources on any for the available task execution nodes (NM).While asking for resource allocation for a task, AM specifies the resource

requirement for that task. For every Job there is one and only one AM in cluster.

Task tracker is mainly left responsible for allocation and deallocation of TaskRunners. Task Runner is referred generically as Container in new architecture.

This reduced functionality of Task tracker is named as Node Manager (NM).

Following image segregates different roles of Job Tracker and Task Tracker in upper halfof the image and how those roles are mapped to components of new architecture in lowerhalf of the image

Executes map/reducetasks

Tracks available TT and their taskexecution capacity. Assigns tasks to a

TT based on available capacity

Maintains the progress and status of all the

running jobs, associated tasks capacity,

interacts with TT for task scheduling and

task status progress and state transition.This function consumes majority of JT

Allocates and deallocated Task Runnerbased on the task scheduling request

from JT

Coordinates and tracks the execution of

tasks and its progress with Task Runnerand feeds the status back into JT

Current Map Reduce Components

ClientRM NM

Container

AM

ClientJob Tracker


New Map Reduce Components


4/5

Following image explains in detail how various components in new architecture interactwith each other to execute a Map-Reduce job.

1. NM registers itspresence with RM andregularly heartbeatsRM with its status

4. NM launchesnew JVM to run

AM on RM re

1. NM registers itspresence with RM and

regularly heartbeats

RM with its status

3. RM selects one of available NM tohost AM that will control the jobexecution. There will only be one instanceof AM per job in cluster

2. Client splits the jobdata into small chunks

and submits the ob to RM

6. AM requests RM toallocate containers to run

ma and reduce tasks.

10. Once containelaunched, AM and Ccoordinate Map-Re

execution amon th

NM

Container

(map/Red TaskRunner)

AM

RMClient

5. Client interact with AMto track the ob status

7. RM responds with list of NMURLs that can host map/reducetasks containers

8. AM coordinates withNM to launch the containeron NM host machine

9. NMlaunches

Container asper AMs ecification


5/5

1. As new NM instances are provisioned in the cluster; they register their present withRM along with the available resources (memory) on their host machine. Once

registered, NM periodically heartbeats RM with its status and status of containerrunning on its host. If a NM fails to heartbeat within expected time window, RM

considers it failed and starts the recovery procedures.

2.

Map Reduce client defines the job execution environment (AM executable, which inthis case is Map-Reduce executable, AM and map-reduce tasks resourcerequirements, data that needs to Map-Reduce analyzed), split the data into smaller

independently analyzable chunks (typically each chunk size is equal to the size ofHDFS block) and submits the job to RM for execution.

3. On receiving a job submission, RM selected one of the registered NM to host AM forthat job instance. There can only be one AM instance per job in the cluster. Onceinitiated, AM is completely responsible for Map-Reduce execution for that job.

4. When instructed by RM, NM launches a java process to run AM executable.5. Reference to AM process is passed back to map-reduce client, which uses AM

interface to administrate and track/inquire the status of map-reduce job.

6.

Each data split is Mapped by a separate Map task. For each Map & Reduce task, AMasks RM to allocate a container (JVM that will execute the Map/Reduce task) in thecluster.

7. RM selects the appropriate NM hosts for container allocation based on the resourceavailability at NM and the Map-Reduce tasks resource requirements. This list ofselected NM hosts is returned to AM.

8. AM coordinates with NM for launching container and passes on details of Map-Reduce executables and their execution environment to NM

9. NM launches a new Map-Reduce task execution container based on AMspecification.

10.Once Container is launched, AM and Container directly interact with each other fortask execution, status tracking, recovery from failed/slow tasks etc.

With reduced responsibility, the load on RM (erstwhile Job Tracker) is significantly

lesser; hence it is able to support much larger cluster of NMs and concurrent jobs.

Notice, AM is generic entity, which is not restricted to run only Map-Reduce tasks.

Interfaces are generic enough to have other style of data processing e.g. Master-Worker.In new architecture Map-Reduce becomes a user land library that is executed by AM.

There are many other changes in release 0.23. Some of those are HDFS federation (aims

at scaling HSFS NameNode and reduce the impact of NameNode failure), improvedresource utilization in HDFS (e.g. connection pooling with Data Nodes), Job recovery if

RM/AM fails in the middle, improved Map-Reduce performance, Node local Map-Reduce tasks for smaller sized jobs. I will be covering them in detail in other blogs.

scaling map-reduce in new hadoop release

Documents