understanding yarn - pune apex meetup jan 06 2016

16
Apache Apex Meetup Building YARN Application Priyanka Gugale January 06, 2016

Upload: priyanka-gugale

Post on 12-Apr-2017

259 views

Category:

Engineering


3 download

TRANSCRIPT

Apache Apex Meetup

Building YARN Application

Priyanka GugaleJanuary 06, 2016

Apache Apex Meetup

Agenda

● Understanding YARN○ Why YARN○ Introducing YARN○ YARN architecture○ Beyond batch○ Application Lifecycle

● Building YARN application

Apache Apex Meetup

Why YARN

Hadoop v1 (MR1) Architecture● Job Tracker

○ Manages cluster resources ○ Job scheduling

● Task Tracker○ Per-node Agent○ Manages tasks

MapReduce Status

Job Submission

JobTracker

Task Task

Task Task

Client

Client

TaskTracker

Task Task

Task Tracker

TaskTracker

Apache Apex Meetup

Limitations with MR1

• Scalabilityo Maximum cluster size: 4,000 nodeso Maximum concurrent tasks: 40,000

• Availability

• Resource Utilization

• Running non-MapReduce applications

Why YARN (Cont…)

Apache Apex Meetup

Introducing YARN

● YARN - Yet Another Resource Negotiator

● Framework that facilitates writing arbitrary distributed processing frameworks and applications.

● YARN Applications/frameworks:e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.

Apache Apex Meetup

Hadoop beyond Batch

YARN for better resource utilization

More applications than MapReduce

Apache Apex Meetup

Introducing YARN

7Proprietary and Confidential

Job Tracker

Resource Manager

Application Master

Timeline Server

Task Tracker Node Manager

Map Slot

Reduce Slot

YARNMap Reduce 1

Apache Apex Meetup

• Resource Managero Manages and allocates cluster resources

o Application scheduling

o Applications Manager

• Node Managero Per-machine agent

o Manages life-cycle of container

o Monitors resources

• Application Mastero Per-application

o Manages application scheduling and task execution

Hadoop v2 (YARN) Architecture

App Master Cntr

NodeManager

Cntr Cntr

NodeManager

Cntr AppMaster

NodeManager

ResourceManager

MapReduce StatusJob SubmissionNode StatusResource Request

Client

Client

Apache Apex Meetup

Application Submission workflow

YarnClient

Node RM

(ApplicationsManagers + Scheduler)

Node NM

Node NM

Application Master

ContainerContainer

1) Submit application

2) Launch application Master

RM = Resource ManagerNM = Node ManagerAM = Application Master = Heartbeats

3) AM registers with RM

4) AM negotiates for containers

5) Launch Container

5) Launch Container

Apache Apex Meetup

Building YARN application

Apache Apex Meetup

Sample YARN application - Client

1. Start the service - YarnClient- YarnClient.start()

2. Create Application object - YarnClientApplication- YarnClient.createApplication()

3. Set up App Context - ApplicationSubmissionContext- ApplicationSubmissionContext represents information needed by ResourceManager to launch ApplicationMaster

4. Submit application to resource manager- YarnClient.submitApplication(ApplicationSubmissionContext)

11Proprietary and Confidential

AppName, Priority, ContainerLaunchContext,…

Apache Apex Meetup

Sample YARN Application - App Master1. Register App Master with Resource Manager

- AMRMClient.registerApplicationMaster

2. Negotiate containers from resource manager - Provides ContainerRequest - request for container resources- AMRMClient.addContainerRequest

3. Build ContainerLaunchContext- Uses container returned by Resource Manager- ContainerLaunchContext - represents information needed by node manager to launch a container

12Proprietary and Confidential

ContainerId,Commands,Environment,LocalResources,…

Apache Apex Meetup

Sample YARN Application - App Master (cont…)

4. Launch container using NMClient.startContainer

5. Wait till all containers are done- AllocateResponse.getCompletedContainersStatuses

6. Unregister application from Resource Manager- AMRMClient.unregisterApplicationMaster

13Proprietary and Confidential

Apache Apex Meetup

References● Simple Yarn code example

○ https://github.com/hortonworks/simple-yarn-app

● Document references○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/○ http://www.slideshare.net/

Apache Apex Meetup

Resources

15

Apache Apex Community Page

Apache Apex LinkedIn Group

Apache Apex Meetup16Proprietary and Confidential

[email protected]@datatorrent.com