resource management with yarn: yarn past, present and future

43
1 Resource Management with YARN: YARN Past, Present and Future Anubhav Dhoot Software Engineer Cloudera

Upload: isaac-cox

Post on 03-Jan-2016

183 views

Category:

Documents


5 download

DESCRIPTION

Resource Management with YARN: YARN Past, Present and Future. Anubhav Dhoot Software Engineer Cloudera. Resource Management. Map Reduce. Impala. Spark. YARN (DYNAMIC RESOURCE MANAGEMENT). YARN (Yet Another Resource Negotiator). Hadoop. Traditional Operating System. Storage: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Resource Management with YARN: YARN Past, Present and  Future

1

Resource Management with YARN:YARN Past, Present and FutureAnubhav DhootSoftware EngineerCloudera

Page 2: Resource Management with YARN: YARN Past, Present and  Future

Resource Management

Map Reduce Impala Spark

YARN (DYNAMIC RESOURCE MANAGEMENT)

Page 3: Resource Management with YARN: YARN Past, Present and  Future

YARN (Yet Another Resource Negotiator)Traditional Operating System

Storage:File System

Execution/Scheduling:

Processes/Kernel

Scheduler

Hadoop

Storage:Hadoop

Distributed File System

(HDFS)

Execution/Scheduling:Yet Another

Resource Negotiator

(YARN)

Page 4: Resource Management with YARN: YARN Past, Present and  Future

Overview of Talk

• History of YARN• Recent features• On going features• Future

Page 5: Resource Management with YARN: YARN Past, Present and  Future

WHY YARN

Page 6: Resource Management with YARN: YARN Past, Present and  Future

Traditional Distributed Execution Engines

Master Worker

Worker

Worker

TaskTask

TaskTask

TaskTask

ClientClient

Page 7: Resource Management with YARN: YARN Past, Present and  Future

MapReduce v1 (MR1)

Job Tracker Task Tracker

Task Tracker

Task Tracker

MapMap

ReduceMap

MapReduce

ClientClient

JobTracker tracks every task in the cluster!

Page 8: Resource Management with YARN: YARN Past, Present and  Future

MR1 Utilization

4 GB

Map1024 MB

Map1024 MB

Reduce1024 MB

Reduce1024 MB

Fixed-size slot model forces slots large enough for the biggest task!

Page 9: Resource Management with YARN: YARN Past, Present and  Future

Running multiple frameworks…

Master Worker

Worker

Worker

TaskTask

TaskTask

TaskTask

ClientClient

Master Worker

Worker

Worker

TaskTask

TaskTask

TaskTask

ClientClient

Master Worker

Worker

Worker

TaskTask

TaskTask

TaskTask

ClientClient

Page 10: Resource Management with YARN: YARN Past, Present and  Future

YARN to the rescue!

• Scalability: Track only applications, not all tasks.• Utilization: Allocate only as many resources as needed. • Multi-tenancy: Share resources between frameworks and users

• Physical resources – memory, CPU, disk, network

Page 11: Resource Management with YARN: YARN Past, Present and  Future

YARN Architecture

Resource Manager

Node Manager

Node Manager

Node ManagerAppMaster Container

Client

Client

Cluster State

ApplicationsState

Page 12: Resource Management with YARN: YARN Past, Present and  Future

MR1 to YARN/MR2 functionality mapping

• JobTracker is split into• ResourceManager – cluster-management, scheduling and

application state handling• ApplicationMaster – Handle tasks (containers) per

application (e.g. MR job)• JobHistoryServer – Serve MR history

• TaskTracker maps to NodeManager

Page 13: Resource Management with YARN: YARN Past, Present and  Future

EARLY FEATURES

Page 14: Resource Management with YARN: YARN Past, Present and  Future

Handing faults on Workers

Resource Manager

Node Manager

Node Manager

Node ManagerAppMaster Container

Client

Client

AppMaster

ContainerCluster State

ApplicationsState

Page 15: Resource Management with YARN: YARN Past, Present and  Future

Master Fault-tolerance - RM Recovery

Node Manager

Node ManagerAppMaster Container Client

Client

Resource Manager

Cluster State

ApplicationsState

RM Store

AppMaster Container

Page 16: Resource Management with YARN: YARN Past, Present and  Future

Master Node Fault toleranceHigh Availability (Active / Standby)

Node Manager

Node Manager

AppMaster

Client

ClientActive

Resource Manager

RM Store

StandbyResource Manager

Elector

ElectorZK

Page 17: Resource Management with YARN: YARN Past, Present and  Future

Master Node Fault toleranceHigh Availability (Active / Standby)

Node Manager

Node ManagerAppMaster

Client

Client

ActiveResource Manager

RM Store

StandbyResource ManagerElector

ElectorZK

Page 18: Resource Management with YARN: YARN Past, Present and  Future

Scheduler

• Inside ResourceManager• Decides who gets to run when and where• Uses “Queues” to describe organization needs• Applications are submitted to a queue• Two schedulers out of the box

• Fair Scheduler• Capacity Scheduler

Page 19: Resource Management with YARN: YARN Past, Present and  Future

Fair Scheduler Hierarchical QueuesRoot

Mem Capacity: 12 GBCPU Capacity: 24 cores

MarketingFair Share Mem: 4 GB

Fair Share CPU: 8 cores

R&DFair Share Mem: 4 GB

Fair Share CPU: 8 cores

SalesFair Share Mem: 4 GB

Fair Share CPU: 8 cores

Jim’s TeamFair Share Mem: 2 GB

Fair Share CPU: 4 cores

Bob’s TeamFair Share Mem: 2 GB

Fair Share CPU: 4 cores

Page 20: Resource Management with YARN: YARN Past, Present and  Future

Fair Scheduler Queue Placement Policies

<queuePlacementPolicy> <rule name="specified" /> <rule name="primaryGroup" create="false" /> <rule name="default" /> </queuePlacementPolicy>

Page 21: Resource Management with YARN: YARN Past, Present and  Future

Multi-Resource Scheduling

● Node capacities expressed in both memory and CPU● Memory in MB and CPU in terms of vcores● Scheduler uses dominant resource for making

decisions

Page 22: Resource Management with YARN: YARN Past, Present and  Future

Multi-Resource Scheduling

12 GB33% cap. 3 cores

25% cap.

10 GB28% cap.

6 cores50% cap.

Queue 1 Usage Queue 2 Usage

Page 23: Resource Management with YARN: YARN Past, Present and  Future

Multi-Resource Enforcement

● YARN kills containers that use too much memory● CGroups for limiting CPU

Page 24: Resource Management with YARN: YARN Past, Present and  Future

RECENTLY ADDED FEATURES

Page 25: Resource Management with YARN: YARN Past, Present and  Future

RM recovery without losing work

• Preserving running containers on RM restart• NM no longer kills containers on resync• AM made to register on resync with RM

Page 26: Resource Management with YARN: YARN Past, Present and  Future

RM recovery without losing work

Node Manager

Node ManagerAppMaster Container Client

Client

Resource Manager

Cluster State

ApplicationsState

RM Store

Page 27: Resource Management with YARN: YARN Past, Present and  Future

NM Recovery without losing work

• NM stores container and its associated state in a local store

• On restart reconstruct state from store• Default implementation using LevelDB• Supports rolling restarts with no user impact

Page 28: Resource Management with YARN: YARN Past, Present and  Future

NM Recovery without losing work

Resource Manager

Node Manager

AppMaster Container

Client

Client

Cluster State

ApplicationsState State

Store

Page 29: Resource Management with YARN: YARN Past, Present and  Future

Fair Scheduler Dynamic User QueuesRoot

Mem Capacity: 12 GBCPU Capacity: 24 cores

MarketingFair Share Mem: 4 GB

Fair Share CPU: 8 cores

R&DFair Share Mem: 4 GB

Fair Share CPU: 8 cores

SalesFair Share Mem: 4 GB

Fair Share CPU: 8 cores

MoeFair Share Mem: 4 GB

Fair Share CPU: 8 cores

LarryFair Share Mem: 2 GB

Fair Share CPU: 4 cores

MoeFair Share Mem: 2 GB

Fair Share CPU: 4 cores

Page 30: Resource Management with YARN: YARN Past, Present and  Future

ON GOING FEATURES

Page 31: Resource Management with YARN: YARN Past, Present and  Future

Long Running Apps on Secure Clusters (YARN-896)

● Update tokens of running applications● Reset AM failure count to allow mulitple failures

over a long time● Need to access logs while application is running● Need a way to show progress

Page 32: Resource Management with YARN: YARN Past, Present and  Future

Application Timeline Server (YARN-321, YARN-1530)

● Currently we have a JobHistoryServer for MapReduce history

● Generic history server● Gives information even while job is running

Page 33: Resource Management with YARN: YARN Past, Present and  Future

Application Timeline Server

● Store and serve generic data like when containers ran, container logs

● Apps post app-specific eventso e.g. MapReduce Attempt Succeeded/Failed

● Pluggable framework-specific UIs● Pluggable storage backend ● Default LevelDB

Page 34: Resource Management with YARN: YARN Past, Present and  Future

Disk scheduling (YARN-2139 )

● Disk as a resource in addition to CPU and Memory● Expressed as virtual disk similar to vcore for cpu● Dominant resource fairness can handle this on the

scheduling side● Use CGroups blkio controller for enforcement

Page 35: Resource Management with YARN: YARN Past, Present and  Future

Reservation-based Scheduling (YARN-1051)

Page 36: Resource Management with YARN: YARN Past, Present and  Future

Reservation-based Scheduling

Page 37: Resource Management with YARN: YARN Past, Present and  Future

FUTURE FEATURES

Page 38: Resource Management with YARN: YARN Past, Present and  Future

Container Resizing (YARN-1197)

● Change container’s resource allocation● Very useful for frameworks like Spark that schedule

multiple tasks within a container● Follow same paths as for acquiring and releasing

containers

Page 39: Resource Management with YARN: YARN Past, Present and  Future

Admin labels (YARN-796)● Admin tags nodes with labels (e.g. GPU)● Applications can include labels in container requests

NodeManager[GPU, beefy]

NodeManager[Windows]

Application MasterI want a GPU

Page 40: Resource Management with YARN: YARN Past, Present and  Future

Container Delegation (YARN-1488)

● Problem: single process wants to run work on behalf of multiple users.

● Want to count resources used against users that use them.

● E.g. Impala or HDFS caching

Page 41: Resource Management with YARN: YARN Past, Present and  Future

Container Delegation (YARN-1488)

● Solution: let apps “delegate” their containers to other containers on the same node.

● Delegated container never runs● Framework container gets its resources● Framework container responsible for fairness within

itself

Page 42: Resource Management with YARN: YARN Past, Present and  Future

Questions?

Page 43: Resource Management with YARN: YARN Past, Present and  Future

43

Thank You!Anubhav Dhoot, Software Engineer, [email protected]