field notes: yarn meetup at linkedin

YARN Meet Up Sep 2013@LinkedIn

By (lots of speakers)

(Editable) Agenda• Hadoop 2.0 beta – Vinod Kumar Vavilapalli

– YARN APIs stability– Existing applications

• Application History Server – Mayank Bansal• RM reliability – Bikas Saha, Jian He, Karthik Kambatla

– RM restartability– RM fail-over

• Apache Tez – Hitesh Shah, Siddharth Seth• Apache Samza• Apache Giraph • Apache Helix• gohadoop: go YARN application - Arun• Llama - Alejandro

Hadoop 2.0 beta

Vinod Kumar Vavilapalli

Hadoop 2.0 beta

• Stable YARN APIs• MR binary compatibility• Testing with the whole stack• Ready for prime-time!

YARN API stability

• YARN-386• Broke APIs for one last time• Hopefully • Exceptions/method-names• Security: Tokens used irrespective of kerberos• Read-only IDs, factories for creating records• Protocols renamed!• Client libraries

Compatibility for existing applications

• MAPREDUCE-5108• Old mapred APIs binary compatible• New mapreduce APIs source compatible• Pig, Hive, Oozie etc work with latest versions.

No need to rewrite your scripts.

Application History Server

Mayank Bansal

Contributions

Mayank Bansal

Zhije Shen

Devaraj K

Vinod Kumar Vavilapalli

YARN-321

https://issues.apache.org/jira/browse/YARN-321

Why we need AHS ?

• Job History Server is MR specific

• Jobs which are not MR

• RM Restart

• Hard coded Limits for number of jobs

• Longer running jobs

AHS• Different process or Embedded in RM

• Contains generic application Data

• Application• Application Attempts• Container

• Client Interfaces

• WEB UI• Client Interface• Web Services

AHS History Store• Pluggable History Store

• Storage Format is PB

• Backward Compatible• Much easier to evolve the storage

• HDFS implementation

AHS

RM

AHS

Store Write Interface

Store

Store

Reading

Interface

A

pp Submitted

App Finished

WEB APP

RPC

WS

Remaining Work

• Security

• Command Line Interface

Next Phase

Application Specific Data ???

Long running services

RM reliability

Bikas SahaJian He

Karthik Kambatla

RM reliability

• Restartability• High availabilty

RM restartability

Jian HeBikas Saha

Design and Work Plan• YARN-128 – RM Restart

– Creates framework to store and load state information. Forms the basis of failover and HA. Work close to completion and being actively tested.

• YARN-149 – RM HA– Adds HA service to RM in order to failover between

instances. Work in active progress.• YARN-556 – Work preserving RM Restart

– Support loss-less recovery of cluster state when RM restarts or fails over

– Design proposal up• All the work is being done in a carefully planned manner

directly on trunk. Code is always stable and ready.

RM Restart (YARN-128)

• Current state of the impl• Internal details• Impact on applications/frameworks• How to use

RM Restart

• Supports ZooKeeper, HDFS, and local FileSystem as the underlying store.

• ClientRMProxy – all Clients (NM, AM, clients) of RM have the same

retry behavior while RM is down.• RM restart is working in secure environment

now!

Internal details

RMStateStore

• Two types of State Info:– Application related state info: asynchronously

• ApplicationState– ApplicationSubmissionContext ( AM ContainerLaunchContext,

Queue, etc.)

• ApplicationAttemptState– AM container, AMRMToken, ClientTokenMasterKey, etc.

– RMDelegationTokenSecretManager State(not application specific) : synchronously

• RMDelegationToken• RMDelegationToken MasterKey• RMDelegationToken Sequence Number

RM Recovery Workflow

• Save the app on app submission– User Provided credentials (HDFSDelegationToken)

• Save the attempt on AM attempt launch– AMRMToken, ClientToken

• RMDelegationTokenSecretManager– Save the token and sequence number on token

generation– Save master key when it rolls

• RM crashes….

What happens after RM restarts?

• Instruct the old AM to shutdown• Load the ApplicationSubmissionContext

– Submit the application• Load the earlier attempts

– Loads the attempt credentials (AMRMToken, ClientToken)

• Launch a new attempt

Impact on applications/frameworks

Consistency between Downstream consumers of AM and YARN

• AM should notify its consumers that the job is done only after YARN reports it’s done– FinishApplicationMasterResponse.getIsUnregister

ed()– User is expected to retry this API until it becomes

true.– Similarly, kill-application (fix in progress)

For MR AM

• Races:– JobClient: AM crashes after JobClient sees

FINISHED but before RM removes the app when app finishes

• Bugs: relaunch FINISHED application(succeeded, failed, killed)

– HistoryServer: History files flushed before RM removes the app when app finishes

How to use?

How to use: 3 steps

• 1. Enable RM restart– yarn.resourcemanager.recovery.enabled

• 2. Choose the underlying store you want (HDFS, ZooKeeper, local FileSystem)– yarn.resourcemanager.store.class– FileSystemRMStateStore / ZKRMStateStore

• 3. Configure the address of the store– yarn.resourcemanager.fs.state-store.uri– hdfs://localhost:9000/rmstore

YARN – Fail over

Karthik Kambatla

RM HA (YARN-149)

● Architecture

● Failover / Admin

● Fencing

● Config changes

● FailoverProxyProvider

Architecture

● Active / Standby

○ Standby is powered up, but doesn’t have any

state

● Restructure RM services (YARN-1098)

○ Always On services

○ Active Services (e.g. Client <-> RM, AM <-> RM)

● RMHAService (YARN-1027)

Failover / Admin

● CLI: yarn rmhaadmin

● Manual failover

● Automatic failover (YARN-1177)

○ Use ZKFC

○ Start it as an RM service instead of a separate

daemon.

○ Re-visit and strip out unnecessary parts.

Fencing (YARN-1222)

● Implicit fencing through ZK RM StateStore

● ACL-based fencing on store.load() during

transition to active

○ Shared read-write-admin access to the store

○ Claim exclusive create-delete access

○ All store operations create-delete a fencing node

○ The other RM can’t write to the store anymore

Config changes (YARN-1232, YARN-986)

1. <name>yarn.resourcemanager.address</name><value>clusterid</value>

2. <name>yarn.resourcemanager.ha.nodes.clusterid</name><value>rm1,rm2</value>

3. <name>yarn.resourcemanager.ha.id</name><value>rm1</value>

4. <name>yarn.resourcemanager.address.clusterid.rm1</name><value>host1:23140</value>

5. <name>yarn.resourcemanager.address.clusterid.rm2</name><value>host2:23140</value>

FailoverProxyProvider

● ConfiguredFailoverProxyProvider (YARN-

1028)

○ Use alternate RMs from the config during retry

○ ClientRMProxy

■ addresses client-based RPC addresses

○ ServerRMProxy

■ addresses server-based RPC addresses

Apache TEZ

Hitesh Shah

© Hortonworks Inc. 2011

What is Tez?• A data processing framework that can execute a complex DAG

of tasks.

Architecting the Future of Big Data


Tez DAG and Tasks



TEZ as a Yarn Application• No deployment of TEZ jars required on all nodes in the Cluster

– Everything is pushed from either the client or from HDFS to the Cluster using YARN’s LocalResource functionality

– Ability to run multiple different versions

• TEZ Sessions– A single AM can handle multiple DAGs (“jobs”)– Amortize and hide platform latency

• Exciting new features– Support for complex DAGs – broadcast joins (Hive map joins)– Support for lower latency – container reuse and shared objects– Support for dynamic concurrency control – determine reduce parallelism at runtime



TEZ: Community• Early adopters and contributors welcome

– Adopters to drive more scenarios. Contributors to make them happen.– Hive and Pig communities are on-board and making great progress - HIVE-4660 and PIG-

3446 for Hive-on-Tez and Pig-on-Tez

• Meetup group – Please sign up to know more– http://www.meetup.com/Apache-Tez-User-Group

• Useful Links:– Website: http://tez.incubator.apache.org/– Code: http://git-wip-us.apache.org/repos/asf/incubator-tez.git– JIRA: https://issues.apache.org/jira/browse/TEZ– Mailing Lists:

– [email protected]– [email protected]

https://issues.apache.org/jira/browse/TEZ-65


https://issues.apache.org/jira/browse/TEZ-65

Apache TezApache SamzaApache Giraph

Apache Helix

YARN usage @LinkedIn

YARN Go demo

https://github.com/hortonworks/gohadoopArun C Murthy

https://github.com/hortonworks/gohadoop

Llama

Alejandro Abdelnur

field notes: yarn meetup at linkedin

Technology