running yarn at scale

23
Our Experience Running YARN at Scale Bobby Evans

Upload: hadoopsummit

Post on 23-Dec-2014

1.083 views

Category:

Technology


2 download

DESCRIPTION

At Yahoo! over the past year we have helped migrate hundreds of our grids? users to YARN. Our YARN clusters have in aggregate run over 18 million jobs with more than 3 billion tasks consuming over 10 thousand years of compute time. With one single cluster running 90 thousand jobs a day. From this experience we would like to share what we have learned about running YARN well, how this is different from running a 1.0 based cluster, and what it takes to migrate your jobs to YARN from 1.0.

TRANSCRIPT

Page 1: Running Yarn at Scale

Our Experience Running YARN at ScaleBobby Evans

Page 2: Running Yarn at Scale

Agenda

• Who We Are• Some Background on YARN and YARN at Yahoo!• What Was Not So Good• What Was Good

Page 3: Running Yarn at Scale

Who I Am

Robert (Bobby) Evans• Technical Lead @ Yahoo!• Apache Hadoop Committer and PMC Member• Past

– Hardware Design– Linux Kernel and Device Driver Development– Machine Learning on Hadoop

• Current– Hadoop Core Development (MapReduce and

YARN)– TEZ, Storm and Spark

Page 4: Running Yarn at Scale

Who I Represent

• Yahoo! Hadoop Team– We are over 40 people developing, maintaining

and supporting a complete Hadoop stack including Pig, Hive, HBase, Oozie, and HCatalog.

• Hadoop Users @ Yahoo!

Page 5: Running Yarn at Scale

Agenda

• Who We Are• Some Background on YARN and YARN at

Yahoo!• What Was Not So Good• What Was Good

Page 6: Running Yarn at Scale

Hadoop Releases

Source: http://hadoop.apache.org/releases.html http://is.gd/axRlgJ

Page 7: Running Yarn at Scale

Yahoo! Scale

• About 40,000 Nodes Running Hadoop.• Around 500,000 Map/Reduce jobs a

day.• Consuming in excess of 230 compute

years every single day.• Over 350 PB of Storage.• On 0.23.X we have over 20,000 years

of compute time under our belts.

http://www.flickr.com/photos/doctorow/2699158636/

Page 8: Running Yarn at Scale

YARN Architecture

http://www.flickr.com/photos/bradhoc/7343761514/

Page 9: Running Yarn at Scale

Agenda

• Who We Are• Some Background on YARN and YARN at Yahoo!• What Was Not So Good• What Was Good

Page 10: Running Yarn at Scale

The AM Runs on Unreliable Hardware

• Split Brain/AM Recovery (FIXED for MR but not perfect)– For anyone else writing a YARN app, be aware

you have to handle this.

Page 11: Running Yarn at Scale

The AM Runs on Unreliable Hardware

• Debugging the AM is hard when it does crash.• AM can get overwhelmed if it is on a slow node or

the job is very large.• Tuning the AM is difficult to get right for large

jobs.– Be sure to tune the heap/container size. 1GB

heap can fit about 100,000 task attempts in memory (25,000 tasks worst case).

http://www.flickr.com/photos/cushinglibrary/3963200463/

Page 12: Running Yarn at Scale

Lack of Flow Control

• Both AM and RM based on an asynchronous event framework that has no flow control.

http://www.flickr.com/photos/iz4aks/4085305231/

Page 13: Running Yarn at Scale

Name Node Load

• YARN launches tasks faster than 1.0• MR keeps a running history log for recovery• Log Aggregation.– 7 days of aggregated logs used up

approximately 30% of the total namespace.

• 50% higher write load on HDFS for the same jobs

• 160% more rename operations• 60% more create, addBlock and fsync

operations

Page 14: Running Yarn at Scale

Web UI

• Resource Manager and History Server Forget Apps too Quickly

• Browser/Javascript Heavy• Follows the YARN model, so it can be confusing for those

used to old UI.

Page 15: Running Yarn at Scale

Binary Incompatibility

• Map/Reduce APIs are not binary compatible between 1.0 and 0.23. They are source compatible though so just recompile require.

Page 16: Running Yarn at Scale

Agenda

• Who We Are• Some Background on YARN and YARN at Yahoo!• What Was Not So Good• What Was Good

Page 17: Running Yarn at Scale

Operability

“The issues were not with incompatibilities, but coupling between applications and check-offs.”-- Rajiv Chittajallu

Page 18: Running Yarn at Scale

Performance

Tests run on a 350 node cluster on top of JDK 1.6.0

1.0.2 0.23.3 Improvement

Sort (GB/s throughput)

2.26 2.35 4%

Sort with compression (GB/s throughput)

4.5 4.5 0%

Shuffle (mean shuffle time secs)

303.8 263.5 13%

Scan (GB/s throughput)

25.2 22.9 -9%

Gridmx 3 replay (Runtime secs)

2817 2668 5%

Page 19: Running Yarn at Scale

Web Services/Log Aggregation

• No more scraping of web pages needed– Resource Manager– Node Managers– History Server– MR App Master

• Deep analysis of log output using Map/Reduce

Page 20: Running Yarn at Scale

Non Map Reduce Applications*

• Storm• TEZ• Spark• …

* Coming Soon

Page 21: Running Yarn at Scale

Total Capacity

Our most heavily used cluster was able to increase from 80,000 jobs a day to 125,000 jobs a day.

That is more than a 50% increase. It is like we bought over 1000 new servers and added it to the cluster.

This is primarily due to the removal of the artificial split between maps and reduces, but also because the Job Tracker could not keep up with tracking/launching all the tasks.

Page 22: Running Yarn at Scale

Conclusion

Upgrading to 0.23 from 1.0 took a lot of planning and effort. Most of that was stabilization and hardening of Hadoop for the scale that we run at, but it was worth it.

Page 23: Running Yarn at Scale

?Questions