yarn at linkedin

21
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING ©2013 LinkedIn Corporation. All Rights Reserved. Welcome to YARN Meetup September 2013

Post on 22-Sep-2014

657 views

Category:

Technology


0 download

DESCRIPTION

YARN meetup on September 2013 at LinkedIn How LinkedIn is using YARN? What is the future of YARN at LinkedIn? New Giraph AM on YARN and experiences of running LinkedIn Graph.

TRANSCRIPT

Page 1: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING©2013 LinkedIn Corporation. All Rights Reserved.

Welcome to YARN MeetupSeptember 2013

Page 2: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

YARN @ LinkedInState of the Art

Mohammad Islam

Page 3: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter

YARN is good fit for many LinkedIn problems Many initiatives by multiple teams LI Engineers enjoy the fun of emergent

technologies

Page 4: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter Samza : Real-time stream processing

system– Developed by LinkedIn team– Apache incubator project– Use YARN and Kafka– Detailed presentation coming later today

Page 5: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter

Helix – Generic cluster management system

– Built and used in LinkedIn– Apache Incubator project– Incorporating YARN resource management– Stay tuned to learn more today

Page 6: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter

Not yet open sourced– Few projects are incubating at LI– Mostly around custom and near-realtime

execution engine– Status: Some in POC and some are in

design state

Page 7: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter

Administering YARN:– One of the pioneers of a 2.1.0-beta prod-like

deployment– Led by our Ops/Dev team– Found a lot of issues

Kerberos auth (YARN -621 & others)

– Contributing back to Apache to stabilize YARN Streamlined operational tools (HADOOP-

9902)

Page 8: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Early Adopter

Pig on Tez: Actively working with Pig community

Hosted a small “Pig on Tez” dev meeting– Participants include: Yahoo, HortonWorks, Netflix

and LinkedIn

Developed a high-level implementation plan

Page 9: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Apache Giraph on YARN

Page 10: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Overview of Giraph

A distributed graph processing framework– Master/slave architecture– In-memory computation– Vertex-centric high-level programming model– Based on Bulk Synchronous Parallel (BSP)

10

Page 11: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Quick History HortonWorks/LinkedIn intern (Eli) wrote the

early version of Giraph AM Based on 2.0.3 Since then YARN has evolved a lot! API overhauled

Action: Overhaul Giraph onYARN

Page 12: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Giraph on YARN

12

Node Manager

Worker Worker

Node Manager

App Mstr

Worker

Node Manager

Worker

Resource Manager

Client ZooKeeper

Master

Page 13: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

New Giraph AM

Girpah AM : Nearly a complete rewrite by LinkedIn Hadoop dev.

– Used new stable API – Adopt new asynchronous/event based model– Status: Patch ready

Client– Used new API– Status: Patch ready

Security– Added Kerberos support for Giraph YARN client and AM– Status: Testing

Page 14: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Memory Footprint - Page Rank Algorithm

Iteration 3 Iteration 27

Reachable 1.5

Un-reach-able 3

Reachable 1.5

Unreachable 6

GB

GB

GB

GB

Page 15: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Challenges in Giraph

Memory intensive Java based system Various (GC) knobs to tune the system and

application Depends heavily on skillful application

developers Performance degradation from scaling up Not a good player for multi-tenant system

15

Page 16: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Future Direction

Option 1: “Worker” in C++ – C++provides direct control over memory management– No need to rewrite the whole Giraph

Issue : Adoption barrier– Writing C++ application– Possible solution: Giraph scripting language

Like Hive or Pig

Option 2: Off-heap memory usage

16

Option 3: Leave it alone!

Page 17: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Final Thoughts on Giraph

LinkedIn is the 1st player of Giraph on YARN Successfully executed full LinkedIn graph run

– Page Rank algorithm– 200M+ vertices and XX Billions edges– On 40-node cluster with 650GB memory– Total time taken: 28 minutes

Ready to go! Scope for improvements utilizing YARN’s

flexibility

17

Page 18: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Challenges in YARN

Failover of various components (RM/AM etc.) APIs stabilization –almost there! Representative examples for quick dev ramp-up Better documentation

– Book on its way!

Operational friendly– Centralized logging– SLA support – timed resource constraint.

Page 19: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Concluding on YARN

YARN is the way to go forward! Reduce the innovation barrier Support non-MR execution platform Improved utilization/performance

– By removing the split of map/reduce slot– Through distribution of JT responsibility

Page 20: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Q& A

Thanks for coming!

Page 21: Yarn at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING

Giraph Architecture

Master / Workers Zookeeper

21

Master Worker Worker Worker

Worker Worker Worker

Worker Worker Worker