high availability in yarn

25
High Availability in YARN ID2219 Project Presentation Arinto Murdopo ([email protected])

Upload: arinto-murdopo

Post on 29-Nov-2014

2.830 views

Category:

Documents


4 download

DESCRIPTION

Project presentation for High Availability in YARN project. We propose to use MySQL Cluster (NDB) to tackle High Availability issue in YARN. We also developed benchmark framework to investigate whether MySQL Cluster (NDB) is better than Apache's proposed storage (ZooKeeper and HDFS) Full project report will be uploaded after I finish it.

TRANSCRIPT

Page 1: High Availability in YARN

High Availability in YARN

ID2219 Project PresentationArinto Murdopo ([email protected])

Page 2: High Availability in YARN

• Mário A. (site – 4khnahs #at# gmail)• Arinto M. (site – arinto #at# gmail)• Strahinja L. (strahinja1984 #at# gmail)• Umit C.B. (ucbuyuksahin #at# gmail)

• Special thanks– Jim Dowling (SICS, supervisor)– Vasiliki Kalavri (EMJD-DC, supervisor)– Johan Montelius (Course teacher)

The team!

12/6/2012 2

Page 3: High Availability in YARN

• Define: YARN• Why is it not highly available (H.A.)?• Providing H.A in YARN• What storage to use?• Here comes NDB• What we have done so far?• Experiment result• What’s next?• Conclusions

Outline

12/6/2012 3

Page 4: High Availability in YARN

• YARN = Yet Another Resource Negotiator

• Is NOT ONLY MapReduce 2.0, but also…

• Framework to develop and/or execute distributed processing applications

• Example: MapReduce, Spark, Apache HAMA, Apache Giraph

Define: YARN

12/6/2012 4

Page 5: High Availability in YARN

Define: YARN

12/6/2012 5

Split JobTracker’s responsibilities

Per-App AppMaster

Generic containers

Page 6: High Availability in YARN

What is it not highly available (H.A.)?

12/6/2012 6

ResourceManager is Single Point of Failure (SPoF)

Page 7: High Availability in YARN

Providing H.A. in YARN

12/6/2012 7

Proposed approach• store and reload state • failure model:

1. Recovery 2. Failover 3. Stateless

Page 8: High Availability in YARN

Failure Model#1: Recovery

12/6/2012 8

Store statesLoad states

1. RM stores states when needed2. RM failure happens3. Clients keep retrying4. RM restarts and loads states5. Clients successfully connect to

resurrected RM6. Downtime exists!

Page 9: High Availability in YARN

Failure Model#2: Failover

12/6/2012 9

Resource Manager

Standby Resource Manager

• Utilize Standby RM• Little Downtime

Store

Load

Page 10: High Availability in YARN

Failure Model#3: Stateless

12/6/2012 10

Resource Manager

Resource Manager

AppMasterNode

ManagerClient

Store all states in storage, example:1. NM Lists2. App Lists

Page 11: High Availability in YARN

Apache proposed• Hadoop Distributed File System (HDFS)– Fault-tolerant, large datasets, streaming

access to data and more

• ZooKeeper– Highly reliable distributed coordination– Wait-free, FIFO client ordering,

linearizables writes and more

What storage to use?

12/6/2012 11

Page 12: High Availability in YARN

NDB MySQL Cluster is a scalable, ACID-compliant transactional database

Some features• Designed for availability (No SPoF)• In-memory distributed database• Horizontal scalability (auto-sharding, no downtime

when adding new node)• Fast R/W rate• Fine grained locking• SQL and NoSQL Interface

Here comes NDB

12/6/2012 12

Page 13: High Availability in YARN

Here comes NDB

12/6/2012 13

Client

Page 14: High Availability in YARN

Here comes NDB

12/6/2012 14

Linear horizontal scalability

Up to 4.3 Billion reads/minute!

MySQL Cluster version 7.2

Page 15: High Availability in YARN

• Phase 1: The Ndb-storage-class– Apache proposed failure model– We developed NdbRMStateStore, that has

H.A!

• Phase 2 : The Framework– Apache created ZK and FS storage classes– We developed a framework for storage

benchmarking

What we have done so far?

12/6/2012 15

Page 16: High Availability in YARN

Apache – implemented Memory Store for Resource Manager

(RM) recovery (MemoryRMStateStore)– Application State and Application Attempt are stored– Restart app when RM is resurrected – It’s not really H.A.!

We – Implemented NDB Mysql Cluster Store (NdbRMStateStore)using clusterj

– Implemented TestNdbRMRestart, to prove the H.A. in YARN

Phase 1: The Ndb-storage-class

12/6/2012 16

Page 17: High Availability in YARN

TestNdbRM-Restart

Phase 1: The-Ndb-storage-class

12/6/2012 18

Restart all unfinished jobs

Page 18: High Availability in YARN

Apache– Implemented Zookeeper Store (ZKRMStateStore)– Implemented File System Store

(FileSystemRMStateStore)

We– Developed a storage-benchmark-framework to benchmark both performances with our store– https://github.com/4knahs/zkndb

Phase 2: The Framework

12/6/2012 19

Page 19: High Availability in YARN

zkndb = framework for storage benchmarking

Phase 2: The Framework

12/6/2012 20

Page 20: High Availability in YARN

zkndb extensibility

Phase 2: The Framework

12/6/2012 21

Page 21: High Availability in YARN

• ZooKeeper– Three nodes in SICS cluster– Each ZK process has max memory of 5GB

• HDFS– Three DataNodes and one Namenode– Each HDFS DN and NN process has max memory

of 5GB

• NDB– Three-node cluster

Experiment Setup

12/6/2012 22

Page 22: High Availability in YARN

Experiment Result #1

12/6/2012 23

Load Setup#1:1 node12 threads60 seconds

Each node:Dual six-core [email protected]

All clusters consist of 3 nodes

Utilize Hadoop code for ZK and HDFS

ZK is limited by its store

implementation

Not good for small

files!

Page 23: High Availability in YARN

Experiment Result #2

12/6/2012 24

Load Setup#2:3 nodes@12 threads30 seconds

Each node:Dual six-core [email protected]

All clusters consist of 3 nodes

Utilize Hadoop code for ZK and HDFS

ZK could scale a bit more!

Get even worse due to root lock in NameNode!

Page 24: High Availability in YARN

• Scheduler and ResourceTracker Analysis

• Stateless Architecture

• Study the overhead of writing state to NDB

What’s next?

12/6/2012 25

Page 25: High Availability in YARN

• NDB has higher throughput than ZK and HDFS

• NDB is the suitable storage for Stateless Failure Model

• but ZK and HDFS are not for Stateless Failure Model!

Conclusions

12/6/2012 26