running services on yarn

25
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Running Services on YARN Munich, April 2017 Varun Vasudev

Upload: dataworks-summithadoop-summit

Post on 24-Jan-2018

449 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Running Services on YARN

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Running Services on YARNMunich, April 2017

Varun Vasudev

Page 2: Running Services on YARN

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About myself

⬢ Apache Hadoop contributor since 2014

⬢ Apache Hadoop committer and PMC member

⬢ Currently working for Hortonworks

[email protected]

Page 3: Running Services on YARN

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction to Apache Hadoop YARN

⬢ Architectural center of big data workloads

⬢ Enterprise adoption–Secure mode is popular

–Multi-tenant support

⬢ SLAs–Tolerance for slow running jobs decreasing

–Consistent performance desired

⬢ Diverse workloads increasing–LLAP on Slider

Page 4: Running Services on YARN

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction to Apache Hadoop YARN

YARN: Data Operating System

(Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

TezTez

Java

Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV

Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase

Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

YARNThe Architectural Center of Hadoop

• Common data platform, many applications

• Support multi-tenant access & processing

• Batch, interactive & real-time use cases

Page 5: Running Services on YARN

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Several important trends in age of Hadoop 3.0 +

YARN and Other Platform Services

Storage

Resource

Management SecurityService

Discovery Management

Monitoring

Alerts

IOT Assembly

Kafka Storm HBase Solr

Governance

MR Tez Spark …

Innovating

frameworks:

Flink,

DL(TensorFlow),

etc.

Various Environments

On Premise Private Cloud Public Cloud

Page 6: Running Services on YARN

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Services workloads becoming more popular

⬢Users using more and more long running services like LLAP, HiveServer, HBase, etc

⬢ Service workloads are gaining more importance–Need a webserver to serve results from a MR job

–New YARN UI can be run in its own container

–ATSv2 would like to launch ATS reader containers as well

–Applications want to run their own shuffle service

Page 7: Running Services on YARN

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Node 1

NodeManager128G, 16 vcores

Launch Applicaton 1 AMAM process

Launch AM process via

ContainerExecutor – DCE, LCE, WSCE.

Monitor/isolate memory and cpu

Application Lifecycle

ResourceManager

(active)

Request containers

Allocate containersContainer 1 process

Container 2 process

Launch containers on node using

DCE, LCE, WSCE. Monitor/isolate

memory and cpu

History Server(ATS – leveldb,

JHS - HDFS)

HDFS

Log aggregation

Page 8: Running Services on YARN

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Application Lifecycle

⬢ Designed for batch jobs–Jobs run for hours, days

–Jobs are using frameworks(like MR, Tez, Spark) which are aware of YARN

–Container failure is bad but frameworks have logic to handle it

•Local container state loss is handled

–Jobs are chained/pipelined using application ids

–Debugging is an offline event

⬢ Doesn’t carry over cleanly for services–Services run for longer periods of time

–Services may or may not be aware of YARN

–Container loss is a bigger problem, can have really bad consequences

–Services would like to discover other services

–Debugging is an online event

Page 9: Running Services on YARN

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Enabling Services on YARN

Page 10: Running Services on YARN

10

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Enabling Services on YARN

⬢ AM to manage services

⬢ Service discovery

⬢ Container lifecycle

⬢ Scheduler changes

⬢ YARN UI

⬢ Application upgrades

⬢Other issues–Log collection

–Support for monitoring

Page 11: Running Services on YARN

11

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

AM to manage services

⬢ Any service/job on YARN requires an AM–AM’s are hard to write

–Different services will re-implement the same functionalities

–AM has to keep up with changes in Apache Hadoop

⬢Native YARN framework layer for services(YARN-5079)–Provides an AM that ships as part of Apache Hadoop that can be used to manage services

–AM is from the Apache Slider project

–AM provides REST APIs to manage applications

–Has support for functionality such as port scheduling, flexing the number of containers

–Maintained by the Apache Hadoop developers so it’s kept up to date with the rest of YARN

–New YARN REST APIs to launch services

Page 12: Running Services on YARN

12

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN REST API to launch services

{ "name": "vvasudev-druid-2017-03-16","resource": {

"cpus": 16, "memory": "51200"

}, "components" : [

{ "name": "vvasudev-druid", "dependencies": [ ], "artifact": { "id": ”druid-image:0.1.0.0-25", "type": "DOCKER"

}, "configuration": { "properties": {

"env.CUSTOM_SERVICE_PROPERTIES": "true", "env.ZK_HOSTS": ”zkhost1:2181,zkhost2:2181,zkhost3:2181"

} }

} ],

"number_of_containers": 5, "launch_command": "/opt/druid/start-druid.sh", "queue" : ”yarn-services”

}

Page 13: Running Services on YARN

13

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Service discovery

⬢ Long running services require a way to discover them–Application ids are constant for the lifetime of the application

–Container ids are constant for the lifetime of the container but containers will come up and go down

⬢ Add support for discovery of long running services using DNS and the Registry Service(YARN-4757)–DNS is well understood

–Registry service will have a record of the application to DNS name

–YARN has a DNS server but currently this is for testing and experimentation only

–YARN will need to add support for DNS updates to fit into existing DNS solutions

Page 14: Running Services on YARN

14

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Service Discovery

NodeManagerNodeManager

NodeManager

ResourceManager

DNS Server Registry Service

ApplicationManager

ZookeeperZookeeper

Zookeeper

User

Page 15: Running Services on YARN

15

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Container lifecycle

⬢When the container exits, the NodeManager(NM) reclaims all the resources immediately–NM also cleans up any local state that the container maintained

⬢ AM may or may not be able to get a container back on the same node–NM has to download any private resources again for the container leading to delays in restarts

⬢ Added support for first class container re-tries(YARN-4725)–AM can specify retry policy when starting the container

–On process exit, the NM will not clean up any state or resources

– Instead it will attempt to retry the container

–AM can specify limits on the number of retries as well as the delay between retries

Page 16: Running Services on YARN

16

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Container Lifecycle

NodeManager Container process

Disk 1 Disk 2Disk 3

HDFS

ApplicationContainer

Data

Page 17: Running Services on YARN

17

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Scheduler improvements

⬢ In case of services, affinity and and anti-affinity become important–Affinity and anti-affinity apply at a container and an application level – e.g. don’t schedule two

HBase region servers on the same node but schedule the Spark containers on the same nodes as the region server

⬢ Support is being added for affinity and anti-affinity in the RM(YARN-5907)–Slider AM already has some basic support for container affinity and anti-affinity via re-tries

–RM can do a better job of container placement if it has first class support

–AMs can specify affinity and anti-affinity policies to get the right placement they need

Page 18: Running Services on YARN

18

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Scheduler improvements - Affinity and Anti-affinity

⬢ Anti-Affinity–Some services don’t want their daemons run on the same host/rack for better fault recovering or

performance.

–For example, don’t run >1 HBase region server on the same fault zone.

Page 19: Running Services on YARN

19

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Scheduler Improvements - Affinity and Anti-affinity

⬢ Affinity–Some services want to run their daemons on the same host/rack, etc. for performance.

–For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker)

Page 20: Running Services on YARN

20

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN UI(YARN-3368)

Page 21: Running Services on YARN

21

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

YARN UI - Services

Page 22: Running Services on YARN

22

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Application upgrades

⬢ YARN has no support for container or application upgrades–Container upgrade support support needs to be added in NM

–Application upgrade support has to be added in the RM

⬢ Support added for container upgrade and rollback(YARN-4726)–Application upgrade support still to be carried out

Page 23: Running Services on YARN

23

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Other issues

⬢ Log rotation–Log rotation used to run on application completion

–Support has been added to fetch the logs for running containers

⬢ Support for container monitoring/health checks

Page 24: Running Services on YARN

24

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

In Conclusion

⬢ Services workloads becoming more and more popular on YARN

⬢ Fundamental pieces to add support for services are in place but few additional pieces remain

Page 25: Running Services on YARN

25

© Hortonworks Inc. 2011 – 2016. All Rights Reserved25

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!