yarn services

Post on 02-Jul-2015

503 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at : Apachecon EU 2014

TRANSCRIPT

Hadoop YARN ServicesSteve Loughran– Hortonworks

stevel at hortonworks.com

@steveloughran

ApacheCon EU, November 2014

Apache Hadoop + YARN:

An OS for data

An OS can do more than SQL

statements

An OS can do more than run

admin-installed apps

An OS lets you run whatever

you want!

An OS Offers

• Persistent Storage

• Execution of code

• jobs & services

• scheduling

• Communications

• Security

YARN Services:

Long lived applicationswithin a Hadoop cluster

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

“The RM”

HDFS

YARN Node Manager

• Servers run YARN Node Managers (NM)

• NM's heartbeat to Resource Manager (RM)

• RM schedules work over cluster

• RM allocates containers to apps

• NMs start containers

• NMs report container health

Background: YARN

Client creates App Master

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

“The RM”

HDFS

YARN Node Manager

ClientApplication Master

“AM” requests containers

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

HDFS

YARN Node Manager

Application Master

Container

Container

Container

Short lived applications

• failure: clean restart

• logs: collect at end

• placement: by data

• security: Kerberos delegation tokens

• discovery: launcher app can track

Long-lived services

• failure: stay available

• logs: ongoing collection

• placement: availability, performance

• security: ??

• discovery: ???

YARN-896Support for YARN services:

Log aggregation

Service registration & discovery

Windowed failure tracking

Anti-affinity placement

Gang scheduling

Applications to continue over AM restart

Container resource flexingContainer reuse

Kerberos token renewal

Container signalling

Net & Disk resources

Labelled nodes & queues

YARN-896

REST

Log aggregation

Service registration & discovery

Windowed failure tracking

Anti-affinity placement

Gang scheduling

Applications to continue over AM restart

Container resource flexingContainer reuse

Kerberos token renewal

Container signalling

Net & Disk resources

Labelled nodes & queues

Hadoop 2.6

(Docker)

REST

YARN-913 Service Registry

$ slider resolve --path \~/services/org-apache-slider/storm1

{ "type" : "JSONServiceRecord","external" : [ {

"api" : "http://","addressType" : "uri","protocolType" : "webui","addresses" : [ {

"uri" : "http://nn.example.com:46132"} ]

}, {"api" : "classpath:org.apache.slider.publisher.configurations","addressType" : "uri","protocolType" : "REST","addresses" : [ {

"uri" : "http://nn.example.com:46132/ws/v1/slider/publisher/slider"}]

} } ] }

Internal and external

"internal" : [ {"api" : "classpath:org.apache.slider.agents.secure","addressType" : "uri","protocolType" : "REST","addresses" : [ {

"uri" : "https://nn.example.com:47749/ws/v1/slider/agents"} ]

} ]

Failures

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

HDFS

YARN Node Manager

Application Master

Container

Container

Container

Failures

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

Container

Container

Failures

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

Application Master

Container

Container

container 1

container 2

lost: container 3

Easy: enabling

// Client

amLauncher.setKeepContainersOverRestarts(true);

amLauncher.setMaxAppAttempts(8);

// Server

List<Container> liveContainers =

amRegistrationData.getContainersFromPreviousAttempts();

Harder: rebuilding state

Node Map

Placement History

Specification

Container QueuesComponent Map

Event History

Persisted Rebuilt Transient

<property><name>yarn.log-aggregation-enable</name><value>true</value>

</property>

Log Aggregation

$ yarn rmadmin...

-addToClusterNodeLabels [label1,label2,label3] -removeFromClusterNodeLabels [label1,label2,label3]

-replaceLabelsOnNode [node1:port,label1,label2]-directlyAccessNodeLabelStore

Labels

Labels offer

• Separation of workloads

• Separation of service roles

• Separation of production & dev code

• Allocation to specific hardware classes

Security

• Token expiry a core Kerberos feature

• Token expiry inimical to service longevity

• Specifically: token delegation

Security

YARN:

AM/RM token renewal

NM HDFS access for AM container relaunch

You: embrace keytabs, test lots

…so you can now

• Write long lived apps

• with failure resilience

• centralised log viewing

• labelled/isolated placement

• in secure clusters

Why not just use Mesos?

Hadoop is everywhere!

Log aggregation

Service registration & discovery

Windowed failure tracking

Anti-affinity placement

Gang scheduling

Applications to continue over AM restart

Container resource flexingContainer reuse

Kerberos token renewal

Container signalling

Net & Disk resources

Labelled nodes & queues

Hadoop 2.7+

REST

Docker

Questions?

http://hadoop.apache.org

top related