introduction to yarn apps

34
Intro to YARN Apps Sandy Ryza

Upload: cloudera-inc

Post on 19-Aug-2015

1.937 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Introduction to YARN Apps

Intro  to  YARN  Apps  Sandy  Ryza  

Page 2: Introduction to YARN Apps

Introduc4on  

•  What’s YARN? •  YARN apps •  Building YARN apps

Page 3: Introduction to YARN Apps

The  OS  analogy  

Traditional Operating System

Storage: File System

Execution/Scheduling: Processes/Kernel

Scheduler

Page 4: Introduction to YARN Apps

The  OS  analogy  

Hadoop

Storage: Hadoop Distributed File System (HDFS)

Execution/Scheduling: YARN!

Page 5: Introduction to YARN Apps

Goal:  Mul4tenancy  

•  Different types of applications on the same cluster

•  Different users and organizations on the same cluster

Page 6: Introduction to YARN Apps

ResourceManager  (RM)  

•  Central service that tracks o  Nodes

§  Resources o  Applications o  Containers

•  Houses scheduler, which is in charge of all container placement decisions

Page 7: Introduction to YARN Apps

NodeManager  (NM)  

•  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness

Page 8: Introduction to YARN Apps

Applica4on  Master  (AM)  

•  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager

Page 9: Introduction to YARN Apps

YARN  

ResourceManager

NodeManager NodeManager

Container

Map Task

Container

Application Master

Container

Reduce Task

JobHistoryServer Client

Page 10: Introduction to YARN Apps

Processing  Frameworks  /  YARN  apps  

•  MapReduce o  Batch processing, fault tolerant

•  Impala o  Low latency SQL on Hadoop

•  Spark o  Load data into memory, great for iterative

algorithms •  Storm o  Stream processing

Page 11: Introduction to YARN Apps

YARN  app  models  

•  Applica4on  master  (AM)  per  job  •  Most  simple  for  batch  •  Used  by  MapReduce  

Page 12: Introduction to YARN Apps

YARN  app  models  

•  Applica4on  master  per  session  •  Runs  mul4ple  jobs  on  behalf  of  the  same  user  •  Recently  added  in  Tez  •  Spark  interac4ve  mode  

Page 13: Introduction to YARN Apps

YARN  app  models  

•  Singleton  AM  as  permanent  service  •  Always  on,  waits  around  for  jobs  to  come  in  •  Used  for  Impala  

Page 14: Introduction to YARN Apps

YARN/MR  Scheduling  

Fair Scheduler Decide which jobs to give resources to

ResourceManager

Decide which tasks to give resources to within a job

MapReduce Application Master

Page 15: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Page 16: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

I want 2 containers with 1024 MB and a 1 core each

Page 17: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Noted

Page 18: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

I’m still here

Page 19: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

I’ll reserve some space on node1 for AM1

Page 20: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Got anything for me?

Page 21: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Here’s a security token to let you launch a container on Node 1

Page 22: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Hey, launch my container with this shell command

Page 23: Introduction to YARN Apps

Scheduling  on  Hadoop  

ResourceManager

Application Master 1

Application Master 2

Node 1 Node 2 Node 3

Container

Page 24: Introduction to YARN Apps

Should you build a YARN app?

•  MapReduce can’t run arbitrary DAGs? o  Use Spark

Page 25: Introduction to YARN Apps

Should you build a YARN app?

•  MapReduce can’t store data in memory? o  Use Spark

Page 26: Introduction to YARN Apps

Should you build a YARN app?

•  Iterative processing? o  Use Spark

Page 27: Introduction to YARN Apps

Should you build a YARN app?

•  Have an existing distributed app that runs all tasks at once? o  Use distributed shell

Page 28: Introduction to YARN Apps

When to build a YARN app

•  Allocating and releasing containers dynamically

•  Weird scheduling requirements o  Gang o  Complex locality

Page 29: Introduction to YARN Apps

What YARN does for you

•  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave

Page 30: Introduction to YARN Apps

What YARN does not do for you

•  Communication between your processes

Page 31: Introduction to YARN Apps

AMRMClientAsync CallbackHandler handler = new CallbackHandler() {

public void onContainersAllocated(List<Container> containers) {

for (Container container : containers) {

startTask(container);

}

}

[... more methods]

}

AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler);

amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”);

amClient.addContainerRequest(

new ContainerRequest(

Resource.newInstance(1024, 1),

new String[] {“node1”, “node2”}, new String[] {“rack1”},

Priority.newInstance(2)));

Page 32: Introduction to YARN Apps

NMClientAsync CallbackHandler nmHandler = new CallbackHandler() {

[... listen for containers stopped and started]

}

NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);

Page 33: Introduction to YARN Apps

Launching Containers

public void startContainer(Container container) {

ContainerLaunchContext launchContext =

ContainerLaunchContext.newInstance(

localResources,

environment,

Arrays.asList(“sleep 1000”),

serviceData,

tokens,

acls);

nmClient.startContainerAsync(container, launchContext);

}

Page 34: Introduction to YARN Apps

Local resources

HDFS

Node Container Container

file.txt

file.txt

Node Container Container

file.txt