Transcript
Page 1: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Writing Application Frameworks on Apache Hadoop YARN

Hitesh Shah

[email protected]

Page 1

Page 2: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Hitesh Shah - Background

• Member of Technical Staff at Hortonworks Inc.

• Committer for Apache MapReduce and Ambari

• Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems.

Page 2Architecting the Future of Big Data

Page 3: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Agenda

•YARN Architecture and Concepts•Writing a New Framework

Page 3Architecting the Future of Big Data

Page 4: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

YARN Architecture

• Resource Manager–Global resource scheduler–Hierarchical queues

• Node Manager–Per-machine agent–Manages the life-cycle of container–Container resource monitoring

• Application Master–Per-application–Manages application scheduling and task execution–E.g. MapReduce Application Master

Page 4Architecting the Future of Big Data

Page 5: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

YARN Architecture

Page 5Architecting the Future of Big Data

Page 6: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

YARN Concepts

• Application ID–Application Attempt IDs

• Container–ContainerLaunchContext

• ResourceRequest–Host/Rack/Any match–Priority–Resource constraints

• Local Resource–File/Archive–Visibility – public/private/application

Page 6Architecting the Future of Big Data

Page 7: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

What you need for a new Framework

•Application Submission Client–For example, the MR Job Client

•Application Master–The core framework library

•Application History ( optional )–History of all previously run instances

•Auxiliary Services ( optional )–Long-running application-specific services running on the

NodeManager

Page 7Architecting the Future of Big Data

Page 8: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Use Case: Distributed Shell

• Take a user-provided script or application and run it on a set of nodes in the Cluster

• Input:

–User Script to execute–Number of containers to run on–Variable arguments for each

different container–Memory requirements for the

shell script–Output Location/Dir

Page 8Architecting the Future of Big Data

NodeManager

Shell Script

NodeManager

Shell Script

NodeManager

DS AppMaster

Page 9: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Client: RPC calls

• Uses ClientRM Protocol

• Get a new Application ID from the RM

• Application Submission

• Application Monitoring

• Kill the Application?

Page 9Architecting the Future of Big Data

Page 10: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Client

• Registration with the RM–New Application ID

• Application Submission–User information–Scheduler queue–Define the container for the Distributed Shell App Master via

the ContainerLaunchContext

• Application Monitoring–AppMaster host details with tokens if needed, tracking url–Application Status (submitted/running/finished)

Page 10Architecting the Future of Big Data

Page 11: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Defining a Container

• ContainerLaunchContext class–Can run a shell script, a java process or launch a VM

• Command(s) to run• Local resources needed for the process to run

–Dependent jars, native libs, data files/archives

• Environment to setup–Java Classpath

• Security-related data–Container Tokens

Page 11Architecting the Future of Big Data

Page 12: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Application Master: RPC calls

• AMRM and CM protocols

• Register AM with RM

• Ask RM to allocate resources

• Launch tasks on allocated containers

• Manage tasks to final completion

• Inform RM of completion

Page 12Architecting the Future of Big Data

Page 13: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Application Master

• Setup RPC to handle requests from Client and/or tasks launched on Containers

• Register and send regular heartbeats to the RM

• Request resources from the RM.

• Launch user shell script on containers as and when allocated.

• Monitor status of user script of remote containers and manage failures by retrying if needed.

• Inform RM of completion when application is done.

Page 13Architecting the Future of Big Data

Page 14: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

AMRM#allocate

• Request:–Containers needed

– Not a delta protocol

– Locality constraints: Host/Rack/Any

– Resource constraints: memory

– Priority-based assignments

–Containers to release – extra/unwanted?– Only non-launched containers

• Response:–Allocated Containers

– Launch or release

–Completed Containers– Status of completion

Page 14Architecting the Future of Big Data

Page 15: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

YARN Applications

• Data Processing:–OpenMPI on Hadoop –Spark (UC Berkeley)

– Shark ( Hive-on-Spark )

–Real-time data processing– Storm ( Twitter )

– Apache S4

–Graph processing – Apache Giraph

• Beyond data:–Deploying Apache HBase via YARN (HBASE-4329)–Hbase Co-processors via YARN (HBASE-4047)

Page 15Architecting the Future of Big Data

Page 16: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

References

•Doc on writing new applications:–WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0-alpha/ )

Page 16Architecting the Future of Big Data

Page 17: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Questions?

Architecting the Future of Big DataPage 17

Thank You!

Hitesh [email protected]

Page 18: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Appendix: Code Examples

Architecting the Future of Big DataPage 18

Page 19: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Client: Registration

ClientRMProtocol applicationsManager;

YarnConfiguration yarnConf = new YarnConfiguration(conf);

InetSocketAddress rmAddress = NetUtils.createSocketAddr(

yarnConf.get(YarnConfiguration.RM_ADDRESS));

applicationsManager = ((ClientRMProtocol)

rpc.getProxy(ClientRMProtocol.class,

rmAddress, appsManagerServerConf));

GetNewApplicationRequest request =

Records.newRecord(GetNewApplicationRequest.class);

GetNewApplicationResponse response =

applicationsManager.getNewApplication(request);

Page 19Architecting the Future of Big Data

Page 20: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Client: App Submission

ApplicationSubmissionContext appContext;

ContainerLaunchContext amContainer;

amContainer.setLocalResources(Map<String, LocalResource> localResources);

amContainer.setEnvironment(Map<String, String> env);

String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2 “;

amContainer.setCommands(List<String> commands);

Resource capability; capability.setMemory(amMemory); amContainer.setResource(capability); 

appContext.setAMContainerSpec(amContainer);

SubmitApplicationRequest appRequest;

appRequest.setApplicationSubmissionContext(appContext); 

applicationsManager.submitApplication(appRequest);

Page 20Architecting the Future of Big Data

Page 21: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

Client: App Monitoring

• Get Application Status

GetApplicationReportRequest reportRequest =

Records.newRecord(GetApplicationReportRequest.class); reportRequest.setApplicationId(appId);

GetApplicationReportResponse reportResponse =

applicationsManager.getApplicationReport(reportRequest);

ApplicationReport report = reportResponse.getApplicationReport();

• Kill the application

KillApplicationRequest killRequest =

Records.newRecord(KillApplicationRequest.class);

killRequest.setApplicationId(appId);

applicationsManager.forceKillApplication(killRequest);

Page 21Architecting the Future of Big Data

Page 22: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

AM: Ask RM for Containers

ResourceRequest rsrcRequest;

rsrcRequest.setHostName("*”); // hostname, rack, wildcard

rsrcRequest.setPriority(pri);

Resource capability; capability.setMemory(containerMemory);

rsrcRequest.setCapability(capability)

rsrcRequest.setNumContainers(numContainers);

List<ResourceRequest> requestedContainers;

List<ContainerId> releasedContainers;

AllocateRequest req;

req.setResponseId(rmRequestID);

req.addAllAsks(requestedContainers);

req.addAllReleases(releasedContainers);

req.setProgress(currentProgress);

AllocateResponse allocateResponse = resourceManager.allocate(req);

Page 22Architecting the Future of Big Data

Page 23: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

AM: Launch Containers

AMResponse amResp = allocateResponse.getAMResponse();

ContainerManager cm = (ContainerManager)rpc.getProxy

(ContainerManager.class, cmAddress, conf);

List<Container> allocatedContainers = amResp.getAllocatedContainers(); for (Container allocatedContainer : allocatedContainers) {

ContainerLaunchContext ctx;

ctx.setContainerId(allocatedContainer .getId());

ctx.setResource(allocatedContainer .getResource());

// set env, command, local resources, …

StartContainerRequest startReq;

startReq.setContainerLaunchContext(ctx);

cm.startContainer(startReq);

Page 23Architecting the Future of Big Data

Page 24: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

AM: Monitoring Containers

• Running ContainersGetContainerStatusRequest statusReq;

statusReq.setContainerId(containerId);

GetContainerStatusResponse statusResp =

cm.getContainerStatus(statusReq);

• Completed ContainersAMResponse amResp = allocateResponse.getAMResponse();

List<Container> completedContainersStatus =

amResp.getCompletedContainerStatuses();

for (ContainerStatus containerStatus : completedContainers) {

// containerStatus.getContainerId()

// containerStatus.getExitStatus()

// containerStatus.getDiagnostics()

}

Page 24Architecting the Future of Big Data

Page 25: Writing Yarn Applications Hadoop Summit 2012

© Hortonworks Inc. 2011

AM: I am done

FinishApplicationMasterRequest finishReq;

finishReq.setAppAttemptId(appAttemptID);

finishReq.setFinishApplicationStatus

(FinalApplicationStatus.SUCCEEDED); // or FAILED

finishReq.setDiagnostics(diagnostics);

resourceManager.finishApplicationMaster(finishReq);

Page 25Architecting the Future of Big Data


Top Related