scheduling in hpc resource management system: queuing vs. planning matthias hovestadt, odej kao,...

17
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies for Parallel Processing (JSSPP) Workshop Jerry Chou 8/29/2005

Upload: linda-underwood

Post on 14-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Scheduling in HPC Resource Management System: Queuing vs. Planning

Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit2003 Job Scheduling Strategies for Parallel Processing (JSSPP) WorkshopJerry Chou 8/29/2005

Page 2: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Outline

Background Queuing and Planning Systems Advanced Planning Functions Example: Computing Center Software Conclusion Discussion

Page 3: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Background HPC systems are operated by resource management systems

(RMS) based on the queuing approach PBS, SGE, Loveleveler, etc…

Grid middleware emerges between resource management systems and applications Globus, vgES, etc

High level function (co-allocation) needs features from RMS Advanced reservation, quality of service

It is hard to realize those features with RMS because it only consider present resource usage

=> This paper purpose planning system to close the gap

Page 4: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Big Picture

Resources

RMS(PBS)

RMS(Loadleveler)

RMS(SGE)

RMS(Condor)

Application

Grid Middleware

Globus vgES

Co-allocation

QoSAdvanced

Reservation

Page 5: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Queuing and Planning Systems

Queuing Systems Planning Systems Queuing vs. Planning Systems

Page 6: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Queuing Systems

Queues have different limits on the resource requests Number of resources requested Execution time Interactive/Batch jobs

Jobs are sorted by schedule policy in the queue The highest priority request is the queue head

If more than one queue can be started, further criteria are needed, such as Queue priority

If no queue head can be started, the idle resources may be utilized with backfilling

Page 7: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Planning Systems - Replanning

Requested Start time Estimated run time

When A new request is submitted A running request ends before it’s estimated end time

How Delete all non-reservations from schedule Sort non-reservations according to schedule policy Arrange reservations into schedule Insert non-reservations in the schedule at the earliest

possible start time

Page 8: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Queuing vs. Planning Systems

Queuing Planning

Planning time frame Present Present and Future

Submission of resource requests

Insert in queue Replanning

Assignment of proposed start time

No All requested

Runtime estimates Not necessary Yes

Reservation Not possible Yes

Backfilling Option Yes

Page 9: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Advanced Planning Functions

Requesting Resources Dynamic Aspects Service Level Agreements

Page 10: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Requesting Resources

Diffuse requestsGive a range: “need 32~128 CPUs”Let RMS optimizes: “need as much

nodes as possible” Negotiation

Page 11: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Dynamic Aspects

Variable Reservations Make a reservation ASAP Different from reserved jobs:

• No fix start time Different from non-reserved jobs:

• Never planed later than its first planned start time Resource Reclaiming

Replace requested resources at run time Automatic Duration Extension

Extend the runtime of jobs while they are running How long can it be extended Hoe many time it can be extended

Page 12: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Dynamic Aspects (Cont.)

Automatic Restart It can utilize short time slots in the

scheduling Space sharing “Cycle Stealing”

Run as a background job to steal resources in a space sharing system (like condor)

Deployment Servers RMS plans both the requested resources

and the time to reconfigure the hardware

Page 13: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Service Level Agreements (SLA)

SLA has to be considered not only in the scheduling process but also during the runtime

At runtime the scheduler is not responsible for measuring the fulfillment of the SLA, but to provide all granted resources

Page 14: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Computing Center Software (CCS)

Architecture

User Interface (UI): provide single access point to one or more systems

Access Manager (AM): manages the user interface and is responsible for authentication, authorization and accounting

Planning Manager (PM): plans the user requests onto the machine

Machine Manager (MM): provides machine specific feature Island Manager (IM): provide CCS internal services and

watchdog facilities to keep the island in a stable condition

Page 15: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Process FlowUser: specify the expected duration of their requests

MM: maps schedule to machines

PM: re-plans the schedule•Fix-time Request: request reserves resource for a given time•Var-time Request: can move to a earlier time slot when replanning

Requests

Schedule

Verify if a schedule can be realized with

the available hardware.

Can PM accept?

No Yes

Done

Find alternative timeSend conflict list to PM

Conflict ListNo Yes

Page 16: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Conclusion

Classify and compare queuing systems with planning systems

Present possible advanced planning functionality

The aim of the paper is to show the benefit of planning systems for managing HPC machines

Page 17: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies

Discussion

Does planning system solve all the problem? What if most of jobs want to run ASAP What if runtime is not estimated precisely

What’s the performance and utilization comparison between queuing systems and planning systems If you are resource provider, will you use it?

What feature could be provided by vgES? Diffuse requests Resource reclaiming Variable reservation Negotiation