scheduling in hpc resource management system: queuing vs. planning matthias hovestadt, odej kao,...
TRANSCRIPT
![Page 1: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/1.jpg)
Scheduling in HPC Resource Management System: Queuing vs. Planning
Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit2003 Job Scheduling Strategies for Parallel Processing (JSSPP) WorkshopJerry Chou 8/29/2005
![Page 2: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/2.jpg)
Outline
Background Queuing and Planning Systems Advanced Planning Functions Example: Computing Center Software Conclusion Discussion
![Page 3: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/3.jpg)
Background HPC systems are operated by resource management systems
(RMS) based on the queuing approach PBS, SGE, Loveleveler, etc…
Grid middleware emerges between resource management systems and applications Globus, vgES, etc
High level function (co-allocation) needs features from RMS Advanced reservation, quality of service
It is hard to realize those features with RMS because it only consider present resource usage
=> This paper purpose planning system to close the gap
![Page 4: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/4.jpg)
Big Picture
Resources
RMS(PBS)
RMS(Loadleveler)
RMS(SGE)
RMS(Condor)
Application
Grid Middleware
Globus vgES
Co-allocation
QoSAdvanced
Reservation
![Page 5: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/5.jpg)
Queuing and Planning Systems
Queuing Systems Planning Systems Queuing vs. Planning Systems
![Page 6: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/6.jpg)
Queuing Systems
Queues have different limits on the resource requests Number of resources requested Execution time Interactive/Batch jobs
Jobs are sorted by schedule policy in the queue The highest priority request is the queue head
If more than one queue can be started, further criteria are needed, such as Queue priority
If no queue head can be started, the idle resources may be utilized with backfilling
![Page 7: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/7.jpg)
Planning Systems - Replanning
Requested Start time Estimated run time
When A new request is submitted A running request ends before it’s estimated end time
How Delete all non-reservations from schedule Sort non-reservations according to schedule policy Arrange reservations into schedule Insert non-reservations in the schedule at the earliest
possible start time
![Page 8: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/8.jpg)
Queuing vs. Planning Systems
Queuing Planning
Planning time frame Present Present and Future
Submission of resource requests
Insert in queue Replanning
Assignment of proposed start time
No All requested
Runtime estimates Not necessary Yes
Reservation Not possible Yes
Backfilling Option Yes
![Page 9: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/9.jpg)
Advanced Planning Functions
Requesting Resources Dynamic Aspects Service Level Agreements
![Page 10: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/10.jpg)
Requesting Resources
Diffuse requestsGive a range: “need 32~128 CPUs”Let RMS optimizes: “need as much
nodes as possible” Negotiation
![Page 11: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/11.jpg)
Dynamic Aspects
Variable Reservations Make a reservation ASAP Different from reserved jobs:
• No fix start time Different from non-reserved jobs:
• Never planed later than its first planned start time Resource Reclaiming
Replace requested resources at run time Automatic Duration Extension
Extend the runtime of jobs while they are running How long can it be extended Hoe many time it can be extended
![Page 12: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/12.jpg)
Dynamic Aspects (Cont.)
Automatic Restart It can utilize short time slots in the
scheduling Space sharing “Cycle Stealing”
Run as a background job to steal resources in a space sharing system (like condor)
Deployment Servers RMS plans both the requested resources
and the time to reconfigure the hardware
![Page 13: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/13.jpg)
Service Level Agreements (SLA)
SLA has to be considered not only in the scheduling process but also during the runtime
At runtime the scheduler is not responsible for measuring the fulfillment of the SLA, but to provide all granted resources
![Page 14: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/14.jpg)
Computing Center Software (CCS)
Architecture
User Interface (UI): provide single access point to one or more systems
Access Manager (AM): manages the user interface and is responsible for authentication, authorization and accounting
Planning Manager (PM): plans the user requests onto the machine
Machine Manager (MM): provides machine specific feature Island Manager (IM): provide CCS internal services and
watchdog facilities to keep the island in a stable condition
![Page 15: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/15.jpg)
Process FlowUser: specify the expected duration of their requests
MM: maps schedule to machines
PM: re-plans the schedule•Fix-time Request: request reserves resource for a given time•Var-time Request: can move to a earlier time slot when replanning
Requests
Schedule
Verify if a schedule can be realized with
the available hardware.
Can PM accept?
No Yes
Done
Find alternative timeSend conflict list to PM
Conflict ListNo Yes
![Page 16: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/16.jpg)
Conclusion
Classify and compare queuing systems with planning systems
Present possible advanced planning functionality
The aim of the paper is to show the benefit of planning systems for managing HPC machines
![Page 17: Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies](https://reader036.vdocuments.us/reader036/viewer/2022083008/56649f2c5503460f94c47611/html5/thumbnails/17.jpg)
Discussion
Does planning system solve all the problem? What if most of jobs want to run ASAP What if runtime is not estimated precisely
What’s the performance and utilization comparison between queuing systems and planning systems If you are resource provider, will you use it?
What feature could be provided by vgES? Diffuse requests Resource reclaiming Variable reservation Negotiation