condor and multi-core scheduling
DESCRIPTION
Condor and Multi-core Scheduling. Dan Bradley [email protected] University of Wisconsin. with generous support from the National Science Foundation and productive collaboration with Red Hat. What Exists Today. Parallel scheduling Designed for MPI-type jobs - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/1.jpg)
Condor and Multi-core Scheduling
Dan Bradley
University of Wisconsin
with generous support from the National Science Foundationand productive collaboration with Red Hat
![Page 2: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/2.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 2 10 Oct 2008
What Exists Today
• Parallel scheduling– Designed for MPI-type jobs– No way to limit matchmaking to slots on
same machine
• Custom Batch Slots– One “slot” can be <> one core– Slot policy can depend on type of job
![Page 3: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/3.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 3 10 Oct 2008
Example: Custom Batch Slots
• Slot 1, 2, 3– only accept 1-core jobs
• Slot 4– when claimed by normal 1-core job,
behaves normally– when claimed by 4-core job
• stop accepting jobs for slots 1, 2, and 3• suspends 4-core job until slots 1, 2, and 3 drain
Related Condor How-to: http://nmi.cs.wisc.edu/node/1482
![Page 4: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/4.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 4 10 Oct 2008
shortcomings
• Awkward to extend to N-core jobs, but doable
• Very awkward to extend to N-core jobs and also support dynamic partitioning of memory, etc.
• Requires custom configuration by admins; no standard JDL for submitting grid jobs to sites
• Accounting does not charge multi-core job at higher rate than 1-core job
![Page 5: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/5.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 5 10 Oct 2008
in development: dynamic slots
• Start with one “batch slot” representing whole machine
• Trim slot to what job requires (cores, memory, disk, network, etc.)
• Leftovers assigned to new slot
![Page 6: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/6.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 6 10 Oct 2008
TODO
• improve accounting and any other monitoring to support multi-core slots
• support standard JDL for requesting multiple cores
• deal with preemption and/or mechanism for preventing starvation of multi-core jobs
• speed up dynamic slot creation
![Page 7: Condor and Multi-core Scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082422/56812e8a550346895d942d34/html5/thumbnails/7.jpg)
Dan Bradley (Wisconsin) @ CERN multi-core workshop 7 10 Oct 2008
Questions
• Is it desirable to have the capability to lock a job to a CPU?
• Does Globus provide RSL attributes with well-defined semantics for multi-core jobs?– example: (count=N)(jobtype=single)– in my experience, this is not well-defined– can mean N cores x 1 job or N x job