condor and mpi paradyn/condor week madison, wi 2001
DESCRIPTION
Condor and MPI Paradyn/Condor Week Madison, WI 2001. Overview. MPI and Condor: Why Now? Dedicated and Opportunistic Scheduling How Does it All Work? Specific MPI Implementations Future Work. What is MPI?. MPI is the “Message Passing Interface” - PowerPoint PPT PresentationTRANSCRIPT
Derek WrightComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Condor and MPIParadyn/Condor Week
Madison, WI 2001
www.cs.wisc.edu/condor
Overview› MPI and Condor: Why Now?› Dedicated and Opportunistic
Scheduling› How Does it All Work?› Specific MPI Implementations› Future Work
www.cs.wisc.edu/condor
What is MPI?› MPI is the “Message Passing Interface” › Basically, a library for writing parallel
applications that use message passing for inter-process communication
› MPI is a standard with many different implementations
www.cs.wisc.edu/condor
MPI and Condor: Why Haven’t We Supported
it Until Now? › MPI's model is a static world› We always saw the world as dynamic,
opportunistic, ever-changing› We focused our parallel support on PVM
which supported a dynamic environment
www.cs.wisc.edu/condor
MPI With Condor:Why Now?
› More and more Condor pools are being formed from dedicated resources
› MPI's API is also starting to move towards supporting a dynamic world (e.g. LAM, MPI2, etc)
› Few schedulers (if any) handle both opportunistic and dedicated resources at the same time
www.cs.wisc.edu/condor
Dedicated and Opportunistic
Scheduling› Resources can move between 'dedicated' and 'opportunistic' status
› Users submit jobs that are either dedicated (e.g. Universe = MPI) or opportunistic (e.g. Universe = standard)
www.cs.wisc.edu/condor
Dedicated and Opportunistic (Cont'd)
› Condor leaves all resources as opportunistic unless it sees dedicated jobs to service
› The Dedicated Scheduler ('DS') claims opportunistic resources and turns them into dedicated ones to schedule into the future
www.cs.wisc.edu/condor
Dedicated and Opportunistic (Cont'd)
› When the DS has no more jobs, it releases the resources which go back to serving opportunistic jobs
www.cs.wisc.edu/condor
Dedicated Scheduling, and "Back-Filling”
› There will always be "holes" in the dedicated schedule, sets of resources that can't be filled with dedicated jobs for certain periods of time
› Traditional solution is “back-filling” the holes with smaller dedicated jobs
› However, these might not be preemptable
www.cs.wisc.edu/condor
Back-Filling (Cont’d)› Instead of back-filling with dedicated
jobs, we give the resources to Condor’s opportunistic scheduler
› Condor runs preemptable opportunistic jobs until the DS decides it needs the resources again and reclaims them
www.cs.wisc.edu/condor
Dedicated Resources are Opportunistic
Resources› Even “dedicated” resources are really opportunistic Hardware failure, software failure, etc Condor handles these failures better
than traditional dedicated schedulers, since our system already deals with them after years of opportunistic scheduling experience
www.cs.wisc.edu/condor
How Does MPI Support in Condor Really Work?› Changes to the resource agent
(condor_startd)› Changes to the job scheduling
agent (condor_schedd)› Changes to the rest of the Condor
system
www.cs.wisc.edu/condor
How Do You Make a Resource Dedicated in
Condor?› Just have to change a few config file
settings.... no new startd binary is required
› Add an attribute to the classad saying which scheduler, if any, this resource is willing to become dedicated to
www.cs.wisc.edu/condor
Other Configuration Changes for the startd
› In addition, you must change the policy expressions: Must always be willing to run jobs
from the DS While the resource is claimed by the
DS, the startd should never suspend or preempt jobs.
www.cs.wisc.edu/condor
Submitting Dedicated Jobs
› Requires a new "contrib" version of the condor_schedd
› Condor "wakes up" the dedicated scheduler logic inside the condor_schedd when MPI jobs are submitted
www.cs.wisc.edu/condor
How Does Your Job Get Resources?
› The DS does a query to find all resources that are willing to become dedicated to it
› DS sends out "resource request" classads and negotiates for resources with the negotiator (the opportunistic scheduler)
www.cs.wisc.edu/condor
How Does Your Job Get Resources? (Cont’d)
› DS then claims resources directly› Once resources are available, the DS
schedules and spawns jobs› When jobs complete, if more MPI jobs can be
serviced with the same resources, the DS holds onto them and uses them immediately
www.cs.wisc.edu/condor
Changes to the rest of Condor?
› Very few other changes required› Users can use all the same tools,
interfaces, etc.› Just need a new condor_starter to
actually spawn MPI jobs (will also be offered as a contrib module)
www.cs.wisc.edu/condor
Specific MPI Implementations
› MPICH› LAM› Others?
www.cs.wisc.edu/condor
Condor and MPICH› Currently we support MPICH on
Unix› Working on adding MPICH-NT
support NT’s MPICH has a different
mechanism to spawn jobs than the Unix MPICH...
www.cs.wisc.edu/condor
Condor + LAM = "LAMdor”
› LAM's API is better suited for a dynamic environment, where hosts can come and go from your MPI universe
› Has a different mechanism for spawning jobs than MPICH
› Condor working to support their methods for spawning
www.cs.wisc.edu/condor
LAMdor (Cont’d)› LAM working to understand,
expand, and fully implement the dynamic scheduling calls in their API
› LAM also considering using Condor’s libraries to support checkpointing of MPI computations
www.cs.wisc.edu/condor
MPI-2 Standard› The MPI-2 standard contains calls
to handle dynamic resources› Not yet fully implemented by
anyone› When it is, we'll support it
www.cs.wisc.edu/condor
Other MPI implementations
› What are people using?› Do you want to see Condor support
any other MPI implementations?› If so, send email to
[email protected] and let us know
www.cs.wisc.edu/condor
Future work› Implementing more advanced
dedicated scheduling algorithms› Support for all sorts of MPI
implementations (LAM, MPICH-NT, MPI-2, others)
www.cs.wisc.edu/condor
More Future work› Solving problems w/ MPI on the Grid
"Flocking" MPI jobs to remote pools, or even spanning pools with a single computation
Solving issues of resource ownership on the Grid (i.e. how do you handle multiple dedicated schedulers on the grid wanting to control a given resource?)
www.cs.wisc.edu/condor
More Future work› Checkpointing entire MPI
computations› "MW" implmentation on top of
Condor-MPI
www.cs.wisc.edu/condor
More Future work› Support for other kinds of dedicated
jobs Generic dedicated jobs (we just gather
and schedule the resources, then call your program, give it the list of machines, and let the program spawn itself)
LINDA
www.cs.wisc.edu/condor
How do I start using MPI with Condor?
› MPI support is still alpha, not quite ready for production use
› A beta release should be out soon as a contrib module
› Check the web site www.cs.wisc.edu/condor
www.cs.wisc.edu/condor
Thanks for Listening!› Questions?› For more information:
http://www.cs.wisc.edu/condor mailto:[email protected]