11. grid scheduling and resource managament
DESCRIPTION
Grid scheduling is a process of mapping Grid jobs to resources over multiple administrative domains. A Grid job can be split into many small tasks. The scheduler has the responsibility of selecting resources and scheduling jobs in such a way that the user and application requirements are met,in terms of overall execution time (throughput) and cost of the resources utilized.TRANSCRIPT
GRID COMPUTINGGrid Scheduling & Resource Management
Sandeep Kumar PooniaHead of Dept. CS/IT, Jagan Nath University, Jaipur
B.E., M. Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
11/9/2013 1Sandeep Kumar Poonia
IntroductionScheduling ParadigmsHow Scheduling WorksA Review of Condor, SGE, PBS and LSFGrid Scheduling with QoS
OUTLINE
Grid scheduling is a process of mapping Grid jobs to
resources over multiple administrative domains.
A Grid job can be split into many small tasks.
The scheduler has the responsibility of selecting
resources and scheduling jobs in such a way that the
user and application requirements are met, in terms of
overall execution time (throughput) and cost of the
resources utilized.
Introduction
Jobs, via Globus, can be submitted to systems managed by Condor, the Sun Grid Engine (SGE), thr Portable Batch
System (PBS) and the Load Sharing Facility (LSF)
Introduction
Scheduling Paradigms
Centralized Scheduling
Hierarchical Scheduling
Distributed Scheduling
Centralized Scheduling
In a centralized scheduling environment, a central
machine (node) acts as a resource manager to
schedule jobs to all the surrounding nodes that are
part of the environment.
This scheduling paradigm is often used in situations
like a computing centre where resources have
similar characteristics and usage policies.
Centralized Scheduling
Here, jobs are first submitted to the central scheduler, which then
dispatches the jobs to the appropriate nodes. Those jobs that
cannot be started on a node are normally stored in a central job
queue for a later start.
Centralized Scheduling: Advantage & Disadvantage
Centralized scheduling system may produce better scheduling
decisions because it has all necessary, and up-to-date,
information about the available resources.
Centralized scheduling does not scale well with the increasing
size of the environment that it manages.
The scheduler itself may well become a bottleneck, and if
there is a problem with the hardware or software of the
scheduler’s server, i.e. a failure,
it presents a single point of failure in the environment.
Distributed Scheduling
No central scheduler responsible for managing all the
jobs.
It involves multiple localized schedulers, which interact
with each other in order to dispatch jobs to the
participating nodes.
There are two mechanisms for a scheduler to
communicate with other schedulers
Direct Communication
Indirect Communication.
Distributed Scheduling: Direct Communication
Each local scheduler can directly communicate withother schedulers for job dispatching.
Each scheduler has a list of remote schedulers that they can
interact with, or there may exist a central directory that
maintains all the information related to each scheduler.
If a job cannot be dispatched to its local resources, its
scheduler will communicate with other remote schedulers
to find resources appropriate and available for executing its
job.
Each scheduler may maintain a local job queue(s) for job
management.
Distributed Scheduling: Direct Communication
Distributed Scheduling: Indirect Communication
Communication via a central job pool
In this scenario, jobs that cannot be executed immediately are sent to a central job pool.
Distributed Scheduling: Indirect Communication
Communication via a central job pool
Compared with direct communication, the localschedulers can potentially choose suitable jobs toschedule on their resources.
Policies are required so that all the jobs in the pool areexecuted at some time.
This method can be modified, so that all jobs arepushed directly in the job-pool after submission.
This way all small jobs requiring few resources canbe used for utilizing free resources on allmachines.
Hierarchical scheduling
In hierarchical scheduling, a centralized scheduler interacts withlocal schedulers for job submission. The centralized scheduler is akind of a meta-scheduler that dispatches submitted jobs to localschedulers.
Similar to the centralized scheduling paradigm,
hierarchical scheduling can have scalability and
communication bottlenecks.
However, compared with centralized scheduling,
one advantage of hierarchical scheduling is that
the global scheduler and local scheduler can have
different policies in scheduling jobs.
Hierarchical scheduling
HOW SCHEDULING WORKS
Grid scheduling involves four main stages: resource discovery, resource selection, schedule generation and job execution
Resource discovery
Goal: identify a list of authenticated resources that areavailable for job submission.In order to cope with the dynamic nature of the Grid,a scheduler needs to have some way of incorporatingdynamic state information about the availableresources into its decision-making process.A Grid environment typically usesa pull model,a push model ora push–pull model
for resource discovery.
Resource discovery : The pull model
A single daemon associated with the scheduler can queryGrid resources and collect state information such as CPUloads or the available memory.
Resource discovery : The pull model
The pull model for gathering resource informationincurs relatively small communication overhead,but unless it requests resource informationfrequently, it tends to provide fairly staleinformation which is likely to be constantly out-of-date, and potentially misleading.
In centralized scheduling, the resourcediscovery/query process could be rather intrusiveand begin to take significant amounts of time asthe environment being monitored gets larger andlarger.
Resource discovery : The push model
Resource discovery
Each resource in the environment has a daemonfor gathering local state information,
which will be sent to a centralized scheduler thatmaintains a database to record each resource’sactivity.
If the updates are frequent, an accurate view ofthe system state can be maintained over time;obviously, frequent updates to the database areintrusive and consume network bandwidth.
Resource discovery : The push–pull model
The push–pull model lies somewhere between the pull model andthe push model.
Resource discovery : The push–pull model
Each resource in the environment runs a daemonthat collects state information.
Instead of directly sending this information to acentral scheduler, there exist some intermediatenodes running daemons that aggregate stateinformation from different sub-resources thatrespond to queries from the scheduler.
A challenge of this model is to find out whatinformation is most useful, how often it should becollected and how long this information should bekept around.
Resource Selection
The second phase of the scheduling process : Select those resources that best suit the constraints
and conditions imposed by the user, such as CPUusage, RAM available or disk storage.
The result of resource selection is to identify aresource list Rselected in which all resources can meetthe minimum requirements for a submitted job or ajob list.
The relationship between resources availableRavailable and resources selected Rselected is:
Rselected ⊆ Ravailable
Resource Generation
The generation of schedules involves two
steps,
selecting jobs and
producing resource selection strategies.
Resource Generation : Job Selection
The resource selection process is used to chooseresource(s) from the resource list Rselected for a givenjob.
Since all resources in the list Rselected could meet theminimum requirements imposed by the job, analgorithm is needed to choose the best resource(s) toexecute the job.
Although random selection is a choice, it is not anideal resource selection policy.
The resource selection algorithm should take intoaccount the current state of resources and choose thebest one based on a quantitative evaluation.
Resource Generation : Job Selection
A resource selection algorithm that only takes CPU and RAM intoaccount could be designed as follows:
where :WCPU – the weight allocated toCPU speed;CPUload – the current CPU load;CPUspeed – real CPU speed;CPUmin – minimum CPU speed;
WRAM – the weight allocated toRAM;RAMusage – the current RAMusage;RAMsize – original RAM size; andRAMmin – minimum RAM size.
Resource Generation : Job Selection
Example: Suppose that the total weighting used in thealgorithm is 10, where the CPU weight is 6 and the RAM weightis 4. The minimum CPU speed is 1 GHz and minimum RAM sizeis 256 MB. Resource information matrix is as follow:
Find the best resource for submitted job.
Resource Generation : Job Selection
Then, evaluation values for resources can becalculated using the three formulas:
From the results we know Resource3 is the best choicefor the submitted job.
Resource Generation : Resource Selection
The goal of job selection is to select a job from a
job queue for execution. Four strategies that can
be used to select a job are given below.
First come first serve
Random Selection
Priority-based Selection
Backfilling Selection
Resource Generation : Resource Selection
First come first serve: The scheduler selects jobs for execution in the order of
their submissions. If there is no resource available for the selected job, the
scheduler will wait until the job can be started. The other jobs in the job queue have to wait.
There are two main drawbacks with this type of job selection.1. It may waste resources when, for example, the job
selected needs more resources to be available beforeit can start, which results in a long waiting time.
2. jobs with high priorities cannot get dispatchedimmediately if a job with a low priority needs moretime to complete.
Resource Generation : Resource Selection
Random selection: The next job to be scheduled is randomly
selected from the job queue. Apart from the two drawbacks with the first-
come-first-serve strategy, jobs selection is notfair and job submitted earlier may not bescheduled until much later.
Resource Generation : Resource Selection
Priority-based selection: Jobs submitted to the scheduler have different
priorities. The next job to be scheduled is the job with the
highest priority in the job queue. A job priority can be set when the job is submitted. One drawback of this strategy is that it is hard to set
an optimal criterion for a job priority. A job with the highest priority may need more
resources than available and may also result in a longwaiting time and inability to make good use of theavailable resources.
Resource Generation : Resource Selection
Backfilling selection:
The backfilling strategy requires knowledge of
the expected execution time of a job to be
scheduled.
If the next job in the job queue cannot be
started due to a lack of available resources,
backfilling tries to find another job in the queue
that can use the idle resources.
Job execution
Once a job and a resource are selected, the next
step is to submit the job to the resource for
execution.
Job execution may be as easy as running a single
command or as complicated as running a series
of scripts that may, or may not, include set up or
staging.