11. grid scheduling and resource managament

GRID COMPUTINGGrid Scheduling & Resource Management

Sandeep Kumar PooniaHead of Dept. CS/IT, Jagan Nath University, Jaipur

B.E., M. Tech., UGC-NET

LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE

11/9/2013 1Sandeep Kumar Poonia

IntroductionScheduling ParadigmsHow Scheduling WorksA Review of Condor, SGE, PBS and LSFGrid Scheduling with QoS

OUTLINE

Grid scheduling is a process of mapping Grid jobs to

resources over multiple administrative domains.

A Grid job can be split into many small tasks.

The scheduler has the responsibility of selecting

resources and scheduling jobs in such a way that the

user and application requirements are met, in terms of

overall execution time (throughput) and cost of the

resources utilized.

Introduction

Jobs, via Globus, can be submitted to systems managed by Condor, the Sun Grid Engine (SGE), thr Portable Batch

System (PBS) and the Load Sharing Facility (LSF)

Introduction

Scheduling Paradigms

Centralized Scheduling

Hierarchical Scheduling

Distributed Scheduling


In a centralized scheduling environment, a central

machine (node) acts as a resource manager to

schedule jobs to all the surrounding nodes that are

part of the environment.

This scheduling paradigm is often used in situations

like a computing centre where resources have

similar characteristics and usage policies.


Here, jobs are first submitted to the central scheduler, which then

dispatches the jobs to the appropriate nodes. Those jobs that

cannot be started on a node are normally stored in a central job

queue for a later start.

Centralized Scheduling: Advantage & Disadvantage

Centralized scheduling system may produce better scheduling

decisions because it has all necessary, and up-to-date,

information about the available resources.

Centralized scheduling does not scale well with the increasing

size of the environment that it manages.

The scheduler itself may well become a bottleneck, and if

there is a problem with the hardware or software of the

scheduler’s server, i.e. a failure,

it presents a single point of failure in the environment.

Distributed Scheduling

No central scheduler responsible for managing all the

jobs.

It involves multiple localized schedulers, which interact

with each other in order to dispatch jobs to the

participating nodes.

There are two mechanisms for a scheduler to

communicate with other schedulers

Direct Communication

Indirect Communication.

Distributed Scheduling: Direct Communication

Each local scheduler can directly communicate withother schedulers for job dispatching.

Each scheduler has a list of remote schedulers that they can

interact with, or there may exist a central directory that

maintains all the information related to each scheduler.

If a job cannot be dispatched to its local resources, its

scheduler will communicate with other remote schedulers

to find resources appropriate and available for executing its

job.

Each scheduler may maintain a local job queue(s) for job

management.

Distributed Scheduling: Direct Communication

Distributed Scheduling: Indirect Communication

Communication via a central job pool

In this scenario, jobs that cannot be executed immediately are sent to a central job pool.

Distributed Scheduling: Indirect Communication

Communication via a central job pool

Compared with direct communication, the localschedulers can potentially choose suitable jobs toschedule on their resources.

Policies are required so that all the jobs in the pool areexecuted at some time.

This method can be modified, so that all jobs arepushed directly in the job-pool after submission.

This way all small jobs requiring few resources canbe used for utilizing free resources on allmachines.

Hierarchical scheduling

In hierarchical scheduling, a centralized scheduler interacts withlocal schedulers for job submission. The centralized scheduler is akind of a meta-scheduler that dispatches submitted jobs to localschedulers.

Similar to the centralized scheduling paradigm,

hierarchical scheduling can have scalability and

communication bottlenecks.

However, compared with centralized scheduling,

one advantage of hierarchical scheduling is that

the global scheduler and local scheduler can have

different policies in scheduling jobs.

Hierarchical scheduling

HOW SCHEDULING WORKS

Grid scheduling involves four main stages: resource discovery, resource selection, schedule generation and job execution

Resource discovery

Goal: identify a list of authenticated resources that areavailable for job submission.In order to cope with the dynamic nature of the Grid,a scheduler needs to have some way of incorporatingdynamic state information about the availableresources into its decision-making process.A Grid environment typically usesa pull model,a push model ora push–pull model

for resource discovery.

Resource discovery : The pull model

A single daemon associated with the scheduler can queryGrid resources and collect state information such as CPUloads or the available memory.

Resource discovery : The pull model

The pull model for gathering resource informationincurs relatively small communication overhead,but unless it requests resource informationfrequently, it tends to provide fairly staleinformation which is likely to be constantly out-of-date, and potentially misleading.

In centralized scheduling, the resourcediscovery/query process could be rather intrusiveand begin to take significant amounts of time asthe environment being monitored gets larger andlarger.

Resource discovery : The push model

Resource discovery

Each resource in the environment has a daemonfor gathering local state information,

which will be sent to a centralized scheduler thatmaintains a database to record each resource’sactivity.

If the updates are frequent, an accurate view ofthe system state can be maintained over time;obviously, frequent updates to the database areintrusive and consume network bandwidth.

Resource discovery : The push–pull model

The push–pull model lies somewhere between the pull model andthe push model.

Resource discovery : The push–pull model

Each resource in the environment runs a daemonthat collects state information.

Instead of directly sending this information to acentral scheduler, there exist some intermediatenodes running daemons that aggregate stateinformation from different sub-resources thatrespond to queries from the scheduler.

A challenge of this model is to find out whatinformation is most useful, how often it should becollected and how long this information should bekept around.

Resource Selection

The second phase of the scheduling process : Select those resources that best suit the constraints

and conditions imposed by the user, such as CPUusage, RAM available or disk storage.

The result of resource selection is to identify aresource list Rselected in which all resources can meetthe minimum requirements for a submitted job or ajob list.

The relationship between resources availableRavailable and resources selected Rselected is:

Rselected ⊆ Ravailable

Resource Generation

The generation of schedules involves two

steps,

selecting jobs and

producing resource selection strategies.

Resource Generation : Job Selection

The resource selection process is used to chooseresource(s) from the resource list Rselected for a givenjob.

Since all resources in the list Rselected could meet theminimum requirements imposed by the job, analgorithm is needed to choose the best resource(s) toexecute the job.

Although random selection is a choice, it is not anideal resource selection policy.

The resource selection algorithm should take intoaccount the current state of resources and choose thebest one based on a quantitative evaluation.


A resource selection algorithm that only takes CPU and RAM intoaccount could be designed as follows:

where :WCPU – the weight allocated toCPU speed;CPUload – the current CPU load;CPUspeed – real CPU speed;CPUmin – minimum CPU speed;

WRAM – the weight allocated toRAM;RAMusage – the current RAMusage;RAMsize – original RAM size; andRAMmin – minimum RAM size.


Example: Suppose that the total weighting used in thealgorithm is 10, where the CPU weight is 6 and the RAM weightis 4. The minimum CPU speed is 1 GHz and minimum RAM sizeis 256 MB. Resource information matrix is as follow:

Find the best resource for submitted job.


Then, evaluation values for resources can becalculated using the three formulas:

From the results we know Resource3 is the best choicefor the submitted job.

Resource Generation : Resource Selection

The goal of job selection is to select a job from a

job queue for execution. Four strategies that can

be used to select a job are given below.

First come first serve

Random Selection

Priority-based Selection

Backfilling Selection


First come first serve: The scheduler selects jobs for execution in the order of

their submissions. If there is no resource available for the selected job, the

scheduler will wait until the job can be started. The other jobs in the job queue have to wait.

There are two main drawbacks with this type of job selection.1. It may waste resources when, for example, the job

selected needs more resources to be available beforeit can start, which results in a long waiting time.

2. jobs with high priorities cannot get dispatchedimmediately if a job with a low priority needs moretime to complete.


Random selection: The next job to be scheduled is randomly

selected from the job queue. Apart from the two drawbacks with the first-

come-first-serve strategy, jobs selection is notfair and job submitted earlier may not bescheduled until much later.


Priority-based selection: Jobs submitted to the scheduler have different

priorities. The next job to be scheduled is the job with the

highest priority in the job queue. A job priority can be set when the job is submitted. One drawback of this strategy is that it is hard to set

an optimal criterion for a job priority. A job with the highest priority may need more

resources than available and may also result in a longwaiting time and inability to make good use of theavailable resources.


Backfilling selection:

The backfilling strategy requires knowledge of

the expected execution time of a job to be

scheduled.

If the next job in the job queue cannot be

started due to a lack of available resources,

backfilling tries to find another job in the queue

that can use the idle resources.

Job execution

Once a job and a resource are selected, the next

step is to submit the job to the resource for

execution.

Job execution may be as easy as running a single

command or as complicated as running a series

of scripts that may, or may not, include set up or

staging.

11. grid scheduling and resource managament

Education