global community

50
Jun 12, 2022 1 Global Community Slide Courtesy of Ian Foster

Upload: anakin

Post on 05-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Global Community. Slide Courtesy of Ian Foster. Resource Management in Grid Computing. AZIZOL ABDULLAH,PhD DEPARTMENT OF COMMUNICATION TECHNOLOGY AND NETWORK. Resource Management. What needs to be managed: Resources - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Global Community

Apr 20, 2023 1

Global Community

Slide Courtesy of Ian Foster

Page 2: Global Community

Resource Management in Grid Computing

AZIZOL ABDULLAH,PhDDEPARTMENT OF COMMUNICATION TECHNOLOGY AND

NETWORK

Page 3: Global Community

Resource Management• What needs to be managed: Resources

– Physical resources (computer, disks, databases, networks, scientific instruments).

– Logical resources (jobs, executing applications, complex workflows etc.).

• What is the Goal– Resources must be available and meet

performance criteria.

Page 4: Global Community

Resource Management (Cont.)• What is Management:

– The process of locating various types of capability, arranging for their use, utilizing them and monitoring their state.Maintenance of resources and

environmentMonitoring their state and performanceReacting to internal and external

changes in resource or its environmentInitiating routine operations: initialization,

start/stop and tuning

Page 5: Global Community

What is Resource Management?

Mechanisms for locating and allocating computational resourcesAuthenticationProcess creation

Remote job submission Scheduling Other resources that can be managed:

MemoryDisk Networks

Page 6: Global Community

Resource Management Issues for Grid Computing

Site autonomyResources owned by different organizations,

in different administrative domainsLocal policies for use, scheduling, security

Heterogeneous substrateDifferent local resource management

systems Policy extensibility

Local sites need ability to customize their resource management policies

Page 7: Global Community

More Issues for Grid Computing

Co-allocationMay need resources at several sitesMechanism for allocating multiple

resources, initiating computation, monitoring and managing

On-line controlAdapt application requirements to resource

availability

Page 8: Global Community

Manageability• The ability of a resource to be managed• Manageability interfaces support common

operations (control and monitor)• Manageability standards specify standard interfaces • Problem:

– Existing interfaces are generally resource-specific– Almost impossible to add standard interfaces to

legacy resources– New standards may require additional interfaces

• Solution: – Common standards– Based on Service orientation, integration and

virtualization.

Page 9: Global Community

Service orientation

• Software services – A service provides some capability to its clients

through message exchanges– represent the physical manageable entities– understand the unique interfaces for the entities

they represent– implement applicable standard interfaces

• Integration– Encapsulated application in services become

Integratable building blocks

Page 10: Global Community

Service orientation (Cont.)• The management process

– Manager invokes the operation (service’s standard interface)

– Service performs operation on managed entity (resource’s unique interface)

– Service returns result to manager (through the standard interface)

• Problem– Need a common way to implement service

• Solution: Web Services

Page 11: Global Community

Virtualization

MANAGER

COMPUTERSOTHER

SERVICEPROVIDERS

COMMON INTERFACES

RESOURCE SPECIFIC INTERFACES

Cluster

RRR

Mainframe

RRRIBMIBM

Blades

RRR

DISKS TELESCOPESWEB

SERVICES

PHYSICAL RESOURCES

Page 12: Global Community

Traditional Resource Management

• Batch schedulers, workflow engines, operating systems

• Designed and operated under the assumption that:

– They have complete control over a resource

– They can implement the mechanisms and policies needed for effective use of that resource in isolation

• This is not the case for Grid Resource management Separate administrative domains Resource Hetrogeneity Lack of control and difference policies

Page 13: Global Community

Grid Resource Management• What is Grid Resource Management?

– Identifying application requirements, resource specification

– Matching resources to applications – Allocating/scheduling and monitoring

those resources and applications over time in order to run as effectively as possible.

Page 14: Global Community

Grid Resource Management (Cont.)

• Challenges in Grid Resource Management– Resources are heterogeneous in nature

• Processors, disks, data, networks, other services.

– Application has to compete for resources– Lack of available data about current

systems, needs of users, resource owners and administrators

Page 15: Global Community

Grid RM Mechanisms• Resource Information Dissemination

– Published by the Resource(push) or gathered by GIS (pull)

– On-demand dissemination (by agents)• Resource Discovery

– Centralized or distributed quesries, agents, distributed queries + agents

– Resources are described in schema/language or objects

• Resource Scheduling/Job execution– Assigning resourses, centralized, hierarchical,

distributed• Resource Monitoring and Re-Scheduling

– Monitoring can be done by application (polling) or by resource (notification to the app or periodic status updates).

Page 16: Global Community

Grid Resouce Brokerage

• Discovering suitable resources for user's job• Currently scenario: Manual or semi-manual

– users manually target their work at the machine that is already known to them.

• For larger grids, manual solution is not feasible• Solution is Grid Resource Broker:

– The user describes their needs to a third party (software)

– which searches for suitable resources, and passes the result(s) back to the user.

Page 17: Global Community

Grid Resouce Brokerage

• Role of the Broker in a Management System– Resource descovery

• Authorization filtering, Application definition, Minimum Requirement filtering

– System Selection• Dynamic information gathering, system selection

– Allocation and Advance reservation• Grid Information System

– Organize a set of sensors on resources so that client or broker can have easy access to data (static or dynamic)

Page 18: Global Community

Matchmaking

• Process of selecting resources based on application requirements

• Symmetric matchmaking– Attribute-based matching

• Resource provider and resource user have to agree on a schema, attribute names and value ranges

• Syntax based like ClassAds

• Asymmetric matchmaking

– Ontology based matching• Ontologies, domain background knowledge, matchmaking

rules

Page 19: Global Community

Specifying Resource and Job Requirements

Resource requirements: Machine typeNumber of nodesMemoryNetwork

Job or scheduler parameters: DirectoryExecutableArgumentsEnvironmentMaximum time required

Page 20: Global Community

Resource and Job Specification

Globus: Resource Specification Language (RSL)&(executable=myprog) (|(&(count=5)

(memory>=64)) (&(count=10)(memory>=32)))

Condor: Classified adsResource owners advertise abilities and

constraintsApplications advertise resource requestsMatchmaking: match offers & requests

Page 21: Global Community

Components of Globus Resource Management Architecture

Resource specification using RSL Resource brokers: translate resource

requirements into specifications Co-allocators: break down requests for

multiple sites Local resource managers: apply local, site-

specific resource management policies Information about available compute

resources and their characteristics

Page 22: Global Community

Resource Specification Language

Common notation for exchange of information between components

API provided for manipulating RSL

Page 23: Global Community

RSL Syntax

Elementary form: parenthesis clauses(attribute op value [ value … ] )

Operators Supported:<, <=, =, >=, > , !=

Some supported attributes:executable, arguments, environment, stdin,

stdout, stderr, resourceManagerContact,resourceManagerName

Unknown attributes are passed through May be handled by subsequent tools

Page 24: Global Community

Constraints: “&”

For example:

& (count>=5) (count<=10)

(max_time=240) (memory>=64)

(executable=myprog) “Create 5-10 instances of myprog, each

on a machine with at least 64 MB memory that is available to me for 4 hours”

Page 25: Global Community

Multirequest: “+”

A multirequest allows us to specify multiple resource needs, for example

+ (& (count=5)(memory>=64)

(executable=p1))

(&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine

with at least 64M of memoryExecute p2 on a machine with an ATM

connection Multirequests are central to co-allocation

Page 26: Global Community

Resource Broker

Takes high-level RSL specification Transforms into concrete specifications

through “specialization” process Locate resources that meet requirements

Multiple brokers may service single request Application-specific brokers translate

application requirements

Output: complete specification of locations of resources; given to co-allocator

Page 27: Global Community

Examples of Resource Brokers

Nimrod-GAutomates creation and management of

large parametric experimentsRun application under wide range of input

conditions and aggregate resultsQueries MDS to find resourcesGenerates number of independent jobsGRAM allocates jobs to computational nodesHigher-level broker: allows user to specify

time and cost constraints

Page 28: Global Community

Examples of Resource Brokers

AppLeSApplication Level SchedulerMap large number of independent tasks to

dynamically varying pool of available computers

Use GRAM to locate resources and initiate and manage computation

Page 29: Global Community

GRAM GRAM GRAM

LSF EASY-LL NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Resource Management Architecture

Page 30: Global Community

Resource co-allocators

May request resources at multiple sitesTwo or more computers and networks

Break multi-request into components Pass each component to resource manager Provide means for monitoring job status or

terminating job Complex:

Two or more resource managersGlobal state like availability of resources

difficult to determine

Page 31: Global Community

Different co-allocation services

1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource

2. Allocate at least N out of M resources and return

3. Return immediately, but gradually return more resources as they become available

Each useful for some class of applications

Page 32: Global Community

Concurrent Allocation

If advance reservations are available: Obtain list of available time slots from each

participating resource manager and choose timeslot

Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current

availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated

Page 33: Global Community

Disadvantages of Concurrent Allocation Scheme

Computational resources wasted while waiting for all requested resources to become available

Application must be altered to perform barrier to synchronize startup across components

Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

Page 34: Global Community

Local Resource Managers

Implemented with Globus Resource Allocation Manager (GRAM)1. Processing RSL specifications representing

resource requests Deny request Create one or more processes (jobs) that satisfy

request

2. Enable remote monitoring and management of jobs

3. Periodically update MDS information service with current availability and capabilities of resources

Page 35: Global Community

GRAM (cont.)

Interface between grid environment and entity that can create processesE.g., Parallel scheduler or Condor pool

GRAM may schedule resource itself More commonly, maps resource

specification into a request to a local resource allocation mechanismE.g., Condor, LoadLeveler, LSF

Co-exists with local mechanisms

Page 36: Global Community

GRAM (cont.)

GRAM API has functions for:Submitting a job request: produces globally

unique job handleCanceling a job requestAsking when job request is expected to runUpon submission, can request that progress

be signaled asynchronously to callback URL

Page 37: Global Community

GRAM Scheduling Model

Jobs are either:Pending: resources have not yet been

allocated to the jobActive: resources allocated, job runningDone: when all processes have terminated

and resources have been deallocatedFailed: job terminates due to :

explicit terminationerror in request formatfailure in resource management systemdenial of access to resource

Page 38: Global Community

GRAM Components Gatekeeper

Responds to a request:

1. Performs mutual authentication of user and resource

2. Determines local user name for remote user

3. Starts a job manager that executes as local user and handles request

Page 39: Global Community

GRAM Components (cont.)

Job managerCreates processes requested by userSubmits resource allocation requests to

underlying resource management system (or does fork)

Monitors state of created processesNotifies callback contact of state transitionsImplements control operations like

termination

Page 40: Global Community

GRAM Components (cont.)

GRAM reporter

Responsible for storing into MDS (information service) info about:Scheduler structure

Support reservations?Number of queues

Scheduler stateCurrently active jobsExpected wait time in queueTotal number of nodes and available nodes

Page 41: Global Community

Job Submission Interfaces

Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructure

Page 42: Global Community

Scheduling in Grid

Optimize Performance: execution time, throughput, fairness and etc. (QoS)

Load balancing. Help to design an effective program

model. Ubiquity. process scheduling in operating

system, task scheduling in parallel computing and scheduling in real life too.

Page 43: Global Community

Scheduling in GRID Application level. resource e.g. data, communication

bandwidth. Models, scheduling policy, program model,

performance model, performance measurement.

Current performance measure, minimize execution time.

Page 44: Global Community

Requirements on GRID scheduling model

Adaptive to the dynamic environment. Adaptive to the varying performance

metrics upon the course of application execution.

Performance predictions over time. Coarse and fine-tuning the component

parameters.

Page 45: Global Community

Techniques commonly employed

Parameterize the components in an application.

Make use of dynamic information, e.g. CPU slots available percentage, network bandwidth available percentage.

Compositional scheduling model, structural character of application and dynamic interaction with grid environment.

Page 46: Global Community

Scheduling Policy

Choose a set of resources to achieve the performance goal.

Fist Come, First Serve. Preemptive. Fair Queuing. And etc.

Page 47: Global Community

AppLes: Application-Level Scheduler

Everything evaluated in terms of the impact on the application, so the resources are evaluated in terms of the predicted capacities and their potential for requirements.

No resource manager is assumed. On User-level, no specific privilege required. Heterogeneous and cross organization. Depends on use Network Weather Service

for the dynamic resource load and availability.

Page 48: Global Community

AppLes(Cont’d)

Information gathered by the network weather service is used to parameterize performance models and to predict the state of grid resources at the time the application will be scheduled.

Time balancing, all processors are assigned some possibly nonuniform amount of the goal that they will all finish at roughly the same time.

Compositional component models is deployed. Adaptive scheduling scheme.

Page 49: Global Community

Conclusion

Scheduling is the key for performance in grid environment.

Coordinating resources in grid environment Most advanced grid application are

targeted to specific resources. High-Performance Scheduling Evolution.

Page 50: Global Community

Open issues• Multiple layers of schedulers

– The higher level scheduler has less information about the remote resources, local resource managers actually control the resources

• Lack of control over resources– Grid scheduler does not have ownership or control over the resources

• Shared resources and variance– No dedicated access to the resources (resources are shared)– This results in a high degree of variance and unpredictability

• Conflicting performance goals– Many participants have different/conflicting preferences– Many different local policies, cost models, security