the koala grid scheduler over das-3 and grid’5000

23
DAS-3/Grid’5000 meeting: 4th December 2006 1 The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru Iosup, Mathieu Jan , Hashim Mohamed, Ozan Sonmez Parallel and Distributed Systems Group

Upload: kenton

Post on 30-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

The KOALA Grid Scheduler over DAS-3 and Grid’5000. Processor and data co-allocation in grids. Dick Epema, Alexandru Iosup, Mathieu Jan , Hashim Mohamed, Ozan Sonmez. Parallel and Distributed Systems Group. Contents. Our context: grid scheduling and co-allocation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

1

The KOALA Grid Schedulerover DAS-3 and Grid’5000 Processor and data co-allocation in grids

Dick Epema, Alexandru Iosup, Mathieu Jan, Hashim Mohamed, Ozan Sonmez

Parallel and Distributed Systems Group

Page 2: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

2

Contents

• Our context: grid scheduling and co-allocation

• The design of the KOALA co-allocating scheduler

• Some performance results

• KOALA over Grid’5000 and DAS-3

• Conclusion & future work

Page 3: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

3

Grid scheduling environment

• System

• Grid schedulers usually do not own resources

themselves

• Grid schedulers have to interface to different

local schedulers

• Sun Grid Engine (SGE 6.0 ) on DAS-2/DAS-3

• OAR on Grid’5000

• Workload

• Various kind of applications

• Various requirements

Page 4: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

4

Co-Allocation (1)

• In grids, jobs may use multiple types of resources in multiple sites: co-allocation or multi-site

operation

• Without co-allocation, a grid is just a big load-sharing device

• Find suitable candidate system

for running a job

• If the candidate is not suitable

anymore, migrate

multiple separate jobs

grid

Page 5: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

5

Co-Allocation (2)

• With co-allocation

• Use available resources (e.g., processors)

• Access and/or process geographically spread data

• Application characteristics

(e.g., simulation in one location,

visualization in another)

• Problems

• More difficult resource-discovery process

• Need to coordinate allocations of local schedulers

• Slowdown due to wide-area communications

single global job

grid

Page 6: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

6

A model for co-allocation: schedulers

global queuewith gridscheduler

LS

local queues

with local schedulers

local jobs

global job

KOALA

clusters

LS LS

load sharingco-allocation

non-local job

Page 7: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

7

A model for co-allocation: job typesfixed job

flexible job

non-fixed job

scheduler decides on component placement

scheduler decides on split up and placement

job components

same total job size

job component placement fixed

Page 8: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

8

A model for co-allocation: policies

• Placement policies dictate where the components of a job go

• Placement policies for non-fixed jobs

• Load-aware: Worst Fit (WF)

(balance load in clusters)

• Input-file-location-aware: Close-to-Files (CF)

(reduce file-transfer times)

• Communication-aware: Cluster Minimization

(CM)

(reduce number of wide-area messages)

• Placement policies for flexible jobs:

• Communication- and queue time-aware: Flexible Cluster

(CM + reduce queue wait time)

Minimization (FCM)

Page 9: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

9

KOALA: a Co-Allocating grid scheduler

• Main goals

1.Processor co-allocation: non-fixed/flexible jobs

2.Data co-allocation: move large input files to the locations where the job components will run prior to execution

3.Load sharing: in the absence of co-allocation

• KOALA

• Run alongside local schedulers

• Scheduler independent from Globus

• Uses Globus components (e.g., RSL and GridFTP)

• For launching jobs uses its own mechanisms or Globus DUROC

• Has been deployed on the DAS2 in September 2005

Page 10: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

10

KOALA: the architecture

• PIP/NIP: information services• RLS: replica location service• CO: co-allocator• PC: processor claimer

• RM: run monitor• RL: runners listener• DM: data manager• Ri: runners

SGE ?

Page 11: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

11

KOALA: the runners

• The KOALA runners are adaptation modules for different application types

• Set up communication

• Launch applications

• Current runners

• KRunner: default KOALA runner that co-allocates processors and that’s it

• DRunner: DUROC runner for co-allocated MPI applications

• IRunner: runner for applications using the Ibis Java library for grid applications

Page 12: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

12

KOALA: job flow with four phases

new submission

placejob

+_

placement queue claiming queue

+

_

Phase 1:job

placement

Phase 2: file

transfer

Phase 3:claim

processors

Phase 4:launchjob

runners

claimprocessors

retry retry

Page 13: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

13

KOALA: job time line

• If advanced reservations are not supported, don’t claim processors immediately after placing, but wait until close to the estimated job start time

• So processors are left idle (processor gained time)• Placing and claiming may have to be retried multiple

times

timejob

placementestimated start time

claiming time

estimated file-transfer time

processor gained time

processorwasted time

jobsubmission

Page 14: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

14

KOALA: performance results (1)

• With replication (3 copies of input files, 2, 4, or 6 GB)

• Offer a 30% co-allocation load during two hours

• Try to keep the background load between 30% and 40%

time (s)

utilization (%)

90 KOALA workloadbackground loadprocessor gained timeprocessor wasted time

1x8 2x8 4x8 1x16 2x16 4x16

job size (number of components X component size)

CF placement triesWF placement triesCF claiming triesWF claiming tries

20

See, e.g.: H.H. Mohamed and D.H.J. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters,” IEEE Cluster 2004.

number of tries

CF

Page 15: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

15

KOALA: performance results (2)

Avg. Wait Time (sec.)

0

100

200

300

400

500

600

700

800

Wload-1 Wload-2

WF

CM

FCM

Avg. Execution Time (sec.)

0

20

40

60

80

100

120

140

160

180

200

Wload-1 Wload-2

WF

CM

FCM

Avg. Middleware Overhead (sec.)

0

20

40

60

80

100

120

1 2 3 4 5

Number of Components

Wload-1

Wload-2

• Communication-intensive applications• Workload 1: low load• Workload 2: high load• Background load: 15-20%

workload 1 workload 2

average wait time (s)

average execution time (s)

average middleware overhead (s)

number of job components

workload 1 workload 2

See: O. Sonmez, H.H. Mohamed, D.H.J. Epema, Communication-Aware Job-Placement Policiesfor the KOALA Grid Scheduler, 2nd IEEE Int’l Conf.on e-Science and Grid Computing, dec. 2006.

Page 16: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

16

Grid’5000 and DAS-3 interconnection: scheduling issues

• Preserve each system usage• Characterize jobs (especially for Grid’5000)• Usage policies

• Allow simultaneous use of both testbeds• One more level of hierarchy in latencies• Co-allocation of jobs• Various type of applications: PSAs, GridRPC, etc

DAS-3

Page 17: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

17

KOALA over Grid’5000 and DAS-3

• Goal: testing KOALA policies …• … in a heterogeneous environment• … with different workloads• … with OAR reservation capabilities

• Grid’5000 from DAS-3• “Virtual” clusters inside KOALA• Used whenever DAS-3 is overloaded

• How: deployment of DAS-3 environment on Grid’5000

DAS-3

Page 18: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

18

KOALA over Grid’5000 and DAS-3: how

DAS-3

DAS-3Lyon

Orsay

Rennes

DAS-3

file-serverOAR

DAS-3

DAS-3 DAS-3

Page 19: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

19

Using DAS-3 from Grid’5000

• Authorize Grid’5000 users to submit jobs …• via SGE directly, OARGrid or KOALA• Usage policies?

• Deployment of environments on DAS-3 as in Grid’5000?• When: during nights and week-end?• Deployment at grid level

• KOALA submit kadeploy jobs

DAS-3

Page 20: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

20

Current progress

• Collected traces of Grid’5000 [done]• OAR tables of 15 clusters • OARGrid tables• LDAP database• Analysis in progress

• KOALA over Grid’5000 [in progress]• KOALA communicate with OAR for its information service [done]

• GRAM interface to OAR• “DAS-2” image on Grid’5000: Globus, KOALA, OAR

DAS-3

Page 21: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

21

Conclusion

• Use bandwidth and latency in job placements

(lightpaths?)

• Deal with more application types (PSAs, …)

• A decentralized P2P KOALA

Future work

• KOALA is a grid resource management system

• Support processor and data co-allocation

• Several job placement policies (WF, CF, CM, FCM)

Page 22: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

22

Information

• Publications• see PDS publication database at www.pds.ewi.tudelft.nl

• Web site• KOALA: www.st.ewi.tudelft.nl/koala

Page 23: The KOALA Grid Scheduler over DAS-3 and Grid’5000

DAS-3/Grid’5000 meeting: 4th December 2006

23

Slowdown due to wide-area communications

• Co-allocated applications are less efficient due to

the relatively slow wide-area communications

• Extension factor of a job

service time on multicluster

service time on single cluster

• Co-allocation is beneficial when the extension factor ≤

1.20

• Unlimited co-allocation is no good

• Communications libraries may be optimized for wide-

area communication

(>1 usually)

See, e.g.: A.I.D. Bucur and D.H.J. Epema, “Trace-Based Simulations of Processor Co-Allocation Policies in Multiclusters,” HPDC 2003.