resource management and accounting working group working group scope and components progress made...

17
Resource Management and Accounting Working Group • Working Group Scope and Components • Progress made • Current issues being worked • Next steps • Discussions involving larger group

Upload: cory-hood

Post on 28-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Resource Management and Accounting Working Group

• Working Group Scope and Components

• Progress made

• Current issues being worked

• Next steps

• Discussions involving larger group

Page 2: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Working Group Scope

The Resource Management Working Group encompasses the areas of resource management, scheduling and accounting.

This working group will focus on the following software components:

• Job Manager(/Queue Manager)

• Scheduler

• Allocation Manager (and accounting)

• Meta Scheduler

Page 3: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Proposed Component Architecture

Job/QueueManager

AllocationManager

Collector

MetaScheduler

Scheduler

NodeManager

ProcessManager

SecuritySystem

InformationService

DiscoveryService

Color Key

Working Group

Resource Management and Accounting

Execution Management and Monitoring

Node Config and Infrastructure

Page 4: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Proposed Component Architecture

Scheduler

PBS server

PBS Mom

QueueManager

ProcessManager

Collector

NodeMonitor

JobManager

Job Management

Node Management1 2 3 4

ba

Page 5: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Component Interaction DiagramJob submitted to Queue Manager

UserInterface

Node Manager

MetaScheduler

Job Manager

Allocation Manager

Scheduler ProcessManager

21

34

65

7

9

8

10

11

Page 6: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Component Interaction TraceJob submitted to Queue Manager

1. A user submits a job to the Queue Manager2. The Queue Manager does a sanity balance check with the Bank3. The Queue Manager notifies the Scheduler that a new job has arrived4. The Scheduler queries node and job status until job can run5. A bank reservation is made with the Allocation Manager6. The Scheduler requests the Queue Manager to run the job7. The Queue Manager passes job control to the Process Manager8. The Process Manager notifies Queue Manager of job completion9. The Queue Manager notifies Scheduler of job completion10. A bank withdrawal is made with the Allocation Manager11. The user is notified of job completion

Page 7: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

General Progress

• Creation of XML marshaller/unmarshaller

• Establishment of CVS repository

• Prototype demonstration: Scheduler makes a deposit to allocation manager using XML interface

Page 8: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Scheduler Progress

• Creation of SSS Resource Manager interface (RMType SSS – half-open sockets)

• Creation of SSS Allocation Manager interface• Creation of allocation manager and resource

manager objects for management of arbitrary attributes

• Integration of XML marshaller/unmarshaller• Maui enhancements to link with C++ libs (Xerxes)• Additional regression tests

Page 9: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Meta Scheduler Progress

• Added support for data-staging interface• Added support for network proximity optimization• Initial support for checkpoint/restart

– Checkpoint aware statistics– Checkpoint aware preemption optimizations

• Sqsub client created allowing PBS-style jobs to be submitted and metascheduled

• Initial work on translation library (PBS->silver & silver->RS2)

• Stability enhancements

Page 10: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Job Manager Progress

• Initial job manager specification defined• Interacted with process manager working group

and drafted specification proposals for task manager and node manager and how they will interact with RMWG components

• Initial study on PBS to determine viability of dissection possibilities and functionality enhancements

Page 11: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Allocation Manager Progress

• Draft requirements document underway• XML schema version 0.3 reworked to have

explicit request & response elements• From scratch allocation manager being used as

prototype to test XML interface• Implemented create, query, modify and delete for

user, account and membership objects (interacting with database over JDBC)

Page 12: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Allocation Manager Progress (contd)

• Stubbed in dummy withdrawal and successfully demo’d XML interface with scheduler (validating against schema)

• Logging, config files, error handling

• General purpose dcecp-like client allows output formatting by utilizing metadata from queries

Page 13: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Current Issues

• Job Manager/Queue Manager as separate or unified components

• How to split up PBS (if at all) and at what levels (if any) to refit with XML interface

• Working with Software Engineering Working Group to decide on test framework

Page 14: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Next Work

• All components under CVS• Establish initial resource management interface

specifications for release• Scheduler demos by next face-to-face:

– Scheduler to process manager (over XML)– Scheduler to node manager (over XML)– Scheduler to job manager (over XML)– Drive an end-to-end checkpoint request– Scheduler talks to registry and discovery service

Page 15: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Next Work

• Job manager/queue manager milestones– Submission client submits job to queue manager and

queue manager reports status to user client

– Scheduler implements query to obtain job info from queue manager

– Scheduler starts a job (requires implementation of task manager interface) – also cancel job

– No prolog, epilog initially. Batch only. Simple single-step jobs. Supports polling mode only. No data-staging.

Page 16: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Next Work

• Allocation manager– Completion of XML schema for remaining

objects/services– Review of requirements (SDSC, NCSA …)– Complete (1st draft of) initial requirements– Implement machine class, allocations,

reservations, withdrawals, transaction register, simple charging algorithm

Page 17: Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving

Issues requiring inter-group coordination

• Need to solidify SSS-wide standards for packaging, revision control, documentation, problem tracking, online project schedule… and establish mechanisms and places to home them.