osg area coordinator’s report: workload management maxim potekhin bnl 631-344-3621 may 8 th, 2008

5
OSG OSG Area Coordinator Area Coordinator s Report: s Report: Workload Management Workload Management Maxim Potekhin BNL 631-344-3621 [email protected] May 8 th , 2008

Upload: kristopher-tucker

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

3 Overview Workload Management Current Initiatives:  user support  With Panda Pilot code adapted to MPI running mode and initial testing done at Purdue and NERSC, will shortly contact the CHARMM team to coordinate pre-production validation  Security  Continue to work out configuration and other issues related to glexec integration. Will work to expand testing to WLCG/EGEE sites  increasing robustness of the Panda job aggregation and submission service  Working on Panda Server code review, refactoring, versioning and improvements to installation and configuration procedures  With a new set of hardware available at RACF as a dedicated test platform, we are working towards a comprehensive stress test of Panda service – work in progress

TRANSCRIPT

Page 1: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 May 8 th, 2008

OSG OSG Area CoordinatorArea Coordinator’’s Report:s Report:

Workload ManagementWorkload Management

Maxim PotekhinBNL

[email protected]

May 8th, 2008

Page 2: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 May 8 th, 2008

2

Overview Overview Workload ManagementWorkload Management

• Accomplishments Since Last Report

Code changes to glexec-enabled Panda Pilot committed to SVN and tested on both BNL and Fermilab sites Understood the issues of the environment set-up when using glexec, which includes both the OS

environment variables and the dynamic change of the working directory

Code enhancements made to the Panda Pilot to accommodate specifics of MPI clusters, with test done at Purdue and NERSC

Finalized the Panda Pilot Factory

EGEE interoperability: had consultations and met in person with EGEE/WLCG personnel regarding the status of LCMAPS/LCAS deployment, which is a pre-requisite to using glexec-enabled pilot jobs in setuid mode (note: in OSG, we are using the GUMS plugin, whose setup has been largely understood prior to that). Developed testing plan for EGEE sites, pending additional development of the LCMAPS network API.

Page 3: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 May 8 th, 2008

3

Overview Overview Workload ManagementWorkload Management

• Current Initiatives:

user support

With Panda Pilot code adapted to MPI running mode and initial testing done at Purdue and NERSC, will shortly contact the CHARMM team to coordinate pre-production validation

Security

Continue to work out configuration and other issues related to glexec integration. Will work to expand testing to WLCG/EGEE sites

increasing robustness of the Panda job aggregation and submission service

Working on Panda Server code review, refactoring, versioning and improvements to installation and configuration procedures

With a new set of hardware available at RACF as a dedicated test platform, we are working towards a comprehensive stress test of Panda service – work in progress

Page 4: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 May 8 th, 2008

4

Overview Overview Workload ManagementWorkload Management

• Issues / Concerns

Current priorities in the OSG Workload Management effort continue to be scalability and security EGEE interoperability re: glexec? (Need to keep up the effort and cooperation with EGEE) While Panda already features a comprehensive set of monitoring tools, we need to move towards more

user-friendly, efficient Panda interface for VO’s and individual users to increase the OSG’s ability to engage new organizations and researchers

Page 5: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 May 8 th, 2008

5

WMS in WBS WMS in WBS

WBS Task Information In Charge Finish Date Comment

4.1.2.1 Deliver phase 1 improvements into OSG 1.0 Wenaus 12/07/07

4.1.9.   Support security effort in facility (including GUMS) Wenaus, Potekhin 09/30/08 Well under way

4.2.1 Support OSG VOs in building, deploying and operating Workload Management Systems (WMS) that are based on just-in time job scheduling and the integration of tools used by these WMS in to the VDT

Potekhin 09/30/08 MPI integration work under way

4.2.1.1 Deliver phase 1 into OSG 1.0 Potekhin, Chiu 17/03/08? Testing in production to commence

4.2.1.2 Deliver phase 2 into OSG 1.2 Potekhin, Caballero 06/07/08 In progress

4.2.2.  Manage the allocation of compute and storage resources allocated to the OSG-ET by OSG sites and/or external resource providers.

Potekhin 09/30/08 Planning phase

4.2.3 Operate and support the hardware upon which the WMS service for OSG VO is instantiated.

Ernst 09/30/08 In progress

4.2.4. Operate and support the WMS service for the OSG VO Potekhin, Caballero 09/30/08 In progress

4.3.3.1 Job submission, execution and management Green 09/30/08