technical status of the project by bob jones

18
Bob Jones –May 16, 2022- n° 1 Technical Status of the Project Bob Jones

Upload: softwarecentral

Post on 24-May-2015

289 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 1

Technical Status of the Project

Bob Jones

Page 2: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 2

Overview

Testbed status

Application status

Project retreat Issues and actions for software process, current (EDG 1.2) and

future releases

Tutorials

Summary

Page 3: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 3

Testbed Status

Application testbed Running EDG 1.2 on 5 core sites (and a couple of others)

Since first week of August (several months later than initially planned)

Users guide and release notes available (installation guide will come later)

Being used for application tests Current “show-stopper” issues found:

Long job status problem (can’t retrieve output) Long file transfers problem (20 mins limit)

Development testbed Testing urgent updates to EDG 1.2

More recent beta release of Globus 2 Various fixes for data management chain

Page 4: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 4

Application Status WP8: High Energy Physics

LHC experiments doing tests now

ATLAS task force

WP9: Earth Observation Installation of EDG 1.2 at ESA done

Testing to start in September

WP10: Biology Initial tests made with EDG 1.2

Overall comments: General confusion about how best to use data mgmt tools

Software not yet stable enough and insufficient diagnostics information available

Too difficult to configure

Concern that EDG 1.2 in its current configuration will not scale easily to ~40 sites

Page 5: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 5

ATLAS Task Force Task for with ATLAS & EDG people (lead by Oxana Smimova)

http://cern.ch/smirnova/atlas-edg

ATLAS is eager to use Grid tools for the Data Challenges ATLAS Data Challenges are already on the Grid (NorduGrid, iVDGL)

The DC1/phase2 (to start in October) is expected to be done mostly using the Grid tools

By September 16 (ATLAS SW week) evaluate the usability of EDG for the DC tasks

The task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one non-EDG site (Karlsruhe)

Intensive activity has meant they could process some partitions but problems with long running jobs is still an issue

Data Management chain is proving difficult to use and sometime unreliable

Need to clarify policy for distribution/installation of applications software

On-going activity with very short-timescale: highest priority task

Page 6: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 6

Project Retreat Project retreat held last week (27 & 28 August) at Chevannes

~45 participants work package managers, architecture group, quality group,

applications groups, mware experts, representatives from LCG, DataTAG, Globus & Condor

Agenda and material on the web: http://documents.cern.ch/age?a021130 Photos by Jeff Templon

http://www.nikhef.nl/~templon/chavannes/index.html

3 sessions addressing most important aspects of projects current work: Software Release Process

Release 1.2

Testbed 2

Page 7: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 7

Software Process

Over-simplification of the current situation:

1. Mware groups develop software in isolation

2. ITeam assembles it as best it can

3. Site managers are asked to install it

4. Application groups are asked to test it

Problems:

No place for the mware groups to integrate software before delivering it to the ITeam

Inadequate software testing – leads to installation/configuration/execution faults

We are running blind – no way to control or reliably plan software delivery

Page 8: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 8

Software Process: Autobuild

A release manager will be nominated with overall responsibility for ensuring the procedure is followed

Make autobuild tools the basis of the daily work of the mware groups and ITeam

Nightly build from CVS repository for all software Problems must be fixed ASAP – checked by Quality Group reps

Mware groups give ITeam CVS tags instead of RPMs Tagged software must be documented

Mware group must perform and supply unit tests Integrated with nightly build

Tagged software that fails the integration, testing or is inadequately documented will be rejected

Mware group is responsible for fixing it

Page 9: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 9

Software Process: Quality Group

Recently formed Quality Group, convened by Gabriel Zaquine, is responsible for ensuring quality issues are addressed within the WPs

Ensure unit test plans are complete and followed

Follow-up on problems reported bugzilla & nightly builds

Organise running of code checking tools on all EDG software

Agree on adopted project developer-guidelines etc.

http://eu-datagrid.web.cern.ch/eu-datagrid/QAG/default.htm

Page 10: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 10

Software Process: Testing

Strengthen the Testing Group Identify leader and a small number of full-time testers Assemble and maintain test suite integrated with autobuild tools

Automate installation and configuration of software releases To permit auto testing need to be able to auto install & configure a

release on a pre-defined small example site Needs improvements by mware WPs to simplify and complete installation &

configuration of their sw Site managers have good overview about how to do this

Need to clarify the work involved during this conference

Set-up certificate testbed Used for testing activities Involves several sites

Page 11: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 11

Technical Management

Architecture group documenting testbed 2 architecture draft: http://doc.cern.ch/archive/electronic/other/agenda/a021130/a021130s4t1/TB2Arch_v0_1.doc

Meets once a month (next meeting tomorrow)

Project Tech. Board addresses deliverables and relationships with other projects Meets once per quarter (next meeting 2nd October @ CERN)

http://documents.cern.ch/AGE/current/displayLevel.php?fid=3l131

Need more frequent technical management forum Authority to make technical & architectural decisions affecting sw development in WPs

Include WP managers, chaired by the Technical Coordinator Can call on mware experts according to needs of the themed agenda

Meets frequently to ensure issues are addressed rapidly Associate with WP managers weekly meeting

Relationship with Architecture Group and Project Tech. Board needs to be clarified

Page 12: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 12

Testbed Support

Strengthen user support group Ensure people involved have sufficient knowledge of the software Emphasis on the accurate and usefulness of the responses provided

Tools used for support are a secondary issue

Federate with equivalent groups from other projects Provides support on the application testbed

Clarify & document procedures Creating a new CA (CA group)

Need to reduce time involved (currently 3 months)

Site Installation (site managers & ITeam) Steps for system manager and requirements for a site to join the testbed

Creating & Managing a Virtual Organisation (site managers & ITeam)

Steps involved and tasks of a VO manager

Page 13: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 13

Release Development

Continuous support

CVSAutobuild

Nightly build Incremental Improvements Continuous

Support

1.2

2.0

branch

Current “show-stoppers” fixed

Incremental Improvements from mware WPs2.x

Application testssatisfied

branch

Migrate sites

Etc.

patches

Changes foreseen for 1.3 & 1.4become “incremental improvements”

patches

Page 14: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 14

Incremental Steps from EDG 1.2

1. Fix “show-stoppers” for application groups – mware WPs (continuous)

2. Build EDG1.2.x with autobuild tools - Iteam

3. Integrate testing framework and limited automatic tests with autobuild tools - testing group

4. Automatic installation & configuration procedure for pre-defined site (can’t auto test without it)

5. Start autobuild server for RH 7.2 and attempt build of release 1.2 – Yannick Patois

6. New LCFG - WP4

7. GridFTP server access to MSS - WP5

7. Giggle & Reptor – WP2

8. LCAS with dynamic plug-in modules – WP4

9. NetworkCost Function – WP7

10. Integrate mapcentre (nordugrid?) and R-GMA – WP3

11. GLUE modified info providers/consumers – WP1,4,5

12. Res. Broker – WP1

13. LCFG for RH 7.2 – WP4

14. Integration with Condor as batch system – WP4

What do we do about: Space mgmt, VOMS, slashrgrid?

EndSept2002

Expect this list to be

discussed/updated this week

Page 15: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 15

EDG Tutorial

DAY1

Tutorial introduction

Introduction to Grid computing and overview of the DataGrid project

Security

Testbed overview

Job Submission

lunch

hands-on exercises: job submission

DAY2

Data Management

LCFG, fabric mgmt & sw distribution & installation

Applications and Use cases

Future Directions

lunch

hands-on exercises: data mgmt

The tutorials are aimed at users wishing to "gridify" their applicationsusing EDG software and are organized over 2 full consecutive dayshttp://hep-proj-grid-tutorials.web.cern.ch/hep-proj-grid-tutorials/dry.aspuser:griduser passwd:tutorials123

Page 16: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 16

Tutorial rehearsal Rehearsal at CERN, 29 & 30 August

19 participants (members of project or closely related) to check material & approach

Lessons learnt Can’t cover as much material as we hoped (goes to fast)

Explain why not just how Avoid details – can read them from references afterwards

Need as many helpers as possible for hands-on exercises Participants have difficulties with certificate management

All participants must have a certificate ready for them and be in the same VO

Generated a lot of enthusiasm in the participants and EDG people doing the hands-on

Found genuine bugs during hands-on exercises Recommend mware WPs send developers to help with hands-on exercises New project people should follow the tutorial

Thanks to:

Mario Reale, Elisabetta Ronchieri, Akos Frohner, Erwin Laure, Peter Kunszt, Antony Wilson, Steve Fisher, Maite Barroso Lopez, Owen Synge, Emanuele Leonardi, Steve Traylen, Frank Bonnassieux, Christophe Jacquet, Sophie Nicoud, Karin Burghauser & CERN training people

Page 17: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 17

Tutorial Schedule CERN school of Computing, Naples, 23-27 September

80 participants. Hands-on exercises only (presentations by Carl Kesselman & Ian Foster)

ALL EDG people attending should do exercises first and help others at the school

CERN, October 3 & 4

NeSC, Edinburgh, December Dates still moving. Maximum 30 participants (more for the presentations)

We could accommodate more sites in December, January etc. Sites must provide support and handle logistics

Organisers/helpers must attend tutorial at another site first

The tutorial does represent some load on the testbed (own VO & cert. creation)

For the future Hands-on exercises are a test suite - automate and run with the nightly checks The material must be kept up to date with each public release of the software

We need to nominate people responsible for the different chapters of the tutorial to be responsible for ensuring the slides and exercises are kept up to date

Page 18: Technical status of the project by Bob Jones

Bob Jones –April 12, 2023- n° 18

Summary Addressing the serious bugs found by the application groups on the

testbed is the task with the highest priority

Testing activities need more resources

Test-bed support is becoming a more important task

Future releases must continue to address the needs of the application groups

We need to clarify the following points during this conference: Autobuild status & how automate installation & configuration

Contents of the test-suites

Release plans until the next EU review

In short:What we are doing is right, we are just going about it in a sloppy manner

Need to go one step at a time and ensure each step works