eu 2nd year review – 04-05 feb. 2003 – title – n° 1 wp8: progress and testbed evaluation f...

19
EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) [email protected]

Upload: teresa-henderson

Post on 28-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1

WP8: Progress and testbed evaluation

F Harris (Oxford/CERN)

(WP8 coordinator )

[email protected]

Page 2: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 2

Outline of the presentation

Overview of the objectives for the 2nd project year, and the corresponding achievements

Activities of funded and unfunded effort

Ongoing work on use cases

Data Challenge work with Atlas and CMS

Comments on the key points of work in the other 4 WP8 experiments

The organisation for D 8.3 ‘Testbed assessment for HEP applications’

The planning for the 3rd project year, and some associated issues

QUESTIONS

Page 3: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 3

The objectives for 2nd project year, and the corresponding achievements

OBJECTIVES

Use and exploitation of Testbed1

Validation of releases + feedback

Participation in the Architecture group (ATF), and the elaboration of use cases

ACHIEVEMENTS

Babar and D0 have joined the 4 LHC experiments, and NA48 will soon join. 5 experiments have used the applications testbed. All WP8 experiments have continued to develop their distributed computing infrastructure in Europe and USA

Both EIPs and the experiments have given continual feedback to middleware from both generic and experiment specific evaluations

ATF is very active and execute regular ‘scenario playing’ reviews. Use case documents have been produced and will develop in the context of EDG/LCG

Page 4: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 4

Overview of objectives for 2nd project year, and the corresponding achievements

OBJECTIVES

Design of a common middleware layer for WP8 experiments

Use of EDG middleware in experiment Data Challenges (DCs)

Developments of tutorials and documentation for the user community

ACHIEVEMENTS

This has moved into the LHC Computing Grid (LCG) project

Atlas and then CMS experiments have achieved significant pioneering work in the use of EDG middleware for DCs, and in producing detailed evaluations

WP8 has played a substantial role in course design, implementation and delivery

Page 5: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 5

Activities of funded and unfunded effort

WP8 used 51 funded man-months instead of the projected 43.5 (January - November)

Complemented with 350 unfunded man-months from experiments which has largely concentrated on experiment specific activities

The EIP (Experiment Independent Persons) have been involved in Functionality* and stress testing Middleware debugging campaigns* Configuration and testing of Storage Elements and Virtual Organisations* Data Challenges of the ATLAS and CMS experiments Organisation of WP8 Integration Team* and Architectural Task Force

* Activities unforeseen in original mandate

Page 6: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 6

Ongoing work on use cases

‘Common Use Cases for A HEP Common Application Layer’ (HEPCAL)

(Document produced for LCG; chaired and largely manned by WP8 people,

and only possible thanks to WP8 experience)

General (authorisation,login,browse resources) 4 use cases Data Management (metadata and data operations) 19 use cases Job Management (submission,control,monitoring,errors 16 use cases

,resource estimation, job splitting…….) VO Management (resource reservation,user rights 4 use cases

,software publishing…)

. EDG 1.4.3 satisfies use cases for a basic system(authorisation/authentication,data handling,job submission)

. EDG 2.0 will satisfy more advanced data handling e.g. (metadata) and HEP data transformation

. There are other areas for discussion e.g. virtual data, experiment s/w publishing

This work will continue within EDG and LCG

IN ATF there is regular scenario playing for use cases to check existing and future design

Page 7: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 7

Overview of data challenge work ATLAS (pioneers!) Specific Goals

Compare results with those obtained without Grid in previous months for ~100 ‘long’ detector simulation jobs

Make prioritised list of recommendations to EDG for bug-fixes and future developments in an evaluation report

Organization Joint Atlas/EDG/LCG effort

Resources used (and functions) Sites (CERN,RAL,Lyon,Nikhef,CNAF)

+ (Karlsruhe) Several UIs Milan,CERN,Cambridge RB CERN(shared)

RC Originally shared with CMS. Finally separate one at CNAF

CMS

Specific Goals Aim for as many simulated events as

possible for physics analysis, with 1000’s of ‘short’ event generation and ‘long’ detector simulation jobs, using the full production system

Measure performances, efficiencies and reason of job failures to give detailed feedback to middleware in a detailed report

Organization

This was a joint effort involving CMS, EDG, EDT and LCG people

Resources used (and functions) Sites (CERN,RAL,Lyon,Nikhef,CNAF) +

(Legnaro,Padova,Ecol. Poly,I.C) SeveralUIs CNAF, Padova, Ecol.P,I.C

Several RBs

CNAF(CMS),CNAF(shared), CERN(CMS),IC(CMS+Babar)

RC Originally shared with Atlas. Finally separate one at CNAF

Page 8: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 8

History-relating applications work to TB versions

Version Date

1.1.2 27 Feb 2002

1.1.3 02 Apr 2002

1.1.4 04 Apr 2002

1.2.a1 11 Apr 2002

1.2.b1 31 May 2002

1.2.0 12 Aug 2002

1.2.1 04 Sep 2002

1.2.2 09 Sep 2002

1.2.3 25 Oct 2002

1.3.0 08 Nov 2002

1.3.1 19 Nov 2002

1.3.2 20 Nov 2002

1.3.3 21 Nov 2002

1.3.4 25 Nov 2002

1.4.0 06 Dec 2002

1.4.1 07 Jan 2003

1.4.2 09 Jan 2003

1.4.3 14 Jan 2003

RC Changes

Mixed Globus 2.0/2.2RB/JSS Upgrade

Known Problems:• GASS Cache Coherency• Race Conditions in Gatekeeper• Unstable MDS

Successes• Improved MDS Stability• FTP Transfers OKKnown Problems:• Interactions with RC

Real Use by Applications!Limitations: • Resource Exhaustion• Size of Logical Collections

Successes• Matchmaking/Job Mgt.• Basic Data Mgt.Known Problems:• High Rate Submissions• Long FTP Transfers

ATLAS commence phase1 tests

CMS start stress tests Nov 30

which continue till Dec 20

•Problems with long jobs•Instability in MDS•Long file transfers unreliable

CMS and Atlas evaluate 1.4.3

Page 9: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 9

Atlas evaluations (August and Dec/Jan) (DETAILED PAPER IN PREPARATION)

RESULTS (see Atlas jobs in DEMO tomorrow) Atlas software was used in the EDG Grid environment Several hundred simulation jobs of length 4-24 hours were executed , data was

replicated using grid tools Results of simulation agreed with ‘non-Grid’ runs

OBSERVATIONS Good interaction with EDG middleware providers and with WP6/8 With a substantial effort it was possible to perform the jobs Showed up bugs and performance limitations (fixed or to be fixed in EDG 2.0)

WP1 Many ‘Long Jobs’ failed (now much better) WP2 Replication Tools were difficult to use and unreliable WP3 Information Service based on MDS gave poor performance (affected

WP1) WP4 We need to separate out application and system software

installations (fixed in 1.4.3)

We need EDG 2.0 release for use in large scale data challenges

RECOMMENDATIONS (see combined ATLAS/CMS recommendations…)

Page 10: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 10

SECE

CMS software

CMS production components interfaced to EDG middleware (more details in DEMO )

BOSSDB

WorkloadManagement

System

JDL

RefDB

parameters

data registration

Job output filteringRuntime monitoring

input

dat a

lo

cat i

on

Push data or info

Pull info

UIIMPALA/BOSS

CMS production tools on UI: job creation, job submission and monitoring

CMS software RPM-based installed on CEs/WNs

Replica Manager

CE

CMS software

CE

CMS software

CE

WN

SECE

CMS software

SE

SE

SE

Page 11: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 11

Main results and observations from CMS work (detailed doc in preparation)

RESULTS

Could distribute and run CMS s/w in EDG environment

Generated ~250K events for physics with ~10,000 jobs in 3 week period

OBSERVATIONS

Were able to quickly add new sites to provide extra resources

Fast turnaround in bug fixing and installing new software

Test was labour intensive (since software was developing and the overall system was fragile)

WP1 At the start there were serious problems with long jobs- recently improved WP2 Replication Tools were difficult to use and not reliable, and the performance of the

Replica Catalogue was unsatisfactory WP3 The Information System based on MDS performed poorly with increasing query rate The system is sensitive to hardware faults and site/system mis-configuration The user tools for fault diagnosis are limited

EDG 2.0 should fix the major problems (see talks by R Jones and E Laure) providing a system suitable for full integration in distributed production

Page 12: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 12

CMS event production in December 2002using EDG software and applications TB

Nb

. of

evts

time

http://cmsdoc.cern.ch/cms/production/www/html/general/index.html

Page 13: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 13

CMS/EDG Summary of Stress Test Preliminary Analysis

Status EDG evaluation CMS evaluation EDG ver 1.4.3

Finished Correctly 5518 4601 604Crashed or bad status 818 1099 65

Total number of jobs 6336 5700 669

Efficiency 0.87 0.81 0.90

CMKIN jobs

Status EDG evaluation CMS evaluation EDG ver 1.4.3

Finished Correctly 1678 2147 394Crashed or bad status 2662 934 104

Total number of jobs 4340 3081 498Efficiency 0.39 0.70 0.79

CMSIM jobs

Short jobs

Long jobs

After Stress Test – Jan 03

After Stress Test – Jan 03

Page 14: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 14

EDG reasons of failure (categories) Preliminary analysis of pre Xmas (1.4.0)

CMKIN (short) jobsStatus Totals

Crashed jobs 818

Reasons of Failure for Crashed jobs

No matching resource found 509

Generic Failure: MyProxyServer not found in JDL expr. 102Running 74Failure while executing job wrapper 37Other failures 96

CMSIM (long) jobsStatus Totals

Crashed jobs 2662

Reasons of Failure for Crashed jobs

Failure while executing job wrapper 1476No matching resource found 722Globus Failure: Globus down/Submit to globus failed 144Running 116Globus Failure 90Other failures 114

Page 15: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 15

Joint recommendations from Atlas/CMS work

There are essential developments (see EDG 2.0) needed in Data Management (robustness and functionality)

Information Systems (robustness and scalability)

Workload Management (scalability for high rates, batch submissions,output file specification)

Mass Storage Support (gridified support due in EDG 2.0)

We must maintain and strengthen joint Experiment/EDG work in the evaluation of system components AND the architecture (both will need to evolve – GRID developments are R/D)

Once middleware providers have done their ‘unit tests’ the applications must work with them in the areas of:

Performance evaluation for the user with increasing rates of job submission and data handling, and an expanding TB configuration

Streamlining procedures for feedback to middleware providers

EDG should provide site validation and monitoring procedures

EDG should provide good user tools for fault detection and diagnosis (what is job status?, why did it fail?……..)

Page 16: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 16

Some key points of work in the other experiments

ALICE developed scripts for the installation of ALICE software on EDG/CEs

developed a WEB interface to automatically submit jobs to the testbed and evaluate its "efficiency" (currently in use)

Current development of the AliEn/EDG interface (included effort from DataTAG) able to send jobs to EDG via AliEn Currently completing the tests for registering/accessing data on/from both

catalogues (AliEn and EDG), which is required for the interoperability

LHCb consolidation of basic job submission capability demonstrated at EU review,

and at the opening of National E-science Center, Edinburgh, 25 April

made RPMs for LHCb environment

included DataGrid in new LHCb distributed production system (DIRAC) and demonstrated that short DataGrid jobs can be submitted and managed via DIRAC

Page 17: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 17

Babar Deployment of the BaBar VO:

VO and RC at Manchester RB at IC CE/SE/WN at SLAC, In2p3, RAL and Ferrara.

Deployment and adaptation of EDG software at SLAC (the EDG scripts had to be modified for the WN inside the Internet Free Zone)

Successfully tested BaBar analysis and simulation jobs within the EDG framework.

Next step is to run full scale analysis on the Grid.

D0 A D0 replica catalogue and VO server have been set up at Nikhef 124 CPU farm has been successfully used with EDG s/w D0 support was added to the official EDG release, and several sites now support

D0 jobs and have installed the RPMs. Will try the newer release (and true Grid production) when RH 7.2 support

appears

Page 18: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 18

The key content for D 8.3 ‘Testbed assessment for HEP applications’

‘Datagrid as HEP production environment’ Detailed evaluations of Atlas and CMS Task Forces

Evaluations by other LHC experiments (Alice,LHCb)

Evaluations from non-LHC experiments (Babar,D0)

Mapping of evaluations to the ‘common use cases’ General use cases

Data management

Job Management

VO management

Summary of lessons learned for future EDG development, and statement of priorities for the experiments

Page 19: EU 2nd Year Review – 04-05 Feb. 2003 – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator ) f.harris@cern.ch

EU 2nd Year Review – 04-05 Feb. 2003 – WP8 progress and testbed evaluation – n° 19

The planning for the 3rd project year, and associated issues

PLANNING Continue work with experiments using the successful Task Force Model for Data

Challenges Complete D8.3 for end March 2003 (based on release 1.4.3) Continue architecture work in ATF, and participate to LCG use case/architecture

activities Evaluate EDG 2.0 software, and port it to experiment software environments for use in

the data challenges Complete D8.4 by Dec 2003 (based on release 2.x)

SOME IMPORTANT ISSUES Must organise detailed test sessions involving experiments and the providers of

middleware for information systems, data management and mass storage handling in the context of moving to EDG 2.0

We look for improved diagnostic information from middleware in case of problems WP8 will work increasingly with experiments rather than in generic testing, which will

taken up by the WP6 Testing Group We must relate EDG/WP8 work to the use by experiments of the forthcoming LCG

Prototype, both in terms of software, hardware and user support We should re-activate inter-application WG (8+9+10)