wlcg middleware validation

14
IT-SDC : Support for Distributed Computing WLCG Middleware Validation Markus Schulz IT/SDC

Upload: jam

Post on 19-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

WLCG Middleware Validation. Markus Schulz IT/SDC. Landscape after EMI. Summary of the GDB presentation: https://indico.cern.ch/conferenceDisplay.py?confId= 197806 EGI produces UMD releases see Tiziana’s presentation at the GDB INFN (Cristina) populates the emi repository periodically - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WLCG Middleware Validation

IT-SDC : Support for Distributed Computing

WLCGMiddleware Validation

Markus SchulzIT/SDC

Page 2: WLCG Middleware Validation

2IT-SDC

Landscape after EMI

Summary of the GDB presentation: https://indico.cern.ch/conferenceDisplay.py?confId=197806

EGI produces UMD releases see Tiziana’s presentation at the GDB

INFN (Cristina) populates the emi repository periodically “blind” copy of binary RPMs (dependencies can

break) this will end March 2014

Simplified view: UMD == EMIrepo + Staged Rollout With EMIrepo == PTs + Cristina

7/10/13

Page 3: WLCG Middleware Validation

3IT-SDC

Other Services

ETICS ends in August (no impact) WLCG Repository

Managed by WLCG CERN (Maarten) HEP_OS libs, xrootd monitoring, info-xx,

yaim, vobox.... Mostly things that don’t fit into EPEL UMD does NOT integrate these

packages

7/10/13

Page 4: WLCG Middleware Validation

4IT-SDC

What do sites do?

(UMD or emi) + WLCG + PT packages “WLCG Baseline” defines minimal

versions EGI + WLCG Operations Coordination

drive transitions developments are driven by the WLCG

community

7/10/13

Page 5: WLCG Middleware Validation

5IT-SDC

Production Readiness Now

EGI Staged Rollout ensures that material that is in UMD can be installed and doesn’t fall over finds certain issues +++

mainly deployment related smoke testing

doesn’t cover all major WLCG deployment scenarios

doesn’t cover all experiment use cases

7/10/13

Page 6: WLCG Middleware Validation

6IT-SDC

Problems

PTs release directly through the EPEL path no emi QA and testing no established inter product tests focus is on self consistency within EPEL RPMs might work or not

EPEL is based on continuous independent releases UMD is based on snapshots

Not all material is in EPEL WLCG repository emi repository no consistency test

Transition from EPEL-test to EPEL-stable is time driven without active intervention the transition happens within 2

weeks

7/10/13

Page 7: WLCG Middleware Validation

7IT-SDC

What can WLCG do?

Fill the gap.... Model: emi-1/2 WN verification

https://twiki.cern.ch/twiki/bin/view/LCG/WorkerNodeTesting

6 contributing sites covering all SE flavours all experiments all standard workflows using a fraction of their resources

7/10/13

Page 8: WLCG Middleware Validation

8IT-SDC

How?

Turn the ad hoc solution into continuous operation Adapt to the future release process

driven by EPEL and WLCG Repositories EPEL-Test + WLCG-Test

Update frequently a small fraction of the resources 10-50 cores/site

One instance of every service (globally) Exercise these resources with experiment workloads

Best: inclusion into the production systems small fraction of a small fraction of tasks will fail

Alternative: Invest in HammerCloud like testing maybe more work and diverge after a while

7/10/13

Page 9: WLCG Middleware Validation

9IT-SDC

Current flow of middleware

7/10/13

EPELStable

EPELTest

STOP!

PT

SOURCE

EPELKoji

(mock)

PT

Binary

EMI WLCG

PT

Binary

UMD

site

PT

Binary

Updates are driven by WLCG Baseline, EGI, UsersSite needs cristina

Page 10: WLCG Middleware Validation

10IT-SDC

Proposed flow of middleware

7/10/13

EPELStable

EPELTest

STOP!

PT

Source

EPELKoji

(mock)

UMD

site

PT

Binary

EGI/WLCGVerification

PT

Source

WLCGStable

WLCGTest

GO!

WLCGKoji

(mock)CERN Agile Build?

discouraged as the standard path( OK for dCache etc.)

Fast Track

Page 11: WLCG Middleware Validation

11IT-SDC

EGI/WLCG Verification

7/10/13

EPELTest

WLCGTest

site A site B site C

Frequent updates, notification on re-config need, only on a fraction of the prod resources

FTSetc.

Part of standard workflows. Problem reports to WLCG Ops Coordination

Additional validation instances of central services (mostly covered by common practice)

Page 12: WLCG Middleware Validation

12IT-SDC

What is needed

Coordination top level: WLCG Ops Coordination and EGI Staged Rollout launch: Taskforce (WLCG+XXXXXX*)

Resources hardware negligible (10-30 cores/site) human effort

0.1 FTE per participating site (not too many updates per month) follow releases, re-config as needed, report issues.....

Sites Candidates: T0/T1s and experienced T2s (about 6 sites needed) need to participate in coordination too (rota on watching for re-config, first deployment

etc.)

Experiments targeting the validation resources monitor the behaviour (might need small changes) report issues

in general already happening, minor adjustments needed 0.1 FTE per experiment

7/10/13* a good name

Page 13: WLCG Middleware Validation

13IT-SDC

Is this additional effort?

Probably not.. We have done this in an ad hoc

fashion harder to coordinate sometimes missing changes complex communications ------

7/10/13

Page 14: WLCG Middleware Validation

14IT-SDC

Timeline

Spring 2014 it has to work Taskforce should start September

first activity: identify suitable sites liaise with experiments

Resource commitments from sites latest by October

Taskforce will then coordinate the setup and development of procedures and follow up on operations

7/10/13