cern lcg middleware certification and support maarten litmaath cern it/gd gridpp workshop 2-4 june...

10
CERN LCG Middleware LCG Middleware Certification and Support Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Upload: hannah-smith

Post on 28-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

CERN

LCG MiddlewareLCG Middleware Certification and Support Certification and Support

LCG MiddlewareLCG Middleware Certification and Support Certification and Support

Maarten LitmaathCERN IT/GD

GridPP Workshop2-4 June 2004

Page 2: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

Where is the code?Where is the code?Where is the code?Where is the code?

• CERN central CVS system autobuild (see next page)– EDG/LCG code

• http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?cvsroot=lcgware– LCG configuration:

• http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi

• Everything else is “external”– Simplifies build– Complicates debugging

• Need at least the sources

• All RPMs under /afs/cern.ch/project/gd/RpmDir

• LCG code guidelines adapted from EDG– http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/

index.cgi– Documentation menu

Page 3: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

The BuildsThe BuildsThe BuildsThe Builds

• EDG autobuild system has been ported to LCG– http://lxshare0297.cern.ch/LCG/autobuild/– Allows nightly build of latest compliant CVS tag per package – Build-on-demand tag triggers immediate build

• Currently only RH 7.3 supported– Porting to RH Enterprise Linux underway in GD group

• WN being tested– Collaboration with CERN OpenLab to port code + build recipes

to IA-64• CE + WN already included in EIS testbed

– Other platforms being considered:• Fedora• RH 9• RH 6.2• Solaris• IRIX• …

Page 4: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

The CertificationThe CertificationThe CertificationThe Certification

• Resulting middleware must be integrated and then certified on all supported platforms– Also verify interoperability of all platforms

• Complicates certification exponentially– Goal is production quality:

• Stability, robustness, performance, scalability• Easy configuration, operation, maintenance

– A lot of effort has been going into debugging• Get feedback from production system (e.g. rollout mailing list)• Send feedback to developers, but apply in-house patches in the

meantime• See next talk by David Smith

• Current “big” certification testbed shown on next page– Only RH 7.3 for now– Remote sites to be added (again)

• Madison (VDT), Taipei, Budapest, …– Simulates multiple realistic configurations

• Can test multiple platforms at the same time

Page 5: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

h234RB_a

h243BDII_a

h235CE_a

h236SE_a

h240RB_b

h281BDII_b

h241CE_b

h277CE_2_a

h278SE_2_a

h290CE_3_a

h291SE_3_a

h286CE_4

h287SE_4

h238WN_a1

h244WN_b1

h280WN_2_a1

h300WN_3_a1

h288WN_3_a2

h296WN_4_a1

rlscert02RLS_Oracle

Cluster_1 Cluster_2 Cluster_3 Cluster_4h275UI_1

h285UI_4

h237CE_5Condor

h270WN_5_1

Cluster_5

lxs5243CE_6LSF

h246MyProxy

h271WN_a2

h272WN_2_a2

h248pool

dcache

h279WN_a3

h273WN_2_a3

h274WN_2_a4

h230WN_3_a3

h294WN_4_a2

lxs5238

No home sharing

No home sharing

No home sharing

sharelocal /home

h247SE_2_bdcache

h206WN_5_2

lxs5239

lxs5240

lxs5241

lxs5242

h282SE_cdcache

No home sharingh303SE_dCastor

h229SE_3_bCastor

Certification & Testing Testbed

h289WN_b3

h245WN_b2

h239RB_3

h276UI_3

h284BDII_3

Page 6: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

The TestsThe TestsThe TestsThe Tests

• Feature testing– Workload Management, Data Management, Information System, …

• Job distribution with and w/o data constraints, resource saturation, proxy renewal

• Data access, replica services

• Different architectures/configurations– Try to simulate the production system to some extent

• Stress tests– Performance should degrade gracefully, no crashes

• Explicit error injection– Study system reaction

• Security– One should not be able to bypass it

• Experiments integration testing done by GD/EIS on their testbed

Page 7: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

Certification, Testing & Release CycleCertification, Testing & Release CycleCertification, Testing & Release CycleCertification, Testing & Release CycleCERTIFICATION

TESTING DEPLOYMENT

LCGC&T sectionadd featuresfix problems

transmit problems

EGEEfix problemsnew releases

VDTfix problemsnew releases

Integrate

BasicFunctionality

Tests

RunSpecial Tests

RunCertification

Matrix

Releasecandidate

tagged

RE

LE

AS

EP

RE

-DE

PL

OY

ME

NT

GE

NE

RA

L R

EL

EA

SE

EXPERIMENTSINTEGRATION

Experimentssoftware

installation

Testingexperiments

specificfeatures

Certifiedrelease

tag

Page 8: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

• Errors reflect ongoing development

• Details available through links

• An LCG release candidate must not have any serious errors reported by the test suites

Typical Certification MatrixTypical Certification MatrixTypical Certification MatrixTypical Certification Matrix

Page 9: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

The TasksThe TasksThe TasksThe Tasks

• Web page to open bugs and tasks:– https://savannah.cern.ch/projects/lcgoperation/

• Main task: stabilize LCG-2– Allow serious work to get done efficiently– Minor remaining inconveniences should be tolerable

• To be addressed by EGEE/ARDA

• Main ingredients– dCache– Porting to RH 7.3 successors– Redo Replica Manager core– Flexible info providers corresponding changes in WP1/WP2

code– Shield CE against overload risk – …

Page 10: CERN LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004

Maarten Litmaath, GridPP meeting, 2004/06/03

CERN

More TasksMore TasksMore TasksMore Tasks

• Try and follow Globus releases (via VDT)• Use the VDT more:

– Helps EU-US interoperability– Try more functionality already provided by VDT

• Condor as default batch system?• PacMan?

– Try and put more into the VDT

• Try R-GMA for monitoring– Combine with GridICE

• Get rid of MDS completely• LCFGng Quattor• …