cern lcg middleware certification and support maarten litmaath cern it/gd gridpp workshop 2-4 june...
TRANSCRIPT
CERN
LCG MiddlewareLCG Middleware Certification and Support Certification and Support
LCG MiddlewareLCG Middleware Certification and Support Certification and Support
Maarten LitmaathCERN IT/GD
GridPP Workshop2-4 June 2004
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
Where is the code?Where is the code?Where is the code?Where is the code?
• CERN central CVS system autobuild (see next page)– EDG/LCG code
• http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?cvsroot=lcgware– LCG configuration:
• http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi
• Everything else is “external”– Simplifies build– Complicates debugging
• Need at least the sources
• All RPMs under /afs/cern.ch/project/gd/RpmDir
• LCG code guidelines adapted from EDG– http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/
index.cgi– Documentation menu
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
The BuildsThe BuildsThe BuildsThe Builds
• EDG autobuild system has been ported to LCG– http://lxshare0297.cern.ch/LCG/autobuild/– Allows nightly build of latest compliant CVS tag per package – Build-on-demand tag triggers immediate build
• Currently only RH 7.3 supported– Porting to RH Enterprise Linux underway in GD group
• WN being tested– Collaboration with CERN OpenLab to port code + build recipes
to IA-64• CE + WN already included in EIS testbed
– Other platforms being considered:• Fedora• RH 9• RH 6.2• Solaris• IRIX• …
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
The CertificationThe CertificationThe CertificationThe Certification
• Resulting middleware must be integrated and then certified on all supported platforms– Also verify interoperability of all platforms
• Complicates certification exponentially– Goal is production quality:
• Stability, robustness, performance, scalability• Easy configuration, operation, maintenance
– A lot of effort has been going into debugging• Get feedback from production system (e.g. rollout mailing list)• Send feedback to developers, but apply in-house patches in the
meantime• See next talk by David Smith
• Current “big” certification testbed shown on next page– Only RH 7.3 for now– Remote sites to be added (again)
• Madison (VDT), Taipei, Budapest, …– Simulates multiple realistic configurations
• Can test multiple platforms at the same time
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
h234RB_a
h243BDII_a
h235CE_a
h236SE_a
h240RB_b
h281BDII_b
h241CE_b
h277CE_2_a
h278SE_2_a
h290CE_3_a
h291SE_3_a
h286CE_4
h287SE_4
h238WN_a1
h244WN_b1
h280WN_2_a1
h300WN_3_a1
h288WN_3_a2
h296WN_4_a1
rlscert02RLS_Oracle
Cluster_1 Cluster_2 Cluster_3 Cluster_4h275UI_1
h285UI_4
h237CE_5Condor
h270WN_5_1
Cluster_5
lxs5243CE_6LSF
h246MyProxy
h271WN_a2
h272WN_2_a2
h248pool
dcache
h279WN_a3
h273WN_2_a3
h274WN_2_a4
h230WN_3_a3
h294WN_4_a2
lxs5238
No home sharing
No home sharing
No home sharing
sharelocal /home
h247SE_2_bdcache
h206WN_5_2
lxs5239
lxs5240
lxs5241
lxs5242
h282SE_cdcache
No home sharingh303SE_dCastor
h229SE_3_bCastor
Certification & Testing Testbed
h289WN_b3
h245WN_b2
h239RB_3
h276UI_3
h284BDII_3
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
The TestsThe TestsThe TestsThe Tests
• Feature testing– Workload Management, Data Management, Information System, …
• Job distribution with and w/o data constraints, resource saturation, proxy renewal
• Data access, replica services
• Different architectures/configurations– Try to simulate the production system to some extent
• Stress tests– Performance should degrade gracefully, no crashes
• Explicit error injection– Study system reaction
• Security– One should not be able to bypass it
• Experiments integration testing done by GD/EIS on their testbed
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
Certification, Testing & Release CycleCertification, Testing & Release CycleCertification, Testing & Release CycleCertification, Testing & Release CycleCERTIFICATION
TESTING DEPLOYMENT
LCGC&T sectionadd featuresfix problems
transmit problems
EGEEfix problemsnew releases
VDTfix problemsnew releases
Integrate
BasicFunctionality
Tests
RunSpecial Tests
RunCertification
Matrix
Releasecandidate
tagged
RE
LE
AS
EP
RE
-DE
PL
OY
ME
NT
GE
NE
RA
L R
EL
EA
SE
EXPERIMENTSINTEGRATION
Experimentssoftware
installation
Testingexperiments
specificfeatures
Certifiedrelease
tag
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
• Errors reflect ongoing development
• Details available through links
• An LCG release candidate must not have any serious errors reported by the test suites
Typical Certification MatrixTypical Certification MatrixTypical Certification MatrixTypical Certification Matrix
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
The TasksThe TasksThe TasksThe Tasks
• Web page to open bugs and tasks:– https://savannah.cern.ch/projects/lcgoperation/
• Main task: stabilize LCG-2– Allow serious work to get done efficiently– Minor remaining inconveniences should be tolerable
• To be addressed by EGEE/ARDA
• Main ingredients– dCache– Porting to RH 7.3 successors– Redo Replica Manager core– Flexible info providers corresponding changes in WP1/WP2
code– Shield CE against overload risk – …
Maarten Litmaath, GridPP meeting, 2004/06/03
CERN
More TasksMore TasksMore TasksMore Tasks
• Try and follow Globus releases (via VDT)• Use the VDT more:
– Helps EU-US interoperability– Try more functionality already provided by VDT
• Condor as default batch system?• PacMan?
– Try and put more into the VDT
• Try R-GMA for monitoring– Combine with GridICE
• Get rid of MDS completely• LCFGng Quattor• …