atlas dc2 seen from prague tier2 center - some remarks
DESCRIPTION
ATLAS DC2 seen from Prague Tier2 center - some remarks. Atlas sw workshop September 2004. Hardware in Prague available for ATLAS. Golias: 32 dual CPU nodes PIII1.13GHz, 1GB RAM upgraded since July: + 49 dual CPU Xeon 3.06 GHz, 2 GB RAM (WN) 3TB disk space reserved for atlas - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/1.jpg)
ATLAS DC2 seen from Prague Tier2 center - some remarks
Atlas sw workshop
September 2004
![Page 2: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/2.jpg)
Hardware in Prague available for ATLAS
• Golias: • 32 dual CPU nodes PIII1.13GHz, 1GB RAM• upgraded since July: + 49 dual CPU Xeon 3.06 GHz, 2 GB
RAM (WN)• 3TB disk space reserved for atlas• PBSPro batch system• lcgatlasprod queue reserved for atlas VO members, high
priority
• Skurut:• 16 dual CPU nodes PIII700MHz, 1GB RAM• OpenPBS batch system• queues: lcgpbs-short, long, infinite, used mainly by atlas
• 2 independent CEs in LCG2
![Page 3: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/3.jpg)
Jobs waiting for input or output replication, sometimes hanging ‘forever’:Example:
Job Id Queue User Node CPUTime WallTime34031.golias lcgatlasprod atlas001 golias30 03:09:28 43:30:3934035.golias lcgatlasprod atlas002 golias03 04:17:38 43:19:1834113.golias lcgatlasprod atlas002 golias10 03:00:41 41:52:1134127.golias lcgatlasprod atlas001 golias11 04:19:11 41:21:4634583.golias lcgatlasprod atlassgm goliasx56 00:00:17 26:01:14...
Not yet cured:
running jobs, 20.9.2004:Job Id Queue User Node CPUTime WallTime55162.golias lcgatlasprod atlassgm goliasx42 00:00:03 102:19:4558528.golias lcgatlasprod atlas001 golias02 11:22:40 11:33:1358529.golias lcgatlasprod atlas001 golias03 00:00:16 11:33:49...
Usually such long jobs are killed either by administrator or by PBS time limit
![Page 4: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/4.jpg)
July 1 – September 21GOLIAS jobs CPU
(days)
Elapsed
(days)
all 4811 1653 1992
long (cpu>100s) 2377 1653 1881
short 2434 .4 111
SKURUT jobs CPU
(days)
Elapsed
(days)
all 1446 1507 1591
long (cpu>100s) 870 1507 1554
short 576 .2 37
number of jobs in DQ: 1349 done 1231 failed = 2580 jobs
number of jobs in DQ: 362 done 572 failed = 934 jobs
![Page 5: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/5.jpg)
Job distribution
• almost always not enough jobs on GOLIAS
ATLAS
• SKURUT usage much better
![Page 6: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/6.jpg)
Memory usage
atlas jobs on GOLIAS, july – september (part) 2004
![Page 7: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/7.jpg)
CPU Time
PIII1.13GHz
Xeon 3.06GHz
hours hours
PIII700MHz
hours
queue limit: 48 hours later changed to 72 hours
![Page 8: ATLAS DC2 seen from Prague Tier2 center - some remarks](https://reader035.vdocuments.us/reader035/viewer/2022072114/5681401e550346895dab764c/html5/thumbnails/8.jpg)
Miscellaneous
• no job name in the local batch system – difficult to identify
• no (?) documentation where to look for log files, which logs are relevant
• lost jobs due to CPU time limit - no warning• lost jobs due to one missconfigured node -
spotted from local logs and by Simone too• some jobs loop forever – where to send this
information?