a statistical analysis of job performance on lcg grid
DESCRIPTION
A Statistical Analysis of Job Performance on LCG Grid. David Colling, Olivier van der Aa, Mona Aggarwal, Gidon Moont (Imperial College, London). Introduction. http://gridportal.imperial .ac.uk. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/1.jpg)
A Statistical Analysis of Job Performance on LCG Grid
David Colling, Olivier van der Aa, Mona Aggarwal, Gidon Moont
(Imperial College, London)
![Page 2: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/2.jpg)
Introduction
http://gridportal.imperial .ac.uk
![Page 3: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/3.jpg)
Introduction
We decided to keep the data that we gather and to perform some statistical analysis on it. In this talk I will briefly discuss..
• What it can tell us about the different usage of the system by the different VOs
• What it can tell us about the performance of the individual components and the system as a whole
• We now produce daily reports (available from the website)
• In general I will just describe what we see rather than trying to interpret it. That is the next step.
This is still very much work in progress
![Page 4: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/4.jpg)
Caveats
• This view of the LCG is that of the RBs (well actually the LBs)
• We don’t see any jobs that are submitted by local users
• We don’t see any any grid jobs that are submitted via RBs to which we do not have access (small effect)
• We do not see grid jobs submitted by directly not using an LCG RB. Specifically we do not see jobs submitted by Rod Walker’s CondorG submission system.
• These stats are only for the last quarter.
![Page 5: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/5.jpg)
The system
• The LCG is an operational Grid currently running over 200 sites in 36 countries, offering its users access to nearly 14,000 CPUs and approximately 8PB of storage.
• Defining meaningful metrics and monitoring the performance of such a system is challenging exercise but important for successful operation.
• Primary motivation for this research is to analyze LCG performance through a statistical analysis of the lifecycles of all jobs on the grid.
• In this paper we define metrics that describe typical job lifecycles. The statistical analysis of these metrics enables us to gain insight into the work load management characteristics of the LCG Grid [2]. Finally we will show how those metrics can be used to spot Grid failures by identifying statistical changes over time in the monitored
metrics.
![Page 6: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/6.jpg)
Analysis Dataset
• The dataset is obtained by – the information published by the about 28 Grid Resource Brokers
(RBs) across the EGEE grid.– Job lifecycle obtained through RBs log files. – Dataset are taken from Sept 2005 –Jan 2006– More than 3 million jobs.
• The performance metrics are measured for main four LHC VO’s:– ALICE– ATLAS– LHCB– CMS
• Metrics are defined to measure performance and effectiveness from three perspectives: – User– Resource – Grid
![Page 7: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/7.jpg)
So what can see?
![Page 8: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/8.jpg)
• Number of Active Users in a system at a given time.
![Page 9: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/9.jpg)
• Distribution of Job Run Time(h) for the LHC VO.
![Page 10: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/10.jpg)
• Distribution of Job Run Time(h) weighted by Job Run Time (h). (where the CPU hours are used)
![Page 11: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/11.jpg)
• Distribution of Job Efficiency for each LHC VO (efficiency=Time spent running successfully/total time in
system)
![Page 12: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/12.jpg)
• Job Efficiency versus Job Run Time (h).
![Page 13: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/13.jpg)
RB Load
• Number of Jobs on a given RB.
![Page 14: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/14.jpg)
CE Jobs Distribution
Number of Jobs
![Page 15: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/15.jpg)
CE Hours distribution
Job Hours
![Page 16: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/16.jpg)
Number of Jobs LCG
Efficiency=N Success/N total
![Page 17: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/17.jpg)
Number of Jobs Alice
Efficiency=N Success/N total
![Page 18: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/18.jpg)
Number of Jobs Atlas
Efficiency=N Success/N total
![Page 19: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/19.jpg)
Number of Jobs CMS
Efficiency=N Success/N total
![Page 20: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/20.jpg)
Number of Jobs LHCB
Efficiency=N Success/N total
![Page 21: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/21.jpg)
Grid Load• Number of Job Hours submitted at a given time
![Page 22: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/22.jpg)
Efficiency CMS and Atlas
•Efficiency=Total Succ Hours/Total Hours
![Page 23: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/23.jpg)
Efficiency LHCB & Alice
•Efficiency=Total Succ Hours/Total Hours
![Page 24: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/24.jpg)
Grid Load
• Number of jobs in the system at a given time.
![Page 25: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/25.jpg)
Grid Load
• Number of jobs in the system at a given time.
![Page 26: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/26.jpg)
Grid Load
• Number of jobs in the system at a given time.
![Page 27: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/27.jpg)
RB Load
• Job scheduling (Match Time) versus load (mean number of jobs/sec during the matching)
![Page 28: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/28.jpg)
RB Load
• Job scheduling (Match Time) versus load (mean number of jobs/sec during the matching)
![Page 29: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/29.jpg)
RB Load
• Job scheduling (Match Time) versus load (mean number of jobs/sec during the matching)
• RB.(gdrb04)
![Page 30: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/30.jpg)
Conclusions
• We have started to analyse the distribution of jobs submitted to the LCG
• Distinct usage patterns are beginning to emerge for each VO
• These uasge patterns have different efficiencies
• There are many more plots that I could have shown and there is a lot more work to do to try to understand what we see
![Page 31: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/31.jpg)
References
• [1] GridPP-UK Computing for Particle Physics: http://www.gridpp.ac.uk/• [2] Crosby P, Colling D, Waters D, Efficiency of resource brokering in grids
for high-energy physics computing, IEEE Transactions on Nuclear Science, 2004, vol: 51, Pages: 884 - 891, ISSN: 0018-9499
![Page 32: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/32.jpg)
Backup slides
![Page 33: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/33.jpg)
• Number of Job Hours submitted at a given time
Grid Load
![Page 34: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/34.jpg)
• Number of Job Hours submitted at a given time
Grid Load
![Page 35: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/35.jpg)
Grid Load
• Number of Job hours submitted at a given time
![Page 36: A Statistical Analysis of Job Performance on LCG Grid](https://reader031.vdocuments.us/reader031/viewer/2022032708/56812ca8550346895d915142/html5/thumbnails/36.jpg)
Grid Load
• Number of Job hours submitted at a given time