future of batch processing at cern · hepix spring 2015 future of batch processing at cern 2....
TRANSCRIPT
![Page 1: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/1.jpg)
![Page 2: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/2.jpg)
Future of Batch Processing at CERNJerome Belleman, Ulrich Schwickerath, Iain Steers – IT-PES-PS
HEPiX Spring 2015 Future of Batch Processing at CERN 2
![Page 3: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/3.jpg)
Outline
Context
For Now: Pilot Service
Next Up: Local Jobs
HEPiX Spring 2015 Future of Batch Processing at CERN 3
![Page 4: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/4.jpg)
Context
HEPiX Spring 2015 Future of Batch Processing at CERN 4
![Page 5: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/5.jpg)
Goals and Concerns
Goals Concerns with LSF
30 000 to 50 000 nodes 6 500 nodes max
Cluster dynamism Adding/Removingnodes requiresreconfiguration
10 to 100 Hz dispatchrate
Transient dispatchproblems
100 Hz query scaling Slow query/submissionresponse times
HEPiX Spring 2015 Future of Batch Processing at CERN 5
![Page 6: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/6.jpg)
Evaluating Alternatives to LSF
After HEPiX Fall 2013 – Ann Arbor:
� LSF 8/9 claims to only marginally scale higher
� SLURM showed scalability problems too
� Son of Grid Engine only briefly reviewed, as. . .
� . . . HTCondor looked promising
HEPiX Spring 2015 Future of Batch Processing at CERN 6
![Page 7: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/7.jpg)
Settling on Condor
After HEPiX Spring 2014 – Annecy:
� Condor scaled encouragingly
� Focus on functions (grid, fairshare, auth, AFS)
� Pleasant experience
HEPiX Spring 2015 Future of Batch Processing at CERN 7
![Page 8: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/8.jpg)
Pilot Service
After HEPiX Fall 2014 – Lincoln:
� Grid submissions only
� Setting up a CREAM CE
� Reviewing security
→ Consolidating pilot service
HEPiX Spring 2015 Future of Batch Processing at CERN 8
![Page 9: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/9.jpg)
For Now: Pilot Service
HEPiX Spring 2015 Future of Batch Processing at CERN 9
![Page 10: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/10.jpg)
Setting Up an ARC CE (I)
� CREAM heavy, opaque
� Heard good things about ARC
� Simple config, single file
HEPiX Spring 2015 Future of Batch Processing at CERN 10
![Page 11: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/11.jpg)
Setting Up an ARC CE (II)
Now we have:
� Condor setup accepting and running jobs
� User-to-VO/role mapping
� Static/dynamic information published to BDII
� HEPSPEC06 normalisation
� Puppetised configuration
HEPiX Spring 2015 Future of Batch Processing at CERN 11
![Page 12: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/12.jpg)
Setting Up an ARC CE (III)
TODO:
� GLUE validation fails with ARC, waiting for fix
� Scale job wall time by worker node attributes
� Single queue to accept jobs
� Security review
And then evaluate HTCondor-CE?
HEPiX Spring 2015 Future of Batch Processing at CERN 12
![Page 13: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/13.jpg)
On the Condor Front (I)
� Fairshare groups and quotas
� Accounting group injected into job submission
TODO:
� Accounting (How do you store it?)
HEPiX Spring 2015 Future of Batch Processing at CERN 13
![Page 14: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/14.jpg)
Monitoring
� Still our Ganglia instance, but also. . .
� . . . central manager, schedds, workers in Kibana
HEPiX Spring 2015 Future of Batch Processing at CERN 14
![Page 15: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/15.jpg)
Next Up: Local Jobs
HEPiX Spring 2015 Future of Batch Processing at CERN 15
![Page 16: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/16.jpg)
AFS Token Management
� There is Kerberos ticket passing
� Forging valid AFS tokens from expired ones
� Risk of credential theft
� Independence from AFS
HEPiX Spring 2015 Future of Batch Processing at CERN 16
![Page 17: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/17.jpg)
Job Submissions and Queries
Query job no matter where it’s submitted from
� A schedd to answer all queries?
� Protection against heavy query loads
Or
� <username>.condor.cern.ch aliases
� Job IDs hashed to schedds
HEPiX Spring 2015 Future of Batch Processing at CERN 17
![Page 18: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/18.jpg)
Group Membership Enforcement
� Submit on behalf of the group you belong to
� Post-submission checks?
� There might be plans upstream
HEPiX Spring 2015 Future of Batch Processing at CERN 18
![Page 19: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/19.jpg)
Replacement for LSF Queues
� Interface between users and resources
� Opportunity to review what users should see
� ClassAds
HEPiX Spring 2015 Future of Batch Processing at CERN 19
![Page 20: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/20.jpg)
Worker Node Admission
� Explicit machine list for now. . .
� . . . with the aim of becoming more dynamic
� Adding nodes easy, removing them not so much
HEPiX Spring 2015 Future of Batch Processing at CERN 20
![Page 21: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/21.jpg)
High Availability/Scalability
� How many schedulers?
� Multiple pools?
� Hierarchical collectors?
HEPiX Spring 2015 Future of Batch Processing at CERN 21
![Page 22: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/22.jpg)
Conclusion
HEPiX Spring 2015 Future of Batch Processing at CERN 22
![Page 23: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/23.jpg)
Collaboration
� European HTCondor Site Admins Meeting 2014
� Enthusiastic chats with lead developers
� HTCondor Week
� Help from RAL
� Sharing with PIC too
HEPiX Spring 2015 Future of Batch Processing at CERN 23
![Page 24: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/24.jpg)
Outlook
� Became interested in other CE solutions
� So did experiments
� Some progress in implementing fairshare
� Grid submissions for early adopters
HEPiX Spring 2015 Future of Batch Processing at CERN 24
![Page 25: Future of Batch Processing at CERN · HEPiX Spring 2015 Future of Batch Processing at CERN 2. Outline Context For Now: Pilot Service Next Up: Local Jobs HEPiX Spring 2015 Future of](https://reader033.vdocuments.us/reader033/viewer/2022043008/5f98dd8af4f89260670d4081/html5/thumbnails/25.jpg)