![Page 2: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/2.jpg)
chtc.cs.wisc.edu
(CPU days each day averaged over one month)
CHTC Cores In Use
1,500
![Page 3: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/3.jpg)
chtc.cs.wisc.edu
(CPU days each day averaged over one month)
OSG Cores In Use
60,000
![Page 4: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/4.jpg)
chtc.cs.wisc.edu
Open Science Grid
![Page 5: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/5.jpg)
chtc.cs.wisc.edu
CHTC and OSG usage
(CPU days each day)
![Page 6: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/6.jpg)
chtc.cs.wisc.edu
Challenges Solved
We worry about all of this.
You don’t have to.
›Authentication X.509 certificates, certificate authorities, VOMS
›Interface Globus, GridFTP, Grid universe
›Validation Linux distribution, glibc version, basic libraries
![Page 7: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/7.jpg)
chtc.cs.wisc.edu
Using OSG
› Before
universe = vanilla
executable = myjob
log = myjob.log
queue
![Page 8: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/8.jpg)
chtc.cs.wisc.edu
Using OSG
› After
universe = vanilla
executable = myjob
log = myjob.log
+WantGlidein = true
queue
![Page 9: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/9.jpg)
chtc.cs.wisc.edu
Challenge: Opportunistic
› OSG computers go away without notice
› Solutions Condor restarts automatically Sub-hour jobs Self-checkpointing Automated checkpointing
• Condor’s standard universe
• DMTCPhttp://dmtcp.sourceforge.net/
![Page 10: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/10.jpg)
chtc.cs.wisc.edu
Challenge: Local Software
![Page 11: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/11.jpg)
chtc.cs.wisc.edu
Challenge: Local Software
› Bare-bones Linux systems
› Solution Bring everything with you CHTC provided MATLAB and R packages
• RunDagEnv/mkdag
![Page 12: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/12.jpg)
chtc.cs.wisc.edu
Challenge: Erratic Failures
› Complex systems fail sometimes
› Solution Expect failures and automatically
retry DAGMan for retries DAGMan POST scripts to detect
problems• RunDagEnv/mkdag
![Page 13: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc](https://reader035.vdocuments.us/reader035/viewer/2022062410/56815294550346895dc0ba66/html5/thumbnails/13.jpg)
chtc.cs.wisc.edu
Challenge: Bandwidth
› Solutions Only send what you need Store large, shared files in our web
cache Read small amounts of data on the fly
• Condor’s standard universe• Parrot
http://www.cse.nd.edu/~ccl/software/parrot/