Download - The Open Science Grid
![Page 1: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/1.jpg)
The Open Science Grid
Miron LivnyOSG Facility Coordinator
University of Wisconsin-Madison
![Page 2: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/2.jpg)
2
Some History
and
background …
![Page 3: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/3.jpg)
04/19/23 3
U.S. “Trillium” Grid Partnership Trillium = PPDG + GriPhyN + iVDGL
Particle Physics Data Grid: $18M (DOE) (1999 – 2006)GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006)
Basic composition (~150 people)PPDG: 4 universities, 6 labsGriPhyN: 12 universities, SDSC, 3 labs iVDGL: 18 universities, SDSC, 4 labs, foreign partnersExpts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO,
SDSS/NVO
Complementarity of projectsGriPhyN: CS research, Virtual Data Toolkit (VDT)
developmentPPDG: “End to end” Grid services, monitoring, analysis iVDGL: Grid laboratory deployment using VDTExperiments provide frontier challengesUnified entity when collaborating internationally
![Page 4: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/4.jpg)
04/19/23 4
From Grid3 to OSG
11/0311/03
2/052/05
4/054/05
12/0512/059/05
9/05
2/062/06
4/064/06
7/067/06
OSG 0.2.1OSG 0.2.1
OSG 0.4.0OSG 0.4.0
OSG 0.4.1OSG 0.4.1
OSG 0.6.0OSG 0.6.0
![Page 5: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/5.jpg)
5
What is OSG?
The Open Science Grid is a US national distributed computing facility that supports scientific computing via an open collaboration of science researchers, software developers and computing, storage and network providers. The OSG Consortium is building and operating the OSG, bringing resources and researchers from universities and national laboratories together and cooperating with other national and international infrastructures to give scientists from a broad range of disciplines access to shared resources worldwide.
![Page 6: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/6.jpg)
6
The OSG Project
Co-funded by DOE and NSF at an annual rate of ~$6M for 5 years starting FY-07
Currently main stakeholders are from physics - US LHC experiments, LIGO, STAR experiment, the Tevatron Run II and Astrophysics experiments
A mix of DOE-Lab and campus resources
Active “engagement” effort to add new domains and resource providers to the OSG consortium
![Page 7: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/7.jpg)
7
OSG Consortium
![Page 8: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/8.jpg)
OSG Project Execution
Executive DirectorRuth Pordes
Resources ManagersPaul Avery, Albert Lazzarini
Applications CoordinatorsTorre Wenaus, Frank Würthwein
Facility CoordinatorMiron Livny
Education, Training, Outreach Coordinator: Mike Wilde
OSG PIMiron Livny
External Projects
Engagement CoordinatorAlan Blatecky
Operations CoordinatorLeigh Grundhoefer
Software CoordinatorAlain Roy
Security OfficerDon Petravick
OSG Executive Board
Deputy Executive DirectorRob Gardner, Doug Olson
√ Role includesProvision of middleware
√
√
√
√
√
√
![Page 9: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/9.jpg)
9
OSG Principles
Characteristics - Provide guaranteed and opportunistic access to shared
resources. Operate a heterogeneous environment both in services available at any site and for
any VO, and multiple implementations behind common interfaces.
Interface to Campus and Regional Grids. Federate with other national/international Grids. Support multiple software releases at any one time.
Drivers - Delivery to the schedule, capacity and capability of LHC and LIGO:
Contributions to/from and collaboration with the US ATLAS, US CMS, LIGO software and computing programs.
Support for/collaboration with other physics/non-physics communities. Partnerships with other Grids - especially EGEE and TeraGrid. Evolution by deployment of externally developed new services and technologies:.
![Page 10: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/10.jpg)
10
Grid of Grids - from Local to Global
Community Campus
National
![Page 11: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/11.jpg)
11
Who are you?
A resource can be accessed by a user via the campus, community or national grid.
A user can access a resource with a campus, community or national grid identity.
![Page 12: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/12.jpg)
12
OSG sites
![Page 13: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/13.jpg)
13
running (and monitored) “OSG jobs” in 06/06.
![Page 14: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/14.jpg)
14
Example GADU run in 04/06
![Page 15: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/15.jpg)
15
CMS Experiment - an exemplar community grid
GermanyTaiwan UKItaly
Data & jobs moving locally, regionally & globally within CMS grid.
Transparently across grid boundaries from campus to global.
Florida
USA
CERN
Caltech
Wisconsin
UCSD
France
Purdue
MIT
UNL
OSG
EGEE
![Page 16: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/16.jpg)
16
The CMS Grid of Grids
Job submission:
16,000 jobs per day submitted across EGEE & OSG via INFN Resource Broker (RB).
Data Transfer:Peak IO of 5Gbps from FNAL to 32 EGEE and 7 OSG sites.
All 7 OSG sites have reached 5TB/day goal.
3 OSG sites (Caltech, Florida, UCSD) exceeded 10TB/day.
![Page 17: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/17.jpg)
17
CMS Xfer on OSG
All sites haveexceeded 5TBper day in June.
![Page 18: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/18.jpg)
18
The US CMS center at
FNAL transfers data to
39 sites worldwide in
CMS global Xfer challenge
Peak Xfer rates of ~5Gbps
are reached.
CMS Xfer FNAL to World
![Page 19: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/19.jpg)
19
EGEE–OSG inter-operability
Agree on a common Virtual Organization Management System (VOMS)
Active Joint Security groups: leading to common policies and procedures.
Condor-G interfaces to multiple remote job execution services (GRAM, Condor-C).
File Transfers using GridFTP. SRM V1.1 for managed storage access. SRM V2.1 in
test. Publish OSG BDII to shared BDII for Resource
Brokers to route jobs across the two grids. Automate ticket routing between GOCs.
![Page 20: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/20.jpg)
20
OSG Middleware Layering
NSF Middleware Initiative (NMI): Condor, Globus, Myproxy
Virtual Data Toolkit (VDT) Common Services NMI + VOMS, CEMon (common EGEE
components), MonaLisa, Clarens, AuthZ
OSG Release Cache: VDT + Configuration, Validation, VO management
CMSServices & Framewor
k
LIGOData Grid
CDF, D0SamGrid &Framework
Infr
ast
ruct
ure
Applic
ati
ons …ATLAS
Services &Framewor
k
![Page 21: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/21.jpg)
21
OSG Middleware Pipeline
Domain science requirements.
OSG stakeholders and middleware developer (joint) projects.
Integrate into VDT Release. Deploy on OSG integration grid
Provision in OSG release & deploy to OSG production.
Condor, Globus,EGEE etc
Test on “VO specific grid”
Test InteroperabilityWith EGEE and TeraGrid
![Page 22: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/22.jpg)
The Virtual Data Toolkit
Alain RoyOSG Software Coordinator
Condor TeamUniversity of Wisconsin-Madison
![Page 23: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/23.jpg)
23
What is the VDT?
A collection of software Grid software (Condor, Globus and lots more) Virtual Data System (Origin of the name “VDT”) Utilities
An easy installation Goal: Push a button, everything just works Two methods:
Pacman: installs and configures it all RPM: installs some of the software, no configuration
A support infrastructure
![Page 24: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/24.jpg)
24
How much software?
![Page 25: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/25.jpg)
25
Who makes the VDT?
The VDT is a product of Open Science Grid (OSG) VDT is used on all OSG grid sites
OSG is new, but VDT has been around since 2002
Originally, VDT was a product of the GriPhyN/iVDGL VDT was used on all Grid2003 sites
![Page 26: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/26.jpg)
26
Who makes the VDT?
Miron Livny
Alain Roy
Tim Cartwright
Andy Pavlo
1 Mastermind
+
3 FTEs
![Page 27: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/27.jpg)
27
Who uses the VDT?
Open Science Grid
LIGO Data Grid
LCG LHC Computing Grid, from CERN
EGEE Enabling Grids for E-Science
![Page 28: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/28.jpg)
28
Why should you care?
The VDT gives insight into technical challenges in building a large grid What software do you need? How do you build it? How do you test it? How do you deploy it? How do you support it?
![Page 29: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/29.jpg)
29
What software is in the VDT?
Security VOMS (VO membership) GUMS (local authorization) mkgridmap (local authorization) MyProxy (proxy management) GSI SSH CA CRL updater
Monitoring MonaLISA gLite CEMon
Accounting OSG Gratia
Job Management Condor (including Condor-G &
Condor-C) Globus GRAM
Data Management GridFTP (data transfer) RLS (replication location) DRM (storage management) Globus RFT
Information Services Globus MDS GLUE schema & providers
Note: The type, quantity, and variety of software is more important to my talk today than the specific software I’m naming
![Page 30: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/30.jpg)
30
Client tools Virtual Data System SRM clients (V1 and V2) UberFTP (GridFTP client)
Developer Tools PyGlobus PyGridWare
Testing NMI Build & Test VDT Tests
What software is in the VDT?
Support Apache Tomcat MySQL (with MyODBC) Non-standard Perl modules Wget Squid Logrotate Configuration Scripts
And More!
![Page 31: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/31.jpg)
31
Building the VDT
We distribute binaries Expecting everyone to build from source is impractical Essential to be able to build on many platforms, and replicate
builds
We build all binaries with NMI Build and Test infrastructure
![Page 32: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/32.jpg)
32
Building the VDT
Sources(CVS)
Patching
NMIBuild & TestCondor pool
(70+ computers)
…
Build
Test
Package
VDT
Contributors
Build
RPM downloads
Pacman CacheBinaries
Binaries
Test
Users
![Page 33: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/33.jpg)
33
Testing the VDT
Every night, we test: Full VDT install Subsets of VDT Current release: You might be surprised how often things break
after release! Upcoming release On all supported platforms
Supported means “we test it every night” VDT works on some unsupported platforms
We care about interactions between the software
![Page 34: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/34.jpg)
34
Fedora Core 3 Fedora Core 4 Fedora Core 4/x86-
64 ROCKS 3.3 SuSE 9/ia64
Supported Platforms
RedHat 7 RedHat 9 Debian 3.1 RHAS 3 RHAS 3/ia64 RHAS 3/x86-64 RHAS 4 Scientific Linux 3
The number of Linux distributions grows constantly, and they have important differences People ask for new platforms, but rarely ask to drop platforms System administration for heterogeneous systems is a lot of work
![Page 35: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/35.jpg)
35
Results on webResults via email
A daily reminder!
Tests
![Page 36: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/36.jpg)
36
Deploying the VDT We want to support root and non-root
installations
We want to assist with configuration
We want it to be simple
Our solution: Pacman Developed by Saul Youssef, BU Downloads and installs with one command Asks questions during install (optionally) Does not require root Can install multiple versions at same time
![Page 37: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/37.jpg)
37
Challenges we struggle with
How should we smoothly update a production service? In-place vs. on-the-side Preserve old configuration while making big changes. As easy as we try to make it, it still takes hours to fully install
and set up from scratch
How do we support more platforms? It’s a struggle to keep up with the onslaught of Linux
distributions Mac OS X? Solaris?
![Page 38: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/38.jpg)
38
More challenges
Improving testing We care about interactions between the software: “When using
a VOMS proxy with Condor-G, can we run a GT4 job with GridFTP transfer, keeping the proxy in MyProxy, while using PBS as the backend batch system…”
Some people want native packaging formats RPM Deb
What software should we have? New storage management software
![Page 39: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/39.jpg)
39
One more challenge
Hiring We need high quality software developers Creating the VDT involves all aspects of software development But: Developers prefer writing new code instead of
Writing lots of little bits of code Thorough testing Lots of debugging User support
![Page 40: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/40.jpg)
40
Where do you learn more?
http://vdt.cs.wisc.edu
Support: Alain Roy: [email protected] Miron Livny: [email protected] Official Support: [email protected]
![Page 41: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/41.jpg)
41
Security Infrastructure
Identity: X509 Certificates OSG is a founding member of the US TAGPMA. DOEGrids provides script utilities for bulk requests of
Host certs, CRL checking etc. VDT downloads CA information from IGTF.
Authentication and Authorization using VOMS extended attribute certficates. DN-> Account mapping done at Site (multiple CEs, SEs)
by GUMS. Standard authorization callouts to Prima(CE) and
gPlazma(SE).
![Page 42: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/42.jpg)
42
Security Infrastructure
Security Process modeled on NIST procedural controls starting from an inventory of the OSG assets: Management - Risk assessment, planning,
Service auditing and checking Operational - Incident response, Awareness
and Training, Configuration management, Technical - Authentication and Revocation,
Auditing and analysis. End to end trust in quality of code executed on remote CPU -signatures?
![Page 43: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/43.jpg)
43
User and VO Management VO Registers with with Operations Center
Provides URL for VOMS service to be propagated to the sites. Several VOMS are shared with EGEE as part of WLCG.
User registers through VOMRS or VO administrator User added to VOMS of one or more VOs. VO responsible for users to sign AUP. VO responsible for VOMS service support.
Site Registers with the Operations Center Signs the Service Agreement. Decides which VOs to support (striving for default admit) Populates GUMS from VOMSes of all VOs. Chooses account UID policy for
each VO & role.
VOs and Sites provide Support Center Contact and joint Operations.
For WLCG: US ATLAS and US CMS Tier-1s directly registered to WLCG. Other support centers propagated through OSG GOC to WLCG.
![Page 44: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/44.jpg)
44
Operations and User Support
Virtual Organization (VO) Group of one or more researchers
Resource provider (RP) Operates Compute Elements and Storage Elements
Support Center (SC) SC provides support for one or more VO and/or RP
VO support centers Provide end user support including triage of user-related trouble tickets
Community Support Volunteer effort to provide SC for RP for VOs without their own SC, and
general help discussion mail list
![Page 45: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/45.jpg)
45
Operations Model
Real support organizations
often play multiple roles
Lines represent communication paths and, in our model, agreements.
We have not progressed very far with agreements yet.
Gray shading indicates that OSG Operations composed of effort from all the support centers
![Page 46: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/46.jpg)
46
OSG Release Process
ApplicationsIntegrationProvisionDeploy
Integration Testbed (15-20) Production (50+) sites
Sao PaoloSao Paolo Taiwan, S.KoreaTaiwan, S.Korea
ITBITB OSGOSG
![Page 47: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/47.jpg)
47
Integration Testbed
As reported in GridCat status catalog
siteITB release service facility
status
Ops map
Tier 2 sites
![Page 48: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/48.jpg)
48
Release Schedule
06/06 03/0712/06 09/07 12/07 3/08 6/08 9/0803/06 09/06 06/0701/06
Funct
ionalit
y
OSG 0.4.1
OSG 0.4.0
OSG 0.6.0
OSG 0.8.0OSG 1.0.0!
Incremental Updates (minor release)
Incremental Updates
Incremental Updates
CMS CSA06SC4 Advanced LIGO
WLCG ServiceCommissioned
ATLAS Cosmic Ray Run
![Page 49: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/49.jpg)
49
OSG Release Timeline
2/052/05
4/054/05
12/0512/059/05
9/05
2/062/06
4/064/06
7/067/06
ITB 0.1.2ITB 0.1.2
ITB 0.1.6ITB 0.1.6
ITB 0.3.0ITB 0.3.0
ITB 0.3.4ITB 0.3.4
ITB 0.3.7ITB 0.3.7
OSG 0.2.1OSG 0.2.1
OSG 0.4.0OSG 0.4.0
OSG 0.4.1OSG 0.4.1
OSG 0.6.0OSG 0.6.0
ITB 0.5.0ITB 0.5.0 IntegrationIntegration
ProductionProduction
![Page 50: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/50.jpg)
50
Deployment and Maintenance Distribute s/w through VDT and OSG
caches.
Progress technically via VDT weekly office hours - problems, help, planning - fed from multiple sources (Ops, Int, VDT-Support, mail, phone).
Publish plans and problems through VDT “To do list”, Int-Twiki and ticket systems.
Critical updates and patches follow Standard Operating Procedures.
![Page 51: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/51.jpg)
51
Release Functionality
OSG 0.6 Fall 2006 Accounting;
Squid (Web caching in support of s/w distribution + database information);
SRM V2+AuthZ;
CEMON-ClassAd based Resource Selection.
Support for MDS-4.
OSG 0.8 Spring 2007 VM based Edge Services;
Just in time job scheduling, Pull-Mode Condor-C,
Support for sites to run Pilot jobs and/or Glide-ins using gLexec for identity changes.
OSG1.0 End of 2007
![Page 52: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/52.jpg)
52
Inter-operability with Campus grids
FermiGrid is an interesting example for the challenges we face when making the resources of a campus (in this case a DOE Laboratory) grid accessible to the OSG community
![Page 53: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/53.jpg)
53
OSG Principles
Characteristics - Provide guaranteed and opportunistic access to shared resources. Operate a heterogeneous environment both in services available at any site and
for any VO, and multiple implementations behind common interfaces. Interface to Campus and Regional Grids. Federate with other national/international Grids. Support multiple software releases at any one time.
Drivers - Delivery to the schedule, capacity and capability of LHC and LIGO:
Contributions to/from and collaboration with the US ATLAS, US CMS, LIGO software and computing programs.
Support for/collaboration with other physics/non-physics communities. Partnerships with other Grids - especially EGEE and TeraGrid. Evolution by deployment of externally developed new services and
technologies:.
![Page 54: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/54.jpg)
54
OSG Middleware Layering
NSF Middleware Initiative (NMI): Condor, Globus, Myproxy
Virtual Data Toolkit (VDT) Common Services NMI + VOMS, CEMon (common EGEE
components), MonaLisa, Clarens, AuthZ
OSG Release Cache: VDT + Configuration, Validation, VO management
CMSServices & Framewor
k
LIGOData Grid
CDF, D0SamGrid &Framework
Infr
ast
ruct
ure
Applic
ati
ons …ATLAS
Services &Framewor
k
![Page 55: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/55.jpg)
55
Summary
OSG facility opened July 22nd 2005.
OSG facility is under steady use— ~2-3000 jobs at all times— HEP but large Bio/Eng/Med occasionally— Moderate other physics - Astro/Nuclear - LIGO expected to ramp up.
OSG project— 5 year Proposal to DOE & NSF funded starting 9/06
— Facility & Improve/Expand/Extend/Interoperate & E&O
Off to a running start … but lot’s more to do.— Routinely exceeding 1Gbps at 3 sites
— Scale by x4 by 2008 and many more sites— Routinely exceeding 1000 running jobs per client
— Scale by at least x10 by 2008— Have reached 99% success rate for 10,000 jobs per day submission
— Need to reach this routinely, even under heavy load
![Page 56: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/56.jpg)
56
EGEE–OSG inter-operability
Agree on a common Virtual Organization Management System (VOMS)
Active Joint Security groups: leading to common policies and procedures.
Condor-G interfaces to multiple remote job execution services (GRAM, Condor-C).
File Transfers using GridFTP. SRM V1.1 for managed storage access. SRM V2.1 in
test. Publish OSG BDII to shared BDII for Resource
Brokers to route jobs across the two grids. Automate ticket routing between GOCs.
![Page 57: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/57.jpg)
57
What is FermiGrid?
Integrates resources across most (soon all) owners at Fermilab.
Supports jobs from Fermilab organizations to run on any/all accessible campus FermiGrid and national Open Science Grid resources.
Supports jobs from OSG to be scheduled onto any/all Fermilab sites,.
Unified and reliable common interface and services for FermiGrid gateway - including security, job scheduling, user management, and storage.
More information is available at http://fermigrid.fnal.gov
![Page 58: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/58.jpg)
58
Job Forwarding and Resource Sharing
Gateway currently interfaces 5 Condor pools with diverse file systems and >1000 Job Slots. Plans to grow to 11 clusters (8 Condor, 2 PBS and 1 LSF)
Job scheduling policies and in place agreements for sharing allow fast response to changes in resource needs by Fermilab and OSG users.
Gateway provides single bridge between OSG wide area distributed infrastructure and FermiGrid local sites. Consists of a Glbus gate-keeper and a Condor-G
Each cluster has its own Globus gate-keeper
Storage and Job execution policies applied through Site-wide managed security and authorization services.
![Page 59: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/59.jpg)
59
Access to FermiGridOSG
GeneralUsers
FermilabUsers
CDF Condor pool
FermiGridGateway
OSG “agreed”
Users
GT-GK
GT-GKDZero
Condor pool
GT-GKShared
Condor pool
GT-GKCMS
Condor pool
GT-GK
Condor-G
![Page 60: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/60.jpg)
04/19/23 60
GLOW: UW Enterprise Grid
• Condor pools at various departments integrated into a campus wide grid
–Grid Laboratory of Wisconsin• Older private Condor pools at other departments
– ~1000 ~1GHz Intel CPUs at CS– ~100 ~2GHz Intel CPUs at Physics– …
• Condor jobs flock from on-campus and of-campus to GLOW
• Excellent utilization– Especially when the Condor Standard Universe is used
• Premption, Checkpointing, Job Migration
![Page 61: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/61.jpg)
04/19/23 61
Grid Laboratory of Wisconsin
• Computational Genomics, Chemistry• Amanda, Ice-cube, Physics/Space Science• High Energy Physics/CMS, Physics• Materials by Design, Chemical Engineering• Radiation Therapy, Medical Physics• Computer Science
2003 Initiative funded by NSF/UWSix GLOW Sites
GLOW phases-1,2 + non-GLOW funded nodes have ~1000 Xeons + 100 TB disk
![Page 62: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/62.jpg)
04/19/23 62
How does it work?
• Each of the six sites manages a local Condor pool with its own collector and matchmaker
• Through the High Availability Demon (HAD) service offered by Condor, one of these matchmaker is elected to manage all GLOW resources
![Page 63: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/63.jpg)
04/19/23 63
GLOW Deployment• GLOW is fully Commissioned and is in constant use
– CPU• 66 GLOW + 50 ATLAS + 108 other nodes @ CS• 74 GLOW + 66 CMS nodes @ Physics• 93 GLOW nodes @ ChemE• 66 GLOW nodes @ LMCG, MedPhys, Physics• 95 GLOW nodes @ MedPhys• 60 GLOW nodes @ IceCube• Total CPU: ~1339
– Storage• Head nodes @ at all sites• 45 TB each @ CS and Physics• Total storage: ~ 100 TB
• GLOW Resources are used at 100% level– Key is to have multiple user groups
• GLOW continues to grow
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 64: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/64.jpg)
04/19/23 64
GLOW Usage• GLOW Nodes are always running hot!
– CS + Guests• Largest user• Serving guests - many cycles delivered to guests!
– ChemE• Largest community
– HEP/CMS• Production for collaboration• Production and analysis of local physicists
– LMCG• Standard Universe
– Medical Physics• MPI jobs
– IceCube• Simulations
![Page 65: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/65.jpg)
04/19/23 65
GLOW Usage 3/04 – 9/05
Over 7.6 million CPU-Hours (865 CPU-Years) served!
Takes advantage of “shadow” jobs
Take advantage of check-pointing jobs
Leftover cycles available for “Others”
![Page 66: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/66.jpg)
04/19/23 66
Example Uses• ATLAS
– Over 15 Million proton collision events simulated at 10 minutes each
• CMS– Over 70 Million events simulated, reconstructed and
analyzed (total ~10 minutes per event) in the past one year• IceCube / Amanda
– Data filtering used 12 years of GLOW CPU in one month• Computational Genomics
– Prof. Shwartz asserts that GLOW has opened up a new paradigm of work patterns in his group
• They no longer think about how long a particular computational job will take - they just do it
• Chemical Engineering– Students do not know where the computing cycles are
coming from - they just do it - largest user group
![Page 67: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/67.jpg)
04/19/23 67
Open Science Grid & GLOW
• OSG Jobs can run on GLOW– Gatekeeper routes jobs to local condor cluster– Jobs flock to campus wide, including the GLOW
resources– dCache storage pool is also a registered OSG
storage resource– Beginning to see some use
• Now actively working on rerouting GLOW jobs to the rest of OSG– Users do NOT have to adapt to OSG interface and
separately manage their OSG jobs– New Condor code development
![Page 68: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/68.jpg)
68www.cs.wisc.edu/~miron
Schedd
ScheddOn The
Side
Elevating from GLOW to OSG
Specialized scheduler operating on schedd’s jobs.
Job 1Job 2Job 3Job 4Job 5…Job 4*
job queue
![Page 69: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/69.jpg)
69www.cs.wisc.edu/~miron
Gatekeeper
X
The Grid Universe
Schedd
Startds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed Random
SeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
•easier to live with private networks•may use non-Condor resources
•restricted Condor feature set(e.g. no std universe over grid)•must pre-allocating jobsbetween vanilla and grid universe
vanilla site X
![Page 70: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/70.jpg)
70www.cs.wisc.edu/~miron
Dynamic Routing Jobs
Schedd
LocalStartds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
ScheddOn The
Side Gatekeeper
X
Y
Z
vanilla site X
RandomSeed
RandomSeed
site Y site Z
•dynamic allocation of jobsbetween vanilla and grid universes.•not every job is appropriate fortransformation into a grid job.
![Page 71: The Open Science Grid](https://reader037.vdocuments.us/reader037/viewer/2022110210/56812d82550346895d929481/html5/thumbnails/71.jpg)
71
Final Observation …
A production grid is the product of a complex interplay of many forces:
Resource providers Users Software providers Hardware trends Commercial offerings Funding agencies Culture of all parties involved …