CHEP 2003 Summary Grid Architecture,
Infrastructure, & Middleware Monitoring & Security
Andrew HanushevskyStanford Linear Accelerator Center
March 25, 2003 2: CHEP 2003
Legal Disclaimer
This summary is from one perspective It is not representative of any particular view
Other than the presenter
This summary is not warranted for any purpose whatsoever Participants assume all direct and indirect
(consequential or inconsequential) damages
Do you want to stay?Do you want to stay?
March 25, 2003 3: CHEP 2003
Grid Deployment
Track I talks referenced grid “deployment” Deployment has many meanings
Minimally, if you have it working it better be usable Is it production ready?
March 25, 2003 4: CHEP 2003
Production Grids
LCG Experience Suggests It Is Difficult Packaging, Installation, Configuration, &
Validation Issues “These issues (and more) make the difference
between the research project ending with a demo and the product to be used for production.”
-- Zdenek Sekera
Assume LCG (T#184) interpretation of production Harsh but be need a benchmark
March 25, 2003 5: CHEP 2003
What is “production quality”?
It is all of the following in no particular order: availability 24 x 7 performance stability, robustness user friendliness maintainability user support
From LCG T#184From LCG T#184
March 25, 2003 6: CHEP 2003
So Where Are We?
Let’s take a look at presented “grid” projects in alphabetic order From Grid to Grid-Like
Disclaimer! This is not representative of all such projects
March 25, 2003 7: CHEP 2003
AliEn (M#253)
Distributed environment with Grid interface SASL (includes GSI) EDG compatible
authentication Distributed RDBMS-based file catalog Condor-like job scheduling Attempts to unify grid infrastructures Adopted by MamoGrid (M#66)
March 25, 2003 8: CHEP 2003
Amanda (M#110)
Ostensibly production ready Condor + Bypasses + Local Tools (Grid Navigator)
Uses central s/w and data repositories Runs a specific application software suite Plan to integrate Globus middleware as it matures
March 25, 2003 9: CHEP 2003
DIRAC (M#253)
Distributed environment Essentially a roll-your-own grid-like solution
Interface to EDG now in test EDG stability considered problematic
Successfully deployed on 17 sites
March 25, 2003 10: CHEP 2003
EDG
Workload Management WP1 (M#132 & 137)
Deployed for 18 months Still pre-production stage
Various problems in reliability & scalability
Numerous improvements planned DAGMan integration Grid Accounting Resource reservation & co-allocation
• Globus GARA Approach
March 25, 2003 11: CHEP 2003
EDG (continued)
Data Management WP2 (T#249 & 490) Basic use cases satisfied Not proven in a “real user environment”
Pre-production
Numerous additions planned Logical collection Enhanced security
Authorization and delegation OGSA direction with future compliance
March 25, 2003 12: CHEP 2003
NorduGrid (M#109)
Modified/Extended Globus + EDG RLS Pre-production stage Additional EDG integration as stability improves Web Services (OGSA) plans
March 25, 2003 13: CHEP 2003
SAM (T#335)
Successful for D0 and CDF Work under way to integrate with grid middleware Production D0 release of SAMGrid (JIM+Condor-G)
scheduled for April One of the arguably successful grid-like projects
Largely dealing with data management issues
March 25, 2003 14: CHEP 2003
STAR (T#442)
Distributed environment Essentially a roll-your-own grid-like solution
Interface to Condor-G Uses LBL HRM/DRM
Successful (but limited) deployment NERSC & BNL
March 25, 2003 15: CHEP 2003
Storage Resource Broker (T#211)
Successful deployment across multiple fields Work underway to integrate with Globus data
mangement One of the arguably successful grid-like projects
Limited to data management
March 25, 2003 16: CHEP 2003
The Successes
Few projects have achieved “production” status Those which have are focused and grid-like
SAM, SRB soon to follow AliEn. Dirac, & Star It is not clear why this is so
Historical timeline? Immediate need for results? Funding model? Grid protocols in flux (e.g., Globus 2 vs Globus 3)? Open software/collaboration issues? Sociological phenomena?
Fortunately many plan to integrate with the “standard” grid Time will tell….
March 25, 2003 17: CHEP 2003
The Fast Trackers
These projects have only incorporated some grid middle-ware Amanda & NorduGrid
Many difficult issues have been avoided, but…. Are we entering the OSI model of development?
Pick and choose from a bag of protocols & tools This does not bode well for interoperability
March 25, 2003 18: CHEP 2003
The Simmering
“These” projects have embraced the grid EDG (parallels and derivates)
Problems not being avoided Adopted the long range view (2 or more years)
Will this be to the benefit of the HEP community? Depends on your of view of next generation computing It seems that all projects are hedging their bet
You wonder where we would be if all the hundreds of current FTE’s were focused on making this model really work
March 25, 2003 19: CHEP 2003
State of Security
Three dominate themes Private Key Management
KCA (T#422), VSC etc. (T#81) Virtual Organization Management
VOMs (T#317) & GUMs (T#363) Authorization (a.k.a. Access Control)
GACL (T#190), SAZ (T#423), Akenti (T#426), CAS (T#441, 518)
March 25, 2003 20: CHEP 2003
Security Convergence
Other than x.509 there is little common ground But, does there need to be any common ground?
Key management is a matter of trust policy VO administration is a site or multi-lateral prerogative Authorization is largely a local issue
It seems that if you can agree on the credentials (i.e., x.509 + endorsements) the rest is relegated to collaboration policy irrespective of implementation
This appears to be the direction Even if it’s not obvious at the moment
March 25, 2003 21: CHEP 2003
Grid Monitoring
There is much activity Much of it overlapping
BOSS (M#84), GMA (M#403), GridMonitor (M#321), Mona Lisa (M#103), PerfMC (M#522), & R-GMA (M#407)
Some convergence Minimum set of events Format (XML yet no “lingua franca” agreement)
This is an area to watch! GGF is likely the stomping ground for agreement
March 25, 2003 22: CHEP 2003
The Ultimate Highlights
Virtual Data
XML
Distributed File Systems
Job Scheduling
Peer to Peer Computing
“The” Award
March 25, 2003 23: CHEP 2003
The Innovation Most At Risk
Virtual Data (T#106 & 114)
Great concept at technological mercy The Optiputer is the menace. Consider….
Unlimited bandwidth Ever decreasing storage costs Constant software changes Sociological problems of capturing the processing path
Together these may make VD untenable
March 25, 2003 24: CHEP 2003
Things to Watch For I
XML This is rapidly becoming the common syntax
Yet little effort in developing a common language Assumption, perhaps misguided, that WSDL repositories
will address the problem• Diamonds (iKnow) architecture (Java RMI + JINI)
Distributed Grid File Systems Minimal data movement with global access
AlienFS (R#254) There are many others that were not presented
March 25, 2003 25: CHEP 2003
Things to Watch For II
Job to Data Scheduling Algorithms to place a job near the data
Minimize data movement
Peer To Peer Computing Marxist scheduling aiming for 100% utilization
Not yet addressed by current grid architectures Ad hoc protocols Subversive in that this may be the “real” next thing
Augernome (R#293)
March 25, 2003 26: CHEP 2003
Summarizer’s Award
The project that makes innovative yet practical use of existing grid protocols Grid Brick (R#493)
Parallel root-based query using Globus scheduling Uncomplicated and practical needs-based approach It’s so obvious you wonder why you didn’t do it first
It works within a standard grid environment! Load balancing and fault tolerance to be explored
March 25, 2003 27: CHEP 2003
Conclusions
Grid efforts are still meandering Great for innovation Dismal for standardization
Security is a bright spot Rapid convergence on authentication issues Authorization is more fuss than furry
There is a light at the end of tunnel
Monitoring situation is disappointing The need is recognized but no agreement on how to proceed
Cross grid monitoring is in serious jeopardy