C. Loomis – Status of European DataGrid – May 23, 2002 – 1
Status of European DataGrid
Charles Loomis
CNRS/LAL
NorduGrid Workshop
May 23, 2002
C. Loomis – Status of European DataGrid – May 23, 2002 – 2
Introduction & Outline
European DataGrid
3-year EU-funded project
Goals:—develop grid middleware
—deploy onto working testbed
—demonstrate grid technology with working applications
Strong application component unique!
Current SoftwareMachine Tour
Status
TestbedDeployed software
Present & Future Sites
Near-term DevelopmentsEDG v1.2
Latest Globus Release
EDG License
Longer-term DevelopmentsTesting & Support Infrastructure
Enhanced EDG Features
Interoperability
Further Information
C. Loomis – Status of European DataGrid – May 23, 2002 – 3
User Interface
Lightweight access to grid
Access from Laptop
No host certificate needed.
Some question about CRLs.
Limitations
Cannot run ftp daemon here.
Services:
UserInterface (CLI)
Globus GSI
globus-url-copy (client)
Development libraries—BrokerInfo
—Replica Catalog APIs
—GDMP client interface
C. Loomis – Status of European DataGrid – May 23, 2002 – 4
Resource Broker
Finds resources, submits & tracks jobs:
Heavyweight machine.
Talks to RC and MDS.
Acts as users’ network presence.
Talks to proxy server.
Bottleneck
Can replicate, but enough?
Services:
Resource Broker
JobSubmission Service—Condor-G below
Information Index
Logging & Bookkeeping
GSI-ftp daemon
C. Loomis – Status of European DataGrid – May 23, 2002 – 5
Computing Element
Accepts & Executes Jobs:
Gatekeeper—acts as public interface to computing
resources
Worker Node(s)—provides all software needed for
applications
—accessible via batch system•PBS, LSF, …
Services:
Gatekeeper
GSI-ftp daemon
GIIS/GRIS
C. Loomis – Status of European DataGrid – May 23, 2002 – 6
Storage Element
Generic interface to storage:
Gatekeeper—should go away
GSIFTP
RFIO
Services:
Gatekeeper
GDMP
GSI-ftp daemon
RFIO daemon
C. Loomis – Status of European DataGrid – May 23, 2002 – 7
Replica Catalog
Provides information about replicas:
Catalog Service—accessed via RB or directly
Services:
LDAP
GIIS/GRIS
C. Loomis – Status of European DataGrid – May 23, 2002 – 8
Authorization/Authentication System
All based on GSI (PKI):
Certification Authorities
Virtual Organization Servers
Services:
LDAP for VO servers
various SW for CA’s
mkgridmap generation software
C. Loomis – Status of European DataGrid – May 23, 2002 – 9
Software Distribution & Installation
Storage:
Package repository
CVS server
Distribution
HTTP downloads
wget with rpm lists
most primitive link in chain
Installation
LCFG (LCFG-lite)
Only works for RH6.2
C. Loomis – Status of European DataGrid – May 23, 2002 – 10
Software on “Production” Testbed
Stopped work on 1.1-series to focus on 1.2.
Deployed v1.1.4+patches version not uniform
Significant functionality missing for applications.—Replica Management
—Access to mass storage.
Difficult for middleware to support this version.
Testbed works, but…
Known stability problems:—Information Index dies regularly.
—Broker needs to be restarted often.
Support limited—Maintenance reduced to life support.
—Effort for new sites limited to “available effort.”
C. Loomis – Status of European DataGrid – May 23, 2002 – 11
Production Testbed Sites
Production Sites
Most have dedicated hardware.—Lyon running on main batch system.
Typically few to 10’s of machines.
LCFG for Install. & Config.—Lyon again exception.
Limitations to Expansion
Info. systems unreliable.—manual reg. not scalable or dynamic
How to add countries w/o CA?—OK for users (CNRS CA)
—Not OK for host certificates.
Site Location
Catania Catania (I)
CC-IN2P3 Lyon (F)
CERN Geneva (CH)
CNAF Bologna (I)
Imperial College
London (UK)
MSU Moscow (Russia)
NIKHEF Amsterdam (NL)
Padova Padova (I)
RAL Rutherford (UK)
Torino Torino (I)
Croatia
Taiwan
United States
C. Loomis – Status of European DataGrid – May 23, 2002 – 12
EDG Release 1.2
New Features in 1.2 Release (10)
Replica Management API—first implementation has limited API
Access to Mass Storage Systems—authorization linked to user account mapping
Auto-resubmission of failed jobs.—will help with stability problems (but is not a solution!)
Current Problems
GASS cache file locking problems (failed job submissions)
OpenLDAP timeout (II hangs; complete loss of MDS information)
FTree interfering with gatekeeper. (Causes crashes; failed submissions)
C. Loomis – Status of European DataGrid – May 23, 2002 – 13
Expected Schedule
13 14 15 16 17
20 21 22 23 24
27 28 29 30 31
3 4 5 6 7
10 11 12 13 14
17 18 19 20 21
May
June
ITeam at CERN 1.2 alpha
RAL/CNAF Test 3 SitesRefine alpha
GASS/MDS Prbs.
JJ/Ingo Tests <1% error rateApp. Testing
App. Testing
1.3 codelicense info
DeploymentDecision
ESRIN DemoCore SiteDeployment
General Deployment
1.2 beta
C. Loomis – Status of European DataGrid – May 23, 2002 – 14
Upgrade to Latest Globus Release
EDG Globus beta-21 is based on first Globus2 beta.Includes some patches for security.
Some EDG-specific patches.
(Larger changes for EDG 1.2.)
Upgrade to current Globus2 release depends on:Desire of the applications groups
—Only known critical problem is with file transfers >20min.
Whether it contains fixes for GASS/MDS problems.
When EDG software for release 1.2 is deemed stable.
EDG 2.0 release in fall will be based on Globus2!OGSA being evaluated, but no whole-scale move yet.
Some new EDG software functions as “Web Service”
C. Loomis – Status of European DataGrid – May 23, 2002 – 15
Testing & Support
Testing Group
Goal: Intensive testing of releases
Provide framework for:—unit tests
—integration tests
—stress tests
Provide material for objective evaluation of software for EU-review.
Use tests for:—check of quality of software
—verification of functionality
—check configuration of new sites
Has started with EDG 1.2 (10).—should have feedback for EDG 1.2
deployment decision
Support Infrastructure
Provide email-based support for both end-users and system administrators.
—ITeam and other experts
—New system administrator group
Tracking & follow-up of problems.
Create “knowledge base” for FAQs and typical problems.
Interact with LCG and CrossGrid to share the support effort.
System in place shortly; fully functional for Testbed2.
C. Loomis – Status of European DataGrid – May 23, 2002 – 16
EDG Software License
EDG software license will be in BSD family (see EDG website):
OpenSource license.
Developments may be put back into code base.
Allows commercial use of code.
Standard license for most Grid-projects—Exception: ClassAds, Condor-G will be LGPL.
EDG audit of external packages:
Necessary to ensure we can apply our own license.
Necessary to ensure that we properly attribute other groups’ work.
Need to be especially careful with GPL code.—Ensure that core functionality consistent with license.
—LCFG will likely be GPL license rather than the EDG license.
C. Loomis – Status of European DataGrid – May 23, 2002 – 17
Release Schedule
Moved to iterative releases:
Keep developments compatible.
Provide intermediate checks on progress.
Allow applications to evaluate functionality.
Not all intermediate releases will be deployed!
Release 2.0 is hard deadline; others somewhat flexible.
Details in “Release Plan” document on web site, highlights…
Release
Date
1.1 Jan. 31
1.2 March 31
1.3 May 31
1.4 July 31
2.0 Sept. 30
C. Loomis – Status of European DataGrid – May 23, 2002 – 18
Release 1.2
General
Emphasis on stability.
Deploy as production release.
Globus
Uses first Globus2 beta (beta-21)
Plus EDG patches.
Workload Management (WP1)
Proxy renewal for long jobs.
Auto-resubmission of failed jobs.
Data Management (WP2)
Replica Manager (first impl.)
GDMP 3.0
Fabric Management (WP4)
Updated LCFG
EDG Gatekeeper (LCAS)
Storage Element (WP5)
Access to existing data in MSS.
Networking (WP7)
Publish network data into MDS.
C. Loomis – Status of European DataGrid – May 23, 2002 – 19
Release 1.3
General
Autobuild all EDG packages.
Copyright and license for code.
Globus
Update to latest Globus2 release
Workload Management (WP1)
C APIs
MPICH support.
Data Management (WP2)
Replica Manager
Replica Location Service (giggle)
Grid Mon./Info. Services (WP3)
R-GMA deployed in parallel with MDS
Fabric Management (WP4)
EDG JobManager
Storage Element (WP5)
RFIO with GSI
Prototype GridFTP with MSS access.
Networking (WP7)
Network cost function.
C. Loomis – Status of European DataGrid – May 23, 2002 – 20
Release 1.4
General
Support RH6.2, RH7.2
GLUE Schema
New authorization scheme.
Workload Management (WP1)
Interactive jobs.
Job dependencies.
Triggered file transfers.
Data Management (WP2)
Replica Manager with Optimiser
SpitFire beta release.
Grid Mon./Info. Services (WP3)Better integration of R-GMA.
Unified (GLUE) schema.
Fabric Management (WP4)KickStart translator.
Monitoring & Alarms.
Condor supported.
Storage Element (WP5)DiskManager for disk-only SE.
Testbed (WP6)New authorization scheme.
Networking (WP7)Publication of network metrics.
C. Loomis – Status of European DataGrid – May 23, 2002 – 21
Release 2.0
General
Support RH6.2, RH7.2, Solaris?
Workload Management (WP1)
Job checkpointing.
Accounting.
Advance reservation.
Data Management (WP2)
Full integration of components.
Grid Mon./Info. Services (WP3)
R-GMA WebServices
Fabric Management (WP4)
HLD templates.
Credential service (LCMAPS).
Storage Element (WP5)
DiskManager access to all HSM.
Reservation, pinning, quotas.
Testbed (WP6)
Laptop based UI machine.
Networking (WP7)
Network cost for all sites.
C. Loomis – Status of European DataGrid – May 23, 2002 – 22
Interoperability
Working with GriPhyN, PPDG, iVDGL, DataTag, CrossGrid,
First concrete example is GLUE schema.
Places for conflict:
Information systems
Agreed interfaces
C. Loomis – Status of European DataGrid – May 23, 2002 – 23
Further Information
Interesting web sites:
EDG: http://www.eu-datagrid.org/ —general information about EDG project
—links to all work package web sites
WP6: http://marianne.in2p3.fr/ —support information (contacts, bug reporting, documentation, mailing lists)
—meeting agenda/minutes
—links to source code in CVS; packages in package repository
Bleeding-edge information:
Warning: this is a high-volume list!