an overview of the egee infrastructure and middleware
DESCRIPTION
Elena Slabospitskaya IHEP NA3 manager for Russia. An overview of the EGEE infrastructure and middleware. EGEE is funded by the European Union under contract IST-2003-508833. Sources of information. LCG-2 User Guide https://edms.cern.ch/file/454439//LCG-2-UserGuide.html LCG Releases - PowerPoint PPT PresentationTRANSCRIPT
An overview of the EGEE infrastructure and middleware
EGEE is funded by the European Union under contract IST-2003-508833
Elena SlabospitskayaIHEP
NA3 manager for Russia
Sources of information
LCG-2 User Guidehttps://edms.cern.ch/file/454439//LCG-2-UserGuide.html
LCG Releaseshttp://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi?var=releasesLCG-2 Install Notes (for administrators)
LCG-2 Manual Installation Guide (for administrators)https://edms.cern.ch/file/434070//LCG2Install.htmlSite with EDG
Tutorialshttp://hep-proj-grid-tutorials.web.cern.ch/hep-proj-grid-tutorials/
Overall
1. GSI – Grid Security Infrastructure
2. Infi\ormation System
3. Job Management
4. Data Management
5. Monitoring System
Conclusions
Main Logical Machine Types (1)
RMSCERN
PS
RBMSU
BD I IMSU
SE
SE
SE
SE
SE
SE
UI
UI
UI
CECE
CE
Protvino, IHEPDubna, JINR
Moscow,SINP MSU
SES
E
CE
UI
Moscow, ITEP
Distributed system - A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine.
UI – User Interface
CE – Grid Gate and Worker Nodes GG – Globus Gatekeeper, Globus Resource Allocation Manager, master server of Local Resource Management System, local Logging and Bookkeepering server
SE – Classic Storage Element – GridFTP server SE may control large disk arrays or Mass Storage System(MSS). This storage resources are managed by Storage Resource Manager (SRM). SRM is interacting with OS, MSS and with protocols (to perform file transfer operations) As MSS, LCG-2 support dcache disk pool (GridFTP and rfio), tape archiving system - Castor( GridFTP and rfio) and Enstore(GridFTP ).RB -Resource BrokerRMS -Replica Management SystemBDII – Berkeley DB Information IndexPS – proxy server
Main Logical Machine Types (2)
How do I login on the Grid ?
Two basic concepts: Authentication: Who am I?
“Equivalent” to a pass port, ID card etc.
Authorisation: What can I do? Certain permissions, duties etc.
The Grid Security Infrastructure (GSI) in LCG-2 enables secure authentication and
communication over an open network . GSI is based on public key encryption,
X.509 certificates, and the Secure Sockets Layer (SSL) communication protocol.
- Provides information about grid resourses and their status
- GLUE (Grid Laboratory for a Uniform Environment) schema – common conceptual data model for CE, SE and binding CE-SE.
-MDS (Monitoring and Discovery Service) from Globus has been adopted asa provider of IS.
- IS implements Glue schema using OpenLDAP – Lightweight Directory Acess Protocol
- GRIS – Grid Resource Information System – local on CE and SE
- GIIS – Grid Index Information Service – site (CE)
- BDII -Berkeley DB Information Index
Information System
.
Information system in LCG-2
A LDAP Information System is based on entries.Each entries describes an object – person, computer etc and has unique Distinquished Name (DN). Which kind of information can be stored in each entryis specified in an LDAP schema
Directory Information Tree
Directory Information Tree (DIT) – a tree of directory entries
LDAP directory of an LCG-2 BDII
Job management
Workload Management System (WMS) services is usually run at Resource Broker. Network Server (NS), which accepts the incoming job requests from the UI,
and provides for the job control functionality.
Workload Manager, which is the core component of the system.
Match-Maker (also called Resource Broker), whose duty is finding the best resource matching the requirements of a job (match-making process).
Job Adapter, which prepares the environment for the job and its final description, before passing it to the Job Control Service.
Job Control Service (JCS), which finally performs the actual job management operations (job submission, removal...)
Logging and Bookkeeping service (LB) . The LB logs all job management Grid events, which can then be retrieved by users or system administrators for monitoring or troubleshooting.
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement CE)Element CE)
Information Service (IS)
ReplicaCatalogue(RC)
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job SubmitEvent
Input Sandbox
Job Status
submitted
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService(JSS)
StorageElement (SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
BrokerInfo
scheduled
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
scheduled
Input Sandbox
running
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JSS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Job Status
submitted
waiting
ready
scheduled
Job Status
running
A Job Submission Example
UIJDL
Logging &Book-keeping
ResourceBroker
Job SubmissionService
StorageElement
ComputeComputeElementElement
Information Service
ReplicaCatalogue
submitted
waiting
ready
scheduled
running
Job Status
done
Job Status
A Job Submission Example
UIJDL
Logging &Book-keeping
ResourceBroker
Job SubmissionService
StorageElement
ComputeComputeElementElement
Information Service
ReplicaCatalogue
submitted
waiting
ready
scheduled
running
done
Job Status
Job Status
outputready
Output Sandbox
A Job Submission Example
UIJDL
Logging &Book-keeping(LB)
ResourceBroker (RB)
Job SubmissionService (JS)
StorageElement(SE)
ComputeComputeElement (CE)Element (CE)
Information Service (IS)
ReplicaCatalogue(RC)
Output Sandbox
cleared
submitted
waiting
ready
scheduled
running
done
Job Status
outputready
Possible Job States
SUBMITTED
WAITING
READY
SCHEDULED
RUNNING
DONE(ok)DONE(failed)
OUTPUTREADY
CLEARED
ABORTEDDONE(cancelled)
Data Management Data Naming
SURL Storage URL An SURL is a locator for a physical filesrm://lxshare0282.cern.ch:8443/castor/cern.ch/home/dteam/generated/2004-02-11/A SURL is often called PFN (Physical File Name)filed8f59bcf-5c85-11d8-bbf3-c59c9bed1519
UUID Universally Unique IDentifier A UUID is a 128 bits long numberGUID Grid Unique IDentifier A UUID generated by the Replica Management System guid:e4fbe9b0-5c85-11d8-bbf3-c59c9bed1519
LFN Logical File Name A Logical File Name is a user defined alias to a GUID.
TURL Transport URL A Transport URL is returned by a SRM in response to a request for a way to access a SURL.
lfn:anjita-demo0236-2004-11-02
rfio://lxshare0282.cern.ch//data/dt/stage/filec0fabd63-5cba-11d8-ba4c-e2aa3666572b.4003
Different filenames in LCG-2
The main services offered by the RMS are: the Replica Location Service (RLS) and the Replica Metadata Catalog (RMC).
The RLS maintains information about the physical location of the replicas (mapping with the GUIDs). It is composed of several Local Replica Catalogs (LRCs) which hold the information of replicas for a single VO.
The RMC stores the mapping between GUIDs and the respective aliases (LFNs) associated with them, and maintains other metada information (sizes, dates, ownerships...)
The last component of the Data Management framework is the Replica Manager. The Replica Manager presents a single interface for the RMS to the user, and interacts with the other services.
REPLICA MANAGEMENT SYSTEM (RMS)
Interactions of the RM with other grid components
CONCLUSIONS
The EGEE Grid requires resources, an infrastructure and middleware that allows for:
Authentication and Authorization Information services Job and Data Management Monitoring and fault recovery
SRM Storage Resource Manager A high-level interface to a storage system. RLS Replica Location Service The distributed service providing the mappings between GUIDs and SURLs. An RLS has two components: LRC and RLI LRC Local Replica Catalog The catalog storing GUID to SURL mappings, along with SURL attributes for a given site, or a single Storage Re- source Manager at a site. RLI Replica Location Index The catalog storing information about which Local Replica Catalogs have GUID to SURL mappings for a par- ticular GUID. It thus provides the link between different LRCs, allowing for distributed indexing and querying of the Catalogs. RMC Replica Metadata Catalog The catalog storing LFN aliases for GUID, as well as at- tributes on GUIDs and LFNs. ROS Replica Optimization Service A service providing information to guide selection be- tween replicas located at different sites. This is based on network information collected from available network monitors.
Appendix. Data Management Services
http://lspitsky.home.cern.ch/lspitsky/
MDS- Monitoring and Discovery ServiceLCFG -Local ConFiguration System - Edinburgh