Grid Projects: EU Grid Projects: EU DataGrid DataGrid and LHC Computing and LHC Computing GridGrid Oxana SmirnovaOxana Smirnova
Lund UniversityLund UniversityOctober 29, 2003, KoOctober 29, 2003, Košicešice
2003-10-29 [email protected] 2
Outlook
Precursors: attempts to meet tasks of HEP computing
EDG: the first global Grid development project
LCG: deploy computing environment for LHC experiments
2003-10-29 [email protected] 3
Characteristics of HEP computingEventEvent independenceindependence
Data from each collision is processed independently: trivial parallelism Mass of independent problems with no information exchange
MassiveMassive datadata storagestorage Modest event size: 1 – 10 MB (not ALICE though) Total is very large – Petabytes for each experiment.
MostlyMostly readread onlyonly Data never changed after recording to tertiary storage But is read often! A tape is mounted at CERN every second!
ResilienceResilience ratherrather than ultimate reliabilitythan ultimate reliability Individual components should not bring down the whole system Reschedule jobs on failed equipment
ModestModest floatingfloating pointpoint needsneeds HEP computations involve decision making rather than calculation
2003-10-29 [email protected] 4
Department
Desktop
CERN – Tier 0
MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html
Tier 1 FNALRAL
IN2P3622 M
bps2.5 Gbps
622 M
bp
s
155
mbp
s 155 mbps
Tier2 Lab a
Uni. b Lab c
Uni. n
MONARC: hierarchical regional centres model
2003-10-29 [email protected] 5
EU Datagrid project
In certain aspects was initiated as a MONARC follow-up, introducing the Grid technologies
Started on January 1, 2001, to deliver by end 2003 Aim: to develop a Grid middleware suitable for High
Energy physics, Earth Observation, biomedical applications and live demonstrations
9.8 MEuros EU funding over 3 years Development based on existing tools, e.g., Globus,
LCFG, GDMP etc Maintains development and applications testbeds,
which include several sites across the Europe
2003-10-29 [email protected] 6
EDG overview : Main partners CERN – International (Switzerland/France)
CNRS – France
ESA/ESRIN – International (Italy)
INFN – Italy
NIKHEF – The Netherlands
PPARC – UK
Slide by EU DatGrid
2003-10-29 [email protected] 7
Research and Academic Institutes•CESNET (Czech Republic)•Commissariat à l'énergie atomique (CEA) – France•Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI)•Consiglio Nazionale delle Ricerche (Italy)•Helsinki Institute of Physics – Finland•Institut de Fisica d'Altes Energies (IFAE) - Spain•Istituto Trentino di Cultura (IRST) – Italy•Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany•Royal Netherlands Meteorological Institute (KNMI)•Ruprecht-Karls-Universität Heidelberg - Germany•Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands•Swedish Research Council - Sweden
EDG overview : Assistant PartnersIndustrial Partners•Datamat (Italy)•IBM-UK (UK)•CS-SI (France)
Slide by EU DatGrid
2003-10-29 [email protected] 8
EDG work-packages WP1: Work Load Management System WP2: Data Management WP3: Grid Monitoring / Grid Information Systems WP4: Fabric Management WP5: Storage Element, MSS support WP6: Testbed and demonstrators WP7: Network Monitoring WP8: High Energy Physics Applications WP9: Earth Observation WP10: Biology WP11: Dissemination WP12: Management
2003-10-29 [email protected] 9
Simplified Grid deployment approach Homogeneous structure
All the sites must run with the same OS and kernel (Linux, RedHat7.3)
Recommended central installation via LCFG service (installs entire machine from scratch on each reboot)
Exceptions are possible, but not supported Invasive installation
Requires massive existing cluster re-configuration Needs to be installed on every compute node
2003-10-29 [email protected] 10
Basic EDG services Workload management
Resource Broker (RB) and Job Submission Service (JSS) Logging and Bookkeeping Service (L&B) Information Index (II) User Interface (UI)
Data management Replica Location Service (RLS) Replica Metadata Catalog (RMC) Replica Optimization Service (ROS)
Information and monitoring service Relational Grid Monitoring Architecture (R-GMA)
Fabric management Mass storage management Virtual Organization management
2003-10-29 [email protected] 11
Typical EDG site composition
Site-specific: User Interface (UI) Computing Element or
Service (CE) Gatekeeper (GK) Worker Nodes (WN)
do have client APIs for accessing EDG services and information
Storage Element (SE) Monitoring Node (MON)
R-GMA servlets for the site ROS
Common: Resource Broker (RB) RLS
Local Replica Catalog (LRC)
RMC Information Catalog (IC)
2003-10-29 [email protected] 12
Organization of user access Users must have valid personal Globus-style certificates
Group or anonymous certificates are not allowed Certificate Issuing Authority (CA) must be endorsed by the EDG Security
Group If there is no approved CA in your country/region, France catches all
Users must belong to one of the accepted Virtual Organizations (VO) LHC experiments, biomedical and Earth Observation applications, and some
EDG teams VO lists are managed by experiments/teams representatives Users can belong to several VOs Users with identical names or a user with several certificates can not belong
to a same VO Local system administrators still have a full control
To “log into the Grid”, users make use of the private certificate to issue a public proxy
Grid sites accept requests only from users whose certificates are signed by CAs that a site accepts
2003-10-29 [email protected] 13
EDG applications testbed EDG is committed to create a stable testbed to be used by
applications for real tasks This started to materialize in August 2002… …and coincided with the ATLAS DC1 CMS joined in December ALICE, LHCb – smaller scale tests
At the moment (October 2003) consists of ca. 15 sites in 8 countries
Most sites are installed from scratch using the EDG tools (require/install RedHat 7.3) Some have installations on the top of existing resources A lightweight EDG installation is available
Central element: the Resource Broker (RB), distributes jobs between the resources Most often, a single RB is used Some tests used RBs “attached” to User Interfaces In future, may be an RB per Virtual Organization (VO) or/and per user ?
2003-10-29 [email protected] 14
EDG Applications Testbed snapshot
2003-10-29 [email protected] 15
Basic EDG functionality as of today
UI
CASTOR
RLS
CE
CE
CE
RB
do rfcp
rfcp
rfcp
replicate
RM
RM
jdl
+R-GMA NFS
RSL
OutputRM
Input
Input
Output
2003-10-29 [email protected] 16
EDG status The EDG1 was not a very satisfactory prototype
Highly unstable behavior Somewhat late deployment Many missing features and functionalities
The EDG2 is released and deployed for applications on October 20, 2003 Many services have been re-written since EDG1 Some functionality have been added, but some have been lost Stability is still the issue, esp. the Information System performance Little has been done to streamline applications environment deployment No production-scale tasks have been shown to perform reliably yet
No development will be done beyond this point Bug fixing will continue for a while Some “re-engineering” is expected to be done by the next EU-sponsored
project – EGEE
2003-10-29 [email protected] 17
The future: LCG
LCG LHC Computing Grid Goal: to deploy an adequate information and
computational infrastructure for the LHC experiments Means of achieving: using the modern distributed
computing and data analysis tools and utilities – The Grid
Resources: large computing centers around the World as the basic elements Research institutes, laboratories and universities are also
members of the data analysis chain No need to concentrate the computing power at CERN
2003-10-29 [email protected] 18
LCG Timeline
September 2001: the project is approved by the CERN Council Duration: 2002 to 2008
Phase 1: prototyping, testing Phase 2: deployment of the LHC computing infrastructure2
November 2003: a functioning LCG-1 prototype (a criterion: 30 consecutive days of non-stop operation); includes 10 regional centers
May 2004: research lab and institutes are joining with their resources
December 2004: LCG-3, 50% of expected by 2007 performance
IX/01
2002 2006 2008
Phase 1Phase 1
Phase 2Phase 2
XI/03 V/04 XII/04
2003-10-29 [email protected] 19
LCG organization Financing:
CERN and other states participating in LHC projects Business partners LHC experiments National research foundations and computing centers Projects financed my EU and other international funds
Structure: Applications CERN fabric Grid technology Grid deployment
FOR MORE INFO:
http://cern.ch/lcg
2003-10-29 [email protected] 20
First priority: LCG-1
Computing cluster Network resources Data storage
Operating system Local schedulerFile system
User access Security Data transfer Information schema
Global scheduler Data managementInformation system
User interfaces Applications
Major components and levels:
Hardware
System software
Passive services
Active services
High level services
Closed system (?)Closed system (?) HPSS, CASTOR…HPSS, CASTOR…
RedHat LinuxRedHat Linux NFS, …NFS, … PBS, Condor, LSF,…PBS, Condor, LSF,…
VDT (Globus, GLUE)VDT (Globus, GLUE)
EU DataGridEU DataGrid
LCG, experimentsLCG, experiments
2003-10-29 [email protected] 21
grid for a physicsstudy group
grid for a regional group
les.
rob
ert
son
@ce
rn.c
h
Tier2
Lab a
Uni a
Lab c
Uni n
Lab m
Lab b
Uni bUni y
Uni x
Tier3physics
department
Desktop
Germany
Tier 1
USA
UK
France
Italy
Taiwan
CERN Tier 1
Japan
The LHC Computing
CentreCERN Tier 0
LHC Grid: what became of the MONARC hierarchy
2003-10-29 [email protected] 22
LCG status
Grid component: almost entirely the EDG solution Major difference: LCG-1 still has the “old” MDS for the information
system Deployed at the LCG testbed, non-overlapping with the EDG in
general, includes non-EU countries like US, Russia or Taiwan More stable so far than EDG (for MDS?..) Little or no Grid development In future, may consider alternative Grid solutions, e.g., the AliEn
(though unlikely) Grid Technology area is on the verge of being dismissed, as LCG
will not be doing Grid development LHC Applications component:
A lot of very serious development Many areas are covered, from generators to Geant4 to data
management etc Unfortunately, has little interaction and co-operation with Grid
developers
2003-10-29 [email protected] 23
LCG-1 Testbed
2003-10-29 [email protected] 24
Summary
Initiated by CERN, EDG came as the first global Grid R&D project aiming at deploying working services
Sailing in uncharted waters, EDG ultimately provided a set of services, allowing to construct a Grid infrastructure
Perhaps the most notable EDG achievement is introduction of authentication and authorization standards, now recognized worldwide
LCG took a bold decision to deploy EDG as their Grid component for the LCG-1 release
The Grid development does not stop with EDG: LCG is open for new solutions, with a strong preference towards OGSA