Download - LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 1
LHC Computing Grid Project
Status Report
12 February 2003
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 2
Project Goals
• applications – environment, common tools, frameworks, persistency, ..
• computing system – data recording, reconstruction, managed storage
(CERN)– global grid service of collaborating computer centres– global analysis environment
• central role of data challenges – deploy & evolve– experience confidence
Goal –Prepare and deploy the LHC computing environmentto help the experiments’ analyse the data coming from the detectors
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 3
Two Phases
Phase 1 – 2002-05 -- R&D– Applications - prototyping development– Develop and Operate a Grid Service– Computing Services TDR – July-2005
Phase 2 –2006-08 -- Construction & operation
– Installation, commissioning and operation
of the initial global LHC data analysis Grid
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 4
Requirements &Implementation
• SC2 brings together the Four Experiments and Tier 1 Regional Centres
• it identifies common domains and sets requirements for the project– may use an RTAG – Requirements and Technical Assessment
Group– limited scope, two-month lifetime with intermediate report– one member per experiment + experts
• PEB manages the implementation– organizing projects, work packages– coordinating between the Regional Centres– collaborating with Grid projects– organizing grid services
• SC2 approves the work plan, monitors progress
Info from SC2
LHCC Computing RRB
ProjectExecution
Board
Software andComputingCommittee
Overview Board
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 5
SC2 Requirements Specificationstatus of RTAGs
– On applications: final report• data persistency apr02• software support process may02• mathematical libraries may02• detector geometry description oct02• Monte Carlo generators oct02• applications architectural blueprint oct02• Detector simulation dec02
– On Fabrics• mass storage requirements may02
– On Grid technology and deployment area• Grid technology use cases jun02• Regional Center categorisation jun02
– Current status of RTAGs (and available reports) on www.cern.ch/lcg/sc2
Info from SC2
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 6
Work Planning Status
• High level planning paper prepared and presented to LHCC in July• Level 1 and 2 milestones agreed with LHCC referees – November
2002• PBS/WBS agreed with experiments – December 2002• see www.cern.ch/lcg/peb Planning
• Formal work plans agreed for – Data Persistency (POOL)– Support for the Software Process & Infrastructure (SPI)– Mass Storage– Core software services (SEAL)
• Work plans in preparation:– Mathematical Libraries– Physics Interfaces (PI)
• LHC Global Grid Service– First service definition in preparation February 2002
Info from SC2
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 7
LCG Level 1 Milestones
2002 200520042003
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4
Hybrid Event Store available for general users
Distributed production using grid services
First Global Grid Service (LCG-1) available
Distributed end-user interactive analysis
Full Persistency Framework
LCG-1 reliability and performance targets
“50% prototype” (LCG-3) available
LHC Global Grid TDR
applications
grid service
launch workshop
Here we
are
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 8
LCG Project Implementation
PEB : 4 Areas of Work -• Applications – Torre Wenaus• Grid deployment – Ian Bird• Fabrics – Bernd Panzer• Provision of Grid Technology
– David Foster
LHCC Computing RRB
ProjectExecution
Board
Software andComputingCommittee
Overview Board
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 9
Applications Area
Area manager – Torre Wenaus
• Importance of RTAGs to define scope• Open weekly applications area meetings• Software Architects Forum
– process for taking LCG-wide software decisions• Staffing of projects –
– CERN, experiments, other institutes– CERN resources being merged into a single group – EP/SFT
and moving people together in building 32
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 10
Simulation
• RTAGs have defined formal requirements for LCG for :– detector geometry description– MC generators– detector simulation
• Support required for both GEANT 4 and FLUKA
• GEANT4– independent collaboration, including HEP institutes, LHC and
other experiments, other sciences– significant LHC related resources (including CERN)– MoU being re-defined now– need to ensure long-term support
– CERN resources will be under the direction of the project– process for agreeing common LHC priorities
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 11
Grid Deployment
Area Manager – Ian Bird
• Planning, building, commissioning, operating - -
a stable, reliable, manageable Grid for - -
Data Challenges and the general analysis workload
• Integrating fabrics from many Regional Centres and CERN
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 12
Distributed Analysis must work
• CERN will provide the data reconstruction & recording service (Tier 0)-- but only a small part of the analysis capacity
Other Total CERN as Total CERN asTier 0 Tier 1 Total Tier 1 Tier 1 % of Tier 1 Tier 0 + 1 % of total
Tier 0 + 1
Processing (K SI2000) 12,000 8,000 20,000 49,000 57,000 14% 69,000 29%Disk (PetaBytes) 1.1 1.0 2.1 8.7 9.7 10% 10.8 20%Magnetic tape (PetaBytes) 12.3 1.2 13.5 20.3 21.6 6% 33.9 40%
-------------- CERN --------------
Summary of Computing Capacity Required for all LHC Experiments in 2008
current planning for capacity at CERN + principal Regional Centres– 2002: 650 KSI2000 <1% of capacity required in 2008– 2005: 6,600 KSI2000 < 10% of 2008 capacity
KSI2000 at CC-IN2P3 : March 2002 ~190, Nov. 2002 ~275, March 2003 ~700
% CPU (LHC/∑CC-in2p3) = 16% in 2002
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 13
Data Challenges in 2002
Wisconsin 18%
INFN 18%
IN2P3 10%
RAL 6%UCSD 3%
UFL 5%
HIP 1%
Caltech 4%Moscow
10%
Bristol 3%
FNAL 8%
CERN 15%
IC 6%
Wisconsin 18%
INFN 18%
IN2P3 10%
RAL 6%UCSD 3%
UFL 5%
HIP 1%
Caltech 4%Moscow
10%
Bristol 3%
FNAL 8%
CERN 15%
IC 6%
Most Resources not at CERN (CERN not even biggest Single Resource)
Spring02: CPU Resources
6 million events
~20 sites
MàJ : 9/02/03 07:24 [email protected]
grid tools used at 11 sites
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 16
Grid Deployment
• Experiments can do (and are doing) their event production using distributed resources with a variety of solutions
– classic distributed production – send jobs to specific sites, simple
bookkeeping– some use of Globus, and some of the HEP Grid tools– other integrated solutions (ALIEN)
• The hard problem for distributed computing is data analysis – ESD and AOD
– chaotic workload– unpredictable data access patterns
this is the problem that the LCG has to solve and this is where Grid technology should really help
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 17
Deploying the LHC Grid
• The priority for 2003 is to move
from testbeds to a SERVICE
• We need to learn how to OPERATE a Grid
Service Quality and Reliability
are as important as functionality
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 18
Grid Deployment Board
• Grid Deployment Board – chair Mirco Mazzucato– representatives from the experiments and from
each country with an active Regional Centre taking part in the LCG Grid Service
– forges the agreements, takes the decisions, defines the standards and policies that are needed to set up and manage the LCG Global Grid Services
– coordinates the planning of resources for physics and computing data challenges
• First meeting 4 October in Milano
• First task is the detailed definition of LCG-1, the initial LCG Global Grid Service
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 19
Grid Deployment - The Strategy
Get a basic grid service into production so that we know what works, what doesn’t, what the priorities are
And evolve from there to the full LHC service
• Agree on a common set of middleware to be used for the first LCG grid service – LCG-1
• target - full definition of LCG-1 by February 2003 - LCG-1 in operation mid-2003
- LCG-1 in full service by end of 2003
• this will be conservative – stability before functionalityand will not satisfy all of the HEPCAL requirements
• but must be sufficient for the data challenges scheduled in 2004
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 20
Centres taking part in the LCG-1
around the world around the clock
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 21
Centres taking part in LCG-1Centres that have declared resources – Dec. 2002
Tier 0 • CERNTier 1 Centres• Brookhaven National Lab • CNAF Bologna• Fermilab• FZK Karlsruhe • IN2P3 Lyon• Rutherford Appleton Lab
(UK)• University of Tokyo• CERN
Other Centres• Academica Sinica (Taipei)• Barcelona• Caltech• GSI Darmstadt• Italian Tier 2s(Torino, Milano,
Legnaro)• Manno (Switzerland)• Moscow State University• NIKHEF Amsterdam• Ohio Supercomputing Centre• Sweden (NorduGrid)• Tata Institute (India)• Triumf (Canada)• UCSD• UK Tier 2s• University of Florida– Gainesville • University of Prague• ……
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 22
LCG-1 as a service for LHC experiments
• Mid-2003– 5-10 of the larger regional centres– available as one of the services used for simulation
campaigns• 2H03
– add more capacity at operational regional centres– add more regional centres– activate operations centre, user support infrastructure
• Early 2004– principal service for physics data challenges
Grid Technology in LCGLCG expects to obtain Grid Technology, along with maintenance and
support, from projects funded by national and regional e-science initiatives -- and, later, from industry
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 23
Grid Technology in LCG
Coordination by the project CTO – David FosterThis area of the project is concerned with • ensuring that the LCG requirements are known to current and potential
Grid projects• active lobbying for suitable solutions – influencing plans and priorities• evaluating potential solutions• negotiating support for tools developed by Grid projects• developing a plan to supply solutions that do not emerge from other
sources
BUT this must be done with caution – important to avoid HEP-SPECIAL solutionsimportant to migrate to standards as they
emerge
(avoid emotional attachment to prototypes)
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 24
Grid Technology Status
• A base set of requirements has been defined (HEPCAL, HEP common application layer) :– 43 use cases– ~2/3 of which should be satisfied ~2003 by currently
funded projects• Good experience of working with Grid projects in Europe
and the United States• Practical results from testbeds used for physics simulation
campaigns• GLUE initiative – has shown how to integrate the EDG and
VDT toolkits• An initial agreement is being made on a joint toolkit for
LCG-1
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 25
Grid Technology Status
• We are still solving basic reliability & functionality problems– This is worrying as we still have a long way to go to get to a
solid service– At end 2002, a solid service in mid-2003 looks (surprisingly)
ambitious• HEP needs to limit divergence in developments.
– Complexity adds cost• We have not yet addressed system level issues
– How to manage and maintain the Grid as a system providing a high-quality reliable service.
– Few tools and treatment in current developments of problem determination, error recovery, fault tolerance etc.
• Some of the advanced functionality we will need is only being thought about now
– Comprehensive data management, SLA’s, reservation schemes, interactive use.
• Many many initiatives are underway and more are coming
How do we manage the complexity of all this ?
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 26
Establishing Priorities
• We need to create a basic infrastructure that works well.– LHC needs a systems architecture and high-
quality middleware – reliable and fault tolerant.– Tools for systems administration.– Focus on mainline physics requirements and
robust data handling.– Simple end-user tools that deal with the
complexity.• Need to look at the overall picture of what we are
trying to do and focus resources on key priority developments
We must simplify and make the simple things work well. It is easy to expand scope, much harder to contract it !
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 27
Grid Technology – Next Steps
• leverage the considerable investments being made– proposals being prepared for EU 6th Framework
Programme, NSF-DoE funding round, various national science infrastructure funding opportunities
• priority target:hardening/re-engineering of current prototypes
with correctly funded maintenance and support
• but - expect several major architectural changes before things mature
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 28
Target for the end of the decade
LHC data analysis using
“global collaborative environments integrating large-scale, globally distributed computational systems and complex data collections linking tens of thousands of computers and hundreds of terabytes of storage”
The researchers concentrating on science, unaware of the details and complexity of the environment they are exploiting
Success will be when the scientist does not mention the Grid
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 29
A few things to keep in mind
A global grid infrastructure needs a coordinated management structure
Middleware for a global infrastructure –– International development programme– World-wide support & maintenance– Regional and national sensitivities
Avoid HEP specials– Basic middleware for global science – not just for HEP– Plan for convergence with industrial solutions
Collaborative, complementary development projects– partnership of computer science, software engineering,
scientists– funding from multiple agencies – national, regional, ..
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 30
Grid Technology Summary• many R&D projects funded
– to develop and demonstrate middleware– limited duration – many already in mid-life
• excellent initial experience– shows the potential for science grids– has given a lot of insight– but – we are understanding that this is very hard to do
• consolidation of the results and coordination of future efforts
is now needed to build a solution for LHC
• a priority now is to –– harden/re-implement the current prototypes and pilot products– understand support issues– add the essential missing features for a production environment
– that were not part of the R&D projects
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 31
Fabric AreaArea Manager – Bernd Panzer • CERN Tier 0+1 centre
– high performance data recording– automated systems management & operation– integration in LHC Grid
• Tier 1,2 centre collaboration– develop/share experience on installing and operating a Grid– exchange information on planning and experience of large fabric
management– look for areas for collaboration and cooperation– use HEPiX as the communications forum
• Technology tracking & costing– new technology assessment (PASTA III) just completed (Feb 03)– re-costing of Phase II will be done 1H03 in light of
• PASTA III• re-assessment of experiment trigger rates, event sizes (LHCC)• but no significant re-assessment of the analysis model
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 32
Mass Storage Requirements
• Current mass storage requirements defined by ALICE for high performance data recording
350 MB/sec 2002 750 MB/sec 2005 1.2 GB/sec in
2008
• Attempt to define requirements for mass storage support for analysis stalled – – analysis model not clear enough– worrying for Tier 1 centres
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 33
Resources in Regional Centres
• Estimates of resources in Regional Centres being gathered by Grid Deployment Board
• Expect to be complete this month
• Then we will compare with Data Challenge requirements
• Delivery efficiency is a key factor – hard to estimate at present
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 34
Resources at CERN
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 35
LCG Phase 1 - Externally-Funded Personnel Profile at CERN
0
10
20
30
40
50
60
70
2001 2002 2003 2004 2005Years
FT
E *
Wei
gh
ted
by
exp
erie
nce
EU
USA
CERNMat
Sweden
Israel
Hungary
Portugal
Switzerland
Spain
France
Germany
Italy
UK
-60
-40
-20
0
20
40
60
80
2002 2003 2004 2005
RequestedCommittedCumulative Balance
FTE
Years
LCG
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 36
Engineering and Control Systems
Infrastructure (non -physics )
CC preparation
Prototype Tier 0 +1
Tier 0 +1 installation,
commissioning and operation
Short term staff for Phase 2
Staff for Tier 0/1 20 FTE
0
10
20
30
40
50
60
2002 2003 2004 2005 2006 2007 2008 2009 2010
Year
MCHF
External Income + MTPMedium Term Plan
Infrastructure (LHC experiments )
Production Computing
(LEP/Fixed Target )
Production Computing (LHC Experiments )
Computing Materials at CERNInfrastructure + Physics
LCG
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 37
Challenges - I
General background -• Complexity of the project – Regional Centres, Grid
projects, experiments, funding sources and funding motivation
• The project is operating in an environment where –– there is already a great deal of activity –
applications software, data challenges, grid testbeds
– requirements are changing as understanding and experience develop
• Fundamental technologies are evolving independently of the project and LHC
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 38
Challenges - II
Going well -• Obtaining agreement on common requirements between
the LHC experiments• Integrating all of the players in implementation teams
– CERN staff and visitors, experiments, other institutes• Resources in Regional Centres
– but we need to understand delivery efficiency
Going reasonably well -• Influence on external projects to which LCG supplies
resources – GEANT4, ROOT• Influence on grid projects and evolution
MàJ : 9/02/03 07:24
LCG
Denis Linglin - 39
Challenges - III
Still in question - • Production quality service on a Grid - harder than it looks
– Proceed with caution - realistic targets– Urgent to establish how well middleware works, get
suppliers focused on support, stability• Grids imply operation and management by the community –
evolution from empires to a federation
• We are a long way from demonstrating that we can do effective ESD analysis on a Grid