-
ALICE: Offline Planning and personnel resources LHCC Manpower Review of ComputingSeptember 3, 2003
ALICE : planning & resources
-
Questions to be answeredProfile of available and required manpower at CERN / Regional Centers / Institutes
Other resources existing and potential
Computing elements which will not be provided in case the required manpower and resources are not available
Measures of progress in producing necessary software Management tools to track the progress Verification of the quality of the LCG software
ALICE : planning & resources
-
ForewordLack of personnel in LHC computing (experiment & common HW/SW infrastructure) has been emphasized by LHC Computing Review (2001) and judged extremely worrying
CERN and the Collaborations together must do all that they can to provide the HR that are needed for Core Software development
The shortage has been alleviated for the LCG project by influx of computing professionals funded by member countries
No such mechanism exists yet for experiments where the personnel shortage remains a problem
ALICE has re-profiled the planning
The data to be shown represent a bare minimum below which the readiness for data processing cannot be guaranteed.
ALICE : planning & resources
-
Menu : Planning & ResourcesALICE Offline organization & managementStrategy for the Offline project, DC & milestonesPersonnel ressources : available and requestsAnswer to questions & conclusions
ALICE : planning & resources
-
Organizatio Offline project mandate : Prepare software and computing infrastructure for experiments data processing (+DAQ, +HLT projects); Provide and maintain a complete infrastructure for simulation, reconstruction and analysis already during construction phase; Offline personnel for software developments: Core Offline project : minority, full time, located at CERN; Detector projects : most of the personnel, part time (preparation of apparatus), located in collaboration institutes; LCG provides common hardware and software infrastructure for LHC computing. nStrict coordination required to make the best usage of the personnel available.
ALICE : planning & resources
-
Organization Management structureProject Leader & DeputyResourcesCoordinationPlanningCoordinationProductionEnvironment CoordinationFramework &Infrastructure CoordinationSimulationCoordinationReconstruction &Physics CoordinationCore Offline OfflineBoardUS GridCoordinationEU GridCoordinationSoftwareprojectsDetectorprojectsDAQHLTInt.Comp.BoardRegional TiersLCG SC2 GDB POB
ALICE : planning & resources
-
Core Offline Work PackagesFramework and infrastructure coordinationSimulation coordinationReconstruction and physics coordinationProduction environment coordination
ALICE : planning & resources
-
Organization Light weighted, single structureEfficient use of available personnel High adaptability to rapid changing technologyMerge framework developer (services providers) & physics algorithms developer (consumers)Maximize communicationEconomy of personnel (polymorphism of software experts)Rapid feedback to users requirements Management structure
ALICE : planning & resources
-
Planning StrategyDynamic management of the work scheduleDevelop a long term software infrastructureMaintain the infrastructure in working state during detector constructionConstraintsDepend on the planning of external projects (LCG, EDG, EGEE)Most developers refer to detector projectsTake advantage of latest developments in fast evolving technologyNo personnel available for in depth planning activity Majority of personnel in Core offline project is temporary and with unpredictable skillsLight weighted and opportunistic strategy with flexible data challenges as high level milestones
ALICE : planning & resources
-
Core team @ CERNA choice, not a necessity Need for a strong and centralized team of expertsTo facilitate coordination in all detector projects and all regional centersCERN, more than other ALICE groups, has the critical mass of people with the right skillsBenefit from co-habitation with ALICE managementAnd with LCG management Benefit from the attraction CERN exercises on young people with the right profile
ALICE : planning & resources
-
Development strategyMinimize the effective amount of developmentChose mature and well tested productsROOT : Common HEP solution for: Data persistency at the file level, interface to various libraries, visualization, graphical user interface, virtual Monte-Carlo, geometrical modelerAliEn : The ALICE distributed computing environment all made with Open Source components based on Open Standards; 2 FTE for development, 0.5 for operation, in production since 2002Reduce staff and rely on temporary personnelHowever there is a threshold for staffDelegate well identified and modular packages to teams outside Core groupDetector data baseEDG/EGEE test bed
ALICE : planning & resources
-
Data ChallengesStress-test the ALICE data model, DAQ hardware and software infrastructure with prototypes of increasing complexity until 2007 objectives are reached.Computing DC: record HI data at 1.2 Gbytes/s and export quasi online processing outside CERNPhysics DC: provide the infrastructure for organized Monte-Carlo production and world-wide random data-analysis
ALICE : planning & resources
-
Computing Data ChallengeALICE & IT : Assess the MS requirements and evaluate available products (1998);Evaluate functions of DAQ, Offline, HLT projects ; Large-scale high-throughput distributed DC (4) to : Prototype the DAQ, Offline, HLT computing systems Verify their integration Assess technologies and computing models Test hardware and software components in realistic environment Achieve an early integration of the overall computing infrastructure
ALICE : planning & resources
-
ilestonesM
ALICE : planning & resources
-
Physics Data ChallengeObjectives : Prototype and test scalability of the components needed to simulate, reconstruct, and analyze data on distributed computing resourcesThree interlinked components : ROOTAliRootAliEn
ALICE : planning & resources
-
Milestones* Fraction of events simulated in one year of standard data talking
ALICE : planning & resources
-
PDC-III Resources estimate Simulation105 Pb-Pb + 107 p-pDistributed production, (partial) data replication at CERNReconstruction and analysisData source is CERN : 5106 Pb-Pb + 107 p-p Reconstruction at CERN and outside depending on resource availabilityResources (CPU and Storage)2004 Q1: 1354 KSI2K and 165 TB2004 Q2: 1400 KSI2K and 301 TBBandwidthSimulation in 2004 Q1~90 TB will be shipped to CERN in about 2 months ~10 days using 10% of the CERN bandwidth.
ALICE : planning & resources
-
PDC-III resources profile
ALICE : planning & resources
-
PDC-III resourcesDetails in the ALICE Data Challenges paper taking into accountResults of previous PDCEstimation of simulations in a standard year (2009)Storage: 200TB must be kept beyond the PDC end!!The numbers indicating the LCG resources for ALICE assume simultaneous use of the resources by all the experiments!A dynamic resource allocation would easily solve the deficitUSA quota to be confirmed
ALICE : planning & resources
Sheet1
O3Q1O3Q2O3Q3O3Q4O4Q1O4Q2O4Q3
CPU Requirements kSI2k13541400
LCG Declared Capacity for ALICE941941
Storage Requirements - total TB active data165301
LCG Declared Capacity for ALICEDisk192192
109Tapes578578
109Total770770
Sheet2
Sheet3
-
Tracking progressMilestones set by the needs to prepare the Physics Performance ReportFull and fast simulationDetector reconstructionGlobal reconstructionProgress monitored by Physics DCCentral coordination at CERN (architect, librarian, multi-platform compatibility)Offline board takes the decision on framework evolution and review progressDevelopers implement during Offline week Code reviewed by experts
ALICE : planning & resources
-
Verification of LCG software quality Grid technology area
ALICE : planning & resources
-
Verification of LCG software quality Grid deployment area
ALICE : planning & resources
-
Verification of LCG software quality Fabric area
ALICE : planning & resources
-
ALICE Offline PlanningToday
ALICE : planning & resources
-
Personnel Profile (task oriented)4 permanent staff persons
Profile is build up with the assumptions that temporary personnel is NOT replaced*
Evolution reported since 1998
* Unrealistic scenario to emphasize fragility of the structure
ALICE : planning & resources
-
Personnel Profile (task oriented) - 1/5
Activity 98 99 00 01 02 03 040506070809Off-line CoordinationAvail.0.81.01.01.01.01.71.51.01.01.01.01.0Needed1.01.01.01.02.02.02.02.02.02.02.02.0Missing0.30.00.00.01.00.30.51.01.01.01.01.0DB and distributed computing infrastructureAvail.0.62.21.61.51.82.02.02.42.00.80.00.0Needed2.02.02.02.02.02.02.02.02.02.02.02.0Missing1.50.20.40.50.30.00.00.00.01.22.02.0Framework DevelopmentAvail.0.40.40.30.81.82.31.91.31.30.80.30.3Needed1.01.01.51.51.52.02.02.02.02.02.02.0Missing0.60.61.20.70.30.30.10.70.71.21.71.7Simulation frameworkAvail.1.92.02.83.03.33.02.82.01.51.01.01.0Needed3.03.03.03.03.03.03.02.02.01.51.01.0Missing1.11.00.30.00.30.00.30.00.50.50.00.0
ALICE : planning & resources
-
Personnel Profile (task oriented) - 2/5
ALICE : planning & resources
-
Personnel Profile (task oriented) - 3/5
Activity 98 99 00 01 02 03 040506070809RadiationStudiesAvail.0.50.30.81.01.01.00.00.00.00.00.00.0Needed0.50.51.01.01.01.01.01.00.50.50.50.5Missing0.00.20.20.00.00.01.01.00.50.50.50.5SystemsupportAvail.1.01.81.51.01.01.01.01.01.01.01.01.0Needed1.01.01.51.01.01.01.01.01.01.01.01.0Missing0.00.80.00.00.00.00.00.00.00.00.00.0Analysissupport Avail.0.00.00.31.01.21.40.80.00.00.00.00.0Needed0.00.00.51.01.01.01.01.01.01.01.01.0Missing0.00.00.20.00.20.40.21.01.01.01.01.0
ALICE : planning & resources
-
Personnel Profile (task oriented) - 4/5 Summary Core Offline team
9899 00 01 02 03 040506070809Avail.6.89.811.813.716.118.414.910.08.05.64.34.3Needed11.511.515.716.517.518.018.517.517.016.516.016.0Missing4.81.73.92.81.40.43.77.59.010.911.711.7
ALICE : planning & resources
-
Personnel Profile (task oriented) - 5/5Long build-up timeMust sustain plateau after 2003
ALICE : planning & resources
-
Personnel Profile (post oriented)4 permanent CERN staff
Temporary CERN personnel (no replacement assumed*)Staff LDTechnical and Physics studentsCERN Fellows
Temporary CERN Project Associates (direct contribution from collaboration institutes + ALICE CERN exploitation budget ; no replacement assumed* )
* Unrealistic scenario to emphasize fragility of the structure
ALICE : planning & resources
-
Personnel Profile (post oriented) - 1/5Mostly temporary personnelSubstantial contribution from collaboration institutesROOT effect in 1999, AliEn effect in 2003
ALICE : planning & resources
-
Personnel Profile (post oriented) - 2/5Only 25% permanent personnelMore than 60% are short/medium term personnel
ALICE : planning & resources
-
Out-sourced projects - 1/3Detector DB by Physics Department and Computer Science Department @ Warsaw University : a single DB (economy of personnel) common to all detectors in the experiment
ALICE : planning & resources
-
Out-sourced projects - 2/3EDG testbed validation and participation in various GRID projects by ALICE/Italy, ALICE/US, and the EDG/DataTAG project; to be continued with EGEE
ALICE : planning & resources
-
Out-sourced projects - 3/3AliEN: basis of the ALICE distributed computing infrastructure : Coordination and main development by Core Offline group but several specific sub-tasks delegated to individuals at remote places
ALICE : planning & resources
-
Ressources summaryDistribution of personnel for common offline activitiesAbout 40% of the work is distributed outside CERN
ALICE : planning & resources
-
HLT SoftwareOnly personnel working on algoritms and simulation in collaboration with Offline projectPart of missing personnel should come from PhD students
ALICE : planning & resources
-
LCG projects in application areaALICE has already made most of choices for critical issues (persistency, data DB, tracking, geometry descriptor, distributed computing, etc)Does not need to rely on common LCG applications To come : AliEn coupled with PROOF as generic architecture for LCG interactive analysis However ALICE contributes to common developments :GANIS ????
ALICE : planning & resources
-
Other ressourcesUE project : one person to work full time on EDG for ALICEIndustry : Do not remember who???? : Code checkerEricson : AliEn what exactly ????Nasa : one person full time on the Virtual Monte-Carlo ?????
ALICE : planning & resources
-
Offline in detector projects - 1/3AliRoot: An object Oriented framework which directly uses ROOT and provides:
Many event generatorsTracking using Virtual Monte-CarloIO infrastructureSteering functionalitiesGlobal reconstruction
Detector (13) tracking and reconstructionAnalysis
ALICE : planning & resources
-
Offline in detector projects - 2/3No full-time dedicated developersSchedule defined by global milestones (DC)Planning is task oriented rather than personnel oriented
ALICE : planning & resources
-
Offline in detector projects - 3/3 SummaryTotal39.737.335.835.8Needed8.613.314.414.4
ALICE : planning & resources
-
Personnel resources in Offline project About 16% of the personnel at CERN, the remainder in collaboration institutes, no experiment dedicated personnel at regional centers.
ALICE : planning & resources
-
Personnel resources in Offline project OUTSIDE INSTITUTES (84%)CERN (16 %)COLORS !
ALICE : planning & resources
Grafico2
38
51
4
10
30
Foglio1
Analysis38
Subtdetecor projects51
HLT4
Core offline NOT CERN10
Core offline CERN30
Foglio1
Foglio2
Foglio3
-
How to mitigate the lack of PersonnelThe ALICE off line project is committed to provide the collaboration with the adequate software to take and analyze data starting 2007.The project has already adapted its strategy to the lack of personnel and aims toward a bare minimum which enables to fulfill its tasks.The Core team cannot afford to lack more personnel without putting in danger the success of its goals.The severe lack of personnel in the detector projects will translate in lack of readiness in terms of accuracy in the algorithms and in lack of availability of categories of algorithms. Such a deplorable situation will have a negative impact on the quality of physics results.
ALICE : planning & resources
-
ALICE priorities - 1/4Core Offline group at CERN : Less than 1/4 of personnel in Core Offline group at CERN are permanentMore than 50% are temporary personnel Dependence on availability of short term CERN positionsUncertainty on renewalsLoss of knowledge -- difficulty of knowledge transferDifficulty to cover key positions with people with the appropriate profileCompetition within ALICE in a fixed quota situation
ALICE : planning & resources
-
ALICE priorities - 2/4Core Offline group at CERN : Have at least 1/3 of long-term personnel, limit use of fellows and students to 1/2, without changing the target number of FTEsEnsure the covering of key areas by converting two area coordinators (Production Environment, Framework & Infrastructure) now on temporary positions into CERN permanent staffAlleviate the volatility of Core Offline Team with at least two long term (6 years, LD-like) positions at CERN to replace short term ones (Detaching LCG personnel to ALICE would be a natural solution)Which profile/task????
ALICE : planning & resources
-
ALICE priorities - 3/4Core Offline group at CERN :
ALICE : planning & resources
-
ALICE priorities - 4/4Detector Offline at collaboration institutes : About 10 FTEs missing in the subdetector projects for software developmentsThis is a responsibility of the Institutes in charge of the subdetector projectsWe are working hard to find these peopleAdditional resources from funding agencies will have to be discussed case-by-case
ALICE : planning & resources
-
Answer to questions - 1/4Profile of available and required manpower at CERN / Regional Centers / InstitutesCore offline group : 2 CERN staff + 2 long-term personnel would create satisfactory working consitions
We have reached a equilibrium which enables to fulfill all the assigned tasks, however the equilibrium is fragile.
ALICE : planning & resources
-
Answer to questions - 2/4Profile of available and required manpower at CERN / Regional Centers / InstitutesDetector groups : Most of the groups are understaffed ; personnel (about 10 FTE) dedicated to detector projects is systemically needed in the institutes Solution to found in the collaboration with the help on case-by-case basis of funding agencies
ALICE : planning & resources
-
Answer to questions 3/4Other resources existing and potential A few occasional collaborations with industries Computing elements which will not be provided in case the required manpower and resources are not availableLack of readiness of algorithms or accuracy in algorithms Serious difficulties to interface ALICE software to LCG middlewareQuasi impossibility to adopt new LCG common software
ALICE : planning & resources
-
Answer to questions - 4/4Measures of progress in producing necessary software :Because of the scare personnel available for the offline project a light weighted and dynamic organization has been adoptedThis organization has been so far been successful in producing a framework, detector software and a grid environment routinely used by the collaboration for detector design and physics validation.LCG software will be considered as soon as stable versions outperforming the software presently in use will become available.Milestones to test LCG middleware and fabric will be closely watched.
ALICE : planning & resources
-
Conclusions - 1/2The core team of the offline project has adapted to the reduced personnel available and established its tasks and objectives accordingly.
The edifice is fragile : any additional cut in (temporary) personnel might hinder the availability in due time of the software needed for data taking and analysis.
Securing two staff positions is instrumental for the project success.
ALICE : planning & resources
-
Conclusions - 2/2Adding 2-3 long term personnel to the core team would alleviate the unstable situation by making it less dependent on temporary personnel
The lack of personnel fully dedicated to software development in the detector projects is worrisome as the lack of indispensable algorithms might dramatically delay first physics results from LHC.
The needed personnel (not necessarely computer specialists) must be recruited by the institutes of the collaboration.
ALICE : planning & resources
-
ALICE : planning & resources
-
DAQ-HLT Data Flow
HLTDDL SIUDDL SIUDetector RODetector LDCD-RORCDDL DIUDDL SIU FEPHLT algorithmH-RORCDDL DIUHLT LDCD-RORCDDL DIUEvent Building Network (raw, HLT data, decisions)Storage NetworkGDCGDCGDCGDC~ 400 DDL~ 300 DDLTPC, TRD, MUON, ITS10 DDLPre(co)-processingMODE AMODE B&CTrigger
ALICE : planning & resources
-
Tasks of offlineFramework and infrastructure coordination Framework development (simulation, reconstruction, analysis)Persistency technologyComputing/Physics data challenges with DAQ/HLTIndustrial joint projectsTechnology trackingLibrarian, CVS maintenance, test proceduresQA toolsSupport and documentation
Core
ALICE : planning & resources
-
Tasks of offlineSimulation coordination Detector simulationPhysics simulationPhysics validationG4 integrationFluka integrationRadiation studiesGeometrical modeler
Core
ALICE : planning & resources
-
Tasks of offlineReconstruction and physics coordination TrackingDetector reconstructionGlobal reconstructionAnalysis toolsAnalysis algorithmsPhysics data challengesCalibration and alignment algorithms
Core
ALICE : planning & resources
-
Tasks of offlineProduction environment coordination Production environment for simulation, reconstruction and analysisDistributed computing environmentData bases organization
Core
ALICE : planning & resources
-
Tasks of offlineWorld computing coordination Planning and resources coordination for LCG1&2Relations with national/international Grid projects
Core
ALICE : planning & resources
-
Computing needs for PDC IIIFlexibility of distributed computing modelAlternative scenarios
ALICE : planning & resources
-
LGC resources pledged for ALICE in 2003USA quota to be confirmed
ALICE : planning & resources
- Objectives of PDC 3 The estimation of the number of events is essentially defined by the jet study105 events for jets with pt ~10-20 GeV/c 104 -- 105 events for studies on particle correlations and simple and double strangeness (,)106 events for high pt jets ( ~105 underlying events) , charmonium and bottonium into e+e- similar statistics, the same underlying events can be reusedCentrality 50% central events (b
-
Relation with HLTIn ALICE Offline, HLT and DAQ are three distinct projectsCooperation between HLT and Offline is goodHLT is using the common AliRoot framework to do simulationSome of the HLT algorithms have been integrated in the Offline framework for testingAs the Offline, HLT coordinates the activities of the different subdetector projectsHLT main trust is the definition of the HLT architecture (HW and SW) and some seminal work on algorithmsMore work on algorithms is done in collaboration with the subdetector projectsIntegration and testing of the three projects is also performed during DCs> NOT included are manpower for the online infrastucture, i.e. cluster> management, process communication infrastructure, monitoring,> FPGA coprocessor interface.
ALICE : planning & resources
-
ALICE Data challengesROOTDAQROOT I/OCASTORSimulated DataCERNTIER 0TIER 1Raw DataRegionalTIER 1TIER 2GRID
ALICE : planning & resources
-
ADC IV Hardware Setup22233333333Total: 192 CPU servers (96 on Gbe, 96 on Fe), 36 DISK servers, 10 TAPE servers210 TAPE servers(distributed)Backbone(4 Gbps)6TOTAL: 32 portsTOTAL: 18 portsCPU servers on FETBED0001-1213-2425-3637-4849-6061-7273-7677-88TBED0007D01D-12D13D-24D25D-36DLXSHARE89-1124 Gigabit switches3 Gigabit switches4 Gigabit switches2 Fastethernet switchesFibers
ALICE : planning & resources
-
ALICE DC BW
ALICE : planning & resources
-
ADC IV performances
Event building with flat data traffic No recording 5 days non-stop 1800 MBytes/s sustainedEvent building and data recording With ALICE-like data traffic Recording to CASTOR 4.5 days non-stop to disk: ~ 140 TBytes 350 MBytes/s sustained
ALICE : planning & resources