1 alice grid status david evans the university of birmingham gridpp 16 th collaboration meeting qmul...
Post on 27-Mar-2015
215 Views
Preview:
TRANSCRIPT
1
ALICE Grid StatusALICE Grid Status
David EvansDavid Evans
The University of BirminghamThe University of Birmingham
GridPP 16th Collaboration MeetingQMUL 27-29 June 2006
2
Outline of TalkOutline of Talk
The ALICE ExperimentThe ALICE Experiment ALICE computing requirementsALICE computing requirements ALICE Grid – AliEnALICE Grid – AliEn Analysis using AliEnAnalysis using AliEn Status of ALICE Data Challenge 2006Status of ALICE Data Challenge 2006 Summary and OutlookSummary and Outlook
3
The ALICE ExperimentThe ALICE Experiment
ALICE is one of the four main LHC ALICE is one of the four main LHC experiments at CERN.experiments at CERN.
Only one dedicated to heavy-ion physics.Only one dedicated to heavy-ion physics.– Study of QCD under extreme conditionsStudy of QCD under extreme conditions
~ 1000 collaborators~ 1000 collaborators ~ 100 institutions~ 100 institutions Birmingham is only Birmingham is only
UK institute involvedUK institute involved
4
ALICE RequirementsALICE Requirements
Data taking (each year)Data taking (each year)– 1 month of Pb-Pb data ~ 1 PByte1 month of Pb-Pb data ~ 1 PByte– Also p-p for rest of the year ~ 1 PByteAlso p-p for rest of the year ~ 1 PByte
Large scale simulation effortLarge scale simulation effort – 1 Pb-Pb event: ~ 8 hrs (3 GHz)1 Pb-Pb event: ~ 8 hrs (3 GHz)
Data ReconstructionData Reconstruction Data analysisData analysis Smaller Collaboration than Smaller Collaboration than
ATLAS or CMS but similar ATLAS or CMS but similar computing requirements.computing requirements.
5
Profile of CPU Profile of CPU requirementsrequirements
Total
CERN T0
CERN T1
Ext Tier 1
Ext Tier 2
35 MSK2K
Jan 07 Sept 08 Nov 09
6
Tier HierarchyTier Hierarchy
MONARC ModelMONARC Model
‘‘Cloud Model’Cloud Model’ (Tier free) used (Tier free) used in ALICE data challenges (native AliEn sites – for LCG site in ALICE data challenges (native AliEn sites – for LCG site
we comply with Tier model)we comply with Tier model)
Tier 0RAW data master copyData reconstruction (1st pass)Prompt analysis
Tier 1Copy of RAWreconstructionScheduled analysis
Tier 2MC productionPartial copy of ESDData analysis
7
ALICE Gridd - AliEnALICE Gridd - AliEn
AliEn (ALICE Environment) – Grid framework AliEn (ALICE Environment) – Grid framework developed by ALICE – developed by ALICE – used in production for ~5 used in production for ~5 yearsyears..
Based on WEB services and standard protocols.Based on WEB services and standard protocols. Built around open source codeBuilt around open source code
– Less than 5% is native AliEn code (mainly Less than 5% is native AliEn code (mainly PERLPERL).).
To date, To date, > 500,000> 500,000 ALICE jobs have been run ALICE jobs have been run under AliEn control worldwide.under AliEn control worldwide.
8
AliEn ‘Pull’ ProtocolAliEn ‘Pull’ Protocol
One of the major differences between ALiEn and LCG One of the major differences between ALiEn and LCG grids is that AliEn uses the ‘grids is that AliEn uses the ‘pullpull’ rather than ‘’ rather than ‘pushpush’ protcol.’ protcol.
EDG/Globus model:EDG/Globus model:
ALiEn model:ALiEn model:
user server
ResourceBroker
user server
ResourceBroker
job
list
9
LCG / gLiteLCG / gLite
ALICE is committed to using as much common ALICE is committed to using as much common grid applications as possible.grid applications as possible.
ChangesChanges have been made to make AliEn work have been made to make AliEn work with LCGwith LCG– E.g. changes to File Catalogue (FC) E.g. changes to File Catalogue (FC) LFC (Local File LFC (Local File
Catalogue or LCG File Catalogue) Catalogue or LCG File Catalogue)
– V0 Box at each Tier 1 and Tier 2 V0 Box at each Tier 1 and Tier 2
– Globus/GSI compatible authenticationGlobus/GSI compatible authentication
Interface Interface AliEn AliEn gLite gLite in development in development
10
AnalysisAnalysis
Core of ALICE computing model is Core of ALICE computing model is AliRootAliRoot– Uses ROOT frameworkUses ROOT framework
Couple AliEn with ROOT for Grid-based analysis.Couple AliEn with ROOT for Grid-based analysis.– Use Use PROOFPROOF – Parallel ROOT Facility – Parallel ROOT Facility
– To the user it’s like using ROOTTo the user it’s like using ROOT
4-tier architecture: 4-tier architecture: – ROOT client session, API server (ROOT client session, API server (AliEn + PROOFAliEn + PROOF), ),
Site PROOF master servers, PROOF slave servers. Site PROOF master servers, PROOF slave servers.
Data from DC2006 only accessible via GridData from DC2006 only accessible via Grid
11
PROOFPROOF
Each node has PROOF slave
Each site has PROOF master server
Uses ‘pull’ protocol i.e. the slaves ask the master for work packets.Slower slaves get smaller work packets etc.
ClientAPI
APIServer
AliEnFC….
List of sites with
data
12
ALICE Data Challenge ALICE Data Challenge 2006 (PDC’06) 2006 (PDC’06)
Last ‘challenge’ before the start of data Last ‘challenge’ before the start of data takingtaking
Test of all Grid components Test of all Grid components – AliEn as a ALICE interface to the Grid and AliEn as a ALICE interface to the Grid and
much, much moremuch, much more– LCG/gLite baseline services (WMS, DMS)LCG/gLite baseline services (WMS, DMS)
Test of computing centres infrastructureTest of computing centres infrastructure Major test of stability of all of the aboveMajor test of stability of all of the above
13
Grid software deployment Grid software deployment and runningand running
LCG sites are operated through the VO-box frameworkLCG sites are operated through the VO-box framework – All ALICE sites need oneAll ALICE sites need one– Relatively extended deployment cycle, a lot of configuration and Relatively extended deployment cycle, a lot of configuration and
version update issues had to be solvedversion update issues had to be solved– Situation is quite routine nowSituation is quite routine now
Data managementData management– This year – xrootd as disk pool manager on all sitesThis year – xrootd as disk pool manager on all sites– The installation/configuration procedures have just been releasedThe installation/configuration procedures have just been released– xrootd integrated in other storage management solutions (CASTOR, xrootd integrated in other storage management solutions (CASTOR,
DPM, dCache) – under development DPM, dCache) – under development Data replication (FTS)Data replication (FTS)
– We use it for scheduled replication of data between the computing We use it for scheduled replication of data between the computing centres (RAW from T0->T1, MC production T2->T1, etc…)centres (RAW from T0->T1, MC production T2->T1, etc…)
– Fully incorporated in the AliEn FTD, to be extensively tested in JulyFully incorporated in the AliEn FTD, to be extensively tested in July
14
VO box support and VO box support and operationoperation
In additional to the standard LCG components, the In additional to the standard LCG components, the VO-box runs ALICE-specific software componentsVO-box runs ALICE-specific software components– V0-boxes now at RAL Tier 1 and Birmingham Tier 2V0-boxes now at RAL Tier 1 and Birmingham Tier 2– Birmingham ALICE students are testing ALiEn for analysis Birmingham ALICE students are testing ALiEn for analysis
purposes through Birmingham Tier 2.purposes through Birmingham Tier 2. The installation and maintenance of these is entirely The installation and maintenance of these is entirely
our responsibility:our responsibility:– Support for UK V0-box supplied by CERN (no UK Support for UK V0-box supplied by CERN (no UK
manpower available)manpower available) Site related problems are handled by the site adminsSite related problems are handled by the site admins LCG services problems are reported to GGUSLCG services problems are reported to GGUS
15
Operation statusOperation status
Running in a continuous mode since 24/05 Running in a continuous mode since 24/05 VO-boxes: VO-boxes:
– monthly releases of AliEn (curently v.2-10) , LCG 2.7.0 and monthly releases of AliEn (curently v.2-10) , LCG 2.7.0 and soon tests of gLite 3.0soon tests of gLite 3.0
Central ALICE services:Central ALICE services:– AliEn machinery and API Service is developed/deployed and AliEn machinery and API Service is developed/deployed and
maintained by the AliEn team maintained by the AliEn team Site services:Site services:
– Stability testing of both AliEn and LCG componentsStability testing of both AliEn and LCG components– The interfaces AliEn-LCG/gLite are still in developmentThe interfaces AliEn-LCG/gLite are still in development– A gLite V0-box has already been provided at CERN and first A gLite V0-box has already been provided at CERN and first
tests performed.tests performed.
16
Running status – Running status – one monthone month
17
Sites contributions in the Sites contributions in the past 2 monthspast 2 months
60%T1, 40%T2 (almost half from 2 T2 sites!)60%T1, 40%T2 (almost half from 2 T2 sites!)
RAL: 0.7%
18
Running status – site Running status – site averagesaverages
Pledged resources – Pledged resources – 4000 CPUs4000 CPUs
Our average is on a 12% Our average is on a 12% levellevel– Due to central and site Due to central and site
services malfunctionsservices malfunctions– Mostly due to sites Mostly due to sites
providing less CPUs providing less CPUs than pledgedthan pledged
19
Stability Stability improvementsimprovements
This is a data challenge, so there is always This is a data challenge, so there is always place for improvement:place for improvement:– AliEn is undergoing gradual fixes and new AliEn is undergoing gradual fixes and new
features are added features are added – The LCG software will undergo a quantum leap The LCG software will undergo a quantum leap
– move from LCG to gLite– move from LCG to gLite– Site infrastructure – VO-box, etc… also needs Site infrastructure – VO-box, etc… also needs
solidification, especially at the T2ssolidification, especially at the T2s– Monitoring and control – continuously adding Monitoring and control – continuously adding
new features new features
20
OutlookOutlook
PDC’06 has started as plannedPDC’06 has started as planned– This is the last exercise before the beam!This is the last exercise before the beam!– It is a test of all Grid tools/services we will use It is a test of all Grid tools/services we will use
in 2007in 2007» If not in PDC’06, good chance is that they will not If not in PDC’06, good chance is that they will not
be readybe ready
– It is also a large-scale test the computing It is also a large-scale test the computing infrastructure – computing, storage and infrastructure – computing, storage and network performancenetwork performance
21
Outlook (2)Outlook (2) We have all pieces needed to run production on the We have all pieces needed to run production on the
GridGrid (some untested). (some untested). The exercise started 2 months ago and will continue The exercise started 2 months ago and will continue
until the end of the yearuntil the end of the year At the moment, we are optimising the use of At the moment, we are optimising the use of
resources – attempting to get from the sites the resources – attempting to get from the sites the promised resourcespromised resources
Next phase of the plan is a test of the file transfer Next phase of the plan is a test of the file transfer utilities of LCG (FTS) and integration with AliEn utilities of LCG (FTS) and integration with AliEn FTDFTD
In parallel to that we will run event production as In parallel to that we will run event production as usualusual
22
SummarySummary
AliEn is a Grid framework developed by ALICE AliEn is a Grid framework developed by ALICE using 95% open source code (e.g using 95% open source code (e.g SOAPSOAP) and 5 % ) and 5 % AliEn specific (AliEn specific (perlperl) code.) code.
AliEn evolving to take into account EGEE/gLite AliEn evolving to take into account EGEE/gLite framework and to work with LCG.framework and to work with LCG.– New user interfaces developed New user interfaces developed – PROOF for analysis developedPROOF for analysis developed– Better authentication/authorisation developedBetter authentication/authorisation developed
Data Challenge 2006Data Challenge 2006 – since April – – since April – going wellgoing well V0 boxes at RAL T1 and B’ham T2V0 boxes at RAL T1 and B’ham T2 Lack of computing resources a worry.Lack of computing resources a worry.
top related