atlas computing in russia
DESCRIPTION
ATLAS computing in Russia. A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08. ATLAS RuTier-2 tasks. - PowerPoint PPT PresentationTRANSCRIPT
10/03/2008 A.Minaenko 1
ATLAS computing in Russia
A.MinaenkoInstitute for High Energy Physics, Protvino
JWGC meeting 10/03/08
10/03/2008 A.Minaenko 2
ATLAS RuTier-2 tasks
• Russian Tier-2 (RuTier-2)computing facility is planned to supply with computing resources all 4 LHC experiments including ATLAS. It is a distributed computing center including at the moment computing farms of 6 institutions: ITEP, KI, SINP (all Moscow), IHEP (Protvino), JINR (Dubna), PNPI (St.Petersburg)• The main RuTier-2 task is providing facilities for physics analysis using AOD, DPD and user derived data formats as ROOT trees. • Full current AOD and 30% of previous AOD version should be available• Development of reconstruction algorithms should be possible which require some subsets of ESD and Raw data• All the data used for analysis should be stored on disk servers (SE) and some unique data (user, group DPD) to be saved on tapes also as well as previous AOD/DPD version• The second important task is production and storage of MC simulated data • The planned RuTier-2 resources should supply the fulfilment of these goals
10/03/2008 A.Minaenko 3
ATLAS RuTier-2 resource evolution2007 2008 2009 2010 2011 2012
CPU (kSI2k)
320 780 1500 2800 3800 4800
Disc (TB)
150 280 610 1400 2200 3000
Tape (TB)
70 160 370 580 780
• The table above was included in the table of Russia pledge to LCG and it illustrates our current understanding of the resources needed. It can be corrected in future when we’ll understand our needs better• Not taken into account: AOD increase due to inclusive streaming, change of rate MC events (30% instead of 20%), possible increase of AOD event size (taken 100 KB), increase of total DPD size (taken 0.5 of AOD)
10/03/2008 A.Minaenko 4
Current RuTier-2 resources for all experiments
CPU slots CPU, kSI2k Disc, TB
IHEP 172 260 45
ITEP 146 250 52
JINR 240(160) 670(430) 83
KI 400 1000 30(250)
PNPI 188 280 52
SINP 176 280 9(50)
Total 1322(160) 2740(430) 271(300)
• Red – will be available in 1-2 month• ATLAS request for 2008 = 780 kSI2k, 280 TB
10/03/2008 A.Minaenko 5
Normalized CPU time (hour*kSI2k)
10/03/2008 A.Minaenko 6
RuTier-2 for ATLAS in 2007
ATLAS – 21% ATLAS – 846 kh*kSI2k
10/03/2008 A.Minaenko 7
Site contributions in ATALAS in 2007
10/03/2008 A.Minaenko 8
ATLAS RuTier-2 in the SARA cloud
• The sites of RuTier-2 are associated with ATLAS Tier-1 SARA
• Now 5 sites IHEP, ITEP, JINR, SINP, PNPI are included in TiersOfAtlas list and FTS channels are tuned for the sites
• 4 sites (IHEP, ITEP, JINR, PNPI) successfully participated in 2007 in data transfer functional tests (next slide). This is a coherent data transfer test Tier-0 →Tiers-1→Tiers-2 for all clouds, using existing SW to generate and replicate data and to monitor data flow.
• Other 2007 ATLAS activity is replication of produced MC AOD from Tiers-1 to Tiers-2 according to ATLAS computing model. It is done using FTS and subscription mechanism. RuTier-2 sites (except ITEP) did not participate in the activity because of the severe lack of a free disk space
• 4 sites (IHEP(15%), ITEP(20%), JINR(100%), PNPI(20%)) participated in replication of M4 data. Here percentage of requested for replication data is shown. Only JINR obtained all the data, the other sites were limited by the size of free disk space
• During one week M4 exercises (Aug-Sep07) about two millions of real muon events were detected, written down on disks and tapes and reconstructed in ATLAS Tier-0. Then the reconstructed data (ESD) in quasi-real time were exported to Tiers-1 and their associated Tiers-2. All the chain was working as it should be during real LHC data taking. This was the first successful experience of this sort for ATLAS
• Two slides (10, 11) illustrate the M4 exercises and the 2nd one shows results for the SARA cloud: practically all subscribed data were successfully transmitted
10/03/2008 A.Minaenko 9
Activities. Functional Tests
Tier1 Tier2
ASGC AU-ATLAS,TW-FTT
BNL AGLT2,BU,MWT2,OU,SLAC,UTA, WISC
CNAF LNF,MILANO,NAPOLI,ROMA1
FZK CSCS,CYF,DESY-HH,DESY-ZN, FZU,FREIBURG, LRZ, WUP
LYON BEIJING,CPPM,LAL,LAPP,LPC,LPNHE, NIPNE,SACLAY,TOKYO
NDGF
PIC IFIC, UAM, IFAE, LIP
RAL GLASGOW,LANC,MANC,QMUL,DUR,EDINBOURGH, OXF,CAM,LIV,BRUN,RHUL
SARA IHEP, ITEP, JINR, PNPI
TRIUMF ALBERTA,MONTREAL,SFU,TORONTO,UVIC
Sep 06 Oct 06 Nov 06 Sep 07 Oct 07
New
DQ
2 SW
rel
ease
. Ju
n 20
07, D
Q2
0.3
New
DQ
2 SW
rel
ease
(0.
2.12
)
New
DQ
2 SW
rel
ease
. O
ct 2
007,
DQ
2 0.
4
10 Tier-1s and 46 Tier-2s participated
10/03/2008 A.Minaenko 10
M4 Data Replication Activity Summary for All Sites
Complete replicasIncomplete replicasDatasets subscribed
Summary for all Tier-2 sitesSummary for all Tier-1 sites
IHEP, ITEP, JINR, PNPI
10/03/2008 A.Minaenko 11
M4 Data Replication Activity Summaryfor SARA Cloud
Transfer status:IHEP: 1 trouble file (0.2%)JINR: 1 trouble file (0.3%)ITEP: no troublesPNPI: no troubles
ESD data only
10/03/2008 A.Minaenko 12
ITEP,IHEP, JINR,PNPI participatedDelay in replication < 24h
M5 Data Replication Activity Summary
Total subscriptionsCompleted Transfers IHEP, ITEP, JINR,PNPI
10/03/2008 A.Minaenko 13
Russian contribution to the central ATLAS sw/computing
• Russia contribution to ATLAS M&O budget of Category A has amounted 0.5 FTE this year. Two our colleagues (I.Kachaev, V.Kabachenko) were involved in central ATLAS activities at CERN concerning Core sw maintenance. They fulfilled a number of tasks:
• Support of [email protected] list, i.e. managing user quotas, scratch space distribution, user requests/questions concerning AFS space, access rights etc.
• Support of [email protected] list, i.e. managing central ALAS CVS
• Official ATLAS sw release builds: releases 13.0.20, 13.0.26, 13.0.28, 13.0.30 have been build and 13.0.40 is under construction
• Corresponding documentation update: release pages, librarian documentation
• ATLAS AFS management• a lot of scripts have been written to support release builds, release
copy and move, command line interface to TagCollector, cvs tags search and comparison in the TagCollector, etc.
10/03/2008 A.Minaenko 14
Russian contribution to the central ATLAS sw/computing
• Two our colleagues (A.Zaytsev, S.Pirogov) were visiting CERN (4+4 month) to make contribution to the activity of ATLAS Distributed Data Management (DDM) group. Their tasks included corresponding sw development as well as participation in central ATLAS DDM operations like support of data transfer functional tests, M4 exercises, etc. Special attention were given to SARA cloud to which Russian sites are attached
• During the visit the following main tasks were fulfilled:• Development of the LFC/LRC Test Suite and applying
it to measuring performance of the updated version of the production LFC server and a new GSI enabled LRC testbed
• Extending functionality and documenting the DDM Data Transfer Request Web Interface
• Installing and configuring a complete PanDA server and a new implementation of PanDA Scheduler Server (Autopilot) at CERN and assisting LYON Tier-1site to do the same
• Contributing to the recent DDM/DQ2 Functional Tests (Aug 2007) activity, developing tools for statistical analysis of the results and applying them to the data gathered during the tests
• All the results were reported at the ATLAS internal meetings and at the computing conference CHEP2007
• Part of the activity (0.3 FTE) was accounted as Russia contribution to ATLAS M&O Category A budget (Central Operations part)
10/03/2008 A.Minaenko 15
Challenges in 2008• FDR-1
– 10 hrs. data taking @200 Hz a few days in a row• CCRC-1
– 4 weeks operation of full Computing Model– All 4 LHC experiments simultaneously
• Sub detector runs
• M6– First week of March
• FDR-2 Simulation Production– 100M events in 90 days plus merging– Using new release
• CCRC-2– Like CCRC-1 but the whole month of May
• FDR-2– Like FDR-1 but at higher luminosity– Timing uncertain now
• M7 ?
10/03/2008 A.Minaenko 16
Planned ATLAS activity in 2008
10/03/2008 A.Minaenko 17
10/03/2008 A.Minaenko 18
ATLAS Production Tiers (Feb 08. Full Dress Rehearsal)
ASGC AU-ATLAS
TW-FTT
BNL AGLT2 BU IU OU SMU SWT2 SLAC UMICH WISC UC
CNAF LNF MILANO
NAPOLI
ROMA1
FZK CSCS CYF DESY-HH
DESY-ZN
FREIBURG
FZU HEPHY-UIBK
LRZ WUP
LYON BEIJING
CPPM LAL LAPP LPC LPNE NIPNE_02
NIPNE_07
SACLAY TOKYO
NDGF IJST2
PIC IFAE IFIC LIP-COIMBRA
LIP-LISBON
UAM
RAL BHAM BRUN CAM DUR EDINBURGH
ECDF GLASGOW
LANCS LIV MANC OXF SHEF QMUL ICL RALPP
SARA IHEP ITEP JINR NIKHEF
PNPI SINP
TRIUMF ALBERTA
MCGILL
SFU TORONTO
UVIC
10 Tier-1s and 56 “Tier-2s”Metrics for T1 success : 100% data transferred (from CERN, from Tier-1s and to Tier-2s)Metrics for T2/T3 success : 95+% data transferred (transfer within cloud)Metrics for cloud success : 75% of sites participated in the test and 75% passed the test
status done part failed No test
10/03/2008 A.Minaenko 19
10/03/2008 A.Minaenko 20
CCRC08-1 results at RuTier-2
Activity Summary ('2008-02-24 08:50' to '2008-03-01 12:50')
Transfers Services Errors
Cloud
Efficiency Throughput
Files
Done
Datasets Done
DQ
Grid
Transfer
Local
Remote
Central
IHEP7
%
0 MB/s
74 33 OK 995 0
ITEP6
%
0 MB/s
99 55 OK 1563 0
JINR2
%
0 MB/s
24 13 OK 1277 0
PNPI16%
0 MB/s
102 68 OK 523 0
SINP3
%
0 MB/s
14 7 OK 416 0
10/03/2008 A.Minaenko 21
10/03/2008 A.Minaenko 22
Structure of ATLAS data used for physics analysis
• The streaming of ATLAS data is under discussion now and final decision is not accepted yet
• Streaming is based on trigger decision and the assignment of a given event to a stream can not change over time (does not depend on offline procedures)
• There will be 4-7 RAW/ESD physics streams
• One or a few AOD streams per a ESD stream, with of about 10 final AOD streams
• There are two possible types of streaming– Inclusive streaming – one and the same event can be assigned to different streams if it has
corresponding trigger types
– Exclusive streaming – a given event can be assigned to only one stream; if it has signatures permitting to assign it to more than one stream it goes to special overlap stream
• Now the inclusive streaming is considered as preferable
• A given DPD is intended for a given type(s) of analysis and it can collect events from different streams. A DPD contains only needed for a given analysis set of events and only needed part of event information
• Physics analysis will be carried out using AOD streams and (mainly) different DPDs including specific user created formats (as ROOT trees)
10/03/2008 A.Minaenko 23
Possible scenarios of data distribution and analysis in RuTier-2
• Scenario A: a given AOD stream (or DPD) is thoroughly kept at a given Tier-2 site:
– advantage – can be easily done from the technical point of view using present ATLAS DDM and analysis tools
– disadvantage – very hard to supply uniform CPU load. At some sites (with “popular” data) CPUs will be overloaded but at other there will be idle CPUs
• Scenario B: each AOD stream (large DPD) is split between all the sites:– advantage – uniform CPU load
– disadvantage – i) possible difficulties with subscription providing automated splitting of data (?); ii) will be analysis grid sub-jobs able to find sites with needed data (?)
• From the point of view of functionality scenario B is more preferable but the question is: do existing ATLAS tools permit to realize the scenario (present answer – yes, but it is necessary to test this practically)
• AOD and DPD to be distributed proportionally to the CPU (kSI2k) between the participating sites