dØ data handling operational experience gridpp8 sep 22-23, 2003 rod walker imperial college london...
TRANSCRIPT
DØ Data Handling Operational Experience
GridPP8
Sep 22-23, 2003
Rod WalkerImperial College
London
• Computing Architecture• Operational Statistics• Challenges and Future Plans• Regional Analysis Centres• Computing activities• Summary
Roadmap of Talk
Great Britain 200
all Monte Carlo Production
Netherlands 50
France 100
Texas 64 Czech R. 32
fnal.gov
DØ computing/data handling/database
architecture
UNIX hosts
ENSTORE movers
LINUX farm300+ dual PIII/IV nodes
StartapChicago
switch
a: production c: development
ADIC AML/2STK 9310
powderhorn
ClueDØLinux desktop user cluster227 nodes
Fiber to experimentswitch
DEC4000
DEC4000
DEC4000
d0ola,b,c
L3 nodes
RIPdata logger
collector/router
ab
c
• SUN 4500
• Linux quad
d0ora1
d0lxac1
• Linux
d0dbsrv1
switch
SGI Origin2000128 R12000 prcsrs 27 TB fiber channel disks
Central Analysis Backend (CAB)160 dual 2GHz Linux nodes 35 GB cache ea.
Experimental Hall/office complex
CISCO
SAM Data Management System
• Flexible and scalable distributed model• Field hardened code• Reliable and Fault Tolerant• Adapters for mass storage systems: Enstore, (HPSS, and
others planned)• Adapters for Transfer Protocols: cp, rcp, scp, encp, bbftp,
GridFTP.• Useful in many cluster computing environments: SMP
w/ compute servers, Desktop, private network (PN), NFS shared disk,…
• Ubiquitous for DØ users
• SAM is Sequential data Access via Meta-data• Est. 1997. http://d0db.fnal.gov/sam
SAM Station – 1. Collection of SAM servers which manage data delivery and caching for a node or cluster2. The node or cluster hardware itself
Overview of DØ Data Handling
Registered Users 600
Number of SAM Stations 56
Registered Nodes 900
Total Disk Cache 40 TB
Number Files - physical 1.2M
Number Files - virtual 0.5M
Robotic Tape Storage 305 TB
Regional CenterAnalysis site
Summary of DØ Data HandlingIntegrated Files Consumed vs Month (DØ)
Integrated GB Consumed vs Month (DØ)
4.0 M Files Consumed
1.2 PB Consumed
Mar2002 Mar2003
Data In and out of Enstore (robotic tape storage) Daily Aug 16 to Sep 20
5 TBoutgoing
1 TBIncoming. Shutdown
starts
Consumption
Applications “consume” data• In DH system:
• consumers can be hungry or satisfied• allowing for consumption rate, the next course delivered before asking.
180 TB consumed
permonth
1.5 PBConsumed
in 1yr
Challenges • Getting SAM to meet the needs of DØ in the many configurations is and has
been an enormous challenge. Some examples include…– File corruption issues. Solved with CRC.– Preemptive distributed caching is prone to race conditions and log jams
or Gridlock. These have been solved.– Private networks sometimes require “border” services. This is
understood.– NFS shared cache configuration provides additional simplicity and
generality, at the price of scalability (star configuration). This works.– Global routing completed.– Installation procedures for the station servers have been quite complex.
They are improving and we plan to soon have “push button” and even “opportunistic deployment” installs.
– Lots of details with opening ports on firewalls, OS configurations, registration of new hardware, and so on.
– Username clashing issues. Moving to GSI and Grid Certificates. – Interoperability with many MSS. – Network attached files. Sometimes, the file does not need to move to the
user.
RAC:Why Regions are Important
1. Opportunistic use of ALL computing resources within the region2. Management for resources within the region3. Coordination of all processing efforts is easier4. Security issues within the region are similar, CA’s, policies…5. Increases the technical support base 6. Speak the same language7. Share the same time zone8. Frequent Face-to-face meetings among players within the region.9. Physics collaboration at a regional level to contribute to results for the
global level10. A little spirited competition among regions is good
Summary of Current & Soon-to-be RACs RAC IAC’s CPU Hz
(Total*)
Disk(Total*)
Archive (Total*)
Schedule
GridKa @FZK
Aachen, Bonn, Freiburg, Mainz, Munich, Wuppertal,
52 GHz(518 GHz)
5.2 TB(50 TB)
10 TB(100TB)
Established as RAC
SAR @UTA(Southern US)
AZ, Cinvestav (Mexico City), LA Tech, Oklahoma, Rice, KU, KSU
160 GHz(320 GHz)
25 TB(50 TB)
Summer 2003
UK
@tbd
Lancaster, Manchester, Imperial College, RAL
46 GHz(556 GHz)
14 TB(170 TB)
44 TB Active, MC production
IN2P3
@Lyon
CCin2p3, CEA-Saclay, CPPM-Marseille, IPNL-Lyon,
IRES-Strasbourg, ISN-Grenoble, LAL-Orsay, LPNHE-Paris
100 GHz 12 TB 200 TB Active, MC production
DØ
@FNAL(Northern US)
Farm, cab, clued0, Central-analysis
1800 GHz
25 TB 1 PB Established as CAC
*Numbers in () represent totals for the center or region, other numbers are DØ’s current allocation.
UK RAC
Manchester Lancaster LeSc Imperial(CMS)
RAL 3.6TB
FNAL MSS,25TB
Global File Routing• FNAL throttles transfers
• Direct access unnecessary• Firewalls, policies,…
• Configurable, with fail-overs
From RAC’s to RichesSummary and Future
• We feel that the RAC approach is important to more effectively use remote resources
• Management and organization in each region is as important as the hardware.
• However…– Physics group collaboration will transcend regional boundaries– Resources within each region will be used by the experiment at
large (Grid computing Model)– Our models of usage will be revisited frequently. Experience
already indicates that the use of thumbnails differs from that of our RAC model (skims).
– No RAC will be completely formed at birth.
• There are many challenges ahead. We are still learning…
Stay Tuned for SAM-GridThe best is yet to come…
CPU intensive activities
• Primary reconstruction– On-site, with local help to keep-up.
• MC production– Anywhere. No input data.
• Re-reconstruction (reprocessing)– Must be fast to be useful
– Use all resources.
• Thumbnail skims– 1 per physics group
• Common skim – OR of group skims– End up with all events if triggers are good
– Defeats the object, i.e. small datasets
• User analysis – not a priority (CAB can satisfy demand)
First on SAMGrid
Current Reprocessing of D0RunII
• Why now and fast?– Improved tracking for Spring conferences– Tevatron shutdown – include reconstruction farm
• Reprocess all RunII data– 40TB of DST data– 40k files (basic unit of Data Handling)– 80 million events
• How– Many sites in US and Europe, inc. UK RAC– qsub initially, but UK will lead move to SAMGrid.– Nikhef (LCG)– Will gather statistics and report.
Runjob and SAMGrid
• Runjob workflow manager– Maintained by Lancaster. Mainstay of D0 MC production.
– No difference between MC production and data (re)processing.
• SAMGrid integration– Was done for GT2.0, eg.tier1a via EDG1.4 CE
– Job Bomb: 1 grid job-to-many local BS jobs, i.e. job has structure.
– Request 2.0 gatekeeper(0mth), write custom perl jobmanagers(2mth), or use DAGman to absorb structure(3mth)
– Pressure to use grid-submit - want 2.0 for now.
• 4UK sites, 0.5FTE’s – need to use SAMGrid.
Conclusions• SAM enables PB scale HEP computing today.• Details are important in production system
– PN’s, NFS, scaling, cache management(free space=zero, always), gridlock,…
• Official & semi-official tasks dominate cpu requirements.– reconstruction, reprocessing, MC production, skims.– by definition these are structured, repeatable – good for
grid. User analysis runs locally(still needs DH), or centrally. (Still project goal – just not mine)
• SAM experience valuable – see report on reprocessing. Have LCG seen how good it is?