dØ data handling operational experience gridpp8 sep 22-23, 2003 rod walker imperial college london...

DØ Data Handling Operational Experience

GridPP8

Sep 22-23, 2003

Rod WalkerImperial College

London

• Computing Architecture• Operational Statistics• Challenges and Future Plans• Regional Analysis Centres• Computing activities• Summary

Roadmap of Talk

Great Britain 200

all Monte Carlo Production

Netherlands 50

France 100

Texas 64 Czech R. 32

fnal.gov

DØ computing/data handling/database

architecture

UNIX hosts

ENSTORE movers

LINUX farm300+ dual PIII/IV nodes

StartapChicago

switch

a: production c: development

ADIC AML/2STK 9310

powderhorn

ClueDØLinux desktop user cluster227 nodes

Fiber to experimentswitch

DEC4000

DEC4000

DEC4000

d0ola,b,c

L3 nodes

RIPdata logger

collector/router

ab

c

• SUN 4500

• Linux quad

d0ora1

d0lxac1

• Linux

d0dbsrv1

switch

SGI Origin2000128 R12000 prcsrs 27 TB fiber channel disks

Central Analysis Backend (CAB)160 dual 2GHz Linux nodes 35 GB cache ea.

Experimental Hall/office complex

CISCO

SAM Data Management System

• Flexible and scalable distributed model• Field hardened code• Reliable and Fault Tolerant• Adapters for mass storage systems: Enstore, (HPSS, and

others planned)• Adapters for Transfer Protocols: cp, rcp, scp, encp, bbftp,

GridFTP.• Useful in many cluster computing environments: SMP

w/ compute servers, Desktop, private network (PN), NFS shared disk,…

• Ubiquitous for DØ users

• SAM is Sequential data Access via Meta-data• Est. 1997. http://d0db.fnal.gov/sam

SAM Station – 1. Collection of SAM servers which manage data delivery and caching for a node or cluster2. The node or cluster hardware itself

Overview of DØ Data Handling

Registered Users 600

Number of SAM Stations 56

Registered Nodes 900

Total Disk Cache 40 TB

Number Files - physical 1.2M

Number Files - virtual 0.5M

Robotic Tape Storage 305 TB

Regional CenterAnalysis site

Summary of DØ Data HandlingIntegrated Files Consumed vs Month (DØ)

Integrated GB Consumed vs Month (DØ)

4.0 M Files Consumed

1.2 PB Consumed

Mar2002 Mar2003

Data In and out of Enstore (robotic tape storage) Daily Aug 16 to Sep 20

5 TBoutgoing

1 TBIncoming. Shutdown

starts

Consumption

Applications “consume” data• In DH system:

• consumers can be hungry or satisfied• allowing for consumption rate, the next course delivered before asking.

180 TB consumed

permonth

1.5 PBConsumed

in 1yr

Challenges • Getting SAM to meet the needs of DØ in the many configurations is and has

been an enormous challenge. Some examples include…– File corruption issues. Solved with CRC.– Preemptive distributed caching is prone to race conditions and log jams

or Gridlock. These have been solved.– Private networks sometimes require “border” services. This is

understood.– NFS shared cache configuration provides additional simplicity and

generality, at the price of scalability (star configuration). This works.– Global routing completed.– Installation procedures for the station servers have been quite complex.

They are improving and we plan to soon have “push button” and even “opportunistic deployment” installs.

– Lots of details with opening ports on firewalls, OS configurations, registration of new hardware, and so on.

– Username clashing issues. Moving to GSI and Grid Certificates. – Interoperability with many MSS. – Network attached files. Sometimes, the file does not need to move to the

user.

RAC:Why Regions are Important

1. Opportunistic use of ALL computing resources within the region2. Management for resources within the region3. Coordination of all processing efforts is easier4. Security issues within the region are similar, CA’s, policies…5. Increases the technical support base 6. Speak the same language7. Share the same time zone8. Frequent Face-to-face meetings among players within the region.9. Physics collaboration at a regional level to contribute to results for the

global level10. A little spirited competition among regions is good

Summary of Current & Soon-to-be RACs RAC IAC’s CPU Hz

(Total*)

Disk(Total*)

Archive (Total*)

Schedule

GridKa @FZK

Aachen, Bonn, Freiburg, Mainz, Munich, Wuppertal,

52 GHz(518 GHz)

5.2 TB(50 TB)

10 TB(100TB)

Established as RAC

SAR @UTA(Southern US)

AZ, Cinvestav (Mexico City), LA Tech, Oklahoma, Rice, KU, KSU

160 GHz(320 GHz)

25 TB(50 TB)

Summer 2003

UK

@tbd

Lancaster, Manchester, Imperial College, RAL

46 GHz(556 GHz)

14 TB(170 TB)

44 TB Active, MC production

IN2P3

@Lyon

CCin2p3, CEA-Saclay, CPPM-Marseille, IPNL-Lyon,

IRES-Strasbourg, ISN-Grenoble, LAL-Orsay, LPNHE-Paris

100 GHz 12 TB 200 TB Active, MC production

DØ

@FNAL(Northern US)

Farm, cab, clued0, Central-analysis

1800 GHz

25 TB 1 PB Established as CAC

*Numbers in () represent totals for the center or region, other numbers are DØ’s current allocation.

UK RAC

Manchester Lancaster LeSc Imperial(CMS)

RAL 3.6TB

FNAL MSS,25TB

Global File Routing• FNAL throttles transfers

• Direct access unnecessary• Firewalls, policies,…

• Configurable, with fail-overs

From RAC’s to RichesSummary and Future

• We feel that the RAC approach is important to more effectively use remote resources

• Management and organization in each region is as important as the hardware.

• However…– Physics group collaboration will transcend regional boundaries– Resources within each region will be used by the experiment at

large (Grid computing Model)– Our models of usage will be revisited frequently. Experience

already indicates that the use of thumbnails differs from that of our RAC model (skims).

– No RAC will be completely formed at birth.

• There are many challenges ahead. We are still learning…

Stay Tuned for SAM-GridThe best is yet to come…

CPU intensive activities

• Primary reconstruction– On-site, with local help to keep-up.

• MC production– Anywhere. No input data.

• Re-reconstruction (reprocessing)– Must be fast to be useful

– Use all resources.

• Thumbnail skims– 1 per physics group

• Common skim – OR of group skims– End up with all events if triggers are good

– Defeats the object, i.e. small datasets

• User analysis – not a priority (CAB can satisfy demand)

First on SAMGrid

Current Reprocessing of D0RunII

• Why now and fast?– Improved tracking for Spring conferences– Tevatron shutdown – include reconstruction farm

• Reprocess all RunII data– 40TB of DST data– 40k files (basic unit of Data Handling)– 80 million events

• How– Many sites in US and Europe, inc. UK RAC– qsub initially, but UK will lead move to SAMGrid.– Nikhef (LCG)– Will gather statistics and report.

Runjob and SAMGrid

• Runjob workflow manager– Maintained by Lancaster. Mainstay of D0 MC production.

– No difference between MC production and data (re)processing.

• SAMGrid integration– Was done for GT2.0, eg.tier1a via EDG1.4 CE

– Job Bomb: 1 grid job-to-many local BS jobs, i.e. job has structure.

– Request 2.0 gatekeeper(0mth), write custom perl jobmanagers(2mth), or use DAGman to absorb structure(3mth)

– Pressure to use grid-submit - want 2.0 for now.

• 4UK sites, 0.5FTE’s – need to use SAMGrid.

Conclusions• SAM enables PB scale HEP computing today.• Details are important in production system

– PN’s, NFS, scaling, cache management(free space=zero, always), gridlock,…

• Official & semi-official tasks dominate cpu requirements.– reconstruction, reprocessing, MC production, skims.– by definition these are structured, repeatable – good for

grid. User analysis runs locally(still needs DH), or centrally. (Still project goal – just not mine)

• SAM experience valuable – see report on reprocessing. Have LCG seen how good it is?

dØ data handling operational experience gridpp8 sep 22-23, 2003 rod walker imperial college london...

Documents

data delivery

sequential data access

d userssam

files consumed1

needs of d

station servers

mnumber files virtual0

collection of sam servers