jean-yves nief cc-in2p3, lyon hepix-hepnt, fermilab october 22nd – 25th, 2002
TRANSCRIPT
Jean-Yves Nief
CC-IN2P3, Lyon
HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
2
Talk’s outlineTalk’s outline
1) Overview of BaBar: motivation for a TierA.
2) Hardware available for the CC-IN2P3 TierA (servers, storage, batch workers, network).
3) Software issues (maintenance, data import).
4) Resources usage (CPU used…).
5) Problems encountered (hardware, software).
6) BaBar-Grid and future developments.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
3
BaBar: a short overviewBaBar: a short overview• Study of CP violation using B mesons, located at SLAC.• Since 1999, more than 88 millions B-B events collected. ~ 660 TB of data stored (real data + simulation)
How is it handled ?• Object oriented techniques: C++ software and OO database system
(Objectivity).• For data analysis @ SLAC: 445 batch workers (500 CPUs), 127 Objy
servers + ~50 TB of disk + HPSS.But: important users needs (> 500 physicists)=>saturation of the system. collaborators spread world-wide (America, Europe).
Idea: creation of mirror sites where data analysis/simu prod could be done.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
4
CC-IN2P3 Tier A: hardware (I)CC-IN2P3 Tier A: hardware (I)
• 19 Objectivity servers: SUN machines. - 8 Sun Netra 1405T (4 CPUs). - 2 Sun 4500 (4 CPUs). - 1 Sun 1450 (4 CPUs). - 8 Sun 250 (2 CPUs).
9 servers for data access for analysis jobs.2 databases catalog servers.6 servers for databases transactions handling.1 server for Monte-Carlo production.1 server for data import/export.
• 20 TB of disks.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
5
Hardware (II): Storage systemHardware (II): Storage system• Mass storage system:
20 % available on disk => automatic staging required.
• Storage for private use: Temporary storage: 200 GB NFS space. Permanent storage:
- For small files (log files…): Elliot archiving system. - For large files (ntuples…) > 20 GB: HPSS (2% of the total occupancy).
> 100 TB in HPSS
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
6
Hardware (III): the networkHardware (III): the network• Massive data import from Slac ( ~ 80 TB in one year ).• Data needs to be available in Lyon within a short amount of time
(max: 24 - 48 hours). Large bandwidth between SLAC and IN2P3 required. 2 roads:
CC-IN2P3 Renater US : 100 Mbs/s CC-IN2P3 CERN US : 155 Mbs/s (until this
week) CC-IN2P3 Geant US : 1 Gbs/s (from now on)
Full potential never reached (not understood)
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
7
Hardware (IV): the batch and interactive farm
Hardware (IV): the batch and interactive farm
• The batch farm (shared):
– 20 Sun Ultra 60 dual processor.
– 96 Linux PIII-750 MHz dual processor, NetFinity 4000R.
– 96 Linux PIII-1GHz dual processor, IBM X-series.
424 CPUs
• The interactive farm (shared):– 4 Sun machines.
– 12 Linux machines.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
8
Software (I): BaBar releases, Objectivity
Software (I): BaBar releases, Objectivity
• BaBar releases:• Needs to keep up with evolution of the BaBar software at Slac.
new BaBar software releases have to be installed as soon as they are available.
• Objectivity and related issues:• Development of tools:
To monitor the servers activity, HPSS and batch resources.To survey the Objectivity processes on the servers (« sick »
daemons, transactions locks…).
• Maintenance: software upgrades, load balancing of the servers.
• Debugging the Objy problems both on client and server side.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
9
Software (II): data import mechanism
Software (II): data import mechanism
Data catalog available for users through a mySql database.
(1) SLACCern IN2P3
(2) SLACRenaterIN2P3
• < size of the dbs > ~ 500 MB• using multi-stream transfer (bbftp: designed for big files).• extraction when new or updated dbs available.• import in Lyon launched when extraction @ SLAC is finished.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
10
Resources usage (I)Resources usage (I)Tier A officially opened last fall.• ~ 200 - 250 analysis jobs running in parallel (the batch system can
handle up to 600 jobs in // ).• ~ 60 – 70 MC production jobs running in //. already ~ 50 millions events produced in Lyon. now represents ~ 10-15% of the total weekly BaBar MC prod. ~ 1/3 of the jobs running are BaBar jobs.
• Up to 4500 jobs in queue during the busiest periods.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
11
Resource usage (II)Resource usage (II)• BaBar: top CPU consumer
group in the last 4 months at IN2P3.
• Second CPU consumer since the beginning of the year.
MC prod represents 25 – 30% of the total CPU time used.
~ 25 – 30% of CPU for analysis used by remote users.
(*) 1 unit = 1/8 hour on PIII, 1 GHz.
(*)
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
12
Resources usage (III)Resources usage (III)• 20% of the data on disk
dynamic staging via HPSS (RFIO interface).– ~ 80 s for a staging request.– Up to 3000 staging requests
possible per day Not a limitation for CPU
efficiency. Needs less disk space, allow to
save money.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
13
Problems encounteredProblems encountered• A few problems with the availability of data in Lyon due
to the complexity of the export/import procedure.
• Network bandwidth for data import a bit erratic, maximum never reached.
• Objectivity related bugs (most of them due to Objy server problems).
• Some HPSS outages, system overloaded (software related + hardware limitations): solved better performance now.
• During peak activity (e.g. before the summer conference), huge backlog on the batch system.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
14
The Tier A and the outer world: BaBar Grid @ IN2P3
The Tier A and the outer world: BaBar Grid @ IN2P3
• Involvement of BaBar to use Grid technologies.• Storage Resource Broker (SRB) and MetaCatalog
(MCAT) software installed and tested @ IN2P3:– Allows to access data sets and resources based on their
attributes rather than their physical locations. Future for the data distribution between SLAC and
IN2P3.• Tests @ IN2P3 of the EDG software using BaBar analysis
applications: possible to remotely submit a job @ IN2P3 to RAL and SLAC. Prototype of a tool to remotely submit jobs: December
2002.
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
15
CC-IN2P3 Tier A: future developments
CC-IN2P3 Tier A: future developments
• 2 new Objy servers + new disks (near future):
– 1 allocated to MC prod goal: x 2 the MC production.
– Less staging requests to HPSS.
• 72 new Linux batch workers ( PIII, 1.4 Ghz) CPU power increased by 50% (shared with others).
• Compression of the databases on disk (client or server decompression on the fly) HPSS load decreased.
• Installation of a dynamic load balancing system on the Objy servers more efficient (next year).
HEPiX-HEPNT Conference, Fermilab, October 22nd-25th 2002
16
ConclusionConclusion
• BaBar Tier A in Lyon running full steam.
• ~ 25 – 30 % of the CPU consumed by analysis jobs used by remote users.
• Significant resources at CC-IN2P3 dedicated to BaBar (CPU: 2nd biggest user this year, HPSS: first staging requester).
• Contribution to BaBar overall effort increasing thanks to:– New Objy servers and disk space.
– New batch workers (72 new Linux this year, ~ 200 next year).
– HPSS new tape drivers.
– Database compression and dynamic load balancing of the servers.