status of lhcb-infn computing csn1, catania, september 18, 2002 domenico galli, bologna

22
Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Upload: lesley-cannon

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing

CSN1, Catania, September 18, 2002

Domenico Galli, Bologna

Page 2: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 2Domenico Galli

LHCb Computing Constraints

Urgent Need of production and analysis of large number of MC data sets in a short time. LHCb-light detector design.

Trigger design, TDRs.

Need to optimize the hardware and software configuration to minimize dead time and system administration effort.

Page 3: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 3Domenico Galli

LHCb Farm Architecture (I)

Article in press on Computer Physics Communications:

“A Beowulf-class computing cluster for the Monte Carlo production of the LHCb experiment”.

Disk-less computing nodes, with operating systems centralized on a file server (Operating System Server).

Very flexible configuration, allows adding and removing nodes from the system without any local installation.

Useful for computing resources shared among different experiments.

Extremely stable system: no side effects at all in more than 1 year of work.

System administration duties minimized.

Page 4: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 4Domenico Galli

LHCb Farm Architecture (II)

Security Usage of private IP addresses and Virtual LAN.

High level of isolation from the Internet network.

Extern accesses (afs servers, bookkeeping database, CASTOR library at CERN) through Network Address Translation technology on a Gateway node.

Potential system “Single Points of Failure” equipped with redundant disk configuration.

RAID-5 (2 NAS).

RAID-1 (Gateway and Operating System Server).

Page 5: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 5Domenico Galli

LHCb Farm Architecture (III)

NAS

Red Hat 7.2 (kernel 2.4.18)DNSNAT (IP masquerading)

Disk-less nodeCERN Red Hat 6.1Kernel 2.2.18PBS MasterMC control serverFarm Monitoring

Gateway

Fast Ethernet Switch

NAS

Power Distributor

EthernetLink

Power Control

Control Node

Processing Node 1

Processing Node n

Red Hat 7.2

Various services:Home directories

PXE remote boot,DHCP, NIS

1TB RAID 5 1TB RAID 5

Uplink

Mirrored disks (RAID 1)

Mirrored disks (RAID 1)

PublicVLAN

PrivateVLAN

Disk-less nodesCERN Red Hat 6.1Kernel 2.2.18PBS Slave

OS file-systemsMaster Server

Page 6: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 6Domenico Galli

Fast ethernet switch

NAS, 1TB

Ethernet controlled power distributor (32 channels)

Rack (1U dual-processor MB)

Page 7: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 7Domenico Galli

Data Storage

Files containing reconstructed events (OODST-ROOT format) are transferred to CERN using bbftp and automatically stored on the CASTOR tape library. Data transfer from CNAF to CERN performed with a

maximum throughput of 70 Mb/s (on a 100 Mb/s link).

To be compared with ~15 Mb/s using ftp.

Page 8: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 8Domenico Galli

2002 Monte Carlo Production Target

Production of large event statistics for the design of the LHCb-light detector and of the trigger system (trigger TDR).

Software: Simulation (FORTRAN) and reconstruction (C++) code to be

used in the production supplied in July.

LHCb Data Challenge ongoing (August-September) Participating Computing Centers : CERN, INFN-CNAF,

Liverpool, IN2P3-Lyon, NIKHEF, RAL, Bristol, Cambridge, Oxford, ScotGrid (Glasgow & Edinburgh)

Page 9: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 9Domenico Galli

Status of Summer LHCb-Italy Monte Carlo Production (Data Challenge)

Events produced in Bologna (Aug., 1 –Sep., 12): 1,053,500

Bd0 -> pi+ pi- 79,000

Bd0 -> D*-(D0_bar(K+ pi-) pi-) pi+ 19,000

Bd0 -> K+ pi- 55,500

Bs0 -> K- pi+ 8,000

Bs0 -> K+ K- 8,000

Bs0 -> J/psi(mu+ mu-) eta(gamma gamma) 8,000

Bd0 -> phi(K+ K-) Ks0(pi+ pi-) 8,000

Bs0 -> mu+ mu- 8,000

Bd0 -> D+(K- pi+ pi+) D-(K+ pi- pi-) 8,000

Bs0 -> Ds-(K+ K- pi-) K+ 8,000

Bs0 -> J/psi(mu+ mu-) phi(K+ K-) 8,000

Bs0 -> J/psi(e+ e-) phi(K+ K-) 8,000

Minimum bias 47,500

c c_bar -> inclusive (at least one c hadron in 400 mrad)

275,500

b b_bar -> inclusive (at least one b hadron in 400 mrad)

505,000

Page 10: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 10Domenico Galli

Distribution of Produced Events Among Production Centers (August, 1–September, 12)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

CERN INFN-CNAF IN2P3-Lyon RAL

The other above mentioned centres are late on the Data Challenge start date.

Page 11: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 11Domenico Galli

Usage of the CNAF Tier-1 Computing Resources

Computing, Control and Service Nodes:

130 PIII CPUs (clock ranges from 866 MHz to 1.4 GHz)

Disk Storage Servers 1 TB NAS (14 x 80 GB IDE

disks + hotspare in RAID5). 1TB NAS (7 x 170 GB SCSI

disks + hotspare in RAID5).

All the stuff is working at a very high duty-cycle.

CPU LOAD

Page 12: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 12Domenico Galli

Plan for Analysis Activities In autumn the analysis of the data produced during the Data

Challenge is foreseen.

Complete porting to Bologna of the development environment of the analysis code (DaVinci C++ code) already performed and in use on a mini-farm since 2 months.

Need of an extension of the analysis mini-farm to a grater number of nodes for the need of the Italian LHCb collaboration.

Data produced in Bologna are kept stored on Bologna disks, data produced in the other centers need to be transferred to Bologna on user-demand with an automatic procedure.

Analysis jobs (on ~100 CPUs) need an I/O throughput (~100MB/s) greater than supplied by NAS (~10MB/s).

Page 13: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 13Domenico Galli

High Performance I/O System (I) An I/O parallelization system (through the use of a

parallel file system) was successfully tested. PVFS (Parallel Virtual File System).

File striping of data among local disks of several I/O servers (ION).

Scalable System (throughput ~ 100 Mbit/s x n_ION)

CN 1

CN 2

CN m

ION 1

ION 2

ION n

MGR

I/O nodes

ManagementNode

Clients

Netw

ork

Page 14: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 14Domenico Galli

High Performance I/O System (II)

With 10 ION we were able to reach the Aggregate I/O of 110 MB/s (30 client nodes reading data).

To be compared with: 20-40 MB/s (local disk)

10 MB/s (100Base-T NAS)

50 MB/s (1000Base-T NAS)

With a single file hierarchy.

PVFS Performances (10 I/O servers)

0

20

40

60

80

100

120

0 5 10 15 20 25 30Client number

Ag

gre

gat

e I/O

[M

B/s

]

Page 15: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 15Domenico Galli

Test of a PVFS-Based Analysis Facility (I)

Test performed using the OO DaVinci algorithm for B+– selection.

Analyzed 44.5k signal events and 484k bb inclusive events in 25 minutes (to be compared with 2 days on a single PC).

Completely performed with the Bologna Farm parallelizing the analysis algorithm over 106 CPUs (80 x 1.4 GHz PIII CPUs + 26 x 1 GHz PIII CPUs).

DaVinci processes read OODST from PVFS.

Page 16: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 16Domenico Galli

Test of a PVFS-Based Analysis Facility (II)

CN 1

PV

FS

CN 2

CN 106

Nt-ple

ION 1

ION 2

ION 10

MGR

Login Node

OODST

OODST

OODST

Page 17: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 17Domenico Galli

Test of a PVFS-Based Analysis Facility (III)

106 DaVinci processes reading from PVFS.

968 files (500 OODST events each) x 120 MB.

116 GB read and processed in 1500 s.

Page 18: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 18Domenico Galli

B+–: Pion Momentum Resolution

p [GeV/c]

p / p for identified pionscoming from B0

FWMH 0.01

p / p

|p / p| vs p for identified pionscoming from B0

p /

p

Page 19: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 19Domenico Galli

B0 Mass Plots

Pt > 800 MeV/c

d/d > 1.6

lB0 > 1 mmMeV/c2

MeV/c2MeV/c2

All pi+ pi- pairs with no cuts All pi+ pi- pairs with all cuts

All pi+ pi- pairs with all cuts

(magnified)

3425 events

105 events

FWMH 66 MeV

Page 20: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 20Domenico Galli

bb Inclusive Background Mass Plot

All pi+ pi- pairs with all cuts Total number of events 484k.

Only events with single interaction taken into account at the moment: ~240k.

213 events in mass region after all cuts.

32/213 are ghosts.

GeV/c2

Page 21: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 21Domenico Galli

Signal Efficiency and Mass Plots for Tighter Cuts Final Efficiency (tighter cuts) @

zero bb inclusive background (240k events) = 871/22271 = 4%

Rejection against bb inclusive background > 1-1/240000 = 99.9996%

GeV/c2

871 signal events in mass region

16 BG events from signal sample in mass region (all ghosts)

GeV/c2

Pt > 2.8 GeV/cd/d > 2.5lB0 > 0.6 mm

Page 22: Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna

Status of LHCb-INFN Computing, 22Domenico Galli

Conclusions MC production farm stably running (with increasing resources)

since more than 1 year. INFN Tier-1 is the second most active LHCb MC production

centre (after CERN). The collaboration with the CNAF staff is excellent. Still we aren’t using GRID tools in production, but we plan to move

as soon as the detector design is stable. An analysis mini-farm for interactive work is running since more

than 1 month and we plan to extend the number of nodes depending on the availability of the resources.

Massive analysis system architecture already tested using a parallel file system and 106 CPUs.

We need at least to keep the present computing power at CNAF (but more resources to keep production running in parallel with massive analysis activities would be welcome) to supply the analysis facility to the LHCb-Italian collaboration.