1 the d0 nikhef farm kors bos ton damen willem van leeuwen fermilab, may 23 2001

1

The D0 NIKHEF Farm

Kors Bos

Ton Damen

Willem van Leeuwen

Fermilab, May 23 2001

2

Layout of this talk

D0 Monte Carlo needsThe NIKHEF D0 farmThe data we produceThe SAM data baseA Grid intermezzo

The networkThe next steps


3

D0 Monte Carlo needs• D0 Trigger rate is 100 Hz, 107 seconds/yr 109 events/yr

• We want 10% of that be simulated 108 events/yr

• To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte)

– On a 800 MHz PIII

• So 1 cpu can produce ~105 events/yr (~200 Gbyte)– Assuming a 60% overall efficiency

• So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte)

– And this is only 10% of the goal we set ourselves

– Not counting Nijmegen D0 farm yet

• So we need another 900 cpu’s– UTA (50), Lyon (200), Prague(10), BU(64),

– Nijmegen(50), Lancaster(200), Rio(25),

4

How it looks

5

The NIKHEF D0 Farm

Farm Server

100 Mbit/s

Surfnet1 Gbit/s

SARA network

Tape robot@SARA

1 Gbit/s

switch

100 Mbit/s

NIKHEF network

1 Gbit/s

1.5 TBdisk cache

File Server

1 Gbit/sSam station

Farm nodes

.. Etc.

.. Etc.

.. Etc.

.. Etc.

.. Etc.

Meta data @Fermilab155 Mbit/s

Sam station

6

50 Farm nodes(100 cpu’s)

Dell Precision Workstation 220

• Dual Pentium III processor 800 MHz / 256 kB cache each• 512 MB PC800 ECC RDRAM • 40 GB (7200 rpm) ATA-66 disk drive

• no screen• no keyboard• no mouse

• wake up on Lan functionality

7

The File Server

Elonex EIDE Server

• Dual Pentium III 700 MHz

• 512 MB SDRAM

• 20 GByte EIDE disk

• 1.2 Tbyte : 75 GB EIDE disks

• 2 x Gigabit Netgear GA620 network card

The Farm Server

Dell Precision 620 workstation

• Dual Pentium III Xeon 1 GHz

• 512 MB RDRAM

• 72.8 GByte SCSI disk

• Will also serve as D0 software server for the NIKHEF/D0 people

8

Software on the farm

• Boot via the network• Standard Redhat Linux 6.2 • Ups/upd on the server• D0 software on the server• FBSNG on the server, deamon on the nodes• SAM on the file server

• Used to test new machines …

9

What we run on the farm

• Particle Generator: Pythia or Isajet• Geant Detector simulation: d0gstar • Digitization, adding min.bias: psim

• Check the data: mc_analyze • Reconstruction: preco • Analysis: reco_analyze

10

Example: Min.bias

• Did a run with 1000 events on all cpu’s– Took ~2 min./event– So ~1.5 days for the whole run– Ouput file size ~575 MByte

• We left those files on the nodes• reason for enough local disk space• Intend to repeat that “sometimes”

11

Output data

• -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

12

Output data translated

0.047 Gbyte gen_*

1.5 Gbyte d0g_*

0.7 Gbyte sim_*

import_gen_*.py

import_d0g_*.py

import_sim_*.py

isajet_*.paramsRunJob_Farm_*.paramsd0gstar_*.paramsd0sim_*.paramssamglobal_*.paramsSummary.conf

12 files for generator+d0gstar+psimBut of course only 3 big onesTotal ~2 Gbyte

13

Data management

sam

NIKHEF D0 FARM

Fermilab

d0mino

SARA

TERAS

reconstructed data

Import_gen.pygenerator data

Import_d0g.pygeant data (hits)

Import_sim.pysim data (digis)

Import_reco.py

parameters

14

Automation• Mc_runjob (modified)

– Prepares MC jobs (gen+sim+reco+anal)• (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times

– Submits them into the batch (FBS)• Ran on the nodes

– Copy to fileserver after completion• A separate batch job onto the fileserver

– Submits them into SAM• Sam does file transfers to Fermi and SARA

• Runs for a week …

15

farm server file server

node

SAM DB

datastore

fbs(rcp)

fbs(sam)

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

FNALSARA

control

data

metadata

fbs job:1 mcc2 rcp3 sam

50 +

16

sam

Fermilab

d0mino

SARA

TERAS

This is a grid!

NIKHEF D0 FARM

in2p3 D0 FARM

KUN D0 FARM

17

The Grid

• Not just D0, but for the LHC expts.• Not just SAM, but for any database• Not just farms, but any cpu resource• Not just SARA, but any mass storage• Not just FBS, but any batch system• Not just HEP, but any science, EO, …

18

European Datagrid Project

• 3 yr. Project for 10 M€ • Manpower to develop grid tools• Cern, in2p3, infn, pparc, esa, fom• Nikhef + sara + knmi

– Farm management– Mass storage management– Network management– Testbed– HEP & EO applications

19

LHC - Regional Centres

Department Atlas LHCb Alice

Desktop

CERN – Tier 0

Tier 1 FNALNIKHEF/

SARA

IN2P3

Tier2 Vrije Univ.

Amsterdam

RAL

INFN

Brussel

Leuven

Utrecht

Nijmegen

SURFnet

possibly

KEK

BNL

20

DataGrid : Test bed sites

Dubna

Moscow

RAL

Lund

Lisboa

Santander

Madrid

Valencia

Barcelona

Paris

Berlin

LyonGrenoble

Marseille

Brno

Prague

Torino

Milano

BO-CNAFPD-LNL

Pisa

Roma

Catania

ESRIN

CERN

HEP sites

ESA sites

IPSL

Estec KNMIQMWBristol

Edinburgh

Manchester

Oxford

Nikhef

The

NL-Datagrid

Project

22

NL-Datagrid Goals• National test bed for middleware

development– WP4, WP5, WP6, WP7, WP8, WP9

• To become an LHC Tier-1 center– ATLAS, LHCb, Alice

• To use it for the existing program– D0, Antares

• To use it for other sciences– EO, Astronomy, Biology

• for tests with other (Trans Atlantic) grids– D0– PPDG, GriPhyN

23

NL-Datagrid Testbed Sites

Nijmegen Univ.(Atlas)

CERNRALFNALESA

Univ.Utrecht(Alice)

Vrije Univ.(LHCb)

Univ.Amsterdam(Atlas)

24

Utrecht Univ.

Dutch Grid topology

NIKHEFFree Univ.

Surfnet

SARAKNMI

FNALESA D-PAFMunchen

CERNGeneva

Nijmegen Univ.

LHCb D0Atlas

Alice

D0AtlasLHCbAlice

25

End of the Grid intermezzo

Back to

The NIKHEF D0 farm and Fermilab:

The network

26

Network bandwidth• NIKHEF SURFnet 1 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• Esnet: Chicago Fermilab 155 Mbit

ATM• But ftp gives us ~4 Mbit/sec• bbftp gives us ~25 Mbit/sec• bbftp processes in parallel ~45 Mbit/sec

For 2002 • NIKHEF SURFnet 2.5 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• SURFnet: Amsterdam Chicago 2.5 Bbit optical• Chicago Fermilab ? but more ..

27

ftp++• ftp gives you 4 Mb/s to Fermilab• bbftp: increased buffer, # streams• gsiftp: with security layer, increased buffer, ..• grid_ftp: increased buffer, # streams,

#sockets, fail-over protection, security• bbftp ~20 Mb/s• grid_ftp ~25 Mb/s• Multiple ftp in // factor 2 seen• Should get to > 100 Mbit/sec • Or ~1 Gbyte/minute

28

SURFnet5 access capacityA

cces

s ca

paci

ty

100 Gbit/s

1 Gbit/s

10 Mbit/s

100 Mbit/s

10 Gbit/s

1999 2000 2001 2002

155 Mbit/s

2,5 Gbit/s

20 Gbit/sSURFnet5

10 Gbit/s

1.0 Gbit/s

SURFnet4

29

NLNLSURFnet

Geneva

UKUKSuperJANET4

AbileneAbilene

ESNETESNET

MRENMREN

ItItGARR-B

GEANT

NewYork

FrFrRenater

STAR-TAP

STAR-LIGHT

622 Mb

2.5 Gb

TA access capacity

30

Network load last week

• Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day)• Available to Chicago: 622 Mbit/s• Available to FNAL: 155 Mbit/s

• Needed next year (double cap.): ~25 Mbit/s• Available to Chicago: 2.5 Gbit/s: factor 100 more !!• Available to FNAL: ??

31

New nodes for D0• In a 2u 19” mounting

• Dual 1 GHz PIII • 1 Gbyte RAM • 40 Gbyte disk• 100 Mbit ethernet

• Cost ~k$2 • Dell machines were ~k$4 (tax incl)

FACTOR 2 cheaper!!

• assembly time 1/hour• 1 switch k$2.5 (24 ports)• 1 rack k$2 (46u high)

• Requested for 2001: k$60• 22 dual cpu’s• 1 switch• 1 19” rack

33

The End

Kors Bos


1 the d0 nikhef farm kors bos ton damen willem van leeuwen fermilab, may 23 2001

Documents

a03 computer

sometimesoutput datarw

cpu farm

farm boot

gbyte gen

gbyte d0g

gbyte sim

nijmegen d0 farm yetso