1 data management d0 monte carlo needs the nikhef d0 farm the data we produce the sam data base the...

12
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab, May 23 2001

Upload: frank-henderson

Post on 29-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

1

Data Management

D0 Monte Carlo needsThe NIKHEF D0 farmThe data we produceThe SAM data base

The networkConclusions

Kors Bos, NIKHEF, AmsterdamFermilab, May 23 2001

Page 2: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

2

D0 Monte Carlo needs• D0 Trigger rate is 100 Hz, 107 seconds/yr 109 events/yr

• We want at least 10% of that be simulated 108 events/yr

• To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte)

– On a 800 MHz PIII

• So 1 cpu can produce ~105 events/yr (~200 Gbyte)– Assuming a 60% overall efficiency

• So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte)

– And this is only 10% of the goal we set ourselves

– Not counting Nijmegen D0 farm yet

• So we need at least an order of magnitude more– UTA (50), Lyon (200), Prague(10), BU(64),

– Nijmegen(50), Lancaster(200), Rio(25),

Page 3: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

3

Example: Min.bias

• Did a run with 1000 events on all cpu’s– Took ~2 min./event– So ~1.5 days for the whole run– Ouput file size ~575 MByte

• We left those files on the nodes• reason for enough local disk space !• Intend to repeat that “sometimes”

Page 4: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

4

Output data

• -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

Page 5: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

5

Output data translated0.047 Gbyte gen_*

1.5 Gbyte d0g_*

0.7 Gbyte sim_*

import_gen_*.py

import_d0g_*.py

import_sim_*.py

isajet_*.paramsRunJob_Farm_*.paramsd0gstar_*.paramsd0sim_*.paramssamglobal_*.paramsSummary.conf

12 files for generator+d0gstar+psimBut of course only 3 big onesTotal ~2 Gbyte

Per Day, on 100 cpu’sTotal 200 Gbyte/day !

Page 6: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

6

Automation• Mc_runjob (modified)

– Prepares MC jobs (gen+sim+reco+anal)• (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times

– Submits them into the batch (FBSNG)• Ran on the nodes• Moves the executable to the nodes + some files

– Copy to fileserver after completion• A separate batch job onto the fileserver• Data moves between nodes and server

– Submits them into SAM• Sam does file transfers to Fermi and SARA

• Runs for a week …

Page 7: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

7

farm server file server

node

SAM DB

datastore

fbs(rcp)

fbs(sam)

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

40 GB

SARA

control

data

metadata

fbs job:1 mcc2 rcp3 sam

50 +

datastore

FNAL

Page 8: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

8

Network bandwidth• NIKHEF SURFnet 1 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• Esnet: Chicago Fermilab 155 Mbit

ATM• But ftp gives us ~4 Mbit/sec• bbftp gives us ~25 Mbit/sec• bbftp processes in parallel ~45 Mbit/sec

For 2002 • NIKHEF SURFnet 2.5 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• SURFnet: Amsterdam Chicago 2.5 Gbit optical• Chicago Fermilab ? More than 155

Page 9: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

9

network capacity internally

Acc

ess

capa

city

100 Gbit/s

1 Gbit/s

10 Mbit/s

100 Mbit/s

10 Gbit/s

1999 2000 2001 2002

155 Mbit/s

2,5 Gbit/s

20 Gbit/sSURFnet5

10 Gbit/s

1.0 Gbit/s

SURFnet4

Page 10: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

10

NLNLSURFnet

Geneva

UKUKSuperJANET4

AbileneAbilene

ESNETESNET

MRENMREN

ItItGARR-B

GEANT

NewYork

FrFrRenater

STAR-TAP

STAR-LIGHT

622 Mb

2.5 Gb

TA network capacity

Page 11: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

11

Network load last week

• Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day)• Available to Chicago: 622 Mbit/s• Available to FNAL: 155 Mbit/s

• Needed next year (double cap.): ~25 Mbit/s• Available to Chicago: 2.5 Gbit/s: factor 100 more !!• Available to FNAL: ??

Page 12: 1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

12

Conclusions• Producing a lot of data is easy• Storing a lot of data less easy, but still easy• Moving a lot of data even less easy, but still easy

So what is the problem?• Managing a lot of data is difficult metadata

dbase• The network around Fermilab/CERN is getting

tight• Otherwise there is enough bandwidth !Conclusion:Do the easiest thing:Don’t store or move: recalculate !!