1 data management d0 monte carlo needs the nikhef d0 farm the data we produce the sam data base the...
TRANSCRIPT
1
Data Management
D0 Monte Carlo needsThe NIKHEF D0 farmThe data we produceThe SAM data base
The networkConclusions
Kors Bos, NIKHEF, AmsterdamFermilab, May 23 2001
2
D0 Monte Carlo needs• D0 Trigger rate is 100 Hz, 107 seconds/yr 109 events/yr
• We want at least 10% of that be simulated 108 events/yr
• To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte)
– On a 800 MHz PIII
• So 1 cpu can produce ~105 events/yr (~200 Gbyte)– Assuming a 60% overall efficiency
• So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte)
– And this is only 10% of the goal we set ourselves
– Not counting Nijmegen D0 farm yet
• So we need at least an order of magnitude more– UTA (50), Lyon (200), Prague(10), BU(64),
– Nijmegen(50), Lancaster(200), Rio(25),
3
Example: Min.bias
• Did a run with 1000 events on all cpu’s– Took ~2 min./event– So ~1.5 days for the whole run– Ouput file size ~575 MByte
• We left those files on the nodes• reason for enough local disk space !• Intend to repeat that “sometimes”
4
Output data
• -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params
• -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000
• -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params
• -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params
• -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000
• -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py
• -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py
• -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py
• -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params
• -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params
• -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000
• -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf
5
Output data translated0.047 Gbyte gen_*
1.5 Gbyte d0g_*
0.7 Gbyte sim_*
import_gen_*.py
import_d0g_*.py
import_sim_*.py
isajet_*.paramsRunJob_Farm_*.paramsd0gstar_*.paramsd0sim_*.paramssamglobal_*.paramsSummary.conf
12 files for generator+d0gstar+psimBut of course only 3 big onesTotal ~2 Gbyte
Per Day, on 100 cpu’sTotal 200 Gbyte/day !
6
Automation• Mc_runjob (modified)
– Prepares MC jobs (gen+sim+reco+anal)• (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times
– Submits them into the batch (FBSNG)• Ran on the nodes• Moves the executable to the nodes + some files
– Copy to fileserver after completion• A separate batch job onto the fileserver• Data moves between nodes and server
– Submits them into SAM• Sam does file transfers to Fermi and SARA
• Runs for a week …
7
farm server file server
node
SAM DB
datastore
fbs(rcp)
fbs(sam)
fbs(mcc)
mcc request
mcc input
mcc output
1.2 TB
40 GB
SARA
control
data
metadata
fbs job:1 mcc2 rcp3 sam
50 +
datastore
FNAL
8
Network bandwidth• NIKHEF SURFnet 1 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• Esnet: Chicago Fermilab 155 Mbit
ATM• But ftp gives us ~4 Mbit/sec• bbftp gives us ~25 Mbit/sec• bbftp processes in parallel ~45 Mbit/sec
For 2002 • NIKHEF SURFnet 2.5 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• SURFnet: Amsterdam Chicago 2.5 Gbit optical• Chicago Fermilab ? More than 155
9
network capacity internally
Acc
ess
capa
city
100 Gbit/s
1 Gbit/s
10 Mbit/s
100 Mbit/s
10 Gbit/s
1999 2000 2001 2002
155 Mbit/s
2,5 Gbit/s
20 Gbit/sSURFnet5
10 Gbit/s
1.0 Gbit/s
SURFnet4
10
NLNLSURFnet
Geneva
UKUKSuperJANET4
AbileneAbilene
ESNETESNET
MRENMREN
ItItGARR-B
GEANT
NewYork
FrFrRenater
STAR-TAP
STAR-LIGHT
622 Mb
2.5 Gb
TA network capacity
11
Network load last week
• Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day)• Available to Chicago: 622 Mbit/s• Available to FNAL: 155 Mbit/s
• Needed next year (double cap.): ~25 Mbit/s• Available to Chicago: 2.5 Gbit/s: factor 100 more !!• Available to FNAL: ??
12
Conclusions• Producing a lot of data is easy• Storing a lot of data less easy, but still easy• Moving a lot of data even less easy, but still easy
So what is the problem?• Managing a lot of data is difficult metadata
dbase• The network around Fermilab/CERN is getting
tight• Otherwise there is enough bandwidth !Conclusion:Do the easiest thing:Don’t store or move: recalculate !!
Top quark physics W. Verkerke (Nikhef), Representing ATLAS, CMS, CDF & D0 Rencontres de Blois 2013 1
%d0%92%d1%96%d0%b4%d0%bf%d0%be%d0%b2%d1%96%d0%b4%d1%8c %d0%9e%d0%b1%d1%83%d1%85%d1%96%d0%b2 %d0%bf%d
%d0%9d%d0%b0%d0%b2%d1%87%d0%b0%d0%bb%d1%8c%d0%bd%d0%be %d0%bc%d0%b5%d1%82%d0%be%d0%b4%d0%b8%d1%87%d0
%d0%9f%d0%be%d0%bb%d0%be%d0%b6%d0%b5%d0%bd%d0%b8%d0%b5 %d0%9f%d0%be%d1%81%d0%bb%d1%8b%20%d0%9f%d0%be