1 data management d0 monte carlo needs the nikhef d0 farm the data we produce the sam data base the...

Data Management

D0 Monte Carlo needsThe NIKHEF D0 farmThe data we produceThe SAM data base

The networkConclusions

Kors Bos, NIKHEF, AmsterdamFermilab, May 23 2001

D0 Monte Carlo needs• D0 Trigger rate is 100 Hz, 107 seconds/yr 109 events/yr

• We want at least 10% of that be simulated 108 events/yr

• To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte)

– On a 800 MHz PIII

• So 1 cpu can produce ~105 events/yr (~200 Gbyte)– Assuming a 60% overall efficiency

• So our 100 cpu farm can produce ~107 events/yr (~20 Tbyte)

– And this is only 10% of the goal we set ourselves

– Not counting Nijmegen D0 farm yet

• So we need at least an order of magnitude more– UTA (50), Lyon (200), Prague(10), BU(64),

– Nijmegen(50), Lancaster(200), Rio(25),

Example: Min.bias

• Did a run with 1000 events on all cpu’s– Took ~2 min./event– So ~1.5 days for the whole run– Ouput file size ~575 MByte

• We left those files on the nodes• reason for enough local disk space !• Intend to repeat that “sometimes”

Output data

• -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py

• -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params

• -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000

• -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

Output data translated0.047 Gbyte gen_*

1.5 Gbyte d0g_*

0.7 Gbyte sim_*

import_gen_*.py

import_d0g_*.py

import_sim_*.py

isajet_*.paramsRunJob_Farm_*.paramsd0gstar_*.paramsd0sim_*.paramssamglobal_*.paramsSummary.conf

12 files for generator+d0gstar+psimBut of course only 3 big onesTotal ~2 Gbyte

Per Day, on 100 cpu’sTotal 200 Gbyte/day !

Automation• Mc_runjob (modified)

– Prepares MC jobs (gen+sim+reco+anal)• (f.e.) 300 events per job/cpu • Repeat (f.e.) 500 times

– Submits them into the batch (FBSNG)• Ran on the nodes• Moves the executable to the nodes + some files

– Copy to fileserver after completion• A separate batch job onto the fileserver• Data moves between nodes and server

– Submits them into SAM• Sam does file transfers to Fermi and SARA

• Runs for a week …

farm server file server

SAM DB

datastore

fbs(rcp)

fbs(sam)

fbs(mcc)

mcc request

mcc input

mcc output

1.2 TB

control

metadata

fbs job:1 mcc2 rcp3 sam

datastore

Network bandwidth• NIKHEF SURFnet 1 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• Esnet: Chicago Fermilab 155 Mbit

ATM• But ftp gives us ~4 Mbit/sec• bbftp gives us ~25 Mbit/sec• bbftp processes in parallel ~45 Mbit/sec

For 2002 • NIKHEF SURFnet 2.5 Gbit• SURFnet: Amsterdam Chicago 622 Mbit• SURFnet: Amsterdam Chicago 2.5 Gbit optical• Chicago Fermilab ? More than 155

network capacity internally

100 Gbit/s

1 Gbit/s

10 Mbit/s

100 Mbit/s

10 Gbit/s

1999 2000 2001 2002

155 Mbit/s

2,5 Gbit/s

20 Gbit/sSURFnet5

10 Gbit/s

1.0 Gbit/s

SURFnet4

NLNLSURFnet

Geneva

UKUKSuperJANET4

AbileneAbilene

ESNETESNET

MRENMREN

ItItGARR-B

NewYork

FrFrRenater

STAR-TAP

STAR-LIGHT

622 Mb

2.5 Gb

TA network capacity

Network load last week

• Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day)• Available to Chicago: 622 Mbit/s• Available to FNAL: 155 Mbit/s

• Needed next year (double cap.): ~25 Mbit/s• Available to Chicago: 2.5 Gbit/s: factor 100 more !!• Available to FNAL: ??

Conclusions• Producing a lot of data is easy• Storing a lot of data less easy, but still easy• Moving a lot of data even less easy, but still easy

So what is the problem?• Managing a lot of data is difficult metadata

dbase• The network around Fermilab/CERN is getting

tight• Otherwise there is enough bandwidth !Conclusion:Do the easiest thing:Don’t store or move: recalculate !!

1 data management d0 monte carlo needs the nikhef d0 farm the data we produce the sam data base the...

Documents

%d0%9d%d0%b0%d0%b2%d1%87%d0%b0%d0%bb%d1%8c%d0%bd%d0%be...

canopen - nikhef

about nikhef physics data processing middleware...

advanced virgo – nikhef tasks jo van den brand, nikhef

%d1%81%d0%b5%d1%82%d0%b5%d0%b2%d0%b0%d1%8 f...

Май 18.05.2020 г. – 22.05.2020 г. Тема...

%d0%9e%d0%b1%d1%80%d0%b0%d0%b7...

%d0%9f%d1%80%d0%b5%d0%b7%d0%b5%d0%bd%d1%82%d0%b0%d1%86%d0%b8%d1%8f...

ï¿½ï¿½d0%98%d0%b7%d0%bc%d0%b5%d0%bd%d0%b5%d0%bd%d0...

%d0%9e%d1%82%d1%87%d0%b5%d1%82 %d0%be...

d0 grid data production 1 d0 grid data production...

ï¿½ï¿½d0%98%d0%b7%d0%bc%d0 ... ·...

%d0%90.%d0%91%d0%bb%d0%be%d0%bd%d0%b4%d0%b5%d0%bb%d1%8c-%d0%98%d0%b3%d1%80%d1%83%d1%88%d0%ba%d0%b8-%d0%bf%d0%be%d0%bc%d0%bf%d0%be%d0%bd%d1%8b...

%d0%9a%d0%be%d0%bb%d0%bb %d0%b4%d0%be%d0%b3

wouter verkerke, nikhef roofit a tool kit for data modeling...

niphad june 27, 2001 freya blekman, nikhef amsterdam 1...

%d0%9f%d0%be%d0%bb%d0%be%d0%b6%d0%b5%d0%bd%d0%b8%d0%b5...

%d0%a1%d0%ba%d0%be%d1%82%d1%82...

%d0%a4%d0%93%d0%9e%d0%a1%203 %20%d0%92%d0%a1%d0%9e

%d0%9a%d1%83%d0%b7%d0%be%d0%b2%d0%bd%d0%be%d0%b9...