distributed data management - indico.ific.uv.es filewhat is what? dq2 containers datasets files...

37
Valencia, October 22th [email protected] Distributed Data Management

Upload: dodang

Post on 14-Aug-2019

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Valencia, October 22th

[email protected]

Distributed Data Management

Page 2: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

• Curso básico de DDM intenta responder a las preguntas• qué es que?• qué puedo y no puedo hacer?• dónde puedo pedir más información?

• Prácticas de DQ2 intenta dar una noción básica pero suficiente de las herramientas más comunes de dq2.

2

Page 3: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Outline

• Motivation• What is what?!• Space Tokens • Data Distribution and availability• Data Policies• Analysis Policies• Data access for analysis purposes• @ CERN

[email protected]

Page 4: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Motivation

Atlas detector will produce lots of data:

RAW ~ 3.2 PB/year ESD ~ 2PB/year + copies + versionsAOD ~ 180 TB/year + copies + versionsTAG ~ 2 TB/year + copies + versionsDPD ~ 2TB/year + copies + versions+ MC production data...

Data is transfered hierarchically from T0 to T1 to T2/T3Transfer between T1 and T2 are just within cloudTransfer between T2’s are just done within cloud

Jobs should go where data is

4

Page 5: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

What is what?

DDM

DQ2Files

DatasetsContainers

Sites

CloudSpace Tokens

Tier of Atlas

Replicas

5

Page 6: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

What is what?

DDM

DQ2

Files Datasets

Containers

Sites

Cloud

Space Tokens Tier of Atlas

Replicas

6

Page 7: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

What is what?

DQ2

Containers

Datasets

Files

Datasets can be open, closed or frozen.

• Open datasets can be changed at any time but not replicated. • Closed datasets can be replicated but not changed. They can be versioned though. • Frozen datasets can be replicated and are immutable.

Container datasets are the logical equivalent of physics datasets. They contain the same files. The only difference is that the container concept allows DDM to handle file grouping in a more reliable way.

7

Page 8: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

What is what?

Containers

Datasets

Files

DQ2

When a replica has all the files of the dataset, it is "complete".When a replica has only a part of the files of the dataset, it is marked as "incomplete". This indicate either;

• the dataset is not frozen• its replication is on-going• there was a problem in its replication

Replicas

Cloud

Sites

Space Tokens

all this info is in ToA

Tier of Atlas

Cloud

Sites

Space Tokens

Cloud

Sites

Space Tokens

Cloud

Sites

Space Tokens

Cloud

Sites

Space Tokens

also lfc names, fts channels,...

8

Page 9: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

What is what?

Container

Datasets

Files

mc08.105200.T1_McAtNlo_Jimmy.merge.AOD.e357_s462_r635_t53/

mc08.105200.T1_McAtNlo_Jimmy.merge.AOD.e357_s462_r635_t53_tid064819mc08.105200.T1_McAtNlo_Jimmy.merge.AOD.e357_s462_r635_t53_tid064823mc08.105200.T1_McAtNlo_Jimmy.merge.AOD.e357_s462_r635_t53_tid064822

...

AOD.064822._00173.pool.root.1 0CEA9055-263B-DE11-BFFB-001A6478706C ad:67f1fb10 2997029350AOD.064822._00164.pool.root.1 30D056FC-273B-DE11-B542-001A64789448 ad:dec9491c 2980618608

....

File Name: GUID: CheckSum: Size:

IFIC

_MCDISK

ES

is located at but also at IFIC

_MCDISK

ESIFIC

_MCDISK

ESIN2P3

_MCDISK

FR9

Page 10: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Space Tokens

DATADISK and MCDISK at all Tier-1 and Tier-2, for detector data (including reprocessing) and simulated data, respectively.

SCRATCHDISK at all Tier-1 and Tier-2 (also Tier-3), volatile disk space for user analysis be careful is cleaned after 30 days.

GROUPDISK some Tier-1 and Tier-2, disk space for group activities.

LOCALGROUPDISK at many Tier-3, permanent NON-PLEGDE for user data. It’s deployed in IFAE, IFIC and UAM.

10

Page 11: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Data Distribution and availability

Detector Data• RAW data replicated to one Tier-1 (DATATAPE).• ESD data replicated to two Tier-1 and BNL (DATADISK).• AOD data replicated to ALL Tier-1s and many Tier-2’s.

• 20 copies world-wide at the moment.• At least one copy per cloud shared across Tier-2s.

Reprocessed Detector Data• ESD at custodial Tier-1 plus replicated to BNL (DATADISK).• AOD data replicated to ALL Tier-1s and many Tier-2’s.

Simulated Data• ESD replicated to BNL (DATADISK).• AOD replicated to ALL Tier-1s and many Tier-2’s.

11

Page 12: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Data Policies

General shortcomings• RAW data are generally on TAPE, users can NOT access TAPE.• ESD data are only at some Tier-1, some Tier-1 allow organized group analysis.

Users/Groups can request data replication on demand• http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req

• This request will trigger a DDM subscription.

Space Tokens• Data for general interest are in DATADISK and MCDISK. Authorized by the Physics Coordinator.• Data interested by group in the GROUPDISK. Authorized by the group manager.• LOCALGROUPDISK for permanent storage • SCRATCHDISK for limited time (30 days)

12

Page 13: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Analysis Policies

JOBS ARE SENT TO DATA• Analysis tools figure out the best site to run your code.

Output are always stored in a volatile space (SCRATCHDISK)• Your data will be removed in 30 days!

You can use dq2-get to download data from SCRATCHDISK• Or you can ask for a replication to a permanent storage via

http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req

13

Page 14: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Data access for analysis purposes

14

Page 15: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Data access for analysis purposes

15

Page 16: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Data access for analysis purposes

Direct upload from CPU to Space Token• For LOCALGROUPDISK highly discouraged.• For GROUPDISK strickly forbidden

16

Page 17: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

@ CERN

@ CERN• There are all kind of storage spaces mentioned before• Unless LOCALGROUPDISK

Castor • There is a local space for users and groups, but it is not know by DDM.• 100 TB for users and 100TB for groups, both with quota.• There are tools to up/down -load files from/to grid to your local space.• By default 0.5 TB by default user and 5 TB by default group.• https://twiki.cern.ch/twiki/bin/view/Atlas/CastorPools• [email protected]

17

Page 18: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Monitoring of your data

Monitoring Link

18

Page 19: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

I Need a

BreaK!

[email protected] Valencia, October 22th

19

Page 20: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Don Quijote 2

[email protected] Valencia, October 22th

TUTORIAL

20

Page 21: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Outline

• Set Up or/and installing dq2-tools.• List of the most useful dq2-tools.• dq2-ls + OPTIONS • dq2-get + OPTIONS• dq2-put + OPTIONS• If you have...• Exercises.

21

Page 22: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Don't forget the Proxy!

22

[jnadal@atlasui02 ~]$ voms-proxy-init --voms atlasCannot find file or dir: /nfs/pic.es/user/j/jnadal/.glite/vomsesEnter GRID pass phrase:Your identity: /DC=es/DC=irisgrid/O=ifae/CN=Jordi.NadalCreating temporary proxy ......................................................................................................................... DoneContacting lcg-voms.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "atlas" DoneCreating proxy ......................................................... DoneYour proxy is valid until Mon Oct 19 23:32:13 2009

export DQ2_LOCAL_SITE_ID='ROAMING'

Page 23: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Set Up or/and installing dq2-tools

For IFIC people just do:source /afs/ific.uv.es/project/atlas/software/ddm/DQ2Clients/setup.sh

For UAM people just do:source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh

It is very easy to install dq2-tools, just do:

rm -rf DQ2Clientsmkdir DQ2Clientscd DQ2Clientspacman -trust-all-caches -allow tar-overwrite -get http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/pacman/cache:DQ2Clientssource setup.shexport DQ2_LOCAL_SITE_ID=ROMAING

To have pacman:

curl -O http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gztar xfz pacman-latest.tar.gzrm pacman-latest.tar.gzcd pacman-3.25source setup.sh

23

Page 24: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

List of the most useful dq2-tools

24

[jnadal@atlasui03 ~]$ dq2-dq2-check-replica-consistency dq2-get-number-files dq2-list-subscription dq2-register-subscriptiondq2-close-dataset dq2-get-replica-metadata dq2-list-subscription-info dq2-register-subscription-containerdq2-delete-datasets dq2-list-dataset dq2-list-subscription-site dq2-register-versiondq2-delete-files dq2-list-dataset-by-creationdate dq2-ls dq2-reset-subscriptiondq2-delete-replicas dq2-list-dataset-replicas dq2-metadata dq2-sampledq2-delete-subscription dq2-list-dataset-replicas-container dq2-ping dq2-set-metadatadq2-delete-subscription-container dq2-list-datasets-container dq2-put dq2-set-replica-metadatadq2-destinations dq2-list-dataset-site dq2-register-container dq2-sourcesdq2-erase dq2-list-erased-datasets dq2-register-dataset dq2-usagedq2-freeze-dataset dq2-list-file-replicas dq2-register-datasets-container dq2-get dq2-list-files dq2-register-files dq2-get-metadata dq2-list-replica-history dq2-register-location [jnadal@atlasui03 ~]$ dq2-

45 tools for dq2 but, ......

Page 25: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

List of the most useful dq2-tools

25

[jnadal@atlasui03 ~]$ dq2-dq2-check-replica-consistency dq2-get-number-files dq2-list-subscription dq2-register-subscriptiondq2-close-dataset dq2-get-replica-metadata dq2-list-subscription-info dq2-register-subscription-containerdq2-delete-datasets dq2-list-dataset dq2-list-subscription-site dq2-register-versiondq2-delete-files dq2-list-dataset-by-creationdate dq2-ls dq2-reset-subscriptiondq2-delete-replicas dq2-list-dataset-replicas dq2-metadata dq2-sampledq2-delete-subscription dq2-list-dataset-replicas-container dq2-ping dq2-set-metadatadq2-delete-subscription-container dq2-list-datasets-container dq2-put dq2-set-replica-metadatadq2-destinations dq2-list-dataset-site dq2-register-container dq2-sourcesdq2-erase dq2-list-erased-datasets dq2-register-dataset dq2-usagedq2-freeze-dataset dq2-list-file-replicas dq2-register-datasets-container dq2-get dq2-list-files dq2-register-files dq2-get-metadata dq2-list-replica-history dq2-register-location [jnadal@atlasui03 ~]$ dq2-

...we can manage with 5 of them!

Page 26: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-ls

Querying all dataset with a common pattern:

[jnadal@atlasui02 ~]$ dq2-ls mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53 ?[jnadal@atlasui02 ~]$ dq2-ls mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061353

mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061353[jnadal@atlasui02 ~]$

Querying a dataset

[jnadal@atlasui02 ~]$ dq2-ls mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53*...mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061351mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061353mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061352...

[jnadal@atlasui02 ~]$ dq2-ls mc*106051*merge*AOD* | grep -v tidmc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_r635_t53/mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_s520_d153_r643_t53/[jnadal@atlasui02 ~]$

only in bash

[jnadal@atlasui02 ~]$ dq2-list-datasets-container mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/

mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061352mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061350mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061346mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061361mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061358

Querying all dataset in a container :

26

Page 27: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-ls -r

[jnadal@atlasui02 ~]$ dq2-ls -r mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/...mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061356: INCOMPLETE: WISC_MCDISK COMPLETE: AGLT2_MCDISK,BNL-OSG2_MCDISK,CERN-PROD_MCDISK,DESY-ZN_MCDISK,GRIF-LAL_MCDISK,IFAE_MCDISK,IN2P3-CC_MCDISK,IN2P3-LAPP_MCDISK,INFN-NAPOLI-ATLAS_MCDISK,JINR-LCG2_MCDISK,LRZ-LMU_MCDISK,MWT2_UC_MCDISK,NDGF-T1_MCDISK,NET2_MCDISK,SFU-LCG2_MCDISK,SLACXRD_MCDISK,SWT2_CPB_MCDISK,TAIWAN-LCG2_MCDISK,TOKYO-LCG2_MCDISK,TRIUMF-LCG2_MCDISK,UKI-LT2-RHUL_MCDISK,UKI-SCOTGRID-GLASGOW_MCDISK....Container name: mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/Total datasets: 20Summary: SITE / # COMPLETE / # INCOMPLETE / TOTAL ---------------------------------------------------------------- RU-PROTVINO-IHEP_MCDISK 4 0 4 UKI-SCOTGRID-GLASGOW_MCDISK 20 0 20 WISC_MCDISK 18 2 20 NET2_MCDISK 16 0 16 NDGF-T1_MCDISK 20 0 20 SLACXRD_MCDISK 20 0 20 SARA-MATRIX_MCDISK 5 0 5 VICTORIA-LCG2_MCDISK 2 0 2 MWT2_UC_MCDISK 20 0 20

...[jnadal@atlasui02 ~]$

[jnadal@atlasui02 ~]$ dq2-list-dataset-replicas-container mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53/

Page 28: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-ls -s

[jnadal@atlasui02 ~]$ dq2-ls -s IFAE_MCDISKmc08.105222.Pythia_LSTC_Wgamma_Wenue_aT315.merge.AOD.e395_s462_r635_t53_tid065565mc08.105259.Pythia_LRSM_WR_800_500_emu.merge.AOD.e402_a84_t53_tid061405mc08.109126.PythiaAtautauMA120TB20.merge.AOD.e377_s462_r635_t53_tid061262mc08.109126.PythiaAtautauMA120TB20.merge.AOD.e377_s462_r635_t53_tid061263...

[jnadal@atlasui02 ~]$ dq2-list-dataset-site -n mc*106051*merge*AOD* IFAE_MCDISK....mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_s520_d153_r643_t53_tid078914mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061346mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_s520_d153_r643_t53_tid078912mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_r635_t53_tid064416mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061356mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061358mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_r635_t53_tid064401mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_s462_s520_d153_r643_t53_tid078917

IFIC-LCG2_TOKENUAM-LCG2_TOKEN

http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py

Page 29: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-ls -f

jnadal@atlasui02 ~]$ dq2-ls -f mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061352 | tail[ ] AOD.061352._01528.pool.root.1 5295E94A-AA3B-DE11-9699-00145EC2B87E ad:70087564 1261093394[ ] AOD.061352._01568.pool.root.1 58C6E38C-AA3B-DE11-993A-00145EC2CBCE ad:81accb87 1285009053[ ] AOD.061352._01686.pool.root.1 507AA0F4-A23B-DE11-B6D8-001A645A0642 ad:e612c831 1273407484[ ] AOD.061352._01628.pool.root.2 4E3FEF5F-B73B-DE11-AA69-00145EC2C7E4 ad:7083d479 1271626496[ ] AOD.061352._01749.pool.root.1 F40A11E6-AC3B-DE11-8DBB-00145EC2CA1E ad:ede67af6 1274913773[ ] AOD.061352._01519.pool.root.1 78F7A012-A83B-DE11-8AFC-00145E3CC121 ad:1c434f38 1252563893

FILENAME GUID CHECKSUM SIZE

total files: 250local files: 0total size: 316.993.996.566 = 316 GBdate: 2009-05-08 13:51:34[jnadal@atlasui02 ~]$

[jnadal@atlasui02 ~]$ export DQ2_LOCAL_SITE_ID='IFAE_MCDISK'[jnadal@atlasui02 ~]$ dq2-ls -f mc08.106051.PythiaZmumu_1Lepton.merge.AOD.e347_a84_t53_tid061352 | tail[X] AOD.061352._01528.pool.root.1 5295E94A-AA3B-DE11-9699-00145EC2B87Ead:70087564 1261093394[X] AOD.061352._01568.pool.root.1 58C6E38C-AA3B-DE11-993A-00145EC2CBCEad:81accb87 1285009053[X] AOD.061352._01686.pool.root.1 507AA0F4-A23B-DE11-B6D8-001A645A0642 ad:e612c831 1273407484[X] AOD.061352._01628.pool.root.2 4E3FEF5F-B73B-DE11-AA69-00145EC2C7E4ad:7083d479 1271626496[X] AOD.061352._01749.pool.root.1 F40A11E6-AC3B-DE11-8DBB-00145EC2CA1Ead:ede67af6 1274913773[X] AOD.061352._01519.pool.root.1 78F7A012-A83B-DE11-8AFC-00145E3CC121 ad:1c434f38 1252563893total files: 250local files: 250total size: 316993996566date: 2009-05-08 13:51:34[jnadal@atlasui02 ~]$

Page 30: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-ls -f

[jnadal@atlasui02 ~]$ dq2-ls -L IFIC-LCG2_MCDISK -f mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641 | tail

mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641[X] AOD.068641._00001.pool.root.25C3F91C5-FB4F-DE11-A70F-001E0B472C96 ad:830393bc1083500442total files: 1local files: 1total size: 1083500442date: 2009-06-03 05:39:45[jnadal@atlasui02 ~]$

[jnadal@atlasui02 ~]$ dq2-ls -L IFIC-LCG2_MCDISK -fp mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641 | tail

mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641srm://srmv2.ific.uv.es/lustre/ific.uv.es/grid/atlas/atlasmcdisk/mc08/AOD/mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641/AOD.068641._00001.pool.root.2total files: 1total size: 1083500442date: 2009-06-03 05:39:45[jnadal@atlasui02 ~]$

Page 31: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

how to create a PoolFileCatalog.xml

Be careful: Only files registered in this site will be included in the Pool File Catalogue, since PFNs are needed for the Pool File Catalogue. If there already is a PoolFileCatalog.xml in the directory, the given files will be appended.

If the program that will read the Pool File Catalogue cannot use the SRM interface you will have to mangle the path to your local setup. Since there are many different tools that read Pool File Catalogues this option is deliberately very generic.

[jnadal@atlasui02 ~]$ dq2-ls -L IFIC-LCG2_MCDISK -P mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641Querying DQ2 central catalogues to resolve datasetname mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641Processing mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641 with PoolFileCatalog.xml[jnadal@atlasui02 ~]$ ls -l PoolFileCatalog.xml -rw-r--r-- 1 jnadal atlas 498 Oct 21 10:34 PoolFileCatalog.xml[jnadal@atlasui02 ~]$

1 <?xml version="1.0" encoding="UTF-8" standalone="no" ?> 2 <!DOCTYPE POOLFILECATALOG SYSTEM "InMemory"> 3 <POOLFILECATALOG> 4 <File ID="5C3F91C5-FB4F-DE11-A70F-001E0B472C96"> 5 <physical> 6 <pfn filetype="ROOT_All" name="srm://srmv2.ific.uv.es/lustre/ific.uv.es/...../_Herwig.merge.AOD.e419_a84_t53_tid068641/AOD.068641._00001.pool.root.2"/> 7 </physical> 8 <logical> 9 <lfn name="AOD.068641._00001.pool.root.2"/> 10 </logical> 11 </File> 12 </POOLFILECATALOG>

<pfn filetype="ROOT_All" name="file:///lustre/ific.uv.es/...../_Herwig.merge.AOD.e419_a84_t53_tid068641/AOD.068641._00001.pool.root.2"/>

• More info: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo#create_a_Pool_File_Catalogue_in

Page 32: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-get• Both dq2-get and DDM subscriptions will access ONLY files registered in DDM.

• dq2-get creates "local" copies of files, which will not be known to DDM and will not be accessible with dq2-* commands. The Grid/DDM informations of the files will not be kept in the local files. If you plan to publish these data on the Grid later from the target storage, run a subscription.

• DDM subscription will copy all the files belonging to the dataset to a storage known by DDM. The files at the destination will be registered to DDM and accessible by dq2- commands.

• Users cannot dq2-get files from DDM sites associated to TAPE (xxx_MCTAPE and xxx_DATATAPE, CERN-PROD_TZERO and CERN-PROD_DAQ). To access data from tape, one should request the replication of the dataset to DISK storage through DDM request(#RequestReplication) .

mkdir /tmp/$user[jnadal@atlasui02 ~]$ dq2-get -H /tmp/$USER mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641Using ROAMING profileQuerying DQ2 central catalogues to resolve datasetname mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641Datasets found: 1.../tmp/AOD.068641._00001.pool.root.2: 0/1083500442 transferred

[jnadal@atlasui02 ~]$ dq2-get -f AOD.068641._00001.pool.root.2 mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641

[jnadal@atlasui02 ~]$ dq2-get -f AOD.068641._00001.pool.root.2 mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53/

[jnadal@atlasui02 ~]$ dq2-get -f file1, file2, file3,... mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641

32

Page 33: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-get

[jnadal@atlasui02 ~]$ dq2-get -n 10 mc08.209722.ZH115llbb_Herwig.merge.AOD.e419_a84_t53_tid068641

To download a n random files:

[jnadal@atlasui02 ~]$ dq2-get -L ROAMING -s UAM-LCG2_MCDISK mc08.209646.PythiaB_bb_J6x_1lepPt15.merge.AOD.e415_a84_t53_tid068084

To download a dataset from a specific site :

To download a some dataset at once :

To download a some dataset/files at once :

[jnadal@atlasui02 ~]$ cat datasets_to_download_from_IFIC mc09_valid.107194.singlepart_e_E60_etaphibin_6.recon.AOD.e437_s566_r738_tid076909mc09_valid.107196.singlepart_e_E60_etaphibin_8.recon.AOD.e437_s566_r738_tid076911

[jnadal@atlasui02 ~]$ dq2-get -F datasets_to_download_from_IFIC

[jnadal@atlasui02 ~]$ cat datasets_to_download_from_IFIC mc09_valid.107194.singlepart_e_E60_etaphibin_6.recon.AOD.e437_s566_r738_tid076909file1file2mc09_valid.107196.singlepart_e_E60_etaphibin_8.recon.AOD.e437_s566_r738_tid076911fileAfileB

Page 34: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

dq2-put

[jnadal@atlasui02 ~]$ ls -l /tmp/jnadal/ total 107820-rw-r--r-- 1 jnadal atlas 110289278 Oct 21 11:59 Note.UE.Reco.A.5k.2bo.out[jnadal@atlasui02 ~]$ dq2-put -s /tmp/jnadal/ user.jordinadal.TUTORIAL.test.20091021_IFAE[...]

[jnadal@atlasui02 ~]$ dq2-ls -f user.jordinadal.TUTORIAL.test.20091021_IFAE

user.jordinadal.TUTORIAL.test.20091021_IFAE[X] Note.UE.Reco.A.5k.2bo.out c43c7be1-410e-4ec9-b1e8-e4ca58c3f27c ad:7c73ab3b 110289278total files: 1local files: 1total size: 110289278date: 2009-10-21 10:20:22[jnadal@atlasui02 ~]$

[jnadal@atlasui02 ~]$ dq2-put -L TIER2_LOCALGROUPDISK -s /tmp/jnadal/ user.jordinadal.TUTORIAL.test.20091021_TIER2

registering a dataset:

registering a dataset in for favorite Tier2 :

[jnadal@atlasui03 ~]$ export DQ2LOCALSITEID='UAM-LCG2_LOCALGROUPDISK'[jnadal@atlasui03 ~]$ dq2-put -s /tmp/jnadal/ user.jordinadal.TUTORIAL.test.20091021_UAM

To add files to your dataset just add those files in the directory and repeat the command:[jnadal@atlasui02 ~]$ ls -l /tmp/jnadal/total 215640-rw-r--r-- 1 jnadal atlas 110289278 Oct 21 11:59 Note.UE.Reco.A.5k.2bo.out-rw-r--r-- 1 jnadal atlas 110289278 Oct 21 12:42 Note.UE.Reco.A.5k.2bo.out_2[jnadal@atlasui02 ~]$ dq2-put -s /tmp/jnadal/ user.jordinadal.TUTORIAL.test.20091021_IFAE

34

Page 35: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

after creating your dataWhat to you do after creating your data?

if the dataset is not freeze, any replication will be suspended.if you may want to add another set of files use containers like:

dq2-register-datasets-container CONTAINERNAME DATASET1 DATASET2dq2-delete-datasets CONTAINERNAME DATASET1

if you want to add files to your dataset you should think in close the dataset.

dq2-close-dataset DATASET1

if everything is done freeze your dataset like:

dq2-freeze-dataset DATASET1

if you are using ganga your dataset will be stored in _SCRATCHDISK at the site where your jobs are running.

THESE DATASETS WILL BE DELETED AFTER 30 DAYS!!!

Two options to maintain your dataset:

1. Ask for a replication to a _LOCALGROUPDISK space2. Use dq2-get to download files onto your local disk. Again, these files will not be available in DDM after the deletion.

35

Page 36: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

Exercices

Exercise 1:Find a dataset with these requirements: mc08, AOD, e419_a84_t53.List files on this dataset.Find it in another site.List dataset on this dataset. Do they match?

Exercise 2:

Get a random set of 5 files of the above dataset, from a specific site (up to you!)Upload this dataset to your favorite site and create the PoolFile Catalog.xml.

Exercise 3:

Do want you want until 13:00!

Page 37: Distributed Data Management - indico.ific.uv.es fileWhat is what? DQ2 Containers Datasets Files Datasets can be open, closed or frozen. • Open datasets can be changed at any time

37

If you have ...

.... problems with one site don’t hesitate to open a ticket here:

https://gus.fzk.de/pages/home.php

.... problems with dq2-tools, ddm,.... just open a DDM savannah bug here:

https://savannah.cern.ch/bugs/?group=dq2-ddm-ops

... doubts send an email to ddm experts:

[email protected]

... have problems with grid/athena/ganga/ddm/sites/... etc, after reading some twikis send mail to:

[email protected]