implementing metadata using rls/lcg

20
Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester http://www.hep.man.ac.uk/u/ja mwer/

Upload: hedia

Post on 05-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Implementing Metadata Using RLS/LCG. James Cunha Werner University of Manchester http://www.hep.man.ac.uk/u/jamwer/. Babar Experiment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implementing Metadata Using RLS/LCG

Implementing Metadata Using RLS/LCG

James Cunha Werner

University of Manchester

http://www.hep.man.ac.uk/u/jamwer/

Page 2: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Babar Experiment• The BaBar experiment studies the

differences between matter and antimatter, to throw light on the problem, posed by Sakharov, of how the matter-antimatter symmetric Big Bang can have given rise to today’s matter-dominated universe.

• High energy collisions between electrons and positrons produce other elementary particles, giving tracks and clusters which are recorded by several high granularity detectors and from which the properties of the short-lived particles can be deduced.

Page 3: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Each recorded collision, called an event, comprises a large volume of data, and thousand of millions of events are recorded, giving a total dataset size of hundreds of thousands of Gigabytes (or hundreds of Terabytes).

Page 4: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Sources of Data in Babar

Page 5: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Amount of data

# Files Size (TB) Events (Million)

Run1 6,972 2.0 593

Run2 11,527 6.3 1,925

Run3 7,383 3.2 951

Run4 16,671 12.2 3,999

Run5 (2xRun4) ??? 32,000 24 8

Run6 (2xRun5) ??? 64,000 48 16

Run7 (2xRun6) ??? 128,000 100 32

SuperBabar ! Systematic errors >>> statistical errors

Same amount of Monte Carlo Generated data!

Page 6: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Data Structure• The user interface to the eventstore: event "collection". Each

collection represents an ordered series of N events and a user can choose to read the events from the 1st one in the sequence or from any given offset into the sequence.

• Data components:– hdr - event header – usr - user data – tag - tag information – cnd - candidate information – aod - "analysis object data" – tru - MC truth data (only in MC data) – esd - "event summary data" – sim - "sim" data from BgsApp or MooseApp like GHits/GVertices (only in

MC data) – raw - subset of raw data from xtc persisted in the Kanga eventstore

Page 7: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Data organisation

How data are stored (level of detail):• micro = hdr + usr + tag + cnd + aod (+ tru) • mini = micro + esd

Data access:• collections - these are "logical" names that users use to configure their

jobs. These are site-independent so (assuming the site has imported the data) the same collection name should work at any site.

• logical file names (LFN) - these are site-independent names give to all files in the eventstore. Any references within the event data itself _must_ use LFN's so that these remain valid when they are moved from site to site.

• physical file names (PFN) - these are file names that will vary from site to site. In practice they are usually derived from the LFN's by adding a prefix that encapsulates how the data is accessed at that site.

Page 8: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Page 9: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Feeding RLS with metadataGeneration of basic metadata file with files selection:

#!/bin/bashBbkDatasetTcl --dbsite=local > MetaLista.txtcat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local --nolocal \""$1"\"";}' >> geratclchmod 700 geratcl./geratcl

Feeding RLS with basic files

#!/bin/bashls *.tcl | awk '// {split($1,a,"."); print "edg-rm --vo babar cr file:///home/jamwer/PgmCM2/MetaData/"$1 " -l lfn:"a[1] " > " a[1]".rlstok";}' >> alimrlschmod 700 alimrls./alimrls

Page 10: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Conformity CE catalogueRun evaluation software to establish CE conformity and perform

catalogue update.

#!/bin/bashldapsearch -x -H ldap://lcgbdii02.gridpp.rl.ac.uk:2170 -b 'Mds-vo-name=local,o=Grid' '(&(objectClass=GlueCE)(GlueCEAccessControlBaseRule=VO:babar))' | grep "GlueCEUniqueID:" > cenames.txtcat cenames.txt | awk '// {print "./catal "$2;}' > subload.shchmod 700 subload.sh./subload.shcat loadrlssubm >> $1.histocat $1.histo | awk ' /Sub/ {FileName=$2} /https/ {HandleName=$2; print "echo " HandleName "> " FileName".tok " }' >> gridtokchmod 700 gridtok./gridtok

Page 11: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Conformity validationVerify if site follow experiment standards:

#!/bin/bashecho Hostname `/bin/hostname`echo Start time: `/bin/date`echolocal=`pwd`echo “Babar initialisation ". $VO_BABAR_SW_DIR/babar-grid-setup-env.shechoecho “Environment variables"printenvechocd $localecho Arquivos disponiveis: $locallsechoecho " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "echocd $BFDIST/releases/14.5.2srtpath 14.5.2 Linux24RH72_i386_gcc2953cd $localBbkDatasetTcl --dbsite=local > MetaLista.txtcat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local \""$1"\"";}' >> geratclchmod 700 geratcl./geratclexport CE_NAME=$1ls *.tcl | awk -v site=CE_NAME '// {split($1,a,"."); print "edg-rm --vo babar addAlias `cat " $1"` lfn:"a[1]"."site ;}' >> alimrlschmod 700 alimrls./alimrlsechoecho " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "echoecho End time: `/bin/date`

Page 12: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Analysis Submission to Grid • Single command: ./easygrid dataset_name

• Perform Handlers management and submission

• Configurable to achieve user’s requirements

• Software based in State-machine

– Verify skimdata available:

• If not available perform BbkDatasetTCL to generate skimData. Each file will be a job.

– Verify if there are handlers pending

• If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation.

• If yes, verify job status. When the all jobs ended, recover results in user folder.

(Prototype)

Page 13: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Job Submission system, metadata and data

Page 14: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Metadata/Event files and Computer elements

For each dataset there is a metadata file containing the names of the event files. These physical files are registered with the RLS, with several logical file names in the format datsetname_CEJobQueue assigned to them as aliases, showing the CEs which contain copies of that dataset. Searching all the aliases for a dataset name provides a list of CEs to which jobs can be submitted.

Page 15: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Managing large files in Grid

• The analysis executable is allocated in the SE and its logical file name (LFN) is also catalogued in the RLS so any WN need download it only once.

• Metadata not only for data, but to support other files as well.

Page 16: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Gera

• Generation of all necessary information to submit the jobs on the Grid. – Job Description Language (JDL) files– the script with all necessary tasks to run the analysis

remotely at a WN– some grid dependent analysis parameters.

• The JDL files define the input sandbox with all necessary files to be transferred

• WN balance load algorithm matches requirements to perform the task optimally.

Page 17: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Running analysis programs

When the task is delivered in the WN, scripts start running to initialize the specific Babar environment, and the analysis software is downloaded.

Page 18: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Benchmarks

• The different behavior of electrons, hadrons, and muons can be distinguished.

• Performing this analysis takes 7 days using one computer 24 hours a day.

• Using 10 CPUs in parallel, accessed via the Grid, it took only 8 hours.

Behavior of particles in the BaBar Electromagnetic Calorimeter (EMC)

Page 19: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Pi+- N Pi0 decays, with N= 1, 2, 3 and 4

• Invariant masses of pairs of gammas, as measured by the EMC, from Pi0 decay produce a mass peak at 135 MeV (the peak in the plot). All other combinations are spread randomly around all energies (background). • There were 81,700,000 events in the dataset and it took 4 days to run in production, with 26 jobs in parallel: to run it in one single computer would take more than 3 months.

Page 20: Implementing Metadata Using RLS/LCG

Metadata Meeting - Grenoble 2005

James Werner [email protected]

Summary

• Easygrid is working and provides all job submission structure using LCG grid, RLS and metadata management.

• Provides handlers management transparent to the user.• Easy to use !!! • Configurable to achieve user’s requirements and maybe

for other experiments as well.• See homepage http://www.hep.man.ac.uk/u/jamwer/ for

more details.

Thanks for the opportunity!