analysis with proof in alice
DESCRIPTION
Analysis with PROOF in ALICE. Arsen Hayrapetyan , Yerevan Physics Institute; CERN [email protected]. Outline. Part 1: Theory PROOF, AAF, CAF PROOF Analysis. Merging, submerging PROOF terminology The structure of the PROOF analysis task Analysis data: Trees, Chains, Datasets. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/1.jpg)
ALIC
E O
fflin
e Tu
toria
l
Analysis with PROOFin ALICE
Arsen Hayrapetyan, Yerevan Physics Institute; [email protected]
Mar
ch 2
2, 2
013
1
![Page 2: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/2.jpg)
ALIC
E O
fflin
e Tu
toria
l
Outline
Part 1: Theory• PROOF, AAF, CAF• PROOF Analysis. Merging, submerging• PROOF terminology• The structure of the PROOF analysis task• Analysis data: Trees, Chains, Datasets.• AliROOT usage options
Part 2: Practice• Exercise 0: Connecting to CAF, listing anlaysis packages and data• Exercise 1: ESD analysis on real data on CAF• Exercise 2: ESD analysis on MC data on CAF• Exercise 3: Combining exercises 1 and 2• Exercise 4: AOD analysis on real data on CAF• Exercise 5: Staging datasets on CAF• Exercise 6: Processing staged datasets on CAF
Mar
ch 2
2, 2
013
2
![Page 3: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/3.jpg)
ALIC
E O
fflin
e Tu
toria
l
PROOF, AAF, CAF
• Parallel ROOt Facility is an extension of ROOT for interactive analysis of large sets of ROOT files in parallel on clusters of computers (analysis facility) or multi-core machines (PROOF Lite).
• ALICE Analysis Facilities is a group of analysis facilities dedicated to prompt analysis of relatively small (compared to grid) amount of pp and PbPb data (all AFs) as well as for reconstruction of samples of raw data during data taking (CAF).
• CERN Analysis Facility is a PROOF cluster with 464 CPU cores, 1.4 TB total RAM and 162 TB total disk space. The analysis data is normally staged (copied from grid) to CAF by users or administrator, before analysing it.
Mar
ch 2
2, 2
013
3
![Page 4: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/4.jpg)
ALIC
E O
fflin
e Tu
toria
l
PROOF analysis schema
Mar
ch 2
2, 2
013
4
Remote PROOF Cluster
Data
Result
Data
Result
Data
Resultnode1
root
Proof masterProof slave
root
root
root
node2
node3
node4
ana.C
Client – Local PC
ana.C Data
rootstdout/result
Result
![Page 5: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/5.jpg)
ALIC
E O
fflin
e Tu
toria
l
Event based parallelism
Mar
ch 2
2, 2
013
5
![Page 6: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/6.jpg)
ALIC
E O
fflin
e Tu
toria
l
Merging of the results• Option 1: Merging on the Master
• The results produced on Workers are all sent to the Master and merged there
• Option 2: Submerging• Certain nodes are selected to be submergers, the results produced on
Workers are divided between them and merged (producing smaller output), the outputs are sent to the Master to be finally merged
• The number of submergers can be chosen automatically (default proof behaviour) or specified by the user
• Standard merging implementation for histograms is available• Other classes need to implement Merge(TCollection)• When no merging function is available, individual objects are
returned
Mar
ch 2
2, 2
013
6
![Page 7: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/7.jpg)
ALIC
E O
fflin
e Tu
toria
l
PROOF terminology• PROOF cluster
• A set of computers working in coordinate way following PROOF protocol• Node
• Single computer in PROOF cluster• Client
• A process created within ROOT session on your machine that is connected to Master
• Master• A process on a dedicated node coordinating work between workers
• Worker or Slave• A process on a node that processes data and is connected to the Master
• Query• A job submitted from the Client to Master. The query consists of a selector and a chain
• Selector• A class containing the analysis code
• In ALICE we use the Analysis Framework and the Selector is usually derived from AliAnalysisTaskSE
• Chain• A list of files (trees) to process
Mar
ch 2
2, 2
013
7
![Page 8: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/8.jpg)
ALIC
E O
fflin
e Tu
toria
l
How does AAF analysis work• In ALICE, we use Analysis Framework to write PROOF-enabled
programs• Analysis task is written as a class derived from AliAnalysisTaskSE• The data to be analysed is normally staged on AAF (if not you can
ask to stage it or do it yourself). The dataset name is then specified in the steering macro
• In case you work with PROOF Lite you can put the files containing the data into a chain and specify the TChain object in the steering macro
• If you need libraries not contained in AliROOT you should pack them in PAR (PROOF Archive) packages, upload them to AAF and enable them in AAF in the steering macro
Mar
ch 2
2, 2
013
8
![Page 9: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/9.jpg)
ALIC
E O
fflin
e Tu
toria
l
The structure of the PROOF analysis task
Mar
ch 2
2, 2
013
9
Constructor once on Client
Terminate()
UserCreateOutputObjects() once on each Worker
ConnectInputData() for each tree
UserExec() for each event
The (ALICE-specific) analysis task is defined via class derived from AliAnalisysTaskSE.
![Page 10: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/10.jpg)
ALIC
E O
fflin
e Tu
toria
l
Trees• The tree (object of ROOT class TTree) is a container for data
storage• Consists of several branches
• Can be stored in one or several files• Stored contiguously• Can be switched off during data reading (hence speed-up)
• Content visualisation with helper functions: Draw(), Scan()• Compressed
Mar
ch 2
2, 2
013
10
pointxyz
File
x x x x x x x x x x
y y y y y y y y y y
z z z z z z z z z z
Branches
![Page 11: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/11.jpg)
ALIC
E O
fflin
e Tu
toria
l
Chains• The chain (object of ROOT class TChain) is a collection of files
containing trees (TTree objects)• Visualisation functions Draw() and Scan() can be used, as with
trees (they will iterate over all elements of the chain)• The data to be analysed is normally put in a tree or chain for
local analysis. For analysis on a PROOF cluster one uses datasets.
Mar
ch 2
2, 2
013
11
Chain
Tree1 (File 1)
Tree2 (File 2)
Tree3 (File 3)
Tree4 (File 4)
Tree5 (File 5)
![Page 12: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/12.jpg)
ALIC
E O
fflin
e Tu
toria
l
Datasets• The dataset is a named list of files (containing trees in case of
ALICE data) including metadata information about files’ locations.• Staged to AAF by cluster administrators or users.
• If staged by administrator, have the names starting with /alice/data or /alice/sim, e.g.:• /alice/data/LHC10d_000126285_p2• /alice/sim/LHC10e13_118507
• If staged by user, have the names starting with <user_group>/<user_grid_login_name>, e.g.:• /PWG4/esicking/LHC10e20_130840_AOD060• /VZERO/cheynis/run136104_pass1• Users who do not enter in any PWG or detector group, have
<user_group>=default, e.g.:• /default/poghos/LHC11a_000146746_pass3_with_SDD
• Can be listed with TProof::ShowDataSets() or via http://aaf.cern.ch -> Favourite links => AAF datasets
Mar
ch 2
2, 2
013
12
![Page 13: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/13.jpg)
ALIC
E O
fflin
e Tu
toria
l
Chain
Tree1 (File1)
Tree2 (File2)
Tree3 (File3)
Tree4 (File3)
Tree5 (File4)
Workflow summaryAnalysis
(AliAnalysisTask) Input
proof
proof
proof
Mar
ch 2
2, 2
013
13
![Page 14: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/14.jpg)
ALIC
E O
fflin
e Tu
toria
l
Workflow summaryAnalysis
(AliAnalysisTask)
proof
proof
proof
Output
Output
Output MergedOutput
Mar
ch 2
2, 2
013
14
![Page 15: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/15.jpg)
ALIC
E O
fflin
e Tu
toria
l
AliROOT usage options• As main analysis software package AliROOT should be loaded into memory of
the Workers on AAF before processing the data. This is done via method TProof::EnablePackage(), e.g.:• gProof->EnablePackage(“VO_ALICE@AliRoot::v5-03-21-AN”)• TProof::EnablePackage() accepts one of pre-defined string constants as second
parameter:• “default” – loads basic analysis libraries (libVMC, libTree, libPhysics, libMatrix, libMinuit,
libXMLParser, LibGui, libSTEERBase, libESD, libAOD, libANALYSIS, libOADB, alibANALYSISalice), e.g.:• gProof->EnablePackage(“VO_ALICE@AliRoot::v5-03-21-AN”, “default”)
• “ALIROOT” – same as “default”, loads libraries defined in $ALICE_ROOT/macros/loadlibs.C• gProof->EnablePackage(“VO_ALICE@AliRoot::v5-03-21-AN”, “ALIROOT”)
• “REC” – suited for reconstruction, loads libraries defined in $ALICE_ROOT/macros/loadlibsrec.C
• gProof->EnablePackage(“VO_ALICE@AliRoot::v5-03-21-AN”, “REC”)• “SIM” – suited for simulation, loads libraries defined in
$ALICE_ROOT/macros/loadlibssim.C• gProof->EnablePackage(“VO_ALICE@AliRoot::v5-03-21-AN”, “SIM”)
• The list of available packages can be displayed with TProof::ShowPackages(), e.g.:• gProof->ShowPackages()
Mar
ch 2
2, 2
013
15
![Page 16: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/16.jpg)
ALIC
E O
fflin
e Tu
toria
l
More information• AAF user documentation
• http://aaf.cern.ch/node/89
• PROOF documentation• http://root.cern.ch/drupal/content/proof
• Analysis framework documentation• http://aliweb.cern.ch/Offline/Activities/Analysis/AnalysisFrame
work/index.html
Mar
ch 2
2, 2
013
16
![Page 17: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/17.jpg)
ALIC
E O
fflin
e Tu
toria
l
Part 2: Hands-on exercises• Set up your credentials
• $> mkdir .globus• Put your certificate and private key there
• Download tutorial files from agenda and unpack them
Mar
ch 2
2, 2
013
17
![Page 18: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/18.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 0: Connecting to CAF, listing anlaysis packages and data
• $> root -l• root > gEnv->SetValue("XSec.GSI.DelegProxy", "2”)• root> TProof::Open([email protected])• root> gProof->ShowPackages()• root> gProof->ShowDataSets()
Mar
ch 2
2, 2
013
18
![Page 19: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/19.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 1: ESD analysis on real data on CAF• $> cd ex1• Inspect the files (steering macro and analysis task)• Run the analysis
• root -l ex1.cxx Mar
ch 2
2, 2
013
19
![Page 20: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/20.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 2: ESD analysis on MC data on CAF• $> cd ex2• Inspect the files (steering macro and analysis task)• Run the analysis
• root -l ex2.cxx Mar
ch 2
2, 2
013
20
![Page 21: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/21.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 3: Combining exercises 1 and 2
• $> cd ex3• Inspect the files (steering macro and analysis task)• Run the analysis
• root -l ex3.cxx Mar
ch 2
2, 2
013
21
![Page 22: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/22.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 4: AOD analysis on real data on CAF• $> cd ex4• Inspect the files (steering macro and analysis task)• Run the analysis
• root -l ex4.cxx Mar
ch 2
2, 2
013
22
![Page 23: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/23.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 5: Staging datasets• Reference: http://aaf.cern.ch/node/224• mkdir ex5 && cd ex5• wget http://afdsmgrd.googlecode.com/svn/tags/v1.0.6/macros/CreateDataSetFromAliEn.C
• Task: Edit the file CreateDataSetFromAliEn.C to stage a dataset containing at most 10 files.
• Use ESD pass2 data for run #188359 (alien path: /alice/data/2012/LHC12g/000188359/ESDs/pass2)
• Stage root_archive.zip files, use “AliESDs.root” as anchor• Specify “/esdTree” for the tree name• Name the dataset “testDS”• Test your modifications with “root -l CreateDataSetFromAliEn.C” to make sure
the dataset satisfies the conditions above.• Once it is OK, enable actual staging with “commit” option
• Monitor the staging progress with gProof->ShowDataSets()• Solution available on agenda page
Mar
ch 2
2, 2
013
23
![Page 24: Analysis with PROOF in ALICE](https://reader036.vdocuments.us/reader036/viewer/2022062520/56816254550346895dd29d71/html5/thumbnails/24.jpg)
ALIC
E O
fflin
e Tu
toria
l
Exercise 6: Analysing staged dataset
• $> cd ex1
• Task: Modify the ex1.cxx file to analyse the dataset you have staged for Exercise 5.
Mar
ch 2
2, 2
013
24