data discovery tools, dq2 enduser tools and ... - trubanurcan ozturk university of texas at...
TRANSCRIPT
Nurcan Ozturk
University of Texas at Arlington
SCHOOL ON HEP@TR-GRID
April 30 – May 2, 2008
Turkish Atomic Energy Authority (TAEA), Ankara, Turkey
Data Discovery Tools, DQ2 Enduser Tools andPhysics Analysis Tools
May 2, 2008Nurcan Ozturk 2
Outline
User’s work-flow for Data Analysis
Data Discovery Tools AMI - ATLAS Metadata Interface
TAG Browser - ELSSI
DQ2 Enduser Tools
ATLAS Analysis Model Analysis Model Forum Recommendations
Derived Physics Data (DPD)
Analyzing the Data (inside or outside Athena)
AthenaRootAccess (ARA)
EventView
May 2, 2008Nurcan Ozturk 3
User’s Work-flow for Data Analysis
Locate the data
Analyze the results
Setup the analysis job
Submit to the Grid
Retrieve the results
Setup the analysis code
Data Discovery Tools
May 2, 2008Nurcan Ozturk 5
ATLAS Metadata Interface (AMI)
http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/index.html
AMI is a bookkeeping project. AMI is a generic cataloging system (a database application). The majority of datasets currently catalogued in AMI are Monte Carlo datasets. AMI reads information from the task request system, and correlates it with information read from the production database. AMI contains the physics metadata for:
2008 real data 2008 FDR exercise 2007 Cosmics runs (M5 data) 2006/2007 service challenge datasets StreamTest Data Challenges DC1 and DC2 / Rome
Production System Combined Test Beam
AMI also powers the TagCollector release management tool.
May 2, 2008Nurcan Ozturk 6
AMI Tutorial
http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/
Or
http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/FastTrackTutorial.html
What is AMI? Where does AMI get its Information? How do I search for a dataset? Which information can I get from the result of an AMI dataset search? What is the schema of the AMI dataset catalogue? Why can I sometimes not find a dataset when I can see its existence in other catalogues? Can I refine the search? Can I simply browse all of the information in AMI? Can I bookmark an AMI page? Why doesn't the back button of my browser work? Can I use AMI without going through the web interface? How can I extract information from AMI? How to I write to AMI?
May 2, 2008Nurcan Ozturk 7
How Do I Search For A Dataset? – Simple Search
Follow the link to the “simple search interface” from the tutorial page:
type here
May 2, 2008Nurcan Ozturk 8
Results From Simple Search (1)
pull down menu
link links link
May 2, 2008Nurcan Ozturk 9
Results From Simple Search (2)When you click on Provenance link it shows:what version of Athena software used in making evgen/digit/reco
May 2, 2008Nurcan Ozturk 10
Results From Simple Search (3)When you click on DQ2 link it shows:DQ2 Dataset Metadata, existing replicas of the dataset, a link to PanDA monitor
May 2, 2008Nurcan Ozturk 11
Results From Simple Search (4)When you click on PANDA link:It gets you to the dataset browser
May 2, 2008Nurcan Ozturk 12
How Do I Search For A Dataset? – Advanced Search
Follow the link to the “Advanced search interface” from the tutorial page:
May 2, 2008Nurcan Ozturk 13
Results From Advanced Search
May 2, 2008Nurcan Ozturk 14
TAG
ATLAS will produce petabytes of data, a system of event-level metadata is needed to quickly identify and select events that are interested for a given analysis. This is provided by TAG files, and the TAG database.
TAG files are built from AOD according to offline analysis-style code. TAG files are then loaded into TAG database.
TAG files store information about the status of each sub-detector, trigger and physics object ID.
For instance for FDR-1 data TAGs contain: Event information:
Run number, event number, luminosity block, number of vertices and tracks, primary vertex position. (Luminosity has an entry but not filled)
Variables such as the summed cell Et, missing Et magnitude, and phi
Trigger information: BitMasks encode pass, pass after prescale for each trigger item/chain Physics objects:
multiplicity of physics objects and the Pt, eta, phi for the highest Pt objects A tightness criterion for e/mu/gamma is included as is b-tag likelihoods and tau candidate likelihood.
PhysWords: 32-bit TAG Word. For b-physics for instance: Bit 0: HighPtMuonPair, Bit 1: J/Psi candidate, Bit 2: Upsilon candidate.
See more details for FDR & TAGs from a talk by James Frost, April Exotics Working See more details for FDR & TAGs from a talk by James Frost, April Exotics Working Group meetingGroup meeting
May 2, 2008Nurcan Ozturk 15
How Does TAG Selection Work?
Use the TAG file as an input to EventSelector or PoolTAGInput. Make sure the matching Pool file (eg. AOD) is in the PoolFileCatalog. Define you query of the TAG content. Run the job. Very flexible:
Can use the TAG to preselect the events from an AOD in which you are interested, passing only those to an analysis algorithm.
Can use the ATG to write out an AOD (or ESD, RDO) of only the selected events. How to learn more? Good tutorials are available already:How to learn more? Good tutorials are available already:
https://twiki.cern.ch/twiki/bin/view/Atlas/FeedBackForTags https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection#Building_Tags_Under
_12_0_31 (create tag files) https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAG https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAGAnalysis https://twiki.cern.ch/twiki/bin/view/Atlas/TopFdrTag http://twiki.mwt2.org/bin/view/Main/TutorialTag080318 (All the above links are
available from this one.)
May 2, 2008Nurcan Ozturk 16
TAG Browser – ELSSI (1)
TAGs are accessed by users via a web interface called ELSSI, the ATLAS Event Level Selection Service Interface. For FDR-1 data (tutorial) https://atldbdev01.cern.ch/tagservices/tutorial/index.htm For FDR-1 data: https://atldbdev01.cern.ch/tagservices/fdr/index.htm
You need Firefox to see this page As Jack Cranshaw informed me.
May 2, 2008Nurcan Ozturk 17
TAG Browser – ELSSI (2)
How to use ELSSI:
Define a query to select runs, streams, data quality, trigger chains,…
Review the query
Execute the query and retrieve the TAG file (a root file)
DQ2 Enduser Tools
May 2, 2008Nurcan Ozturk 19
The Client Tools to Retrieve Data
DQ2 enduser tools Includes dq2_xxx (dq2_ls, dq2_get, etc) commands Available to download from:
https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#Download The setup files are edited to accommodate local needs (dq2.sh, setup.sh) Available on AFS at CERN:
source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh
source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN
gLite UI (User Interface) Includes lcg-cp, egee-gridftp-xxx Available on AFS at CERN:
source /afs/usatlas.bnl.gov/lcg/current/etc/profile.d/grid_env.sh
source /afs/cern.ch/project/gd/LCG-share/current/external/etc/profile.d/grid-env.sh
Why glite UI may be needed in OSG:
dq2_put/get may use some gLite commands depending on the site they interact with (TiersOfATLASCache.py description): lcg-lg, lcg-rf, glite-gridftp-ls, lcg-gt
More Info:
https://twiki.cern.ch/twiki/bin/view/Atlas/DDMEndUserTutorial
May 2, 2008Nurcan Ozturk 20
DQ2 Enduser Tools
dq2_ls: returns a list of datasets matching a given pattern
dq2_ls fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1
dq2_get: copies the files from DQ2 to a local area
dq2_get –rv fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1
dq2_put: registers datasets to DQ2
dq2_poolFCjobO: creates PoolFileCatalog and Athena job-option for DQ2 datasets
dq2_register: uploads and registers external generator input files to DQ2
dq2_cleanup: deletes a dataset from a site's catalog and storage.
dq2_sample: copies a portion of an existing dataset and registers it to DQ2
More info:
https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#DQ2_end_user_tools
ATLAS Analysis Model
May 2, 2008Nurcan Ozturk 22
Analysis Model Forum Recommendations on the Analysis Model
includes metadata + simple UserData
May 2, 2008Nurcan Ozturk 23
Derived Physics Data - DPnD
Primary DP1D: POOL-based DPD produced by the GRID production system. There are expected to be O(10) primary DPDs, so the contents will not be very specific to an analysis. It is expected to be skimmed (keeping only interesting events), slimmed (keeping only interesting objects, for example electrons and muons), and thinned (keeping only the subset of information inside objects that is relevant in future steps) compared to the AOD. An Example Job Options file AODtoDPD.py (see CVS)
Packages In CVS: TopDPDMaker, TauDPDMaker, BPhysicsDPDMaker,
SUSYDPDMaker
Secondary DP2D: POOL-based DPD with more analysis-specific information. Typically, this is produced from Primary DPD and may be created using an Athena tool like EventView. SimpleThinningExample
HighPtViewDPDThinningTutorial
Tertiary DP3D: Does not need to be POOL-based, it includes flat ntuples.
May 2, 2008Nurcan Ozturk 24
Analyzing the Data
Inside Athena Interactive or batch using C++, python code.
Needs a part from Athena (depends on user needs).
Provides full access to all tools and services.
Outside Athena – AthenaRootAccess (ARA) CINT, or using python, or compiled C++ code.
Does not need full Athena installation (expected 1GB)
Not all classes are available (example, calo-Cells)
Important: both methods use the same files as input.
May 2, 2008Nurcan Ozturk 25
ARA - AthenaRootAccess
Allows to read an Allows to read an AOD in ROOTAOD in ROOT like you would read a normal ntuple (without like you would read a normal ntuple (without using Athena). using Athena).
The goal is to seamlessly use Athena tools.The goal is to seamlessly use Athena tools. One can use One can use identical code/toolsidentical code/tools to run on ESDs, AODs, DPDs. to run on ESDs, AODs, DPDs. The The names of the variablesnames of the variables in the AOD ROOT tree are the same as in the in the AOD ROOT tree are the same as in the
AOD.AOD. Limitations:Limitations:
However it uses the transient classes and converters of the ATLAS software so a portion of the offline is needed. A ~1GB distribution including Athena libraries.
Tools and data that need detector description, conditions, B-field etc, cannot be called in ARA. However this type of info can be put in UserData in DPD.
Gaudi based classes (like AlgTools, Services) don’t work in ARA. Wrapping machinery is needed to reuse the code in Athena/ARA.
May 2, 2008Nurcan Ozturk 26
ARA Examples (1)
CINT macrosCINT macros Easy development (change code and run), Run time is slow ~x10 C++ compiled code
C++ compiled codeC++ compiled code Slower development (change code, recompile, cannot reload libs) Fastest runtime Integrates easily back into Athena
Python scriptsPython scripts Easy development (change code, reload and run) Simple example shows runtime ~x3 C++ compiled code
May be able to compile Python
Integration of developed code into Athena?
Examples on Examples on TwikiTwiki and in and in ReleaseRelease:: https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaROOTAccess PhysicsAnalysis/AthenaROOTAccessExamples
May 2, 2008Nurcan Ozturk 27
ARA Examples (2)
Available in CVS under PhysicsAnalysis/AthenaROOTAccessExamplesAvailable in CVS under PhysicsAnalysis/AthenaROOTAccessExamples
Need Need pythonpython script to script to open fileopen file and and setup transient treesetup transient tree::
lxplus:~> get_files AthenaROOTAccess/test.pylxplus:~> get_files AthenaROOTAccess/test.py
Compiled C++ Example:Compiled C++ Example:lxplus:~> rootlxplus:~> root
root [0] TPython::Exec("execfile('test.py')");root [0] TPython::Exec("execfile('test.py')");
root [1] CollectionTree_trans = (TTree *)gROOT>Get("CollectionTree_trans");root [1] CollectionTree_trans = (TTree *)gROOT>Get("CollectionTree_trans");
root [2] ClusterExample ce; // Example class in AthenaROOTAccessExamplesroot [2] ClusterExample ce; // Example class in AthenaROOTAccessExamples
root [3] ce.plot(CollectionTree_trans);root [3] ce.plot(CollectionTree_trans);
root [4] TruthInfo ti;root [4] TruthInfo ti;
root [5] ti.truth_info(CollectionTree_trans);root [5] ti.truth_info(CollectionTree_trans);
test.py takes about ~20 secs to load necessary dictionaries
One can recompile and then restart from the beginning
May 2, 2008Nurcan Ozturk 28
ARA Examples (3)
CINT Example:lxplus:~> rootroot [0] TPython::Exec("execfile('test.py')");root [1] CollectionTree_trans = (TTree *)gROOT->Get("CollectionTree_trans");root [2] gROOT->LoadMacro("AthenaROOTAccessExamples/macros/cluster_example.C");root [3] plot(CollectionTree_trans);
One can now edit cluster_example.C and re-run LoadMacro
Python Example:lxplus:~> python -i test.py>>> import AthenaROOTAccessExamples.cluster_example>>> AthenaROOTAccessExamples.cluster_example.plot(tt)
One can now edit cluster_example.py and re-run:
>>> reload(AthenaROOTAccessExamples.cluster_example)>>> AthenaROOTAccessExamples.cluster_example.plot(tt)
May 2, 2008Nurcan Ozturk 29
Analysis Frameworks: EventView (1)
This framework provides general tools for common analysis tasks like
particle selection
overlap removal
observable calculation
combinatorics
Recalibration
systematics evaluation
generating ntuples
Users can perform a great deal of their analyses in Athena by chaining and configuring a set of these tools and producing an ntuple for further analysis in ROOT.
Twiki page:
https://twiki.cern.ch/twiki/bin/view/Atlas/EventView
May 2, 2008Nurcan Ozturk 30
Analysis Frameworks: EventView (2)
Though this style of "modular" analysis usually does not require writing C++, the EventView framework is completely extensible, so if necessary users can easily develop and mix their own C++ tools with the common EventView tools and share their configurations and tools with other collaborators.
Most users are introduced to EventView through one of the "View" packages (eg TopView, SusyView, HighPtView) which for the most part collect configurations of EventView tools for a specific set of analyses and produce a standard ntuple output.
These users typically start by analyzing the View ntuples produced by the various physics working groups, and then continue to re-configuring and re-running the respective View package if they require additional tuning for their specific analyses.
There also efforts to evolve (the persistent piece of) EventView in the context of AthenaROOTAccess.
We will practice with the tools during the tutorial.