[email protected] cms data analysis school@ntu, taiwan 11-15 sep 2012 introduction to cms software,...
TRANSCRIPT
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
Introduction to CMS Software, Computing Model and Analysis Tools
Sudhir Malik
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 2
Three parts• CMS Offline Software concepts and organization• Computing Model • Analysis Tools
Outline
CMSDAS pre-exercises• Taste of a bit of each of the above• Get you in CMSDAS-mode to understand physics objects and
perform physics analysis during the school
• Remarks
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 3
CMS Offline Software concepts and organization
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 4
• The full suite of CMS offline software is called CMSSW • Based on Event Data Model (EDM)• Used for online data taking, simulation, primary reconstruction, and physics analysis• The two thousand (or so) packages that make up the project are organized into sub- systems with names that should be suggestive of their purpose
CMSSW
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 5
Documentation• WorkBook - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBook
• source of one-stop shopping organization, update frequently• initial starting point for people new to CMS software and computing• intended for people to read through it, and work out the
examples and tutorials it contains • CMS tutorials are found here
• SWGuide - https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuide• “industrial” strength shopping needs, expert level information• The software architecture, detailed descriptions of the algorithms, instructions
for analysis and validation• Not updated as frequently
• Reference Manual - http://cmssdt.cern.ch/SDT/doxygen/•Reference Manual - mixture of auto-generated, Doxygen and hand written pages, always check the last edit date…
• Glossary of acronyms - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookGlossary
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 6
• A CMS Event* starts as a collection of the RAW data from a detector or MC event• As the event data is processed, products are stored in the Event as reconstructed (RECO) data objects• Event contains = triggered physics event + derived data + metadata(software config) + condition and calibration data• Products are stored as C++ containers
* single readout of the detector electronics and the signals that will (in general) have been generated by particles, tracks, energy deposits, present in a number of bunch crossings
CMS Event Data Model (EDM)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 7
A Software Bus Model• Start from the raw or generated data, read from an input root file• Producers are scheduled to operate on the event data and produce their output which is written into the event• Because it’s modular you can inspect/debug the job at any point in the processing path. The contents of the event can be examined outside of the context of the process that made it.• The FW automatically tracks the provenance of what is produced.• Several instances of the same module can be run in the same application and you will still be able to uniquely identify their products, eg. JetProducer with different cone sizes. Identified by C++ type, producer label, instance name, process name
Modular architecture of FW
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 8
Concept of ProvenanceWhy should you care?Provenance helps answer the question “how was this root file made?”, and “Why does my plot look different then Joe’s,” or simply “How is this file made?”
All “tracked” parameters values used in the job that created the file are recorded in the file.
• Any result should be reproducible starting from• the input data file,• the plugins, and• the output provenance
Another source of provenance is the “Dataset Aggregation Service”,DAS, which stores the top level configuration for each file.
Others are: The DB global tag combined with ORCOFF, CVS, cmsTC
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 9
Framework modules Modules are configurable and communicate via the Event
• Source• Reads the event in a root file
• OutputModule • Write events to a file. Can use filter decisions
• EDAnalyzer (read)• Reading data only
• Creating histograms
• the standard use case
• EDProducer (read/write)• You want to create new products
• You want to share your reconstruction code with others
• EDFilter (read/write)• you want to know if an object could be produced
• you want to control the analysis flow or make skimming
• Non-Event Data - Geometry, Calibration, MessageLogger –• ESSource, ESProducer
• More info can be found at • https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 10
What is stored in the event files?
In CMSSW a set of standard data formats is defined, they are collections of several products managed centrally in CMSSW• RAW
•Data like they come from the detector• RECO (Reconstruction):
•Output of the event reconstruction• AOD (Analysis Object Data):
• Subset of data needed for standard analysis• RAWSIM, RECOSIM, AODSIM:
• with additional simulation information
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 11
Input - Source, Output - OutputModule
process.source = cms.Source("PoolSource”,
fileNames = cms.untracked.vstring("file:input.root”)
)
INPUT
process.out = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("output.root")
) OUTPUTProcess.out.outputCommands = cms.untracked.vstring(
"keep *","drop *_*_*_HLT", "keep FEDRawDataCollection_*_*_*”
)
Configurable Event Content:What to drop? What to keep?
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 12
Inspect configuration: the ConfigBrowser
Tree View Graphical representation
Property View Box
https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideConfigBrowser
You can inspect your config file using a graphical tool as well: the ConfigBrowser
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 13
What are the stored products?
vector<reco::MET> "tcMet" "" "RECO." vector<reco::Muon> "muons" "" "RECO." vector<reco::Muon> "muonsFromCosmics" "" "RECO." vector<reco::Muon> "muonsFromCosmics1Leg" "" "RECO." vector<reco::PFCandidate> "particleFlow" "" "RECO." vector<reco::PFCandidate> "particleFlow" "electrons” "RECO." vector<reco::PFJet> "ak5PFJets" "" "RECO."
edmDumpEventContent <filename>
Handle<reco::MuonCollection> muons; Event.getByLabel(”muons”,muons );
reco::MuonCollection is a typedef for vector<reco::Muon>
Access the singleProduct in the
framework module
C++ class type product alias label process name
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 14
Accessing Event Data
# by module and default product labelHandle<reco::MuonCollection> muons; iEvent.getByLabel(”muons", muons );
# by module and product labelHandle<vector<reco::PFCandidate> > particleFlow; iEvent.getByLabel("particleFlow", ”electrons" , particleFlow_electrons );
We can access the productsin the module using the Handle
Framework modules are written in C++ , you can find a basic C++ guide at:
https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookBasicCPlusPlus
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 15
CMSSW ReleaseCMSSW_4_2_X - Was the release used for prompt reconstruction for the majority of 2011. It struggled to process 2011B data with pileup of 16 or more. CMSSW_4_2_8_patch7 is the currently recommended analysis release for data and MC reconstructed with 4.2.x.
CMSSW_4_4_X - Fixed all of the technical performance problems, while keeping the same physics performance as 4_2_X. It was used to reprocess all of the 2011 data and MC at the end of data taking and for the prompt reconstruction of the 2011 HI run. CMSSW_4_4_4 is the currently recommended analysis release for data and MC reconstructed with 4.4.x.
CMSSW_5_2_X - is the high pileup data taking release for 2012. CMSSW_5_2_5 is the currently recommended Analysis Release for 2012 Data/MC.
CMSSW_6_1_X - is the integration release for the software being developed for the upgraded detector.
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWhichRelease
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 16
CMS Computing model
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
Computing and Analysis Facilities
17
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
TT33TT33
Analysis in CMS is performed on a globally distributed collection of computing facilities
Tier 0 (T0) at CERN (20% of all CMS computing resources) • Primary reconstruction
• First archival copy of the data
• Only central processing, no user access
Tier 1 (T1) - each in 7 countries (40% of all CMS computing resources)
• Share of raw data for custodial storage
• Data reporocessing
• Skim data to reduce size make it more easily handleable
• Data selection
• Archive simulated data from Tier 2, no user access
Tier 2 (T2) ~ 50 Tier2s• Half devoted MC production, half user analysis• Physics Group data storage• Primary Resource for Analysis
Tier 3 (T3)• Entirely controlled by the providing institution• Used for analysis
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 18
The US has a Tier-1 Center at FNAL• FNAL is the largest Tier-1 center in CMS
The US has 7 Tier-2 computing facilities• Caltech, Florida, MIT, Nebraska, Purdue, UCSD, Wisconsin• All have at least 300 TB of disk
In addition FNAL has an analysis farm called the LPC CAF• Like a very large Tier-3 in terms of analysis• Not grid accessible
US Tier sites (example)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 19
Data Storage
• In CMS jobs go to the data.• The challenging part is making sure the right data is distributed broadly
•30TB is identified for use by the local group• Local community controlled space
20TB is identified for storing user produced files and making them grid accessible
Tier-3s • Important analysis resource, entirely controlled by the community that provides them, minimum to be a registered Tier-3 is a PhEDEx instance for data Transfers• Currently over 50 Tier-3s registered, 22 in the US• A number of Tier-3s have a full grid interface and accept analysis jobs• Currently the Tier-3s can receive data from anywhere large variations in sizes
Tier-2s
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 20
Data Names/Management/Movement• Dataset convention is: /PrimaryDataset/AcquisitionEra-ProcVersion/DataTier
• PrimaryDataset: Physics channel based on triggers• AcquisitionEra: A period of data taking (BeamCommisioning09)• ProcVersion: Describes processing conditions, _v1, _v2• DataTier: What objects are in the dataset? RAW, RECO, AOD
• CMS data is divided into• Datasets - A group of events that can be accessed together• Data Blocks - A group of files that is tracked by the data transfer system• Logical file name (LFN) – A single file in a block, uniform name• Physical file name (PFN) – Actual file name on a site
• Search data - Dataset Aggregation Service (DAS) page - https://cmsweb.cern.ch/das/• Find data (talk to your physics group which data is relevant for your analysis)• Wildcard searches, select on software releases
• Data is also divided by Event Content (Tier)• RAW, RECO, AOD , may be skimmed and reduced
• Data Movement between Tiers in CMS is handled by a tool called PhEDEx• http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?view=globa• Any user can request data transfer, site managers approve the request
• For fetching a file or two, one can use Filemover• https://cmsweb.cern.ch/filemover, useful to develop analysis code, debug etc.,
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 21
CMS, Grid and CRAB• The GRID
• Interconnects world-wide computing resources (batch farms, storagesystems, ... ) in a transparent way•Provides every user with access to these resources
•Authorization, job submission, ...• CMS uses the GRID to provide data access for all collaborators.• CMS uses GRID certificates and a dedicated Virtual Organization (VO) management to have better access control for specific tasks/groups• You need a certificate for making Phedex requests, submitting jobs, and can use to access Indico and the TWiki• Your GRID certificate is important, follow all rules, don’t let it expire!• CRAB enables the user to submit CMSSW jobs to all CMS datasets within the data- location driven CMS computing structure
• Aims to hide as much of the complexity of the GRID as possible from the end user• Provides a user front-end to
• Create and manage proxies from your certificate• Split user jobs into manageable pieces• Transport user analysis code to the data location for execution (compiled on submitting node)• Execute user jobs, check status and retrieve output
• CRAB Support – https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback• CRAB documentation – SWGuideCrab, SWGuideCrabFaq
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 22
Analysis Tools and Physics Analysis Toolkit
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 23
Development Comprehension
Execution
Seek to Minimize steps in the loop
Analysis Workflow
Analysis Tools are there to help you
• Configuration tools• Distribution kit
• Full Framework• FWLite• Grid• PAT
• Fireworks• FWLite• Validated Code
PATSelectorsPOG/PAG Software
•Validate physics results
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 24
Getting Started With an Analysis
What do you want to do? What do you really(!) have to do to achieve this?
• Selecting events.• Understanding data.• Understanding corrections.• Convincing others of the results• Writing notes and papers!
• Writing your own histogram plotting tool• Writing/maintaining your own ntupelizer• Convincing others about your variable definitions
Physicsis is not
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 25
Analysis in CMSSW• Typical case – someone hands you code, you select, modify, use it• If you are an experienced HEP • person, you are used to TTree analyzer• At CMS we have the same concept, not more difficult• We have CMS specific way of reading and writing Ttrees
• same semantics, different syntax
CMSSW is exactly what you are used to, only geared towards CMS
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 26
Guideline for s/w code used in CMS
We even have an official document containing some guidelines for you:https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdfhttps://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications
• Stick to event provenance!• Leave the EDM as late as possible (if at all)• Use official tools/code where possible.• Use PAT (as this includes already the first three points)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 27
The Event
root -l
[] TFile f(“AOD.root”)
[] new TBrowser()
• Data inside the event are called “Product” • moduleLabel : productInstanceLabel : processName
• Example: recoTracks_generalTracks_RECO
• Each Event data file is made of several independent ROOT trees • one for each object class
• Can be inspected with ROOT
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 28
• Connected via 'SmartPointer' relations • edm::Ref, edm::Ptr, …
• Strong hierarchy• There is no direct connection between basic objects and super structures
• Composed object with minimal space overhead• Central structure:
• No connection between basic objects themselves• TTrees have provenance
• can tell what happened to the file without having access to configuration file• edmProvDump and edmProvDiff can save you days, weeks, months!!
The TTree Objects
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 29
Events can be visualized
In CMS there are visualization tools, one of them is:Fireworks is the light weight event display for analysis. It can be installed on your laptop. You can find it at: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookFireworks
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 30
The Event Content• Tiered data structure: RECO / AOD• View the event content as a complicated (grown) organic structure • Add/keep/drop trees to adapt content and size • some decisions have already have been taken for you
• via the data tier definitions
• Different levels of granularity (resemble the order of creation)
RECO Tier
AOD Tier The data tier guarantees what information will be available
The minimum I want to know
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 31
FWLite: A light Version of EDMSometimes people think the full EDM framework (FWFull) is a bit heavy…Even for those people CMS has something to offer:
• Python configuration, edm::Handle, TFileService, data access equivalent to EDM• Also PAT is fully compliant with (and even especially supports) FWLite.• NO writing to the event content!• Full framework ↔ FWLite: This is NOT an exclusive OR!• Have a look at: WorkBookFWLite• AT provides BasicAnalyzer class and a wrapper Classes to transform this class into an EDAnalyzer or a FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples)
FWLite: This is bare Root with known data formats (with the same performance!)
Typical FWLite config file
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 32
Typical CMS Analysis Workflow
• Prompt reconstruction at Tier-0.• Central skims at Tier-1's.• Users run cmsRun at Tier-2's:
• Perform high level analysis steps.• Preselect events.• Write their own user defined• EventContent to private T2/T3 space
• The latter step might be iterated.• Copy reduced datasets to your favorite machine.• Run your final analysis/produce plots
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 33
• Use cmsRun with crab• Use cmsRun on batch systems or FWLite• Use cmsRun to write persistent datasets to your disc space and for more complex analysis tasks• Write PAT Tuples• Use FWLite for testing/interactive analysis on a complexity level comparable with complex analysis tasks• Write flat n-Tuples if you really think that you need them. Use EDM-ntuples instead of bare root TTrees!• Rather have at look at: SWGuideEDMNtuples
PAT helps you create a user-definedEventContent
AT Recommendation
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 34
Difficulties with the EDM
The EDM bares two main problems from the analysist's point of view:
• Calculation/retrieval of high level analysis information (complicated pointer arithmetic! what do I need? Where do I find it?)
• Reduction only to the relevant high level analysis information (where is the dropped data used throughout the event?)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 35
Analyst's Issues• I want use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof?• I found out that the GsfElectron collection consists of many objects but only 10% can be real. There must be some electronID, how can I find and apply it?• I found out that CaloJets are uncorrected. How can I apply the JetMET calibration for the absolute JES to them? Do I need DB access for that?• I want to match reco objects to generated particles. I want to write my own matching. Is there an official recommendation for algorithms and parameters?• I want to match trigger muons to offline reconstructed muons. How do I do this?• How to reduce the event content?
• Skim events (keep dijects with pt > 50?), dropping objects (no muons,, keep jets), reducing object size (keep 4-vector, drop the rest)
PAT comes in here and help you
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 36
What is the Physics Analysis Toolkit
PAT is a toolkit that is an integral part of the CMSSW framework
• It is an interface between the sometimes complicated EDM and the simple mind of the common user.
• It serves as well tested and supported common ground for group and user analyses
• It facilitates reproducibility and comprehensibility of analyses
• You can view it as a common language between CMS analysts
• If another CMS analyst describes you a PAT analysis you can easily know what he/she is talking about
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 37
Three aspects of PAT
Interface Common Tool
Common Format
• b/w RECO expertise & Analysis Level contacts• simplifies access via DataFormats• canalizes expertise (via POG & PAG• crossing point between POGs & PAGs ('vertical integration')
• approved algorithms & sensible defaults• synergy (everybody can profit from recent developments)• quick start into analysis for the beginners
• facilitates transfer & comparisons• PAG common configurations• sustained provenance
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 38
The PAT Data Formats
• All pat::Objects inherit from their corresponding reco::RecoCandidates• A PAT Candidate is a reco::RecoCandidate + more
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 39
Combine Flexibility and User Friendliness
Flexibility User Friendliness
Maximal Configurability
• You can choose yourself whether you really need all the extra information that the PAT Candidates provide• Still you don't need to know, how EDM/PAT manages this access for you under the hood
• The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 40
Configuration of PAT DataFormatsYou can configure the content of the DataFormats yourself (example: pat::Jet)!
Size: 14 kb/event ( for ttbar)
• This can be slimmed even further
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 41
The concept of Maximal Configuration
DataFormatsobject embedding
EventContentpat event content
WorkFlowsworkflow tools
Selectionspat selectors
• Configure your own DataFormats via embedding
• Add any extra info you need to the EventContent
• Apply selections via the StringCutParser
• Configure your workflow via tools that PAT provides
Maximal Configurability
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 42
• Particle flow (PF) provides a full unambiguous reconstructed particle based event interpretation, more sophisticated ways to do objection disambiguation
• PF is plugged in PAT as follows and is called PF2PAT
• SWGuidePF2PAT/WorkBookPF2PAT
Particle Flow
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 43
• Can run PF2PAT and standard PAT in parallel• Example, standard PAT muons will still be stored in patMuons and the muons found by PF2PAT will be stored in patMuonsPOSTFIX
PF2PAT
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 44
The Code Location
• DataFormats/PatCandidates• Definition of all PAT Candidates. • pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, …
• PhysicsTools/PatAlgos• Implementation and filling of all data formats.• Definition of common workflow and PAT tools
• PhysicsTools/PatUtils• Definition of common tools and helper functions used in PatAlgos
• PhysicsTools/PatExamples• Location of many examples e.g. all non-trivial examples used during this Tutorial
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 45
Development
Have a look at: SWGuidePATRecipes
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 46
Support for AT
Check the the main entry page of PAT in the software guide: SWGuidePAT
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 47
PAT Documentation
• SWGuidePAT/WorkBookPAT Main documentation pages• WorkBookPATDataFormats Description of all PAT Candidate• WorkBookPATWorkflow Description of the PAT workflow• WorkBookPATConfiguration Description of the configuration of PAT• SWGuidePATTools Description of all PAT tools• WorkBookPATTutorial Tutorials and examples to get started• SWGuidePATRecipes Installation recipes• SWGuidePATEventSize Tools for event size estimate• SWGuidePF2PAT/WorkBookPF2PAT Main documentation PF2PAT
• WorkBookPATTutorial PAT Tutorial
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 48
Concluding remarks• You have experienced a pre-view of these tools, concepts doing CMSDAS pre-exercises• Use the above to understand the physics objects and do physics analysis during this school
• SHORT and LONG exercises• Participate in:
• PAT tutorials to know about PAT and PF2PAT• CRAB, Fireworks, DAS tutorials etc.• CMS Data Analysis School(s)
• Contribute to physics tools and tutorials – service credit given• https://twiki.cern.ch/twiki/bin/view/CMS/Tutorials• [email protected] (analysis tools releated – PAT, PF2PAT,tag and probe etc.)• [email protected] (crab related)• Analysis tools meetings (announced in the hn above, link to all, below)
• https://twiki.cern.ch/twiki/bin/view/CMS/AnalysisToolsMeetings
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 49
BACKUP
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 50
CMS Event Data Model (EDM) II• The CMS software CMSSW is based on ROOT• All CMS EDM output files, from RAW data files just from the HLT to analysis PATtuples are proper ROOT files• System is designed so that the same root file can be used in three contexts:
• cmsRun - input for a full framework application, could be used for a reprocessing pass or creation of more refined analysis objects, like PAT objects.• FWLite – a small set of CMS defined loadable libraries to allow more sophisticated use with root, for the EDM this is a read-only application.• Bare root.exe –allows browsing of simple objects (floats, ints, and composites of them) from the root TBrowser GUI is the ONLY recommended use case
• Pros • Single File Structure, Reconstruction of detector object, physics objects (POGs)• Physics Analysis ( PAGs)• Track event from initial recording to “final” paper - Provenance
• Challenge • strong software requirements, flexible, easy to extend, not be space expensive
• More info can be found at WorkBookCMSSWFrameworkWorkBookAnalysisOverviewIntroduction
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 51
• Applications in CMSSW are built from special shared object libraries called plugins. In practice this means:
• cmsRun <some-configuration-file>• Configurations are written in the Python language.• There are two types of plugins:• Module Plugins
• EDAnalyzers: e.g., making histograms• EDProducers: e.g., making tracks from hits and saving tracks in event• EDFilters: e.g., Skimming - Deciding or not to keep a given event• Others: ESSources, ESProducers.• ED* process event data, ES* process event setup data
• Data Object Plugins – also known as “root dictionaries” as they can also be loaded directly into the root application. These are most of the products of the above work, and form the elements of the EDM
• cmsRun workflows can mix standard plugins with user defined ones
Concept of Framework (FW)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 52
Workflow MC Workflow
The set of processes that both MC and collision data flow through
Collision data
Scientific analysis, simulation requires theprocessing and generation of millions events,calibration, skims, express analyses
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 53
Integration Build (IB)
http://cmssdt.cern.ch/SDT/html/showIB.html
IB - Tests that CMSSW code compiles, standard jobs run without errors - do some code quality and run quality testscmsDriver.py is a utility used to generate a full configuration file, which can be run with cmsRun for the above tests and production quality configuration files. If you generate MC events you will come across it too (WorkBookGeneration)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 54
Framework Subtleties: Example• I want to use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof?
• You dropped the generalTracks collection from the event content, did not consider that reco::Muon internally points to the inner track points in generalTracks collection it was created from
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 55
Facilitated Access to Event Information
• Do you know how to access this event information within the EDM?
• With PAT Candidates you get this just by calling member functions!• Note: Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 56
EventContent of the default PAT Tuple
• A default patTemplate_cfg.py:
• But decide yourself how your PAT Tuple should look like (add reco::Tracks or reco::GenParticles to the Event Content or BTag information to the jets, etc ... )
Size: 14kb/event (for ttbar)
[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 57
PAT Candidate Member Functions
Check the Documentation: WorkBookPATDataFormats