[email protected] cms data analysis school@ntu, taiwan 11-15 sep 2012 introduction to cms software,...

57
[email protected] CMS Data Analysis School@NTU, Taiwan 11- 15 Sep 2012 Introduction to S Software, Computing Model and Analysis Too Sudhir Malik

Upload: ross-todd

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012

Introduction to CMS Software, Computing Model and Analysis Tools

Sudhir Malik

Page 2: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 2

Three parts• CMS Offline Software concepts and organization• Computing Model • Analysis Tools

Outline

CMSDAS pre-exercises• Taste of a bit of each of the above• Get you in CMSDAS-mode to understand physics objects and

perform physics analysis during the school

• Remarks

Page 3: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 3

CMS Offline Software concepts and organization

Page 4: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 4

• The full suite of CMS offline software is called CMSSW • Based on Event Data Model (EDM)• Used for online data taking, simulation, primary reconstruction, and physics analysis• The two thousand (or so) packages that make up the project are organized into sub- systems with names that should be suggestive of their purpose

CMSSW

Page 5: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 5

Documentation• WorkBook - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBook

• source of one-stop shopping organization, update frequently• initial starting point for people new to CMS software and computing• intended for people to read through it, and work out the

examples and tutorials it contains • CMS tutorials are found here

• SWGuide - https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuide• “industrial” strength shopping needs, expert level information• The software architecture, detailed descriptions of the algorithms, instructions

for analysis and validation• Not updated as frequently

• Reference Manual - http://cmssdt.cern.ch/SDT/doxygen/•Reference Manual - mixture of auto-generated, Doxygen and hand written pages, always check the last edit date…

• Glossary of acronyms - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookGlossary

Page 6: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 6

• A CMS Event* starts as a collection of the RAW data from a detector or MC event• As the event data is processed, products are stored in the Event as reconstructed (RECO) data objects• Event contains = triggered physics event + derived data + metadata(software config) + condition and calibration data• Products are stored as C++ containers

* single readout of the detector electronics and the signals that will (in general) have been generated by particles, tracks, energy deposits, present in a number of bunch crossings

CMS Event Data Model (EDM)

Page 7: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 7

A Software Bus Model• Start from the raw or generated data, read from an input root file• Producers are scheduled to operate on the event data and produce their output which is written into the event• Because it’s modular you can inspect/debug the job at any point in the processing path. The contents of the event can be examined outside of the context of the process that made it.• The FW automatically tracks the provenance of what is produced.• Several instances of the same module can be run in the same application and you will still be able to uniquely identify their products, eg. JetProducer with different cone sizes. Identified by C++ type, producer label, instance name, process name

Modular architecture of FW

Page 8: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 8

Concept of ProvenanceWhy should you care?Provenance helps answer the question “how was this root file made?”, and “Why does my plot look different then Joe’s,” or simply “How is this file made?”

All “tracked” parameters values used in the job that created the file are recorded in the file.

• Any result should be reproducible starting from• the input data file,• the plugins, and• the output provenance

Another source of provenance is the “Dataset Aggregation Service”,DAS, which stores the top level configuration for each file.

Others are: The DB global tag combined with ORCOFF, CVS, cmsTC

Page 9: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 9

Framework modules Modules are configurable and communicate via the Event

• Source• Reads the event in a root file

• OutputModule • Write events to a file. Can use filter decisions

• EDAnalyzer (read)• Reading data only

• Creating histograms

• the standard use case

• EDProducer (read/write)• You want to create new products

• You want to share your reconstruction code with others

• EDFilter (read/write)• you want to know if an object could be produced

• you want to control the analysis flow or make skimming

• Non-Event Data - Geometry, Calibration, MessageLogger –• ESSource, ESProducer

• More info can be found at • https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework

Page 10: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 10

What is stored in the event files?

In CMSSW a set of standard data formats is defined, they are collections of several products managed centrally in CMSSW• RAW

•Data like they come from the detector• RECO (Reconstruction):

•Output of the event reconstruction• AOD (Analysis Object Data):

• Subset of data needed for standard analysis• RAWSIM, RECOSIM, AODSIM:

• with additional simulation information

Page 11: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 11

Input - Source, Output - OutputModule

process.source = cms.Source("PoolSource”,

fileNames = cms.untracked.vstring("file:input.root”)

)

INPUT

process.out = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("output.root")

) OUTPUTProcess.out.outputCommands = cms.untracked.vstring(

"keep *","drop *_*_*_HLT", "keep FEDRawDataCollection_*_*_*”

)

Configurable Event Content:What to drop? What to keep?

Page 12: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 12

Inspect configuration: the ConfigBrowser

Tree View Graphical representation

Property View Box

https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideConfigBrowser

You can inspect your config file using a graphical tool as well: the ConfigBrowser

Page 13: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 13

What are the stored products?

vector<reco::MET> "tcMet" "" "RECO." vector<reco::Muon> "muons" "" "RECO." vector<reco::Muon> "muonsFromCosmics" "" "RECO." vector<reco::Muon> "muonsFromCosmics1Leg" "" "RECO." vector<reco::PFCandidate> "particleFlow" "" "RECO." vector<reco::PFCandidate> "particleFlow" "electrons” "RECO." vector<reco::PFJet> "ak5PFJets" "" "RECO."

edmDumpEventContent <filename>

Handle<reco::MuonCollection> muons; Event.getByLabel(”muons”,muons );

reco::MuonCollection is a typedef for vector<reco::Muon>

Access the singleProduct in the

framework module

C++ class type product alias label process name

Page 14: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 14

Accessing Event Data

# by module and default product labelHandle<reco::MuonCollection> muons; iEvent.getByLabel(”muons", muons );

# by module and product labelHandle<vector<reco::PFCandidate> > particleFlow; iEvent.getByLabel("particleFlow", ”electrons" , particleFlow_electrons );

We can access the productsin the module using the Handle

Framework modules are written in C++ , you can find a basic C++ guide at:

https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookBasicCPlusPlus

Page 15: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 15

CMSSW ReleaseCMSSW_4_2_X - Was the release used for prompt reconstruction for the majority of 2011. It struggled to process 2011B data with pileup of 16 or more. CMSSW_4_2_8_patch7 is the currently recommended analysis release for data and MC reconstructed with 4.2.x.

CMSSW_4_4_X - Fixed all of the technical performance problems, while keeping the same physics performance as 4_2_X. It was used to reprocess all of the 2011 data and MC at the end of data taking and for the prompt reconstruction of the 2011 HI run. CMSSW_4_4_4 is the currently recommended analysis release for data and MC reconstructed with 4.4.x.

CMSSW_5_2_X - is the high pileup data taking release for 2012. CMSSW_5_2_5 is the currently recommended Analysis Release for 2012 Data/MC.

CMSSW_6_1_X - is the integration release for the software being developed for the upgraded detector.

https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWhichRelease

Page 16: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 16

CMS Computing model

Page 17: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012

Computing and Analysis Facilities

17

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

TT33TT33

Analysis in CMS is performed on a globally distributed collection of computing facilities

Tier 0 (T0) at CERN (20% of all CMS computing resources) • Primary reconstruction

• First archival copy of the data

• Only central processing, no user access

Tier 1 (T1) - each in 7 countries (40% of all CMS computing resources)

• Share of raw data for custodial storage

• Data reporocessing

• Skim data to reduce size make it more easily handleable

• Data selection

• Archive simulated data from Tier 2, no user access

Tier 2 (T2) ~ 50 Tier2s• Half devoted MC production, half user analysis• Physics Group data storage• Primary Resource for Analysis

Tier 3 (T3)• Entirely controlled by the providing institution• Used for analysis

Page 18: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 18

The US has a Tier-1 Center at FNAL• FNAL is the largest Tier-1 center in CMS

The US has 7 Tier-2 computing facilities• Caltech, Florida, MIT, Nebraska, Purdue, UCSD, Wisconsin• All have at least 300 TB of disk

In addition FNAL has an analysis farm called the LPC CAF• Like a very large Tier-3 in terms of analysis• Not grid accessible

US Tier sites (example)

Page 19: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 19

Data Storage

• In CMS jobs go to the data.• The challenging part is making sure the right data is distributed broadly

•30TB is identified for use by the local group• Local community controlled space

20TB is identified for storing user produced files and making them grid accessible

Tier-3s • Important analysis resource, entirely controlled by the community that provides them, minimum to be a registered Tier-3 is a PhEDEx instance for data Transfers• Currently over 50 Tier-3s registered, 22 in the US• A number of Tier-3s have a full grid interface and accept analysis jobs• Currently the Tier-3s can receive data from anywhere large variations in sizes

Tier-2s

Page 20: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 20

Data Names/Management/Movement• Dataset convention is: /PrimaryDataset/AcquisitionEra-ProcVersion/DataTier

• PrimaryDataset: Physics channel based on triggers• AcquisitionEra: A period of data taking (BeamCommisioning09)• ProcVersion: Describes processing conditions, _v1, _v2• DataTier: What objects are in the dataset? RAW, RECO, AOD

• CMS data is divided into• Datasets - A group of events that can be accessed together• Data Blocks - A group of files that is tracked by the data transfer system• Logical file name (LFN) – A single file in a block, uniform name• Physical file name (PFN) – Actual file name on a site

• Search data - Dataset Aggregation Service (DAS) page - https://cmsweb.cern.ch/das/• Find data (talk to your physics group which data is relevant for your analysis)• Wildcard searches, select on software releases

• Data is also divided by Event Content (Tier)• RAW, RECO, AOD , may be skimmed and reduced

• Data Movement between Tiers in CMS is handled by a tool called PhEDEx• http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?view=globa• Any user can request data transfer, site managers approve the request

• For fetching a file or two, one can use Filemover• https://cmsweb.cern.ch/filemover, useful to develop analysis code, debug etc.,

Page 21: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 21

CMS, Grid and CRAB• The GRID

• Interconnects world-wide computing resources (batch farms, storagesystems, ... ) in a transparent way•Provides every user with access to these resources

•Authorization, job submission, ...• CMS uses the GRID to provide data access for all collaborators.• CMS uses GRID certificates and a dedicated Virtual Organization (VO) management to have better access control for specific tasks/groups• You need a certificate for making Phedex requests, submitting jobs, and can use to access Indico and the TWiki• Your GRID certificate is important, follow all rules, don’t let it expire!• CRAB enables the user to submit CMSSW jobs to all CMS datasets within the data- location driven CMS computing structure

• Aims to hide as much of the complexity of the GRID as possible from the end user• Provides a user front-end to

• Create and manage proxies from your certificate• Split user jobs into manageable pieces• Transport user analysis code to the data location for execution (compiled on submitting node)• Execute user jobs, check status and retrieve output

• CRAB Support – https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback• CRAB documentation – SWGuideCrab, SWGuideCrabFaq

Page 22: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 22

Analysis Tools and Physics Analysis Toolkit

Page 23: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 23

Development Comprehension

Execution

Seek to Minimize steps in the loop

Analysis Workflow

Analysis Tools are there to help you

• Configuration tools• Distribution kit

• Full Framework• FWLite• Grid• PAT

• Fireworks• FWLite• Validated Code

PATSelectorsPOG/PAG Software

•Validate physics results

Page 24: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 24

Getting Started With an Analysis

What do you want to do? What do you really(!) have to do to achieve this?

• Selecting events.• Understanding data.• Understanding corrections.• Convincing others of the results• Writing notes and papers!

• Writing your own histogram plotting tool• Writing/maintaining your own ntupelizer• Convincing others about your variable definitions

Physicsis is not

Page 25: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 25

Analysis in CMSSW• Typical case – someone hands you code, you select, modify, use it• If you are an experienced HEP • person, you are used to TTree analyzer• At CMS we have the same concept, not more difficult• We have CMS specific way of reading and writing Ttrees

• same semantics, different syntax

CMSSW is exactly what you are used to, only geared towards CMS

Page 26: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 26

Guideline for s/w code used in CMS

We even have an official document containing some guidelines for you:https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdfhttps://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications

• Stick to event provenance!• Leave the EDM as late as possible (if at all)• Use official tools/code where possible.• Use PAT (as this includes already the first three points)

Page 27: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 27

The Event

root -l

[] TFile f(“AOD.root”)

[] new TBrowser()

• Data inside the event are called “Product” • moduleLabel : productInstanceLabel : processName

• Example: recoTracks_generalTracks_RECO

• Each Event data file is made of several independent ROOT trees • one for each object class

• Can be inspected with ROOT

Page 28: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 28

• Connected via 'SmartPointer' relations • edm::Ref, edm::Ptr, …

• Strong hierarchy• There is no direct connection between basic objects and super structures

• Composed object with minimal space overhead• Central structure:

• No connection between basic objects themselves• TTrees have provenance

• can tell what happened to the file without having access to configuration file• edmProvDump and edmProvDiff can save you days, weeks, months!!

The TTree Objects

Page 29: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 29

Events can be visualized

In CMS there are visualization tools, one of them is:Fireworks is the light weight event display for analysis. It can be installed on your laptop. You can find it at: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookFireworks

Page 30: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 30

The Event Content• Tiered data structure: RECO / AOD• View the event content as a complicated (grown) organic structure • Add/keep/drop trees to adapt content and size • some decisions have already have been taken for you

• via the data tier definitions

• Different levels of granularity (resemble the order of creation)

RECO Tier

AOD Tier The data tier guarantees what information will be available

The minimum I want to know

Page 31: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 31

FWLite: A light Version of EDMSometimes people think the full EDM framework (FWFull) is a bit heavy…Even for those people CMS has something to offer:

• Python configuration, edm::Handle, TFileService, data access equivalent to EDM• Also PAT is fully compliant with (and even especially supports) FWLite.• NO writing to the event content!• Full framework ↔ FWLite: This is NOT an exclusive OR!• Have a look at: WorkBookFWLite• AT provides BasicAnalyzer class and a wrapper Classes to transform this class into an EDAnalyzer or a FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples)

FWLite: This is bare Root with known data formats (with the same performance!)

Typical FWLite config file

Page 32: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 32

Typical CMS Analysis Workflow

• Prompt reconstruction at Tier-0.• Central skims at Tier-1's.• Users run cmsRun at Tier-2's:

• Perform high level analysis steps.• Preselect events.• Write their own user defined• EventContent to private T2/T3 space

• The latter step might be iterated.• Copy reduced datasets to your favorite machine.• Run your final analysis/produce plots

Page 33: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 33

• Use cmsRun with crab• Use cmsRun on batch systems or FWLite• Use cmsRun to write persistent datasets to your disc space and for more complex analysis tasks• Write PAT Tuples• Use FWLite for testing/interactive analysis on a complexity level comparable with complex analysis tasks• Write flat n-Tuples if you really think that you need them. Use EDM-ntuples instead of bare root TTrees!• Rather have at look at: SWGuideEDMNtuples

PAT helps you create a user-definedEventContent

AT Recommendation

Page 34: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 34

Difficulties with the EDM

The EDM bares two main problems from the analysist's point of view:

• Calculation/retrieval of high level analysis information (complicated pointer arithmetic! what do I need? Where do I find it?)

• Reduction only to the relevant high level analysis information (where is the dropped data used throughout the event?)

Page 35: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 35

Analyst's Issues• I want use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof?• I found out that the GsfElectron collection consists of many objects but only 10% can be real. There must be some electronID, how can I find and apply it?• I found out that CaloJets are uncorrected. How can I apply the JetMET calibration for the absolute JES to them? Do I need DB access for that?• I want to match reco objects to generated particles. I want to write my own matching. Is there an official recommendation for algorithms and parameters?• I want to match trigger muons to offline reconstructed muons. How do I do this?• How to reduce the event content?

• Skim events (keep dijects with pt > 50?), dropping objects (no muons,, keep jets), reducing object size (keep 4-vector, drop the rest)

PAT comes in here and help you

Page 36: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 36

What is the Physics Analysis Toolkit

PAT is a toolkit that is an integral part of the CMSSW framework

• It is an interface between the sometimes complicated EDM and the simple mind of the common user.

• It serves as well tested and supported common ground for group and user analyses

• It facilitates reproducibility and comprehensibility of analyses

• You can view it as a common language between CMS analysts

• If another CMS analyst describes you a PAT analysis you can easily know what he/she is talking about

Page 37: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 37

Three aspects of PAT

Interface Common Tool

Common Format

• b/w RECO expertise & Analysis Level contacts• simplifies access via DataFormats• canalizes expertise (via POG & PAG• crossing point between POGs & PAGs ('vertical integration')

• approved algorithms & sensible defaults• synergy (everybody can profit from recent developments)• quick start into analysis for the beginners

• facilitates transfer & comparisons• PAG common configurations• sustained provenance

Page 38: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 38

The PAT Data Formats

• All pat::Objects inherit from their corresponding reco::RecoCandidates• A PAT Candidate is a reco::RecoCandidate + more

Page 39: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 39

Combine Flexibility and User Friendliness

Flexibility User Friendliness

Maximal Configurability

• You can choose yourself whether you really need all the extra information that the PAT Candidates provide• Still you don't need to know, how EDM/PAT manages this access for you under the hood

• The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets)

Page 40: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 40

Configuration of PAT DataFormatsYou can configure the content of the DataFormats yourself (example: pat::Jet)!

Size: 14 kb/event ( for ttbar)

• This can be slimmed even further

Page 41: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 41

The concept of Maximal Configuration

DataFormatsobject embedding

EventContentpat event content

WorkFlowsworkflow tools

Selectionspat selectors

• Configure your own DataFormats via embedding

• Add any extra info you need to the EventContent

• Apply selections via the StringCutParser

• Configure your workflow via tools that PAT provides

Maximal Configurability

Page 42: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 42

• Particle flow (PF) provides a full unambiguous reconstructed particle based event interpretation, more sophisticated ways to do objection disambiguation

• PF is plugged in PAT as follows and is called PF2PAT

• SWGuidePF2PAT/WorkBookPF2PAT

Particle Flow

Page 43: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 43

• Can run PF2PAT and standard PAT in parallel• Example, standard PAT muons will still be stored in patMuons and the muons found by PF2PAT will be stored in patMuonsPOSTFIX

PF2PAT

Page 44: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 44

The Code Location

• DataFormats/PatCandidates• Definition of all PAT Candidates. • pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, …

• PhysicsTools/PatAlgos• Implementation and filling of all data formats.• Definition of common workflow and PAT tools

• PhysicsTools/PatUtils• Definition of common tools and helper functions used in PatAlgos

• PhysicsTools/PatExamples• Location of many examples e.g. all non-trivial examples used during this Tutorial

Page 45: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 45

Development

Have a look at: SWGuidePATRecipes

Page 46: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 46

Support for AT

Check the the main entry page of PAT in the software guide: SWGuidePAT

Page 47: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 47

PAT Documentation

• SWGuidePAT/WorkBookPAT Main documentation pages• WorkBookPATDataFormats Description of all PAT Candidate• WorkBookPATWorkflow Description of the PAT workflow• WorkBookPATConfiguration Description of the configuration of PAT• SWGuidePATTools Description of all PAT tools• WorkBookPATTutorial Tutorials and examples to get started• SWGuidePATRecipes Installation recipes• SWGuidePATEventSize Tools for event size estimate• SWGuidePF2PAT/WorkBookPF2PAT Main documentation PF2PAT

• WorkBookPATTutorial PAT Tutorial

Page 48: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 48

Concluding remarks• You have experienced a pre-view of these tools, concepts doing CMSDAS pre-exercises• Use the above to understand the physics objects and do physics analysis during this school

• SHORT and LONG exercises• Participate in:

• PAT tutorials to know about PAT and PF2PAT• CRAB, Fireworks, DAS tutorials etc.• CMS Data Analysis School(s)

• Contribute to physics tools and tutorials – service credit given• https://twiki.cern.ch/twiki/bin/view/CMS/Tutorials• [email protected] (analysis tools releated – PAT, PF2PAT,tag and probe etc.)• [email protected] (crab related)• Analysis tools meetings (announced in the hn above, link to all, below)

• https://twiki.cern.ch/twiki/bin/view/CMS/AnalysisToolsMeetings

Page 49: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 49

BACKUP

Page 50: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 50

CMS Event Data Model (EDM) II• The CMS software CMSSW is based on ROOT• All CMS EDM output files, from RAW data files just from the HLT to analysis PATtuples are proper ROOT files• System is designed so that the same root file can be used in three contexts:

• cmsRun - input for a full framework application, could be used for a reprocessing pass or creation of more refined analysis objects, like PAT objects.• FWLite – a small set of CMS defined loadable libraries to allow more sophisticated use with root, for the EDM this is a read-only application.• Bare root.exe –allows browsing of simple objects (floats, ints, and composites of them) from the root TBrowser GUI is the ONLY recommended use case

• Pros • Single File Structure, Reconstruction of detector object, physics objects (POGs)• Physics Analysis ( PAGs)• Track event from initial recording to “final” paper - Provenance

• Challenge • strong software requirements, flexible, easy to extend, not be space expensive

• More info can be found at WorkBookCMSSWFrameworkWorkBookAnalysisOverviewIntroduction

Page 51: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 51

• Applications in CMSSW are built from special shared object libraries called plugins. In practice this means:

• cmsRun <some-configuration-file>• Configurations are written in the Python language.• There are two types of plugins:• Module Plugins

• EDAnalyzers: e.g., making histograms• EDProducers: e.g., making tracks from hits and saving tracks in event• EDFilters: e.g., Skimming - Deciding or not to keep a given event• Others: ESSources, ESProducers.• ED* process event data, ES* process event setup data

• Data Object Plugins – also known as “root dictionaries” as they can also be loaded directly into the root application. These are most of the products of the above work, and form the elements of the EDM

• cmsRun workflows can mix standard plugins with user defined ones

Concept of Framework (FW)

Page 52: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 52

Workflow MC Workflow

The set of processes that both MC and collision data flow through

Collision data

Scientific analysis, simulation requires theprocessing and generation of millions events,calibration, skims, express analyses

Page 53: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 53

Integration Build (IB)

http://cmssdt.cern.ch/SDT/html/showIB.html

IB - Tests that CMSSW code compiles, standard jobs run without errors - do some code quality and run quality testscmsDriver.py is a utility used to generate a full configuration file, which can be run with cmsRun for the above tests and production quality configuration files. If you generate MC events you will come across it too (WorkBookGeneration)

Page 54: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 54

Framework Subtleties: Example• I want to use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof?

• You dropped the generalTracks collection from the event content, did not consider that reco::Muon internally points to the inner track points in generalTracks collection it was created from

Page 55: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 55

Facilitated Access to Event Information

• Do you know how to access this event information within the EDM?

• With PAT Candidates you get this just by calling member functions!• Note: Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more)

Page 56: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 56

EventContent of the default PAT Tuple

• A default patTemplate_cfg.py:

• But decide yourself how your PAT Tuple should look like (add reco::Tracks or reco::GenParticles to the Event Content or BTag information to the jets, etc ... )

Size: 14kb/event (for ttbar)

Page 57: Malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik

[email protected] CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 57

PAT Candidate Member Functions

Check the Documentation: WorkBookPATDataFormats