physics & data quality monitoring at cms emilio meschi (original design, run-control, mentoring)...
TRANSCRIPT
Physics & Data Quality Monitoringat CMS
• Emilio Meschi (original design, run-control, mentoring)
• CL (core functionality, rules & alarms library, tech support)
• Dimitrios Tsirigkas (Web interface), Giulio Eulisse (Qt-GUI)
• Ilaria Segoni (specialized clients, coordination with detector groups)
Online ComputingCHEP 2006 – February 13-17, 2005, Mumbai
Christos LeonidopoulosCERN
on behalf of the CMS-DQM group
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 2
DQM: Outline• Rationale
− Build a product that various CMS groups can use (now & later); Functionality that every subdetector needs: Save resources− Provide people w/ infrastructure (relatively) early in the game− CMS is in “Magnet Test” mode (magnet & subdetector electronics commissioning): Use “real-world” conditions to get feedback
• Provide a general, homogeneous monitoring solution at CMS− Flexibility & customization to be usable across experiment − Content: Trigger/Physics performance, subdetector data quality
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 3
Monitoring the High Level Trigger• “Filter Farm” at CMS runs the High Level Trigger
− 1000 dual-CPU PC farm replacing traditional L2+L3 − Input from L1: 100 kHz, Output: 150 Hz − HLT runs all reconstruction algorithms
• Monitoring needs at HLT− “Keep an eye” on 1000 machines
HLT Inputs Physics objects Trigger rates
− Monitoring should not slow down main application (HLT algorithm) − Collect & process information “centrally”
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 4
The Big Picture
CPU CPU CPU CPU CPU CPU CPU CPU
DQM principle: usesame code to servedifferent customers
Inputs Physics objects Triggers etc…
Monitoring producers
Monitoring consumers (clients)
DQM infrastructure:Collectors/Servers
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 5
DQM from a client’s perspective
Client
“DQM”
Monitoringinformation
• Configuration• Reference objects• Historic plots• Etc…
• “Comparison-to-reference”• Collation of similar objects
Database Tools
“Alarm”
“System ok”
• Clear separation of creation of monitoring information from collection, processing• Too much work not to share with rest of CMS!
TCP/IP
TCP/IP or http
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 6
Core Features• Support for all the “usual stuff”: static and dynamic sets of objects 1,2,3-D histograms, 1,2-D profiles, integers, floats, strings (ROOT objects behind the scenes)
• Support for unix-like directory structures
• Support for “monitoring producers” (Publish) & “monitoring consumers” (Subscribe) Clients can subscribe to (sub)directories, or “à-la carte”
• Support for root-tuples Create and save root-tuples w/ monitoring structure on the fly
• Stability Be able to handle connecting/disconnecting producers, clients at run-time: robust behavior and support for dynamic lists of nodes
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 7
Tools
• “Soft-reset”: reset t < t0 contents of monitoring content − Does not permanently erase contents
• “Accumulate”: sum up contents over multiple monitoring periods
• “Collate”: add multiple monitoring data − Sum-up (same-format) contributions from different sources− Interface supports search-strings with wildcards (?, *)
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 8
More toys: Web interface
Select information to visualize on the fly…
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 9
Web interface in (real) action
• “Monitoring producer” (and collector): CERN• “Monitoring consumers” (clients): one at CERN, one at Florida (US)• You are looking at web browser running in Florida office
Live cosmic test data forend-cap muon detector
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 10
Even more toys: Qt GUI
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 11
Qt GUI in (real) action
Cosmic test data forcalorimeter detector(reading from file)
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 12
More Tools: Quality Tests
• Library with “rules” for assigning “quality” value to tests− Comparison to reference (χ2, Kolmogorov tests)− Contents within range ([xmin, xmax], ([ymin, ymax])− Exact match− Mean of (e.g. gaussian) distribution “near” expected value− Flat occupancy− Etc…
• “Alarm” library Warnings & error messages should propagate to all clients downstream
• Group monitoring in sets with links to status, messages Create intuitive interface for quick problem spotting (see next slide)
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 13
Organizing tests, results
Compare to reference ON OFF
• ERROR1: L1 Acc Increment =0 (5%) Move to Hidden Warnings Take Action2
• WARNING1: L1 Acc Increment =2 (10%) Move to Hidden Warnings Take Action3
• WARNING1: DDU Trailer Problems (7%) Move to Hidden Warnings Take Action1
• STATUS OK
Compare to reference ON OFF
Compare to reference ON OFF
Data Integrity Checks vs. L1 Accept. Number
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 14
Time measurements
• 50-bin ROOT histograms (floats)
• 1 Gb/s Ethernet connection
• Mini-farm prototype at CMS site
<tship>=6.0±0.2 ms N = 1000 histograms
N = 1000 histogramsCould use this info to adjustupdate rate on the fly…
ROOT v5.08.00b
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 15
Summary• DQM: a homogeneous monitoring solution for CMS
− Content: Trigger/Physics performance, subdetector data quality− Environment:
○ “HLT” processes in Filter Farm○ Monitoring processes fed by “live streams” or local DAQ○ Batch jobs (potentially “production” validation)
• Makes use of general framework and services− ROOT− Transfer protocols: TCP/IP, http− Tools for processing of monitoring information− Visualization: Web & Qt-GUI
• Ongoing development− Database requirements & interface design− Organization of alarms & quick problem spotting− Components for client customization
Backup Slides
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 17
High Level Trigger: Event Filter Farm
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 18
Subdetectors & The Magnet Test
• All subdetector groups have set up test sites (cosmic-ray tests, radioactive sources, laser beams) where monitoring programs are used (custom or not)
• All subdetector groups either plan to port/are porting their monitoring code or are already using the DQM infrastructure
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 19
Rational for additional interface
• Question: Why not give users direct access to ROOT objects?
• An abstract interface does not bind the user access methods to a specific analysis framework. In our particular case, the transfer mechanism and the “true” format (ROOT) could change in the future, without breaking the customized programs.
• Having an abstract interface that hides the raw monitoring data from the user is a good OO practice. The set of allowed operations on the monitoring objects should be defined by the (abstract) user interface, not the framework used for the implementation.
• Additional functionality can be added to the monitoring objects (e.g. alarms) without directly inheriting from ROOT classes.
Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 20
DQM documentation
• Release noteshttp://cmsevf.web.cern.ch/cmsevf/DQM_doc/Release_Notes_v010.txt
• Documentationhttp://cmsevf.web.cern.ch/cmsevf/DQM_doc/DQM_instructions.html
• Archive with presentations from DQM group:http://cmsevf.web.cern.ch/cmsevf/DQMMeetings.html
• DQM status for subdetector groupshttps://uimon.cern.ch/twiki/pub/CMS/DQMInfrastructure/DQMDetectorStatus.html
• Draft on DQM “requirement & design” CMS note:Under preparation