physics & data quality monitoring at cms emilio meschi (original design, run-control, mentoring)...

20
Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support) Dimitrios Tsirigkas (Web interface), Giulio Eulisse (Qt-GUI) Ilaria Segoni (specialized clients, coordination with detector groups) Online Computing CHEP 2006 – February 13-17, 2005, Mumbai Christos Leonidopoulos CERN on behalf of the CMS-DQM group

Upload: rolf-garrison

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoringat CMS

• Emilio Meschi (original design, run-control, mentoring)

• CL (core functionality, rules & alarms library, tech support)

• Dimitrios Tsirigkas (Web interface), Giulio Eulisse (Qt-GUI)

• Ilaria Segoni (specialized clients, coordination with detector groups)

Online ComputingCHEP 2006 – February 13-17, 2005, Mumbai

Christos LeonidopoulosCERN

on behalf of the CMS-DQM group

Page 2: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 2

DQM: Outline• Rationale

− Build a product that various CMS groups can use (now & later); Functionality that every subdetector needs: Save resources− Provide people w/ infrastructure (relatively) early in the game− CMS is in “Magnet Test” mode (magnet & subdetector electronics commissioning): Use “real-world” conditions to get feedback

• Provide a general, homogeneous monitoring solution at CMS− Flexibility & customization to be usable across experiment − Content: Trigger/Physics performance, subdetector data quality

Page 3: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 3

Monitoring the High Level Trigger• “Filter Farm” at CMS runs the High Level Trigger

− 1000 dual-CPU PC farm replacing traditional L2+L3 − Input from L1: 100 kHz, Output: 150 Hz − HLT runs all reconstruction algorithms

• Monitoring needs at HLT− “Keep an eye” on 1000 machines

HLT Inputs Physics objects Trigger rates

− Monitoring should not slow down main application (HLT algorithm) − Collect & process information “centrally”

Page 4: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 4

The Big Picture

CPU CPU CPU CPU CPU CPU CPU CPU

DQM principle: usesame code to servedifferent customers

Inputs Physics objects Triggers etc…

Monitoring producers

Monitoring consumers (clients)

DQM infrastructure:Collectors/Servers

Page 5: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 5

DQM from a client’s perspective

Client

“DQM”

Monitoringinformation

• Configuration• Reference objects• Historic plots• Etc…

• “Comparison-to-reference”• Collation of similar objects

Database Tools

“Alarm”

“System ok”

• Clear separation of creation of monitoring information from collection, processing• Too much work not to share with rest of CMS!

TCP/IP

TCP/IP or http

Page 6: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 6

Core Features• Support for all the “usual stuff”: static and dynamic sets of objects 1,2,3-D histograms, 1,2-D profiles, integers, floats, strings (ROOT objects behind the scenes)

• Support for unix-like directory structures

• Support for “monitoring producers” (Publish) & “monitoring consumers” (Subscribe) Clients can subscribe to (sub)directories, or “à-la carte”

• Support for root-tuples Create and save root-tuples w/ monitoring structure on the fly

• Stability Be able to handle connecting/disconnecting producers, clients at run-time: robust behavior and support for dynamic lists of nodes

Page 7: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 7

Tools

• “Soft-reset”: reset t < t0 contents of monitoring content − Does not permanently erase contents

• “Accumulate”: sum up contents over multiple monitoring periods

• “Collate”: add multiple monitoring data − Sum-up (same-format) contributions from different sources− Interface supports search-strings with wildcards (?, *)

Page 8: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 8

More toys: Web interface

Select information to visualize on the fly…

Page 9: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 9

Web interface in (real) action

• “Monitoring producer” (and collector): CERN• “Monitoring consumers” (clients): one at CERN, one at Florida (US)• You are looking at web browser running in Florida office

Live cosmic test data forend-cap muon detector

Page 10: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 10

Even more toys: Qt GUI

Page 11: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 11

Qt GUI in (real) action

Cosmic test data forcalorimeter detector(reading from file)

Page 12: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 12

More Tools: Quality Tests

• Library with “rules” for assigning “quality” value to tests− Comparison to reference (χ2, Kolmogorov tests)− Contents within range ([xmin, xmax], ([ymin, ymax])− Exact match− Mean of (e.g. gaussian) distribution “near” expected value− Flat occupancy− Etc…

• “Alarm” library Warnings & error messages should propagate to all clients downstream

• Group monitoring in sets with links to status, messages Create intuitive interface for quick problem spotting (see next slide)

Page 13: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 13

Organizing tests, results

Compare to reference ON OFF

• ERROR1: L1 Acc Increment =0 (5%) Move to Hidden Warnings Take Action2

• WARNING1: L1 Acc Increment =2 (10%) Move to Hidden Warnings Take Action3

• WARNING1: DDU Trailer Problems (7%) Move to Hidden Warnings Take Action1

• STATUS OK

Compare to reference ON OFF

Compare to reference ON OFF

Data Integrity Checks vs. L1 Accept. Number

Page 14: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 14

Time measurements

• 50-bin ROOT histograms (floats)

• 1 Gb/s Ethernet connection

• Mini-farm prototype at CMS site

<tship>=6.0±0.2 ms N = 1000 histograms

N = 1000 histogramsCould use this info to adjustupdate rate on the fly…

ROOT v5.08.00b

Page 15: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 15

Summary• DQM: a homogeneous monitoring solution for CMS

− Content: Trigger/Physics performance, subdetector data quality− Environment:

○ “HLT” processes in Filter Farm○ Monitoring processes fed by “live streams” or local DAQ○ Batch jobs (potentially “production” validation)

• Makes use of general framework and services− ROOT− Transfer protocols: TCP/IP, http− Tools for processing of monitoring information− Visualization: Web & Qt-GUI

• Ongoing development− Database requirements & interface design− Organization of alarms & quick problem spotting− Components for client customization

Page 16: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Backup Slides

Page 17: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 17

High Level Trigger: Event Filter Farm

Page 18: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 18

Subdetectors & The Magnet Test

• All subdetector groups have set up test sites (cosmic-ray tests, radioactive sources, laser beams) where monitoring programs are used (custom or not)

• All subdetector groups either plan to port/are porting their monitoring code or are already using the DQM infrastructure

Page 19: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 19

Rational for additional interface

• Question: Why not give users direct access to ROOT objects?

• An abstract interface does not bind the user access methods to a specific analysis framework. In our particular case, the transfer mechanism and the “true” format (ROOT) could change in the future, without breaking the customized programs.

• Having an abstract interface that hides the raw monitoring data from the user is a good OO practice. The set of allowed operations on the monitoring objects should be defined by the (abstract) user interface, not the framework used for the implementation.

• Additional functionality can be added to the monitoring objects (e.g. alarms) without directly inheriting from ROOT classes.

Page 20: Physics & Data Quality Monitoring at CMS Emilio Meschi (original design, run-control, mentoring) CL (core functionality, rules & alarms library, tech support)

Physics & Data Quality Monitoring at CMS Christos Leonidopoulos 20

DQM documentation

• Release noteshttp://cmsevf.web.cern.ch/cmsevf/DQM_doc/Release_Notes_v010.txt

• Documentationhttp://cmsevf.web.cern.ch/cmsevf/DQM_doc/DQM_instructions.html

• Archive with presentations from DQM group:http://cmsevf.web.cern.ch/cmsevf/DQMMeetings.html

• DQM status for subdetector groupshttps://uimon.cern.ch/twiki/pub/CMS/DQMInfrastructure/DQMDetectorStatus.html

• Draft on DQM “requirement & design” CMS note:Under preparation