r. jacobsson 1 pit operation - “luminosity production” - is in good hands with many devoted and...

42
R. Jacobsson 1 Pit Operation - “Luminosity Production” - is in good hands with many devoted and competent people from experts to shifters But as conclusion will state, we need more to guarantee quality physics Experts would also like to be able to devote a bit of time to physics analysis Also, luckily we left behind a very good team at CERN meeting the challenges of this week with nominal bunch intensities!

Upload: tamsyn-gregory

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

R. Jacobsson 1

Pit Operation - “Luminosity Production” - is in good hands with many devoted and competent people from experts to shifters

• But as conclusion will state, we need more to guarantee quality physics

Experts would also like to be able to devote a bit of time to physics analysis

• Also, luckily we left behind a very good team at CERN meeting the challenges of this week with nominal bunch intensities!

R. Jacobsson

Concentrate on global topics that are of concern interest to entire collaboration

• Will not discuss the status of the individual sub-detectors unless affecting global operation

• In the past presented a lot how we followed and participated to the beam commissioning

Main topics• Operation up to now• Operational status and efficiency• Luminosity• Data Quality• First experience with nominal bunches• Trigger • Organization• Tools to follow operation• Shifter situation, the working model, and the needs for the future

2

R. Jacobsson

Machine and Experiment Availability• Extremely low average failure rates * extremely high number of vital system = 0.50• Thunderstorms daily now!

Tripped LHCb magnet already twice

3

Wednesday:1.AFS problem2.SPS down3.Thunderstorm4.VELO motion5.….

A lot for one day…..Still we took 1h of physics!•Wrong fill number for 30min!

R. Jacobsson

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12

Day shifts & Day Piquet

LHCb PostMortem meeting

LHC Beam Commission Workshop

LHC Chamonix Workshop

Meeting on calibration run needs

24h Shifts 11/2 -

LHC + Exps Dryn Run

First beam in LHC 28/2

450 GeV Technical Fills?

Power-up & Online Upgrades

Detector Calibrations & Dataflow?

Cosmics 13/2-14/2

TED Run 18/2

Determining colliding beams?

LHCb Magnet (V-) w. beam

LHCb Magnet (V+) w. beam?

January February Mars

Sub-detector stand-alone works and testsContinuous global 'Heat Run' with rate ramp (First detector calibrations)

LHC Beam Commissioning

MD Mode = off No Beam = Tests

Where the plan stopped at the RC report in March:

4

R. Jacobsson 5

Nominal bunchesB downHLT1 rejectionHLT2 pass-thru

Bup~5nb-1 Bdown~7.6nb-1

Minimum Bias, HLT pass-thru

MB<1 kHzHLT1 rejection

MB<100 Hz(H

z/b

)

R. Jacobsson

Cumulative (In-)efficiency logging implemented since fill 1089• Breakdown on HV, VELO, DAQ, DAQ Lifetime (trigger throttling)• Entered into Run Database

6

Operational luminosity (in-)efficiencies May 10 – June 5

R. Jacobsson

LHCb dependence on LHC:• Short-hand page for LHC Operators and EICs

• Completely automatized for LHCb Shifters requiring ‘only’ confirmations Also Voice Assistance VELO still to be fully integrated Very advanced as compared to Atlas and CMS….

7

R. Jacobsson

LHCb State Control

8

R. Jacobsson

Shifter Voice Assistance• Draw attention to new information or changes

LHC Page 1, injection, optimization scans, etc• Instructions for LHCb State Control handling

HV/LV handling, BCM rearm etc • Undesired events…

Beams lost, run stopped, magnet trip, clock loss• DSS Alarms, Histogram Alarms to be added and voice quality to be improved

• Related work in progress: Clean up shifters instructions on the consoles and add help button to all displays

Collapse of separation bumps simultaneous between all experiments• Golden orbit established with improved reproducibility Good luminosity already during Adjust Optimization scan right at start of Stable Beams starting with experiment with lowest

luminosity

Full VELO powering during ADJUST (TCTs at physics setting and separation bumps collapsed) Powering of VELO by central shifters next step

• Future of VELO Closure by Shifter/Expert being discussed Closing Manager now very user friendly Aim for have “on-call” shifter for closing, preferably same as piquet

End of fill calibrations, automization?

9

R. Jacobsson

Work on automatic recovery from DAQ problems in progress• Added one after the other• Start testing Autopilot

Majority mechanism when configuring farm to start run being looked into• Farm sw and storage sw crashes still room for improvement

Exclusion/recovery of problematic (sub)farms on the fly while taking data• Routine procedure for shifters

Recovery of monitoring/reconstruction/calibration farms while taking data

Faster recovery of sub-detectors without stopping the run (only trigger) becoming routine maneuvers for most shifters

10

R. Jacobsson 11

Two numbers for Trigger Deadtime counting•TriggerLivetime(L0) @ bb-crossings•TriggerLivetime(Lumi) @ bb-crossings

Also major improvements made on monitoring of the HLT (histograms and trends)• Both technical parameters and physics retention

R. Jacobsson

Problem with DAQ and control switch seems solved

Storage problem earlier this year also solved

Purchase of farm during 2010Q3, install in November during ion run

Some outstanding subdetector problems:• Dieing VCSELs in the subdetectors is a worry• SPECS connections in the OT tracker• Control of ISEG HV for VELO seems solved changing from Systec to Peak• L0 Derandomizer emulation of Beetle – about to be addressed• …

System Diagnostics Tools is like an AOB on every agenda since always…• Alarm screen and Log viewer• Well, better solving the problem than adding the alarm if that works!

12

R. Jacobsson

Data Quality is of highest importance now (together with trigger)• Main problem: we need more interest/participation from people doing physics analysis

Discover and document “which” problem is tolerable and not tolerable• Impact on data quality in order to know the urgency of solving a problem operational

efficiency Aim at perfect data obviously more than 100% operational efficiency But recoveries should be well thought through, well planned and swift, and to the extent that it is

possible coordinated with other pending recoveries!

• How to classify data quality problems for different physics analysis

Establish routine for use of Problem Database• Checking in and checking out entries, fast feedback

Procedure outlined for decision on detector interventions which may have an impact on data quality

Working group setup to address Online Data Quality tools and follow up• Improvements of histogram presenter, histogram analysis, alarms etc• Need for trend plots, trend presenter and trend database being looked into• Documenting quality problems and their impacts/recoveries• Reconstruction Farm and associated histograms• More interest from subdetectors would be welcome

13

R. Jacobsson

Shifter catalogue • Most important/significant histograms with descriptions and references• Several iterations, still need improvements and links to severity/actions

Alarm panel from automatic histogram analysis• Associate sound/voice to alarms

14

R. Jacobsson

The tool for registering data quality problems – Problem Database• Shared between Online – Offline• http://lbproblems.cern.ch/ (“Problem DB” from LHCb Welcome page)

15

R. Jacobsson

Three sources of luminosity online• Counted by ODIN using non-prescaled L0Calo or L0Muon trigger from L0DU

Getting average number of interactions per crossing and pileup from fraction of null crossings Correcting luminosity real-time

Recorded luminosity

• Beam Loss Scintillators acceptance determined relative to L0 Calo Luminosity corrected for pileup

• LHC Collision rate monitors (BRANs) Not yet calibrated but in principle only used for cross-checking

• Combination gives delivered luminosity• Recorded in Online archive, Run Database, LHC displays and logging, and LHC

Program Coordinator plots (delivered) for overall machine performance

Optimization scans are based on this combined luminosity

For offline lumi triggers containing luminosity counters – “nanofied”• Tool being finalized to obtain integrated luminosity on analyzed files• Constantly at 1 kHz• Careful changing thresholds/prescaling on sources of the lumi counters

16

R. Jacobsson

http://lbrundb.cern.ch/ (“RunDB” on LHCb Welcome page) • Tool for anybody in the collaboration to get rough idea on data collected• Help/documentation should be linked

17

R. Jacobsson

Van der Meer scans• To a large extent automatic with ODIN connected directly to the scan data received from

LHC real-time and flagging the steps in the data Allows easy offline analysis

Has allowed a first determination of length scales (LHC/VELO) and of absolute luminosity:• Visible L0Calo cross-section to 60+/-6 mb (prel)

• From MC: (L0 CALO) = (L0) x 0.937 = 63.7 * 0.937 = 59.7 mb

• Many things still to be verified, another vdM scan is on our planning Also allows another method to extract beam shapes and VELO resolution

18

R. Jacobsson

Access to experiment condition archive in the online system • Machine settings

• Beam parameters measured by machine and LHCb

• Backgrounds measured by machine and LHCb

• Trigger rates, luminosities, VELO luminous region, bunch profiles

• Run performance numbers, etc

Tool also produces LPC files for luminosity, luminous region and bunch profile data19

R. Jacobsson

1. Arrived at a dead-end with Qbunch ~ 2E10 (max 4-5E10)

2. More to understand with increasing Qbunch than Nbunch

3. Summer months with not all experts present

4. Keep up luminosity ladder for this year

June 9 - June 25 (16 days!)

20

7x7@5E10

13x13@2E10

[email protected]

[email protected]

[email protected]

R. Jacobsson

Increasing number of nominal bunches through July-August• 170 kJ 1.5 MJ• Gain experience• Understand already strange bunch/beam behaviour• LHC Operation does not feel ready for 0.5 – 1 MJ yet, work in progress

21

2x2 1e11 2 1 112 2.5E29 0.005 (1 fills)3x3 1e11 3 2 168 5.0E29 0.03 (3 fills) 6x6 1e11 6 4 336 1.0E30 0.7 (10 fills) 12x12 1e11 12 8 672 2.0E30 2.1 (10 fills) 24x24 1e11 24 16 1344 4.0E30 4.9 (10 fills)Trains needed…

2x2 1e11 2 1 112 2.5E29 0.005 (1 fills)3x3 1e11 3 2 168 5.0E29 0.03 (3 fills) 6x6 1e11 6 4 336 1.0E30 0.7 (10 fills) 12x12 1e11 12 8 672 2.0E30 2.1 (10 fills) 24x24 1e11 24 16 1344 4.0E30 4.9 (10 fills)Trains needed…

R. Jacobsson

Complete two-day internal review of the Machine and Experiment Protection• >1.5 (3) MJ• Long list of actions• Will be followed by a complete external review

Dump following lightning strike and power blackout!

22

R. Jacobsson

Four fills with 3x3 • #fill Qbunch L0Calo Pileup PeakLumi Efficiency• 1179 0.8E11 7500 1.2 0.15 78% (VELO lumi-monitoring/BPM/new conf)• 1182 0.9E11 16000 1.7 0.46 68% (deadtime, HLT blocked)• 1185 1.15E11 19300 2.3 0.73 85% (RICH, VELO, • 1186 10000 1.3 0.22 To be patched (wrong fill number but stable)• 1188 16000 1.7 0.46 65% (Storage, HLT, VELO, Trigger OK)

23

Rocky start!... Old L0 settings + HLT1+ HLT2Express (Stable but 15% deadtime)Reconfiguring: New L0 settings + HLT1+ HLT2Full (30 min)

Memory and combinatorics – run died and tasks stuck…

2 hours to recover/reconfigure New L0 + HLT1 + HLT2Express

Completely stable through entire night

R. Jacobsson

We’ve been sailing in light breeze up to now

Not only interaction pileup but also problem pileup• Pileup 2.3!

• Occupancies E.g. Problem with MTU size for UKL1

• Event size 85 kB (used to be 35 kB)

• Storage backpressure Running with 10% - 20% deadtime at 1500 – 2000 Hz at 85 kB (peak!) Suspicion is that MD5 checksum calculation limits output (again) to 1 Gb/s

• Lurking instabilities in weak individual electronics boards? Desychronizations, data corruption, strange errors at beginning of fills….

24

R. Jacobsson

Peak occupancies 22%! Average >7.5% as compared to 5% in the past

25

R. Jacobsson

(0x2710 0x1F)• L0-Mb (CALO, MUON, minbias, SPD, SPD40, PU, PU20) Prescale by 100• Physics

Electron 700 MeV 1400 MeV Hadron 1220 MeV 2260 MeV Muon 320 MeV 1000 MeV Dimuon 320/80MeV 400 MeV Photon 2400 MeV 2400 MeV

Yet another configuration prepared• L0xHLT1 retention 2%, including HLT2 would allow to go to 200 kHz• Would prefer not to use even if we have to run with a bit of deadtime

Changed to solve 10% - 20% deadtime problem• System completely stable with deadtime but long to stop in case of problems….

10 kHz of random bb-crossing and be-, eb-, ee-crossings according to • Weighting {bb:0.7, eb:0.15, be:0.1, ee:0.05}

26

R. Jacobsson

Technical problems in HLT• HLT1 (3D) OK with 7.5 % retention

• HLT2Express stable but contains only J/, , KS, Ds, D*D0BeamHalo

• HLT2Full (150++ lines) serious problems and surely a lot of unnecessary overlap• HLT2Core (81 lines) validated with FEST and data taken during weekend

Configured in pass-through now to test it and check output before we have to switch on rejection >6x6

Best compromise we have for the moment together with L0TCK 0x1F First impression is that it was working stable during fill this night

• Processing time for HLT with HLT2Express observed to be 140ms…450 nodes x 8 task * 1/140E-3 = 26 kHz!To be followed upShould see how this developed with HLT2Core during this nights fill

• Two measures to solve bad memory behavior partly and stuck tasks already done Activating swap space on local disk of farm node improved significantly the situation Automatic script prepared which would kill the leader

Requires careful tuning and testing since memory spread is narrow

Memory/disk in Westmere machines?

27

R. Jacobsson

We managed to take a lot of data containing full natural mixture of pileup• Invaluable for testing, validating and debugging HLT• Lucky we got nominal intensity now with few bunches!...

We aim hard to be flexible and should keep this spirit• But converge quickly on compromise for physics and technical limitations

Most of all solve bugs and tune system

• Avoid cornering ourselves in phase space now in panic by severe cuts• Exploring and understanding is now or never

Procedure for release of new TCKs works well now and efficient• But should not be abused!

FEST is an indispensible tool for testing/debugging/validating HLT• Make sure it satisfies needs for future• More HLT real-time diagnostics tools to be developed

Effect of L0 derandomizer and trains….• No proper emulation for Beetle and we are forced to only exploit half of buffer• We currently accept all crossings… Filling scheme for autumn 25% L0 deadtime

28

R. Jacobsson

Two possibilities to reduce luminosity per bunch• Back-off on beta*

Requires several days – week of machine commissioning

• Collision offset in the vertical plane Beam-beam interaction with an offset between the beams can result in an emittance growth Follow ongoing tests for Alice to reduce luminosity by a factor 30

• Hoped to detailed news from Alice beam offset tests Attempt during end-of-fill study this morning but not completed due to control software

HOT NEWS while I was in the plane: Seems to work fine

29

R. Jacobsson

Daily Run Meeting ~30 minutes• EVO everyday• Chaired by Run Chief• 24h summary with Run Summary attached to the agenda (Lumi, Efficiency, Beam, Background)• LHC status and plan• Round table where experts comment on problems• Internal Plan of the Day

Minutes from Run Meeting and other postings on Run News serve two purposes• Expert follow up on problem• Inform collaboration about daily operation – strive for public language in 24h summary and plan

for next 24h

Improve• Systematic follow up on data quality• Check lists• Checkup on Piquet routines• Invite more Run Chiefs – already discussed with several candidates• Meetings three days a week when we are ready for this (Monday – Wednesday – Friday)

Requires more discipline from piquets and efficient exchange of information directly with involved people

• Synchronize piquets take-over with overlaps

30

R. Jacobsson

http://lhcbproject.web.cern.ch/lhcbproject/online/comet/Online/• (“Status” from LHCb welcome page)

31

R. Jacobsson

1.Shifter Intro

1. Introduction

2. Pit Area 8 – LHCb

3. Control Room

4. Cavern

5. Access to Cavern

6. Shift Organization

7. Safety

8. Calling Experts

9. Coordinators

10.Experts

11.Online computers

12.Shifter Duties

13.Shift Logbook

14.LHCb Status

15.LHC Status

16.LHC Logbook

17.Documentation

18.Conclusion

R. Jacobsson

1. Introduction for LHCb Shifters

1

2. SLIMOS

1. Role of SLIMOS

2. Safety Systems

3. Level 3 Alarms

4. L3 Alarm and fire brigade

5. L3 and SLIMOS duties

6. Emergency Panel

7. Detector Safety System

8. DSS Panel

9. Contacts

R. Jacobsson

2. SLIMOS

1

32

3. Basic Concepts

1. Introduction

2. LHCb at LHC

3. Coordinate Systems

4. Insertion Region 8

5. Injection

6. Filling Schemes

7. Collimation

8. Beam Dump

9. Fill Procedure

10.Crossing Angle

11.Timing

12.LHCb Detector

13.Readout System

14.Trigger

15.Luminosity

16.Backgrounds

R. Jacobsson

3. Basic Concepts for Shifters

1

4. Running LHCb

1. Introduction

2. Running LHCb

3. Operational Phases

4. Shifter Interfaces

5. LHC Page 1

6. LHCb Overview

7. LHC/LHC Op.View

8. Intensity&Luminosity

9. Backgrounds

10.LHCb Beam Dumps

11.Beam Pos.Monitor

12.Timing

13.Trigger Rates

14.Run Change

15.Run Performance

16.Experiment Status

17.Magnet

18.Cavern Radiation

R. Jacobsson

4. Running LHCb

1

5. Data Manager

1. Introduction

2. Data Manager Duties

3. Quality Checking

4. Problem Reporting

5. Data Monitoring

6. Histogram Presenter

7. Trend Presenter

8. Event Display

9. Run & File Status

10.Problem Database

11.Logbook

R. Jacobsson

5. Data Manager

1

6. Shift Leader

1. Introduction

2. SL Duties

3. Golden Rules

4. Operational Procedure

5. Mode Handshakes

6. Cold Start

7. LHCb State Control

8. Clock Switching

9. End of Fill

10.Machine Development

11.Run Control

12.System Allocation

13.System Configuration

14.Run/File Status

15.Farm Node Status

16.Dead Time

17.Error

18.Slow Control

19.Access

R. Jacobsson

6. Shift Leader

1

Shifter Training• Completely overhauled and updated training slides• Refrsher course now as well

With EVO in future Invite piquets to go through Shifter Histograms with Data Managers

• Insist more on shifts with already experienced shifters as newcomer

R. Jacobsson

In my view the experiment consists of sort of three levels of activities:1. Maintaining and developing all from electronics to the last bit of software in the common

interest of the experiment.

2. Producing the data we use for analysis, basically carried out by four types of shifters: Shift Leader, Data Manager, Production Manager, Data Quality checker

3. Consuming the data and producing physics results

• Activity 1 and 2 should not be compared and counted in the same "sum“• Activities 2 and 3 are instead coupled:

"I contribute to produce the data that I analyze"

• Huge benefit taking regular shifts, learn about data quality, and have the opportunity to discuss and exchange information about problems met in your analysis of real data

Shifter situation “Far from satisfactory” – What does it mean?• Means that “the situation is vital to improve” by:

1. Maintaining current commitments

2. And making an additional effort which is relatively modest spread across all of LHCb!

33

R. Jacobsson

Shifter model based on the idea of “volunteers”• Not synonymous with “offering a favour” to people heavily involved in operating LHCb• Based on the idea of feeling responsible, in particular for your own data• We need people interested in learning about the detectors and data they are hopefully

going to use Each group would normally find the representatives themselves, also to a large extent meaning

an Experiment Link Person

• Why this model? Because we don’t have neither the tools, nor the time and strength to be bureaucratic

• However, up to now not sufficiently clear on the size of the required commitments

November 2009 – July 2010 #/24h #Shifters #Shifts

• Active Shift Leaders 3 30 660• Active Data Managers: 3 61 (- Dec) 564 • Active Production Managers: 2 27 408• Active Data Quality Checkers: 1 11 13

Total 9 129 1768

34

R. Jacobsson 35

November 2009 – July 2010

Current Normalized Contribution

Institute

R. Jacobsson 36

November 2009 – July 2010

Institute

R. Jacobsson 37

Nov 09 – July 10 (3 / 24h) Nov 09 – July 10 (3 / 24h)

Nov 09 – Dec 10 (2 / 24h) Nov 09 – July 10 (1 / 24h)

R. Jacobsson

Assuming• Perfect uniform availability (no exclusion of weekends, nights)• Immediate replacement of people leaving and no lag in training new people

38

R. Jacobsson

Change in subdetector piquets coverage being increasingly assured by non-experts instead of experts

Should free the people with the ideal profile for shift leader shifts this year

39

“One available shifter taking 4-6 shifts every 2 months per 3 authors”,

Recruited 2010-2011 from:1. Shift Leaders: A pool of 50-100 people with experience in commissioning/operation of LHCb

2. Data Managers: All authors making physics analysis

3. Production Managers: A pool of 50-100 people with experience with analysis on Grid

4. Data Quality: All authors making physics analysis

R. Jacobsson 40

R. Jacobsson

Experiment Conditions are good, machine is very clean

Data Quality• Requires fast reaction time and feed-back/good communication with offline• Establish the habit and routine• No Data Quality offline now for two weeks!

Find appropriate compromise for trigger is of absolute highest priority and solve technical issues

• Dedicate time/luminosity intelligently now

System stability, individually is good but multiplied with the number….• Sensibilize everybody to react to any anomaly and act quickly• Big step from 10 years of MC to real data

Masochistic exercise to produce shifter bstatistics• Need improvements and functions in ShiftDB tool

Great team work, spirit and perseverance • Join us to produce Your data!

LHC bunch evolution til end of August• Up to 24 bunches with 16 colliding in LHCb = 1.55 MJ/beam

41

R. Jacobsson

Regular opportunities for access up to now OT tracker opened 3 times to change FE box

• Impact on data quality

Procedure for filing access request and handling works well• Taken care of very well by shifter, Run Chiefs and Access Piquet/RPs

Issue:• Still no instruments for radioactivity in magnetic field!

Complicates access where in principle magnet could be left on

42