cpass0/cpass1 on lhc12e/d/c updated at 10:00 on 20/08

41
+ CPass0/CPass1 on LHC12e/d/c Updated at 10:00 on 20/08 C. Zampolli Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett)

Upload: nariko

Post on 22-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett). CPass0/CPass1 on LHC12e/d/c Updated at 10:00 on 20/08. C. Zampolli. LHC12f. Summary table – on 20/08 at ~ 10:00 LHC12f. 25 in logbook - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+

CPass0/CPass1 on LHC12e/d/cUpdated at 10:00 on 20/08

C. Zampolli

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

(S. Beckett)

Page 2: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+LHC12f

8/20/12 C. Zampolli 2

Page 3: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00LHC12f

8/20/12C. Zampolli

25 in logbook Filters used: LHC12f, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC,

TRD, TOF, T0]

CPass0, completed: Snapshot: 26 (run 186687 – 2 min - marked as bad later) Reco+CalibTrain: 26 Merging+OCDB: 25 (186845 still running in the reco), 23 needed, 19 ok

CPass1, completed: Snapshot: 19 Reco+CalibTrain: 19 Merging+OCDB: 19

3

Page 4: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12f

8/20/12C. Zampolli

COSMICS: 0 failure expected

EMCAL/PHOS/MUON: 2 failure expected

No triggers: 0 failure expected (too short run)

EE/EV/Expired: 0 memory issue during the merging (under investigation)

Running: 0

Others (detectors): 4 (186855 186816 186694 186687

Successful: 19

19/(19+4) = 82.6% success rate

4

Page 5: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12f

8/20/12C. Zampolli

Failure reason

Run Number

TRD (4)

186687

186694

186816

186855

5

2 min

12 min

6 min

7 min

All failures due to too short runs

Page 6: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12f

8/20/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (2)

186805

186834

6

Page 7: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass1 – LHC12f

8/20/12C. Zampolli

Of the 19 successful runs: 19 at CPass1 reco+CalibTrain 19 at CPass1 merging+OCDB

7

Page 8: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+LHC12e

8/20/12 C. Zampolli 8

Page 9: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00LHC12e

8/20/12C. Zampolli

27 in logbook Filters used: LHC12e, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0]

CPass0, completed: Snapshot: 27 Reco+CalibTrain: 27 Merging+OCDB: 27, 21 useful, 11 ok

CPass1, completed: Snapshot: 11 Reco+CalibTrain: 11 Merging+OCDB: 11

9

Page 10: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12e

8/20/12C. Zampolli

COSMICS: 0 failure expected

EMCAL/PHOS/MUON: 6 failure expected

No triggers: 0 failure expected (too short run)

EE/EV/Expired: 0 memory issue during the merging (under investigation)

Running: 0

Others (detectors): 10

Successful: 11

11/(11+10) = 52.4% success rate

10

Page 11: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12e

8/20/12C. Zampolli

Failure reason Run Number

TRD (8)

186428 (*)

186429 (*)

186453 (*)

186456 (**)

186459 (**)

186507 (*)

186508 (**)

186598 (*)

11

Failure reason Run Number

TRD + T0 (1) 186600 (**)

Failure reason Run Number

T0 (1) 186601

TRD: (*) suffered from missing class (CSPI8WU-S-NOPF-ALL) in the configuration during

data taking Fixed manually using CINT8WU-S-NOPF-ALL Cpass0/1 should be re-run

(**) suffered from statistics – 186459 has CSPI8WU-S-NOPF-ALL but with zero triggers)

T0 suffers from high background, but limits will be increased Re-running will be ok (but CPass1 should be triggered manually if Rev < Rev-23

will be used)

Page 12: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12e

8/20/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (6)

186383

186405

186425

186448

186503

186589

12

Page 13: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass1 – LHC12e

8/20/12C. Zampolli

Of the 11 successful runs: 11 at CPass1 reco+CalibTrain 11 at CPass1 merging+OCDB

13

Page 14: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Actions

CPass0 completed on the available runs 10 runs failed 2 T0 (1 in common with TRD)

CPass1 can be triggered manually at any time If re-running everything with Rev > Rev-23 (the next to come), everything should be

ok, otherwise CPass0 will fail again, and CPass1 will be needed to be triggered manually

9 failed in TRD (1 in common with T0) 5 runs had not the right class in the configuration

Fixed manually, waiting for OCDB update to re-run 4 runs have too little statistics

CPass1 completed on the available runs

In summary, 6 runs can be recovered

8/20/12C. Zampolli

14

Page 15: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+LHC12d

8/20/12 C. Zampolli 15

Page 16: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00LHC12d

8/20/12C. Zampolli

224 in logbook Filters used: LHC12d, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0]

CPass0 completed: Snapshot: 220 Reco+CalibTrain: 220 Merging+OCDB: 220, 176 needed, 147 ok

CPass1 completed: Snapshot: 148 (1 more than CPass0, triggered manually after CPass0) Reco+CalibTrain: 148 Merging+OCDB: 148, 148 needed

16

Page 17: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Difference between logbook and snapshot in MonALISA In logbook, but not in MonALISA:

184370 (EMCAL), 184645 (EMCAL), 185345 (ACORDE trigger), 185347 (ACORDE trigger), 185467 still in the migration process, checking with offline

In MonALISA but not in the logbook: 185190 (short run, the quality flag was changed)

8/20/12C. Zampolli

17

Page 18: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12d

8/20/12C. Zampolli

COSMICS: 9 failure expected

EMCAL/PHOS/MUON: 33 failure expected

No triggers: 2 failure expected (too short run)

EE/EV/Expired: 1 memory issue during the merging, but then merged manually

Running: 0

Others (detectors): 28

Successful: 147

147/(147+28+1) = 83.5% success rate

18

Page 19: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12d

8/20/12C. Zampolli

Failure reason Run Number

TPC Gain Threshold (1) 185460

Failure reason Run Number

COSMICS (9)

184880

184882

184885

184886

184889

184910

184914

184918

186264

19

Also TRD

16 recovered rerunning with looser constraints for validation (run 185460 not retried, since it failed anyway in TRD)

Page 20: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12d

8/20/12C. Zampolli

20

Failure reason Run Number

T0 (20)

185687

185692

185695

185697

185698

185699

185700

185701

185734

185735

185738

185756

185757

185764

185765 Hardware problem, fixed now

Failure reason Run Number

185768

T0 (20)

185775

185776

185778

185784

Page 21: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12d

8/20/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

184443

184481

184663

184664

184709

184716

184719

184762

184780

185024

185148

185186

185341

21

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

185456

185559

185560

185562

185631

185647

185677

185731

185934

185994

185998

186036

186062

186063

Failure reason Run Number

EMCAL/MUON/PHOS runs (33)

186159

186192

186224

186225

186232

186316

Page 22: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12d

8/20/12C. Zampolli

Failure reason Run Number

No triggers (2)183915

185190

TRD (8)

184190

185133

185378

185460

185915

185916

186319

186320

EV (1) 184673

22

Also TPC

Merged manually

Page 23: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass1 – LHC12d

8/20/12C. Zampolli

Of the 147 successful runs: 148 at CPass1 reco+CalibTrain

1 more than CPass0 since CPass0 was merged manually and the objects were uploaded manually in the OCDB (184673)

148 at CPass1 merging+OCDB… …of which 147 successful (ignore the red TPC color)… ...1 failed in TRD (184145)…

23

Different statistics for CPass0 and CPass1 480/480 chunks at CPass0 472/480 chunks at CPass1

Page 24: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+TRD issue

Due to a problem in the TRD reconstruction, some wrong OCDB entries were produced at CPass0; it is not possible to get the correct ones without re-running CPass0 Some manual OCDB update is needed (after LHC12d is fully processed,

ongoing for completed runs) Then CPass0/CPass1 should be re-run with a Rev > Rev-18 Will the failed runs be recovered? Waiting for experts’ reply

8/20/12C. Zampolli

24

Page 25: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Actions

CPass0 completed 20 runs failed at CPass0 due to T0 hardware problems

CPass1 should be triggered manually for these runs To be done after reprocessing, since now it would be useless (they all contain

TRD) 8 runs failed in TRD

TRD needs LHC12d reprocessing (only for the runs it was in) will these 8 runs be recovered, or the failure reason is something that won't be

fixed when re-running? run 184673 failed in CPass0 merging (EV) and had CPass0 entries uploaded

produced manually by Raphaelle, and uploaded in the OCDB CPass1 run, everything seems ok

8/20/12C. Zampolli

25

Page 26: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Actions – II

CPass1 completed 1 run failed in TRD due to lower statistics at CPass1 reconstruction

should we try to recover it? will the TRD people fix it manually before VPass? Probably not needed, since we should re-run everything for TRD anyway

In summary, we are waiting to re-run for TRD

8/20/12C. Zampolli

26

Page 27: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+LHC12c

8/20/12 C. Zampolli 27

Page 28: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00LHC12c

8/20/12C. Zampolli

205 in logbook Filters used: LHC12c, PHYSICS, Good Run, GRP ok at least one of [SDD,

TPC, TRD, TOF, T0] Do not coincide with those in MonALISA, since runs were queued

manually for CPass0

CPass0 completed: Snapshot: 208, 1 should be ignored (179444) Reco+CalibTrain: 207 Merging+OCDB: 207, 109 needed, 93 ok

CPass1 completed: Snapshot: 93 Reco+CalibTrain: 93 Merging+OCDB: 93

28

Page 29: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

COSMICS: 37 failure expected

EMCAL/PHOS/MUON: 58 failure expected

No triggers: 3 failure expected (too short, or not the right trigger configuration)

EE/EV/Expired: 0

Others (detectors): 16

Successful: 93

93/(93+16) = 85.3% success rate

29

Page 30: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

Failure reason Run Number

COSMICS (37)

179941

179943

179944

179946

179948

179950

179951

179960

180164

180979

180980

180981

180983

180984

180985

Failure reason Run Number

COSMICS (37)

180986

180987

180988

180991

180992

182749

182750

30

Failure reason Run Number

COSMICS (37)

179658

179712

179713

179717

179723

179725

179730

179736

179740

179742

179743

179746

179747

179758

179766

Page 31: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

179595

179603

179604

179685

179687

180552

180559

180616

180643

180644

180692

180704

31

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

181026

181040

181046

181328

181339

181344

181360

181546

181558

Page 32: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

181580

181625

181631

181954

181956

181984

182003

182094

182100

182103

182195

182198

182200

182226

Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

32

Failure reason Run Number

EMCAL/MUON/PHOS runs (58)

182316

182403

182405

182410

182449

182451

182452

182470

182471

182475

182477

Page 33: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

33

Failure reason Run Number

EMCAL/MUON/PHOS runs (60)

182499

182502

182504

182609

182610

182612

182640

182641

182681

182712

182717

182721

Page 34: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass0 – LHC12c

8/20/12C. Zampolli

Failure reason Run Number

No triggers (3)

180934

181609

182639

Failure reason Run Number

TRD (7)

180716 (*)

180717 (*)

182325 (*)

182509 (*)

182508 (*)

182513 (*)

182724 (*)

Failure reason Run Number

TPC+TRD (9)

181617 (**)

181618 (**)

181619 (**)

181620 (**)

181652 (**)

181694 (**)

181698 (**)

181701 (**)

181703 (**)

34

(*) Low statistics, recoverable(*) Low statistics, not recoverable(**) No SSD/SDD number of contributors to Vertex Track = 0, TRD calibration failing, TRD fix in place; what about TPC?

Page 35: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Summary table – on 20/08 at ~ 10:00CPass1 – LHC12c

8/20/12C. Zampolli

Of the 93 successful runs: 93 at CPass1 reco+CalibTrain 93 at CPass1 merging+OCDB…

…of which 84 successful in CPass1 (ignore the red TPC color)… …and 9 failed in T0, but are MUON runs – they should have not gone

through (different AliRoot, some changes in T0)

As soon as CPass1 is completed, 1 week of time will be given for manual update. If too little (QM, holidays), we’ll increase it. Then, Vpass should start

35

Page 36: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Actions CPass0 completed; 9 runs failed in TPC and TRD

TRD failed due to missing SDD/SSD; what about TPC? TRD provided a code fix would TPC try to recover these runs? if both TPC and TRD can recover, should we wait for this and then run

again CPass0/CPass1? 7 runs failed in TRD due to low statistics

TRD can recover them manually, but no CPass1 would be run after those how will the other detectors mark these runs?

TOF, T0 bad Mean Vertex good TRP? TRD?

CPass1 completed on the available runs

In summary, we need to know whether the 9 runs that failed in TRD+TPC should be reprocessed need a statement from TPC

8/20/12C. Zampolli

36

Page 37: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Further comments

8/20/12 C. Zampolli 37

Page 38: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Interdependencies

Under discussion: does EMCAL runs need calibration triggers? (PHOS does not) Seems not!

8/20/12C. Zampolli

38

Page 39: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Further issues

Some reconstruction jobs fail with bad_alloc under investigation Grid tests with gdb ongoing not many information retrievable, the jobs

ran successfully Valgrind test ongoing did not show anything significant Trying with Rev-21 on LHC12c, LHC12e

Many errors, but FPE, not bad_alloc stack trace available I could not reproduce the problem, still investigating

8/20/12C. Zampolli

39

Page 40: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+PPass

LHC12a and LHC12b Vpass validated ready for Ppass A patched Rev-16 was created to fix the TRD QA issue to be used to run

Ppass LHC12a completed, waiting for QA feedback LHC12b completed, waiting for QA feedback

8/20/12C. Zampolli

40

Page 41: CPass0/CPass1 on LHC12e/d/c Updated at  10:00  on  20/08

+Calibration of old data

GRP/CTP/Aliases entries to be created, after defining the classes to be used for the reconstruction Might be needed to apply some downscale min(max(nevents/10,30000),nevents)/nevents, but we need to define

nevents

8/20/12C. Zampolli

41