2nd september 2008 1richard hawkings / paul laycock conditions data handling in fdr2c tag...

3
2nd September 2008 1 Richard Hawkings / Paul Lay cock Conditions data handling in FDR2c Tag hierarchies set up (largely by Paul) and communicated in advance No real problems uploading data to the correct tag Calibration experts starting to deal with ‘real’ IOVs (data valid for calib n period) New POOL file registration scripts worked fine Calibration users need to be in AFS group atlcond:poolcond Consider doing calibration uploads from a ‘calibration’ account, not personal ones? No instances of data in COOL without corresponding (or wrong) POOL file upload No use of run-signoff database pages yet System was not ready and integrated yet (holidays; too busy with other things) But only one set of runs, and all calibrations were ‘accepted’ - no real test Handling of detector status information works technically Merging and transfer to LBSUMM folder (for ESD/AOD) still done by hand Limited mapping of DQ histograms to status flags restricts usefulness Need to make sure this improves for real data Need to clarify how detector status flags are dealt with in ES1, ES2 processing

Upload: bonnie-allen

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2nd September 2008 1Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance

2nd September 2008 1Richard Hawkings / Paul Laycock

Conditions data handling in FDR2c

Tag hierarchies set up (largely by Paul) and communicated in advance No real problems uploading data to the correct tag

Calibration experts starting to deal with ‘real’ IOVs (data valid for calibn period)

New POOL file registration scripts worked fine Calibration users need to be in AFS group atlcond:poolcond

Consider doing calibration uploads from a ‘calibration’ account, not personal ones?

No instances of data in COOL without corresponding (or wrong) POOL file upload

No use of run-signoff database pages yet System was not ready and integrated yet (holidays; too busy with other things) But only one set of runs, and all calibrations were ‘accepted’ - no real test

Handling of detector status information works technically Merging and transfer to LBSUMM folder (for ESD/AOD) still done by hand Limited mapping of DQ histograms to status flags restricts usefulness

Need to make sure this improves for real data

Need to clarify how detector status flags are dealt with in ES1, ES2 processing

Page 2: 2nd September 2008 1Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance

2nd September 2008 2Richard Hawkings / Paul Laycock

Conditions DB access problems

Big problems in Tier-0 conditions DB access Thursday night/ Friday morning Combination of several factors

2/4 of Oracle server nodes got into trouble and restarted Kernel patch being applied this week, some interdependencies not fully understood yet Server full of ‘stuck’ connections which were never released or cleaned up - deadlock

Very high load due to FDR2 bulk reprocessing and cosmics reprocessing going on in parallel, plus FCT, ATN, RTT, TCT tests, plus user jobs

All jobs accessing Oracle directly, no use of SQLite replicas at present Replica only useful once the run is ended online - applicable to ES2, bulk reco only

Vulnerability in that ALL Athena jobs accessing Oracle use same reader account Limit of 800 concurrent sessions, now changed to 4 x 800 Each Athena job holds O(10) connections in parallel until end of first event (one per

subdetector schema) - typically for 5 minutes or so. Vulerable to ‘deadlock’

Further actions being pursued Deploy SQLite replica for bulk processing (but not for cosmics / express stream) Use a dedicated COOL reader account for Tier-0 jobs - guarantee # connections Reduce connection load from Athena jobs (short/long term actions)

Page 3: 2nd September 2008 1Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance

2nd September 2008 3Richard Hawkings / Paul Laycock

Next steps - discussion needed

Work on conditions DB access problems Deployment of SQLite replicas to be used where possible

Start to setup tag hierarchies for first data Separate top-level tags to be used by HLT, monitoring, Tier-0, reprocessing

Define calibration loop model for first data Cosmics processing has no calibration loop, and several ‘express’ streams Same plan for single beam running, or move to ‘calibration loop’

Calibration 24hrs might be needed for code fixes even if no prompt calibration can be done yet, might have multiple processings at Tier-0

What to do for first collisions Sign-off tool and Tier-0/conditions integration to support all this ..?