db2 v12 migration @kbc - gse young professionals v12 migration @kbc.pdf · • db2 tries to store...
TRANSCRIPT
Internal
Agenda
Company 2
1. Time line
2. Premigration checks
3. Migration to V12
4. Measurements
5. New Features
6. Lessons learned
Internal
Time line
Company 3
• Project of 1 year, 300 Mandays, described in a Project Charter, monthly reporting to steering committee.
• Migration in phases (starting from V11 FFM)
• TREL Nov 2017 : Db2 tools
• TREL Jan 2018 : early code
• TREL May 2018 : V12R1M100 : 1/3 Dev, 15/3 Acpt, 5/5 Prod
• TREL June 2018 : V12R1M500 : 17/5, 3/6 Acpt, 16/6 Prod
• Sandbox3 : to M100 and M500 : todo
1.
Internal
Time line : Db2s to migrate
Company 4
# subsyst. 2-way DS 4-way DS
Sandbox 4 2
DEV 14 2
ACPT 14 4 4
PROD 14 4 4
1.
Internal
Premigration checks
Company 5
• Cfr. Installation & Migration Guide for checklist
• Release incompatibilities : simple TS, explain tables in old format, Packages that access the catalog
• BIND/REBIND Packages/Plans ?
• DBAs were involved
• They got output of premigration checks
• They schedule (re)binds in applicative releases
• All products/tools (>40) that use/access Db2
• were checked for V12 tolerance/compatibility
• were upgraded if needed at TRELs
• Remark : >6 months elapsed, a lot of effort !
2.
Internal
Challenges
Company 6
• Planned outage because of CATMAINT job ?
• In V11 we did, both at CM and FFM
• Window between 01:00 and 03:00 including migration and testing by DBA’s and Business
• In V12 we didn’t
• 20 sec for M100 for our largest catalog
• But… (to be continued…)
3.
Internal
Migration to M100 on DEV
Company 7
• DB2 dumps when executing query in V12R1M100 : Usermod (applied immediatly), APARFIX (with M500), UI55366
• Query : select count(1) from (select NAME,(casewhen EXISTS(select 1 from sysibm.sysdummy1) then '1' else '2' end) from sysibm.systables where NAME = 'X')
• DBM1 creates a 0C4‐00000004 : UI55990
4.1.
Internal
Migration to M100 on ACPT
Company 8
• During migration Catmaint V12R1M100 fails with REASON 00C90101 ERQUAL 5020 (OMG !)
• PMR opened : looks like PI93894 : bad value in the SCA record for table space DSNDB08.D8TS1 before and after Db2 start.
• Fallback to V11 for 2 non-data sharing Db2s (scenarios were in place and tested before !). We didn’t migrate the 3 others.
• Solved via usermod 10 days later : migrate to M100 on non-data sharing Db2s (IBM Dev worked around the clock to deliver AI94553)
• UI55070
• Remark :
• -TERM UTIL needed if catmaint is not restarted (not documented)
• Be sure all utilities are terminated in V11 (no V11 UTIL can be terminated under V12 so fallback to V11 needed)
4.2.
Internal
After migration to M100 on ACPT
Company 9
• Issues with UR :
• UR ? : “We always use UR, also for reading in our programs, to have a minimal impact to online workload regarding locks.”
• Impacted : all UNLOAD utilities with SHRLEVEL CHANGE ISOLATION UR
• Not all records are read (= records altered by 1 subsystem of a Data Sharing group and thus registeredin the Coupling Facility)
• Workaround : mass update UNLOAD jobs with SHRLEVEL CHANGE ISOLATION UR REGISTER YES
• 1100 jobs in PROD
• APAR PI99075 / UI57005
4.3
.
Internal
After migration to M100 on ACPT
Company 10
• S0C1 dump when executing query in V12R1M100 : workaround : add ‘rtrim’ function to query + alter hidden zparm. APARFIX PI95129 applied. Afterwards UI55397
• DIST address space dumps after migrating to V12R1M100 : CASE expression : UI53834 + alter hidden zparm
• DB2 supplied java stored procedures address space loops : workaround : set the NUMTCB to 1
• Resource unavailable for EDM DYNAMIC STATEMENT SPACE
• dynamic statement cache cleanup mechanism is not working anymore
• Increased size to 4G but not sufficient
• workaround: disable the dynamic statement cache (zparm ACPT warehouse)
• Solution : PI91054 / UI57563
4.3.
Internal
Migration to M100 on PROD
Company 11
• Executed on Saturday evening
• Sunday morning : issues with impact during 3 hours
• PMR Sev 1 opened
• lock requests (the vast majority) are issued for Space Map pages
• Db2 tries to store the V0 OBDREC info in a system page (new V12 feature and retrofitted to V11) and that the page set partition does not have enough space to store this info.
• online reorgs of the UTS PBG TS during the morning solved the issue
4.4.
Internal
Migration to M100 on PROD
Company 12
• Monday morning : issues with P-locks, severe impact !
• PMR Sev 1 opened
• 3 online reorgs of the PBG UTS TS during the morning to solve the issue
• 46 online reorgs during the evening + testing by Business
• V12 : UI57257 (TREL1811)
4.4.
Internal
Migration to M100 on PROD
Company 13
• Page p-locks in batch jobs
• Very high impact
• Cause : overflow rows
• Bypass : reorg the involved TS and ALTER TS PCTFREE FOR UPDATE (to reserve space)
• Solution :
• usermod (TREL1806)
• APARfix (TREL1811)
4.4.
Internal
Migration to M100 on PROD
Company 14
• Excessive getpages after initialisation of FTB.
• Very high impact
• Cause : ?
• Not reproducable
• re-enabling FTB is not an option
• Bypass : turn of FTB for this particular subsystem
• Solution :
• UI53014 for other FTB issue
4.4.
Internal
Migration to M100 on PROD
Company 15
• Data inconsistency : 00C90101 : data of 1 index is corrupt
• High impact
• Cause : regression on spacemap page
• Bypass : rebuild IX
• Solution : UI57598 (TREL1811)
4.4.
Internal
Migration to M100 on PROD
Company 16
• Select in batch gets ‐811 but there are no duplicates
• High impact
• Cause : wrong leaf page when FTB enabled
• Bypass : disable FTB (was not an option for us)
• Solution : UI53014 (TREL1806)
4.4.
Internal
Migration to M500 on ACPT
Company 18
• All load utilities on PBG UTS that result in an extra partition : resource unavailable with resource type 00000000 and no name filled in
• Also called ‘DBD puffing'
• First work around : ALTER TABLE...ADD PARTITION (but generates unneccesary partitions)
• Second workaround : ALTER TS PRIQTY
• Our solution : ALTER ALL PBG and PBR UTS TS SEQ QTY
• 57.000 TS on PROD !
• UI56192
4.6.
Internal
Migration to M500 on PROD
Company 19
• Issues with medium impact :
• problem with reorg job
• Hung remote thread consumes a lot of cpu
• runstats ends with 04E-00C90101
• Db2 12 message DSNI011I PAGE TEMPORARILY UNAVAILABLE
• DBM1 creates a 0C4‐00000004
4.7.
Internal
Problem solving
Company 20
• Follow-on PMRs : each TREL we roll out the most recent maintenance
• On the planning : ‘customer profiling’
• Sending Catalog/Directory and certain queries to IBM
• A workshop with experts from IBM and KBC to explain ourenvironment
• We asked IBM if it is possible to collect Db2 usage info and share that (so, if you get that question, please respond)
4.8.
Internal
Measurements : expectations
Company 21
• System and OLTP :
- 2-3% CPU reduction without Index In-Memory feature
- 5-10% CPU reduction using Index In-Memory feature
• Query performance :
- 0-20% without changing access path
- 10-40% with new access path
• A baseline measurement is done upfront
5.
Measurements : resultso V12R1M100 (TREL1805 on 5/5) and V12R1M500 (TREL1806 on
16/6)
• We didn’t see any effect on average TRX CPU usage
o 26/5, 23/6 and 25/8 : RAC on PROD :
o more effect on average TRX CPU usage
o A very hard interpretation of the numbers because of a lot of issues after M100 and all necessary workarounds and (temporary) maintenance
o 18/08 : TREL1808 on PROD
o Extra maintenance because Performance issues
o Dynamic Statement Cache issues (went to 4Gig and was not cleared)
Measurements : RID pool
o = internal Record Identifier
o Expected increase of 40% because of extension from 5 to 7 bytes.
o Only visible after M500
o With TREL1808 ZPARM MAXRBLK on PROD increased (with60% on PROD)
Measurements : Internal latchcontention relief
Latch contention classes : in TYPE 100 records
• LC14 buffer manager latch : QVLSLC14
• LC19 log latch : QVLSLC19 : removed ??
• LC23 page latch timer : QVLSLC23
• LC24 EDM latch : QVLSLC24
InternalLC14 buffer manager latch
35
Remark. : decrease after M100 on May 5
Remark : increase LC14 buffer manager latch : QVLSLC14 since M500 ! (no explanation yet)
InternalLC24 EDM latch
Remark : strong decrease after M100 on May 5 = normal because EDM pool is on other location and therefore handled by Storage Manager
Internal
New features :
Company 38
• We start by using defaults and out-of-the box coding
• New features are subject of study (not in project scope !)
• In-memory bufferpools (PGSTEAL(NONE) option of ALTER BP)
• In-memory index optimization using index Fast Traverse Block (FTB)
• Application related features (study by DBAs):
• SQL performance enhancements
• Implementation only at TREL or RAC
• Still an open question : how to deal with Continuous Delivery
6.
Internal
Summary
Company 39
• Hardest upgrade ever for (just the four of) us
• Had to open 4 SEV1, 12 SEV2 and 7 SEV3 PMRs for a lot of new issues
• We had to convince all management levels to proceed theupgrade to M500
• We were backed up by the management of our department
• A lot of (technical) explaining to do, even to our business
• Great cooperation between system engineers and DBAs, theinvolved development teams, the colleagues of our department
• Good cooperation with IBM
• Issues picked up immediatly
• Got an advocate (Bart) assigned
• Conf call and meeting with Jeff Josten + follow up of status of PMRs
7
Internal
Lessons learned:
Company 40
• Have fallback scenarios ready and tested
• Be current on maintenance
• Allthough some PTF which caused problems were already retrofitted in V11 (would have popped up during our half year maintenance)
• Have a good relationship with ‘advocate Bart’
8.