in the beginning, there was the log - webinar with martin hubel
TRANSCRIPT
Agenda
1. The role of the database recovery log2. How backups fit in3. Recovery scenarios4. Planning for recovery
2
Disclaimer
• Every attempt has been made to be accurate with product information. Trademarks are the property of their respective owners.
• I work mainly on DB2, but have used and supported other products.
• All DBMS mentioned in today’s webinar are commercial grade, used by hundreds of companies to manage their enterprises. No attempt is made here to compare DBMS products for functionality.
Copyright©2014 Martin Hubel Consulting Inc. 3
Assumptions
• This presentation is about OLTP applications• The need for backups is known• The need for recovery is also understood
• Databases are enabled for recovery• BI / DW have different recovery requirements
• Data is recovered / rebuilt / reload from another source
Copyright©2014 Martin Hubel Consulting Inc. 4
The database recovery log
• Log contains information necessary to:• Redo completed units of work• Undo incomplete units of work• Rebuild system status
• Today’s presentation will deal mainly with units of work (transactions) and their effect on the recovery log
Copyright©2014 Martin Hubel Consulting Inc. 5
A successful transaction
• Single transaction can contain many update actions• All updates must complete • Then a commit makes changes permanent• Successful termination is recorded in the recovery log
• Any failure rolls changes back to beginning• The DB is considered consistent before the transaction and after the
commitCopyright©2014 Martin Hubel Consulting Inc. 6
•Begin UR •End UR
•INS A •UPD B •UPD C •INS A•DEL D•INS E
•One Business Transaction
Unit of work
• Updates performed in the subsystem between two points of consistency
• Point of consistency (commit point) is a point where all recoverable data for an application is consistent with other data• Equivalent terms: Synchpoint, Logical Unit of Work or
Unit of Recovery (UR)• Some environments require a 2 phase commit
Copyright©2014 Martin Hubel Consulting Inc. 7
Batch transactions
• Batch jobs handle transaction can contain many update actions• Many business transactions are included in a single unit of work• The commit makes changes permanent for all transactions
• Any failure rolls changes back to beginning• Many tens, hundreds or even thousands of transactions are processed in
a single UR
Copyright©2014 Martin Hubel Consulting Inc. 8
•Begin UR •End UR
Rollback
Copyright©2014 Martin Hubel Consulting Inc. 9
• During rollback, data changes are not removed• The log is read and the changes reversed• The data is returned to its last consistent state (last commit)
• Some badly designed applications issue rollback statements following if validation fails• Can be a lot of log activity for nothing• Always complete all validation prior to performing first update
•Begin UR •End UR
•INS A •UPD B •UPD C •INS E
•One Business Transaction
•DEL E •UPD C` •UPD B` •DEL A
•Failure
Performance of new application
• A new financial application had a light transaction processing load• Transactions were complex and had a lot of validation and
updates to perform• Users regularly had to correct input before the transaction was
ready for processing• Performance was poor. Design showed:
• After each validation check, the SQL updates to that point were performed
• If a further check failed, the updates were reversed via rollback• This problem was found via a high volume of log activity
• Much of it for no benefitCopyright©2014 Martin Hubel Consulting Inc. 10
Transaction Recovery/Backout
• Some DBMS products default to this type of recovery (rollback of incomplete URs only)• Logging (log retain) in these products must be turned on for
full recoveryCopyright©2014 Martin Hubel Consulting Inc. 11
TIMEOUT
Work RollbackWait
Work Rollback
DEADLOCK
Work
Wait
Continue
a b
b a
Log files
• “In the beginning, there was the log”• DBMS writes to the active log file
• May be dual logs or mirrored by DBMS May be on RAID devices at hardware level
• Logs use buffered I/O For changes to data
• Logs are written synchronously at commit time• In this way, the log is close to being correct if failure occurs• Design I/O system to avoid waits for log writes
• DBMS must be able to write changes to the log
Copyright©2014 Martin Hubel Consulting Inc. 12
Archiving logs
• Current log is called the active log• Log files fill up and switch automatically to new files
• Called archive logs• Archive files may remain in same directory for some period• May be automatically offloaded to some facility like TSM
• Problems can occur in the archival process• If log file system fills, the DBMS stops!
Active log
Copyright©2014 Martin Hubel Consulting Inc. 13
Other storage
Archive logs
Batch, long running, and large transactions
• Thousands (or more) updates can be performed, e.g. by: • A batch job • A single SQL statement
• Long-running transactions cause problems:• Many locks taken during the updates• Many log records and log files written before a commit
• Commit frequency must be managed• Rollback is also problematic
• Requires log to be read, and to write the opposite action to reverse the change
Copyright©2014 Martin Hubel Consulting Inc. 14
“The audit table is too big”
• The new DBA/Linux sysadmin notices the size of the audit table is 30 million rows• Issues a DELETE FROM AUDIT_TABLE
WHERE DATE < ‘60 days ago’• 10M rows will be deleted
• The effect:• All users locked out of the application • No estimate of completion time
Copyright©2014 Martin Hubel Consulting Inc. 15
“The audit table is too big” - 2
• After many complaints and about 2 hours, the DBA cancels the Delete
• Rollback starts:• Rollback has to read the log to finds the updates and
write the opposite• As such, rollback typically takes twice as long
• The users continue to be unhappy and are now more vocal
Copyright©2014 Martin Hubel Consulting Inc. 16
“The audit table is too big” - 3
• The DBA, using his sysadmin kills, starts killing DB2 processes• Kill -9 has immediate effects to stop DB2 and the rollback
• But now, the rollback is incomplete• At restart, DB2 must remove the effect of the incomplete
rollback and start the full rollback afterwards• Estimated full recovery time is 24 hours
• Some behind-the-scenes involvement reduces the outage to 8 hours on a 24x7 system
• CIO asks: “What should we do with the DBA?”
Copyright©2014 Martin Hubel Consulting Inc. 17
... And Now
• A word from Embarcadero...
Copyright©2014 Martin Hubel Consulting Inc. 18
Can the log be kept forever?
• Hypothetically yes, the answer is no• Not practical• Recovery would take too long for one thing• Log retention would also be impractical
• Backups taken within the DBMS are recorded and provide starting points for recovery
• Once all databases or tables (tablespaces) have been copied, the oldest logs can be deleted
Copyright©2014 Martin Hubel Consulting Inc. 19
Backup copies
• Backups provide a starting point for the recovery process• Need at least one full backup• Two generations are safer
• DBMS options for backups:• Database• Table or tablespace• Incremental or delta• Online or offline
• Once all databases have a valid backup, the older log files can be deleted• Set generous retention periods
Copyright©2014 Martin Hubel Consulting Inc. 20
Gaps in the log
• A big recovery consideration is to ensure there are no gaps within the log range• A gap indicates that some log records are missing,
and DBMS is unable to determine what those records might have contained
• As such, forward log processing stops• It is advantageous to turn logging off for certain
situations or operations• E.g, large loads
Copyright©2014 Martin Hubel Consulting Inc. 21
Implications of turning off logging
• Logging is not free• Significant performance impact for certain operations
such as batch updates or load operations• Not logged operations affect the ability to forward
recover objects using the log• DBMS does not know what or how data was changed
• Therefore, DBMS requires a backup following not logged operations to re-establish a recovery point
Copyright©2014 Martin Hubel Consulting Inc. 22
Production mortgage system• DBA group was responsible for backups
• Forgot to take regular backups• Last backup was January 26
• System group managed the logs• Set a “reasonable” retention period of 60 days• On April 1, they deleted the logs older than Feb 1
• On April 8, a hardware problem occurred• Production was recovered to January 26
Copyright©2014 Martin Hubel Consulting Inc. 23
Application recovery scenarios
• Transaction recovery/backout• Application backout• Point-in-time recovery• Checkpoint restart• Data synchronization• Data inconsistencies• Out of space• Media failure• Accidentally deleted data
Copyright©2014 Martin Hubel Consulting Inc. 24
Recovery guidelines
• Test your recovery procedures• Make a backup copy prior to recovery - things can get
worse• Make a copy after recovery if you think it might
happen again• Make two copies using the DBMS backup utility• Periodically run integrity checking utilities to ensure
recoverability
Copyright©2014 Martin Hubel Consulting Inc. 25
Common user errors - 1
• Delete or write over archive log or image copy• Without necessary backups, recovery may be
impossible• Possible storage management issue
• Misuse of disk level dump/restore• Valid for disaster planning, but dump everything – Logs DBMS system objects User data
Copyright©2014 Martin Hubel Consulting Inc. 26
Common user errors - 2
• Common Programming Errors• Logical unit of work identification• Commit/CHKP placed incorrectly• Too many commits• Coding methods may vary throughout application,
e.g. Use of separate read and write routines Checking of SQL return code codes
Copyright©2014 Martin Hubel Consulting Inc. 27
Application Backout
• Required when multiple jobs running concurrently against same database
• Work was committed against database• Backout may not be possible or practical in all cases
Copyright©2014 Martin Hubel Consulting Inc. 28
Startof Job
Time
NormalJob End Job 1
Job 2
Job 3
Job 4
Backout, repair, and rerun
SQL change in production
• “Oops, we thought that was the test system”• An SQL error changed all addresses to the same
value• 780,000 rows in table
• “Martin, please change them back”• No, you can’t restore from a backup New customers have signed up Old customers may have updated their information
Copyright©2014 Martin Hubel Consulting Inc. 29
SQL change in production
• While this was pressing, it was not a “show stopper”• Fortunately, I had unloaded this table for some reason
• This became a source for the correct addresses• Alternatives might have been a redirected restore /
transportable tablespace• We were able to load the backup table
• Found SQL on internet and adapted it• Ultimately updated 362,000 addresses
Copyright©2014 Martin Hubel Consulting Inc. 30
Summary
• All update activity is recorded in the DBMS log• Log performance is part of overall DBMS
performance• Backups and logs are both used for recovery• Plan and test recovery scenarios prior to needing them
Copyright©2014 Martin Hubel Consulting Inc. 31