Download - Databases: Backup and Recovery
Database Recovery
Recovery
Nothing works 100% of the time => procedures are required for failure recovery
Recovering in a DBS works as follows, if the current state is incorrect (or suspect) the DB must be restored to a state known to be correct
Causes of Failure (I)
Programming errors - in the application in the O.S. in the DBS
Hardware errors - on the device on the channel on the CPU
Causes of Failure (II)
Operator errors (e.g. mounting wrong tape)
Fluctuations in the power supply
Fire, sabotage, etc.
Underlying Principle : REDUNDANCY
Ensure that a given piece of information can be reconstructed from some other information stored (redundantly) in the system.
So, periodically (e.g. once a day), the entire DB is copied (dumped) to archive storage.
For each change made, a record containing the old and new values are written to a log
Recovering from Failure
If failure occurs, there are two possibilities if the DB is damaged, e.g. disk head crash, the
DB can be restored by loading the archive copy and using the log to redo all changes since the copy was made
If the DB is O.K., but the contents are unreliable (prog crashing in mid-execution), the log is used to undo all unreliable changes.
Duplexing
Maintain 2 copies of the DB, and apply all updates to both simultaneously,
Advantage increased reliablity
Disadvantages requires double the storage log still need to undo changes 2 copies need independent “failure modes” (e.g.
on different channels)
Transactions (I)
A unit of work consisting of the execution of an application specific sequence of operations, ending with a COMMIT or ROLLBACK operation
Transactions cannot be nested all recoverable operations must be
done within the bounds of a transaction
Transactions (II)
Transaction are an all-or-nothing proposition and must be executed only once
Recovery manager is responsible for ensuring the reliability of the recovery techniques
On-Line Log
Log can be very large (200Megs not unusual) => not wise to keep log permanently on-line
Possible approach is to maintain an active log as a direct access data set when this data set is full, Log manager switches
to a second data set and dumps the first to archive storage
Since automatic rollback is sometimes required, all active transactions must reside in active log
Active log
To active safety the second data set must be opened as soon as the first data set is 95% full; thereafter, all the new transactions are written to the second data set, while transactions active at the time of the switch continue to be written to the first data set
UNDO Logic
Rollback itself can be subject to failure
=> UNDO(UNDO(UNDO….(X))))
= UNDO (X)
Log Compression Archive version of the log does not have to
be identical to the on-line version Convenient to use data compression
techniques so, unnecessary to retain log records for
transactions that have failed to COMMIT, since they are rolledback.
Changes can be consolidated, only final value of data item need be stored.
System Failures
System failure is any event which causes the system to stop and thus require a subsequent system restart
DB is not damaged, but all of the transactions active at the time of the failure must be rolled back
The Recovery Manager
How does the RECOVERY MANAGER know which transactions were active ?
Could go through the log searching for all BEGIN TRANSACTIONs with no corresponding COMMIT/ROLLBACK
But, time can be saved using CHECKPOINTS...
Checkpoints Checkpoints - the 4 step process
1. Forcing the content of the log buffers out to the log data set (forcing any log records the are still in main storage out to the actual log)
2. Forcing a “Checkpoint record” out to the log data set
3. Forcing the contents of the database buffers out to the database
4. Write the address of the checkpoint record within the log data set into a restart file
Force-writing
Performing a log/DB write is not atomic it involves
moving the O/P record into a buffer in main storage
writing the contents of that buffer out to secondary storage when the buffer is full
Force writing empties the buffer even when it is not full
Checkpoint Record
A list of all active transactions at the time of the checkpoint is recorded
The address within the log of each transactions most recent record log is also recorded
At Restart...
The Recovery Manager obtains the address of the most recent
checkpoint recorded from the restart file locates that record in the log searches through the log from that point
to the end
Thus...
The recovery manager is able to determine both the transactions that need to be undone and the transactions that need to be redo
for the purposes of the restart, transactions can be classified, as in the following example :
Example
Example - discussion
A system failure occurred at Tf the most recent checkpoint is Tc transactions of type T1 were
completed before Tc transactions of type T2 started before
Tc and finished before Tf transactions of type T3 started before
Tc and did not complete by Tf
Example - discussion
Transactions of type T4 started after Tc and completed before Tf
tranasctions of type T5 started after Tc and did not complete before Tf
Example - What to be done ?
At restart, T3 and T5 must be undone T2 and T4 must be redone T1 is not involved since the were
forced onto the output when the checkpoint was taken