Download - Databases: Backup and Recovery

Database Recovery

Recovery

Nothing works 100% of the time => procedures are required for failure recovery

Recovering in a DBS works as follows, if the current state is incorrect (or suspect) the DB must be restored to a state known to be correct

Causes of Failure (I)

Programming errors - in the application in the O.S. in the DBS

Hardware errors - on the device on the channel on the CPU

Causes of Failure (II)

Operator errors (e.g. mounting wrong tape)

Fluctuations in the power supply

Fire, sabotage, etc.

Underlying Principle : REDUNDANCY

Ensure that a given piece of information can be reconstructed from some other information stored (redundantly) in the system.

So, periodically (e.g. once a day), the entire DB is copied (dumped) to archive storage.

For each change made, a record containing the old and new values are written to a log

Recovering from Failure

If failure occurs, there are two possibilities if the DB is damaged, e.g. disk head crash, the

DB can be restored by loading the archive copy and using the log to redo all changes since the copy was made

If the DB is O.K., but the contents are unreliable (prog crashing in mid-execution), the log is used to undo all unreliable changes.

Duplexing

Maintain 2 copies of the DB, and apply all updates to both simultaneously,

Advantage increased reliablity

Disadvantages requires double the storage log still need to undo changes 2 copies need independent “failure modes” (e.g.

on different channels)

Transactions (I)

A unit of work consisting of the execution of an application specific sequence of operations, ending with a COMMIT or ROLLBACK operation

Transactions cannot be nested all recoverable operations must be

done within the bounds of a transaction

Transactions (II)

Transaction are an all-or-nothing proposition and must be executed only once

Recovery manager is responsible for ensuring the reliability of the recovery techniques

On-Line Log

Log can be very large (200Megs not unusual) => not wise to keep log permanently on-line

Possible approach is to maintain an active log as a direct access data set when this data set is full, Log manager switches

to a second data set and dumps the first to archive storage

Since automatic rollback is sometimes required, all active transactions must reside in active log

Active log

To active safety the second data set must be opened as soon as the first data set is 95% full; thereafter, all the new transactions are written to the second data set, while transactions active at the time of the switch continue to be written to the first data set

UNDO Logic

Rollback itself can be subject to failure

=> UNDO(UNDO(UNDO….(X))))

= UNDO (X)

Log Compression Archive version of the log does not have to

be identical to the on-line version Convenient to use data compression

techniques so, unnecessary to retain log records for

transactions that have failed to COMMIT, since they are rolledback.

Changes can be consolidated, only final value of data item need be stored.

System Failures

System failure is any event which causes the system to stop and thus require a subsequent system restart

DB is not damaged, but all of the transactions active at the time of the failure must be rolled back

The Recovery Manager

How does the RECOVERY MANAGER know which transactions were active ?

Could go through the log searching for all BEGIN TRANSACTIONs with no corresponding COMMIT/ROLLBACK

But, time can be saved using CHECKPOINTS...

Checkpoints Checkpoints - the 4 step process

1. Forcing the content of the log buffers out to the log data set (forcing any log records the are still in main storage out to the actual log)

2. Forcing a “Checkpoint record” out to the log data set

3. Forcing the contents of the database buffers out to the database

4. Write the address of the checkpoint record within the log data set into a restart file

Force-writing

Performing a log/DB write is not atomic it involves

moving the O/P record into a buffer in main storage

writing the contents of that buffer out to secondary storage when the buffer is full

Force writing empties the buffer even when it is not full

Checkpoint Record

A list of all active transactions at the time of the checkpoint is recorded

The address within the log of each transactions most recent record log is also recorded

At Restart...

The Recovery Manager obtains the address of the most recent

checkpoint recorded from the restart file locates that record in the log searches through the log from that point

to the end

Thus...

The recovery manager is able to determine both the transactions that need to be undone and the transactions that need to be redo

for the purposes of the restart, transactions can be classified, as in the following example :

Example

Example - discussion

A system failure occurred at Tf the most recent checkpoint is Tc transactions of type T1 were

completed before Tc transactions of type T2 started before

Tc and finished before Tf transactions of type T3 started before

Tc and did not complete by Tf

Example - discussion

Transactions of type T4 started after Tc and completed before Tf

tranasctions of type T5 started after Tc and did not complete before Tf

Example - What to be done ?

At restart, T3 and T5 must be undone T2 and T4 must be redone T1 is not involved since the were

forced onto the output when the checkpoint was taken

Download - Databases: Backup and Recovery

Top Related