ingres backup and recovery -...

67
1 Ingres® Backup and Recovery Bruno Bompar Senior Manager Customer Support

Upload: dangtu

Post on 23-May-2018

232 views

Category:

Documents


2 download

TRANSCRIPT

1

Ingres® Backup and Recovery

Bruno BomparSenior Manager Customer Support

2

Abstract

Proper backup is crucial in any production DBMS installation, and Ingres is no exception. And backups are useless unless you can recover from them. This session explains how Ingres backup and recovery work. We will also cover some ideas on how best to do a regular backup and how to do a save recovery.

3

Agenda

• Why backup and recovery?

• Disaster scenarios

• Ingres features

• Housekeeping

• Customisation

• Issues to Consider

• Tips and cautions

4

Why backup and recovery?

• Insurance

• What if?

• Cost to business

• Critical functionality

• One part of overall process

5

Scenarios to Consider

• System Crash

• Database Corruption

• Lost Table

• Accidental Transaction

6

System Crash

• Automated Recovery

• After a crash Ingres will• Scan the transaction log file• Rollback uncompleted transaction• Apply completed transactions

• Databases will be consistent• Depends on the crash

7

Database Corruption

• Databases can be recovered

• Only if valid Ingres backup is available!

• ckpdb command to backup

• rollforwarddb to recover

8

Backup Mechanisms

• OS backup• invalid unless done with Ingres shut down cleanly• important for backing up Ingres installation, journals,

checkpoints, dumps• useless for backing up databases unless you can

guarantee a clean shutdown

• unloaddb• an archiving or porting tool, not a backup tool• no way to ensure a consistent snapshot without locking out

all users (an "offline" archive)

9

Backup Mechanisms

• In order to get the most out of a backup mechanism, two things are needed:• a way to take a static snapshot of the database without

interfering too greatly with active users• a way to record incremental changes since that static

snapshot

• Ingres does both via checkpoints and journals• a checkpoint is the static backup or snapshot• the journals are the ongoing change records

10

Backup Mechanisms

• Terminology note! Ingres differs from other DBMS's in its use of the word "checkpoint"

• Ingres:• a checkpoint is a backup snapshot• a consistency point (CP) is a buffer and log flush

• Other DBMS's:• a checkpoint means a buffer flush• a backup is just called a backup

11

Database Checkpoints

• Backup the whole database

• Online or Offline

• Enable / Disable journaling

• Can be performed in parallel

• Written to• Tape• Disk

• Don’t forget iidbdb!!

12

Online versus Offline

• Offline• Requires exclusive access to database

• Online• Users carry on working• No DDL statements• Slower than offline• Can cause transaction log file to fill

13

Online Checkpointing

• An online checkpoint (the ckpdb command) has three phases:• quiescing the database• file copying with change logging• completion recording

14

Online Checkpointing

15

Online Checkpointing

16

Online Checkpointing

• File copying is controlled by the checkpoint template (cktmpl.def)• can be modified by Ingres administrator• change copy command, add file compression, etc• amazing things are possible

• DML allowed during file copying• but not DDL - no file creation/deletion

• Changes during file copying are specially logged• before-images sent to dump files

17

Checkpointing

• After copying is complete, the checkpoint success or failure is recorded in the database config file• aaaaaaaa.cnf• another copy left in cnnnnnnn.dmp in dump location• note that the checkpoint itself does not contain a record of

the checkpoint completion

• Config file records last N checkpoint attempts• successful or not• N = 99 for recent releases of Ingres• N = 16 for older versions (2.0 and older)

18

Online Checkpointing

• When it's all over, you have• one or more checkpoint files (one for each data location)

• in disk checkpoint area, or on tape• zero or more dump files containing changes made while

file-copying• an updated database config file

• plus an updated copy in the dump location• a new set of journal files

• a fresh journal file is started at the end of the database quiescent phase

19

Checkpointing

• What to save after the checkpoint completes:• the checkpoint and dump locations

• you need both• infodb output (human readable listing of the database

config file)• output of: select * from iifile_info

• for manual table level recovery and emergencies• optional but recommended

20

Journals

• Audit trail of all changes made to selected tables• written in batches by the archiver (dmfacp)

• Default for tables is journaling ON• journaling also needs to be enabled for the database using

ckpdb +j• this is an offline checkpoint; no users allowed

• Journal files grow to a target size, then a new one is started• current expected size and sequence number is stored in the

database config file• each checkpoint starts a fresh set of journal files

21

Database Checkpoint - Examples

• Command line• Online checkpoint

ckpdb dbname• Offline checkpoint – enabling journaling

ckpdb +j dbname ’#m3’• Offline checkpoint – disabling journaling

ckpdb -j dbname

22

Database Checkpoint - Examples

• Visual DBA

23

Recovery

• Recovery is a two step process• one command (rollforwarddb) with two distinct phases

• First, restore the database to a point in time (a checkpoint)

• Second, replay journals• optional• all journals, or stop at a given time

24

Recovery

25

Recovery

26

Recovery

27

Recovery

• The database must exist before it can be recovered

• All required data locations must exist

• A valid config file must be available• recovery looks in the data location first, then the dump

location• config file is renamed to aaaaaaaa.rfc

• The last checkpoint must be valid• can ask for an earlier checkpoint with #cn option

28

When Recovery Is Needed

• Stay calm!• you have practiced recovery, right?• haste makes mistakes• turn off the mobile phone, pager, etc• the database will be ready when it's ready

• Save your current database config• ideally, make a copy of the dump location and the data location

aaaaaaaa.cnf• as a minimum save aaaaaaaa.cnf• allows you to try again if something goes wrong• if you have time, save everything in sight

29

Database Recovery

• Point in time recovery• Last checkpoint only• Last checkpoint + 10 hours work• 5 checkpoints ago

• Based on available files

30

Database Recovery - Examples

• Command Line• Last checkpoint only, no journals

Rollforwarddb +c –j dbname• Last checkpoint, journals to 12:32 on 10/05/02

Rollforwarddb +c +j dbname –e10-may-2002:12:32:00

31

Database Recovery - Examples

• Visual DBA– Last checkpoint

only, no journals

32

Database Recovery - Examples

• Visual DBA– Last checkpoint,

journals to 12:32 on 10/05/02

33

Recovery Scenarios

• Data area is lost• shut down Ingres if it's not down• restore data directories with db config file• restart Ingres

• transaction log contents can be moved to journals only if a valid config file is available!

• rollforwarddb• up-to-the-minute recovery should be possible

34

Recovery Scenarios

• Transaction log is lost• wasn't it mirrored?• recreate transaction log• rollforwarddb• most recent transactions not moved to journals will be lost

35

Recovery Scenarios

• Checkpoint or dump location is lost• recreate location directories• take fresh checkpoint• loss of checkpoint area should not affect running database

36

Recovery Scenarios

• Journal location is lost• installation will continue to run until transaction log fills up• recreate journal directory• alterdb -disable_journaling to halt journaling• restart archiver which will have stopped due to inability to

write journals• ckpdb +j to restart journaling

37

Recovery Scenarios

• Software or human error is discovered

• If mistake is discovered immediately:• crash/restart Ingres, or remove all user sessions• rollforwarddb with -e option to replay journals, stopping

short of the time of mistake

• If mistake isn't discovered until later, recovery is more complicated• Ingres Journal Analyzer (IJA) can help

38

Accidental Transaction

• AuditDB• Filter against

• Table• Users• Time

• Scan Journal files• Generate SQL• Execute

39

Accidental Transaction

• Ingres Journal Analyzer• Auditdb with Knobs on…• Connect to remote servers• Force Log Flush• Point and Click

40

Accidental Transaction

41

Accidental Transaction

42

43

Recovery Scenarios

• Disaster

• Use OS backups to restore Ingres system directories, all data, work, checkpoint, dump, journal directories

• rollforwarddb iidbdb• you have been checkpointing iidbdb, right?• restores users, locations, database privileges, etc

• rollforwarddb databases

44

Recovery Scenarios

• Rollforwarddb failure• restore the config or dump info you saved before

attempting rollforwarddb• rename aaaaaaaa.rfc back to aaaaaaaa.cnf if it exists• cure any other rollforwarddb complaints• try again

• Last checkpoint didn't work• use ckpdb #cn to restore an older one• you do have more than one checkpoint around, right?

45

Lost Table

• Table can be recovered

• From table checkpoint only

• Enforce logical consistency

• Journaling must be enabled

46

Table Checkpoints - Examples

• Command line• Checkpoint table t1

ckpdb dbname –table=t1• Checkpoint table t1 and t2

ckpdb dbname –table=t1,t2

47

Table Recovery - Examples

• From table checkpoint only

• Command line• Recover table t1

rollforwarddb dbname –table=t1• Recover table t1 and t2

rollforwarddb dbname –table=t1,t2

48

Housekeeping Ingres

• Infodb

• Checkpoints

• Dumps

• Journals

49

Infodb / aaaaaaaa.cnf

• Shows meta-data about database• Locations• Checkpoint sequence

• Valid / Invalid• Dump / Journal sequence• Counters

• Last table id• Last valid checkpoint

50

Infodb / aaaaaaaa.cnf

• Info stored in aaaaaaaa.cnf

• Three copies• Primary database location• Dump location as aaaaaaaa.cnf• Dump location as cxxxx.dmp

• Infodb reads CNF file in database area

• Copy to dump area with every change• II_DUMP• database own dump area

51

Checkpoint files

• Stored in 1 location• II_CHECKPOINT• Database defined checkpoint area

• One file for each location

• Format depends on archiver used

52

Dump files

• Changes during ONLINE checkpoint

• Required for recovery

• Single location• II_DUMP• Database defined dump area

53

Journal Files

• Record of changes• Table configuration

• Facilitates point in time recovery

• Files stored in single location• II_JOURNAL• Database defined journal area

54

Backing up the backup files

• OFFLINE Checkpoint• Database aaaaaaaa.cnf• Dump aaaaaaaa.cnf• Output from infodb• Checkpoint• Journals

• ONLINE Checkpoint• All above• Dump files

55

Cleaning up

• ckpdb –d• All but the last checkpoint• Dump, journal files deleted as well

• alterdb –delete_oldest_ckp• Oldest checkpoint only• Maintain set of checkpoints• Dump, journal files deleted as well

56

Customisation• cktmpl.def

• $II_SYSTEM/ingres/files

• Defines actions• Before / During / After• Tape• Disk

• II_CKTMPL_FILE• ingsetenv only

• Most common entries to change:• WSDD: work phase of regular checkpoint• WRDD: work phase of regular rollforward

• Some things you can do:• add compression/decompression• use a different utility (eg star instead of tar)• wild and crazy stuff

• Test both checkpoint and restore after modifying the template

57

Issues To Consider

• Files• Ingres supports large files• OS archiver utility may not

• POSIX standard• tar• cpio

58

Tips and Cautions

• Hardware "solutions" aren't solutions• "I don’t need to backup, I have magic solution of the

moment"• RAID 5, mirroring, whatever• you aren't protected against software failures• you aren't protected against human failures• you aren't protected against disasters• you may not be protected against multiple hardware

failures• you are putting all your eggs in one basket

59

Tips and Cautions

• Backups are no good if they don't work• make sure that ckpdb works• automatic verification is better than manual verification

• not ensuring that checkpoints are working may be the #1 cause of recovery failure

• Automate as much as possible• error checking• disk space checking• old-checkpoint deletion

60

Tips and Cautions

• A choice of checkpoints is better than just one• avoid ckpdb -d (delete all prior checkpoints)• alterdb -delete_oldest_ckp is better• manual (or scripted) deletion of old checkpoints is often best

• maintains checkpoint history in the config file

• Keep as many checkpoints as you can• gives you more recovery options• don't skimp on checkpoint disk space (disks are cheap!)• you can delete checkpoints but keep journals• it's all on OS backups, right??

61

Tips and Cautions

• Be wary of checkpointing to tape• nasty, unreliable devices they are• "oops, there wasn't a tape in the drive"• if you must use tape, verify your backups regularly

• tape drives have been known to write unreadable tapes

• Keep checkpoint and dump locations together• on the same file system or drive• keep them on the same OS backup schedule• checkpoints are worthless without the dump info

62

Tips and Cautions

• Practice is essential• not just once, but regularly• practice on look-alike installation if production is not

available• practice on production at least occasionally

• clean Ingres shutdown• OS backup everything in sight• verify the OS backup, then run your recovery tests

• you need hardware resources to support your recovery practice

63

Tips and Cautions

• Document your recovery procedures• let someone else do a trial recovery• keep the procedures up to date• make sure that more than one person knows how to do a

recovery

• make sure that more than one person knows where to find the documentation

• – keep a copy offsite or in a safe place

64

Tips and Cautions

• Backing up and archiving are different• a backup has a short useful lifetime• an archive (unload) is good indefinitely

• Backup planning and disaster recovery planning are different• recoverable backups are just one aspect of a complete

disaster recovery plan

65

More Information

• Ingres DBA guide• Chapter 15 (2.6)

• Ingres Command Reference Guide

• Compressed Checkpoints• Servicedesk Doc ID 409751

66

Summary

• Backups deserve more than lip service

• Ensuring 100% recoverable backups takes time, effort, and money

• Ingres checkpoint and rollforward capabilities are simple yet powerful and customisable

• With proper practice and procedures, a recovery is nothing to be afraid of

67

Questions & Answers

?