database corruption be prepared, not scared. richard banville fellow, openedge development progress...

44
Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

Upload: letitia-may

Post on 17-Dec-2015

283 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

Database Corruption

Be prepared, not scared.

Richard BanvilleFellow, OpenEdge DevelopmentProgress Software

Page 2: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.2

Dealing With Database Corruption

•Prepare for the worst, hope for the best

Preparation

•Stopping corruption before it happens

•Avoiding foolish behavior

Prevention

•Identifying you have a problem

•Pinpointing the cause

Detection

•Resolving corruption with least impact

Reaction

Page 3: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.3

Types Of Corruption

Corruption can be small or widespread

• User based corruption

• Internal system based corruption

Physical

• Block level corruption

• Hardware: Bad disk, memory, etc

Logical

• Missing Data

– Relational issues

• Data access

– Index issues

Page 4: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.4

Be Prepared

Modern Release (all facets of deployment)

Backups – perform regularly

• Backup database AND application

– Perform large backups with split mirrors

– Run online backup with –Bp

• TEST your backups with restore & access or hot stand-by

– prorest – Validates data written successfully (not proper data written)

– prorest –vf: Compares against original, but who wants to be down that long?

• Use offsite storage

Run with AI enabled

• Put AI files on separate disk/separate controller

• AI management tool makes AI management easy

prorest <db> -vp

prorest <db> -vf

Page 5: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.5

Be Really Prepared

Keep hot standby

• Continually roll forward AI files

• OpenEdge Replication

Have a comprehensive recovery strategy

• Audit changes

• Plan for natural disasters

• Plan for not so natural disasters

Document and test your recovery strategy

• Educate at all levels of organization

Implement redundancy

• Failover Clusters

• Have a duplicate remote site

Page 6: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.6

Database Consistency Checking

Index name in customer for recid 16991 could not be deleted.

Wrong key in index 10 for record 2010.

Invalid size of an index entry.

Seen these messages before?

Page 7: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.7

Database Consistency Checking

Invalid RM block for area 10

rmdoins: pbk->free went negative dbkey 4096

bkwrite: bktbl dbk 4096 not equal to bkbuf dbk -1234

bkaddr called with negative blkaddr:  -1234

Or how about these…

Page 8: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.8

Database Consistency Checking

Stop shared memory problems before they happen

Memory overwrite protection

• -MemCheck

• Ensure block changes written to proper shm location

Buffer 1

Buffer 2

Buffer 1

Buffer 2

Insert new key entry

Oops! Miscalculation results in memory stomp of next block header.

Page 9: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.9

Database Consistency Checking

Stop database corruption from becoming persistent

Physical block consistency checking

• -DbCheck

• Validates record and index blocks after each update operation

– -AreaCheck “area name”

– -Index Check “index name”

– -TableCheck ‘table name”

Typically the result of a bug

Available for OLTP and roll forward

Page 10: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.10

Enabling Database Consistency Checking

Database startup parameter (-MemCheck, -DbCheck)

Managed via promon R&D, 4. Admin Functions

• 8. Block level consistency check

Current consistency check status:

1. -MemCheck: enabled

2. -DbCheck: enabled

3. –AreaCheck: disabled

4. -IndexCheck: disabled

5. -TableCheck: disabled

Enter the option to enable/disable a consistency check:

Page 11: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.11

Database Consistency Checking Performance Impact

Memory checking: unnoticeable impact

Block level checking: still reasonable

On error, get .lg file to Progress Technical Support

Current consistency check status:

1. -MemCheck: enabled

2. -DbCheck: enabled

3. –AreaCheck: disabled

4. -IndexCheck: disabled

5. -TableCheck: disabled

Enter the option to enable/disable a consistency check:

~5%

< 1%

Page 12: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.12

Identifying Problem Types and Reacting

There are many ways for data to get corrupted

Identifying corruption type

• Key word association can help direct recovery effort

• Understanding process can also help

Quickest way to recovery

• Knowing the tools & which to use is key

• Practice recovery efforts before needed

Let’s examine a few

Page 13: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.13

Index Issues

• Index• (ix, cx, ky)• Root block

• B-tree• Key entry• Cursor

Index Messaging

• Index <i> in <t> for recid <r> could not be deleted. (1422) 

– Logical corruption: Missing entries or record not found

• Index <i>, block <b>, element no. 1: bad compression size. (4423)

– Physical corruption: Storage format of index is incorrect

How to proceed

Page 14: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.14

Index Validation Tools

• Idxcheck online validation levels

– Physical/Block corruption

o Physical consistency

– Logical/key entry corruption

o Keys to records

o Records to keys

o Validate key order

– Lock table option

• New index rebuild may be faster!

proutil <db> -C idxcheck

Page 15: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.15

Index Validation & Repair Tools

proutil <db> -C idxfix

Select one of the following:

All (a/A) - Fix all the indexes Some (s/S) - Fix only some of the indexesBy Area (r/R) - Fix indexes in selected areasBy Schema (c/C) - Fix indexes by schema ownersBy Table (t/T) - Fix indexes in selected tablesBy Activation (v/V) - Fix selected active or inactive indexes

Index Fix Utility

1. Scan records for missing index entries.2. Scan indexes for invalid index entries.3. Both 1 and 2 above.4. Cross-reference check of multiple indexes for a table.5. Build indexes from existing indexes.6. Delete one record and it's index entries.7. Quit.

Fix indexes on Scan.

Is this correct? (y/n)

Page 16: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.16

Index Validation & Repair Tools

• Online operation

– Transactions are relatively small

– Does not fix physical block corruption

– One concurrent idxfix process per table

proutil <db> -C idxfix

Index Fix Utility

1. Scan records for missing index entries.2. Scan indexes for invalid index entries.3. Both 1 and 2 above.4. Cross-reference check of multiple indexes for a table.5. Build indexes from existing indexes.6. Delete one record and it's index entries.7. Quit.

Page 17: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.17

Using Index Fix: Record but no index entry

proutil <db> -C idxfix

bbbb Field2 Field3 Field4

richb Field2 Field3 Field4

aaaa Field2 Field3 Field4166891669016691

10 11 12

OLTP (.lg and screen):

Index name in customer for recid 16691 could not be deleted.

1. Scan records for missing index entries:

Index 12 (customer, name): couldn't find key <RICHB> recid 16691.

Option #1: Add key entry to index

• 1. Scan records for missing index entries.

• Fix indexes on Scan. Yes

NOTE: 2. Scan indexes for invalid index entries.

• Would NOT report an error!

Page 18: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.18

Find first cust where recid(cust) = 16691. display cust

Using Index Fix: Record but no index entry

proutil <db> -C idxfix

richb Field2 Field3 Field4

bbbb Field2 Field3 Field4

aaaa Field2 Field3 Field4166891669016691

OLTP (.lg and screen):

Index name in customer for recid 16691 could not be deleted.

1. Scan records for missing index entries:

Index 12 (customer, name): couldn't find key <RICHB> recid 16691.

Option #2: Delete record and its key entry in table’s other indexes

6. Delete one record and it's index entries.

• Type the recid to delete: 16691

• Type the area (number) for the recid(s): 8

– Look in the .st file to match area number and area name.

10 11 12

Page 19: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.19

Using Index Fix: Record but no index entry

proutil <db> -C idxfix

Often no runtime error reported.

2. Scan indexes for invalid index entries:

Index 12 (customer, name): found invalid key <RICHB> recid 16691.

Only option: remove invalid key entry

• 2. Scan indexes for invalid index entries

• Fix indexes on Scan. Yes

NOTE: 1. Scan records for missing index entries.

• Would NOT report an error!

10 11 12

Page 20: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.20

Fixing Index Corruption (continued)

Missing key entries or record not found (logical corruption)

• Index fix

• Action based on record removal or index entry insert/delete

Index <i>, block <b>, element no. 1: bad compression size

• Physical b-tree corruption

• Must rebuild index to recover

Page 21: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.21

Index Repair Tools

• Offline utility

• Performance improvements since 10.2b06

• Will repair:

– Index block corruption (physical)

– Orphan index blocks

– Adds missing index entries

o Assumes record data is correct

• Flexible options (db, area, table, index)

• Truncates existing BI file

– Does not record idxbuild changes into BI file

proutil <db> -C idxbuild

Page 22: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.23

Index Build/Repair Tools

• Builds and activates index

• Online

– One concurrent idxactivate process per table

– Requires client schema re-cache

– Transaction size based on “recs” parameter

• Deactivate requires exclusive access

• Repair logical and physical index corruption

• Assumes valid record data

• *** Static queries require recompile to consider new index

proutil <db> -C idxactivate <i1> useindex <i2>

Page 23: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.24

Record Issues

• Record • (rm, bf, rec)• recid

• rowid• field

Record Fragment 1

Field1 Field2 Field3 Field4

Record Messaging

• bffld: nxtfld: scan past last field. (16)

Looking for field #5 but only 4 fields exist

• Record continuation not found, fragment recid <r> area <a>. (10831)

Pointer to next record fragment is invalid

How to proceed

Page 24: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.25

Checking For Inconsistencies Online

• Reads record for statistics purposes

Physical Validation

5. Read or Validate Database Block(s)

• Validation levels

– 0: Block header info only

– 1: Record header & record size

– 2: Record overlap checking

Logical Validation w/schema

3. Record Validation

4. Record Version Validation

proutil <db> -C dbanalys | tabanalys

dbtool <db>

Page 25: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.26

Record Repair Tools

dbtool <db>

proutil <db> -C idxfix

bffld: nxtfld: scan past last field. (16)

• Online and multi-threaded

6. Record Fixup

– Adds missing fields

– Removes invalid “end-rec” indicator

6. Delete one record and it’s index entries

Page 26: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.27

Record Repair Tools

proutil <db> -C dbrpr

Record Fragment 1

Record continuation not found, fragment recid <r> area <a>.

3. Remove Bad Record Fragment

14. Display Record Contents

• Exclusive access

• Truncate bi file

Page 27: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.28

Warning:

The use of dbrpr to fix problems in the database should be done with the assistance

of Progress Technical Support.

More Record Repair Tools

Record continuation not found, fragment recid <r> area <a>

Record Fragment 1

Page 28: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.29

Dbrpr Record Fix-up Example – Last resort

Before you do anything: Validate current backup

Options:

1. Database Scan Menu 2. Test One or More Indexes 3. Remove Bad Record Fragment 4. Dump Block 5. Load Block 6. Copy Bytes Between Files 7. Load RM Dump File 8. Reformat Block to a Free Block 9. Change Current Working Area

10. Display the Free Chain 11. Display the RM Chain 12. Display the Index Delete

Chain 13. Display Block Contents 14. Display Record Contents 15. Display Cluster Chain 16. Scan/Fix block checksum

proutil <db> -C dbrprproutil <db> -C truncate bi

Page 29: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.30

Dbrpr Record Fix-up Example – Last resort

proutil <db> -C dbrprproutil <db> -C truncate bi

1. Report Bad Blocks

3. Fix Bad Blocks

4. Report Bad Records

5. Delete Bad Records

6. Dump Records to RM File

7. Rebuild Free Chain

8. Rebuild RM Chain

9. Rebuild Index Delete Chain

10. Change Current Working Area

11. Fix Cluster Chains in Type II Area

Record continuation not found, fragment recid 16896 area 8 3.

Before you do anything: Validate current backup

Validate bad record info

1. Database Scan Menu

Page 30: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.31

Dbrpr Record Fix-up Example – Last resort

proutil <db> -C dbrprproutil <db> -C truncate bi

Offset Len Hex Ascii

19 1 0x64 d

21 5 0x72696368620 richb

30 0 0 “”

35 2 0x6d61 MA

35 3 0x626262 BBB

Record continuation not found, fragment recid 16896 area 8 3.

Get a view of what you are going to delete:

• 9. Change Current Working Area

• 13. Display Block Contents

– 1. Dump Data Block Details

– 6. Start Dbkey

Delete partial record

• 9. Change Current Working Area

• 3. Remove Bad Record Fragment

Re-validate (see previous screen)

Page 31: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.32

Other Record Oriented Repair Tools

Binary dump

• Online & multi-threaded

• Binary record format

– May not fix individual record corruption

– May fail when encountering physical corruption

– Use selective binary dump to dump in ranges

• -index defaults to primary index

– Use different index if primary cannot be used

– Use –index 0 if no valid index exists (Type II storage area)

proutil <db> -C dump <table> . -index <i>

Page 32: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.33

Other Record Repair Mechanisms

Dump records in “PUB” schema by rowid

Manual Ascii dump and load “repair”

Reload w/bulk load or ABL import

Specify index to use or TABLE-SCAN

DEFINE VARIABLE ix AS INTEGER NO-UNDO.

FIND _file "item".

OUTPUT TO item.d.

DO ix = 1 TO 10000:

FIND item WHERE RECID(item) = ix NO-ERROR.

IF AVAILABLE item AND ix <> INTEGER(_file._template) THEN

EXPORT item.

END.

Make sure Large enough!

Page 33: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.34

Block Issues

• bkio, bk, bm• Dbkey• Block

• Buffer• Area• Extent

Block and shared memory buffer messages

• Wrong dbkey in block. Found <x>, should be <y> in area <z>. (1124)

– Read, write, modify, release

– Most often O/S File System issue

– Reboot often fixes this error – but why?

• bkioWrite:Unknown O/S error during write, errno 2, fd <x>, len <y>, offset <z>, filename <s> database <t>. (14676)

• Attempt to read block <n> which does not exist in area <a>. (201)

– Often index rebuild will fix this error. (rebuild on area level)

Page 34: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.35

Block Repair Tools

proutil <db> -C dbrpr

• Master block: 1• Record block: 3• Free block: 4

• Index block: 5

Checksum validation of dbkey <d> block type 4 in area <a> does not match data. Expected: <e> received <r>. (14410)

1. Report Bad Checksum

2. Fix Bad Checksum

16. Scan/Fix block checksum (Type II Area)

• Ignore for free blocks (block type 4)

• Validate database by other means prior to “fixing”

• True corruption will require a database rebuild

– dump and load

– restore/roll forward

Page 35: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.36

Block Chain Repair Tools

RM RM RM RM RM

RM RM RM RM RM

RM FREE RM RM RM

RM chain count inconsistency.

20 Blocks indicated on record free chain (actually 5)

RM block found not on RM chain, but flagged RM chain.

RM block free chain link error

<type> Block <number> with invalid chain type <number> on RM chain

Free block marked on free chain but linked into RM chain  

Page 36: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.37

Block Chain Repair Tools

proutil <db> -C dbrpr

RM chain count inconsistency.

RM block found not on RM chain, but flagged RM chain.

<name> Block <number> with invalid chain type <number> on RM chain 

1. Database Scan Menu

7. Rebuild Free Chain

8. Rebuild RM Chain

9. Rebuild Index Delete Chain

11. Fix Cluster Chains in Type II Area

• Rebuild free chains/rm chains from dbrpr

• Seek help from support

Page 37: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.38

Recovery Manager Issues

• Recovery (rl)• Redo• Undo• Retry

• Before image• After image• ai, a<n>• bi, b<n>

• Transaction (tm)

Recovery Messages

** The after-image file expected Tue Feb 26 16:47:29 2013. (832)

** Those dates don't match, so you have the wrong copy of one of them. (833)

Undo failed to reproduce the record in area <a> with rowid <r> and return code  -1. (10566)

Invalid block <x> for file <y>.a3, max is 1024 (2329)

How to proceed

• Restore / roll forward

• Switch to hot standby

Page 38: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.39

Recovering From Recovery Failures

I’ve got no backup & crash recovery won’t work?

• Looks further back in BI.

• Should no longer be needed but its worth a try!

**** As a very last resort, force truncate

• What are the side effects of skipping crash recovery?

– -F: How bad could it be?

• Dump and re-load into new database

• Reconcile data contents and relationships after load

• Backup & enable AI

• Maintain hot standby

proutil <db> -C truncate bi -F

proutil <db> -C truncate bi –G 120

Page 39: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.40

Structural Repair

Those dates don't match, so you have the wrong copy of one of them.

• Usually the result of an OS copy or move

• Make sure all right pieces in place & .st file identifies them correctly

• Does NOT repair corrupt database

• Updates path names to those specified in .st file

• Use “sparingly”

• Does NOT repair corrupt database

• Patches date mismatch & creates dummy extents

• Use to recover what ever data remains when no backup exists

prostrct repair <db> <x>.st

prostrct unlock <db> <x>.st

Page 40: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.41

rm x.db - Ooops!

• Rebuild database “control area” (.db file) from .st file

• Changes to control area are not logged

– Cancelling a txn that changes control area may require builddb

• May force re-base for OpenEdge Replication

• Always have an up to date .st file

Structural Repair

prostrct list <db>

prostrct builddb <db> <db>.st

Page 41: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.42

Summary

The many faces of corruption

• Corruption shows itself in many different ways

• Hard and soft corruption

• Memory and disk. Record, index, block and db structure

Some repair tools are a loaded gun

• In the wrong hands they can produce havoc

Preparation is your best way to recovery

• Standard disaster recovery preparations

• Knowing options before problems occur

Page 42: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.43

Questions?

Page 43: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software

October 6–9, 2013 • Boston #PRGS13

www.progress.com/exchange-pug

Special low rate of $495 for PUG Challenge attendees with the code PUGAM

And visit the Progress booth to learn more about the Progress App Dev Challenge!

Page 44: Database Corruption Be prepared, not scared. Richard Banville Fellow, OpenEdge Development Progress Software