firebird anti-corruption approach

34
ANTI-CORRUPTION How to prevent Firebird database corruption Alexey Kovyazin, IBSurgeon, [email protected]

Upload: nataly-polyanskaya

Post on 18-Dec-2014

3.892 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Firebird Anti-Corruption Approach

ANTI-CORRUPTIONHow to prevent Firebird database corruption

Alexey Kovyazin,IBSurgeon,

[email protected]

Page 2: Firebird Anti-Corruption Approach

IBSurgeon – 8 years!

• Products• IBFirstAID/FBFirstAID, etc• FBScanner• FBDataGuard

• Clients• Carl Zeiss Meditec, USA• Vneshtorgbank, Russia• Wells Fargo Bank, USA• Watermark Software, UK• Bas-X, Australia• Victoria University, New

Zealand• Kingsway Management, UK

• Team• Dmitry Kuzmenko• Alexey Kovyazin• Sergey Nikitin• Oleg Mateveev & team

• Consultants• Dmitry Yemanov, Vlad

Khorsun, Alex Peshkoff

• Partners• IBPhoenix

Page 3: Firebird Anti-Corruption Approach

Alexey Kovyazin

Borland Russia CodeGear Embarcadero Microsoft IBSurgeon

Hosting & cloud

partners

In 2007 we sold 3mln of Delphi to all Russian

schools

2006 2008 2009 2010

Yes, it’s me! http://ru.linkedin.com/in/kovyazin

Page 4: Firebird Anti-Corruption Approach

Agenda• Why bother?• Why corruption happens?

• Reasons• Symptoms

• What things are to monitor to recognize problem?• Problems with server • Problems with environment• Problems with database

• Maintenance improvements to prevent corruptions.• Backups

• Why we created FBDataGuad?

Page 5: Firebird Anti-Corruption Approach

Why bother?• Firebird databases become bigger and bigger every year

• Information inside Firebird can costs $XXXXXX• Outage (corruption, backup/restore breaks) can costs $XXXXX too

• Real-world examples• Bas-X, Australia – Firebird 2.x, 250Gb, no BLOBs, 250 users• Watermark Software, UK – Firebird 2.x, up to 400Gb, with BLOBs • Profitmed, Russia, medical distribution, Firebird 1.5, 65Gb, 250 users

• 1 Terabyte Firebird 2.1 database• http://www.ib-aid.com/articles/item104 • 3.8Billions of records in the biggest table

Page 6: Firebird Anti-Corruption Approach

Corruption reasons & symptons

•Reasons• Misadministration• Hardware failures• Bugs

• The actual reason often remains undiscovered

• Symptoms• Repeatable• Trackable• Complimentary

We can prevent corruption if see its symptoms.

Page 7: Firebird Anti-Corruption Approach

DatabaseFirebird

Server

Backups

Copy of backups

Sample Firebird environment

Page 8: Firebird Anti-Corruption Approach

What to monitor at Firebird instance level

General parameters

Server version

Logs

How much RAM?

Temp files?

Is Server online?

# Mb

Records to analyze 6 levels

Is it recommended?

Bugs, issues

Size of logs

Page 9: Firebird Anti-Corruption Approach

Firebird instance key parameters

Need to watch for 7 key parameters which can indicate possible or actual problems

1. Server availability

2. Consumed RAM

3. Temp files size

4. Temp files quantity

5. Records in logs

6. Logs’ size

7. Server version related issues

Page 10: Firebird Anti-Corruption Approach

What to monitor-1

• General database checks• Database availability -> Outages, firewalls, stability• Log records related with database in firebird.log -> early

symptoms• Check metadata – validate all metadata -> early showings

• Transactions• Transaction markers monitoring (garbage problems)• Limit (2 billions between backup/restore)

• Users• Min/max/avg users –> peaks problems, design of application

Page 11: Firebird Anti-Corruption Approach

What to monitor-2

• Database files• Single volume and multi-volume -> Volumes in bin• Paths – where to stored (not at the same drive with temp files and

backups!)• Sizes and growth limits -> Warnings about growth

• Delta-files (nbackup)• Life-time and sizes -> Huge/aged delta problems

• Backup files• Existence, sizes and growth limits -> Backup could kill database

Page 12: Firebird Anti-Corruption Approach

What to monitor-3

• Number of formats per table• No more than 255 -> corruption• Less formats in production -> performance problem

• Non-activated and deactivated indices• Deactivated – explicitly deactivated (why deactivated?)• Non-activated – indicates problems during restore

Page 13: Firebird Anti-Corruption Approach

What to monitor-4

• Periodical statistics (gstat) -> deep look into database• Firebird server version

• Examples - problems with nbackup• Latest patches are recommended

• Firebird fbclient.dll version• If fbclient.dll <> fbserver - > Problems (disconnects, 10054, errors)

• Firebird installation size• Default database place is %Firebird%\Bin

• Firebird logs size and paths• Big logs quickly exhaust space -> corruption

Page 14: Firebird Anti-Corruption Approach

Maintenance-1

• Backups• Revolver (days, week, month

copies) backups• Backup depth• Checking restore (need to check

results)• Growth prognosis (if not enough

space, backup should be canceled)• Control backup time (too long

backup indicates problems)

Today

Yesterday

Weekly

5..7

Page 15: Firebird Anti-Corruption Approach

Big database requires individual maintenance plan • Maintenance plan depends on size of database and work

mode (8x5, 24x7)• Backups scheme is not simple• Perform test restores separately• To be checked

• Errors – in firebird.log and run error checking quries on live database

• Metadata – check integrity • Metadata limits• Data & BLOBs – walk through data, check segmentation• Indices – check indices health• Transactions – any gaps, garbage growth, other problems

Page 16: Firebird Anti-Corruption Approach

Everyday minimal (!) maintenance plan for big database

Database is online?

Backup done Ok?

Checking restore done

Ok?

Errors in firebird.log?

All Indices Ok?

All Indices statistics done?

Enough space for database?

Enough space for backups?

Transaction markers Ok?

Page 17: Firebird Anti-Corruption Approach

Example of backup plan for big Firebird database

Firebird database

Nbackup copy Gbak -b

Checking restore

And each step should be confirmed and reported.

Maintenance serverMain server

Page 18: Firebird Anti-Corruption Approach

Maintenance-2

• Indices• Recalculate indices statistics -> Performance

• Selected or excluded

• Check index status – active/in-active/non-activated -> Problems, corruptions

• Check physical index health• Early showings of corruptions

Page 19: Firebird Anti-Corruption Approach

Maintenance-3

• Validate database with gfix• Don’t forget to shutdown database• Analysis (including firebird.log)

• Metadata validation• Check important system tables

• Firebird.log maintenance• When log becomes very big, copy it to backup log files

•And some more things….

Page 20: Firebird Anti-Corruption Approach

And this is not enough!

•Business wants to have warranty - even if hardware fails data should be recovered!

Page 21: Firebird Anti-Corruption Approach

A big job• Implement scripts• Check them in the test environment• Explore errors messages and codes of Firebird

• We spent 6 years getting the necessary information…

Page 22: Firebird Anti-Corruption Approach

THAT’S WHY WE CREATED FBDATAGUARD

Page 23: Firebird Anti-Corruption Approach

FBDataGuard does all above things…• Watches database files, volumes, deltas, performs and checks backups in the right way

• Verifies metadata, data and indices• Watches for errors, limits and wrong versions• Sends alerts and recommendations

Page 24: Firebird Anti-Corruption Approach

Example with TEMP

FBDataGuard found the temp files size = N

Free space at TEMP- locations

= M

Not enough space

M – N<X

Not enough space –

administrator will have alert and

recommendation to increase

TEMP

Page 25: Firebird Anti-Corruption Approach

Example alert

Too big temporary files

Total size of all temporary files 3 Gb is more than recommended: 500 Mb

Firebird creates temporary files for some SQL queries (PLAN SORT). Too big size of temporary files can indicate performance problems. This is not a strictly defined number, so this threshold depends on particular database and application.

Page 26: Firebird Anti-Corruption Approach

Index problem example

FBDataGuard found non-

activated index after restore

non-activated indices usually

indicates corruption (missed

Foreign Keys)

Administrator will get alert and

recommendation to check indices

Possible perfomance

problem prevented!

Page 27: Firebird Anti-Corruption Approach

Example of backup problem resolution

FBDataGuard found free space at backups’ disk =

N

FBDataGuard found the

backup size = M Not

enough spaceM>=N

Backup cancelled, database status is

set to Critical, administrator got

alert

Corruption of backup was prevented!

Page 28: Firebird Anti-Corruption Approach

Example of backup problem alert

• Job backup@[ server-0000 / db-0000 ] malfunction• Unexpected job backup@[ server-0000 / db-0000 ] error:

There is not enough space on the disk

Page 29: Firebird Anti-Corruption Approach

Example of good backup notification

Page 30: Firebird Anti-Corruption Approach

Hardware and UNDELETE failures

•HDD corruption•Flash-drive corruption•UNDELETE problem

Page 31: Firebird Anti-Corruption Approach

And even more – protects from hardware failures

BLOBs

Tables data

Metadata repository FBDataGuard Extractor extracts

data from corrupted

database and inserts to the

new

New DB

Page 32: Firebird Anti-Corruption Approach

Firebird DataGuard

• Watch for 26 important database and server parameters

• Alerts for potential and real problem by email• Proper automation of database maintenance• Windows, Linux, MacOS, Firebird 1.5-2.1 (not 2.5 yet)

• Special licensing for ISV (Independent Software Vendors) Firebird developers

Page 33: Firebird Anti-Corruption Approach

Get FBDataGuard 1 year

• Free 1 year license for all attendees• Send request to [email protected]

Page 34: Firebird Anti-Corruption Approach

Thank you• Questions? • [email protected]