firebird anti-corruption approach
DESCRIPTION
TRANSCRIPT
ANTI-CORRUPTIONHow to prevent Firebird database corruption
Alexey Kovyazin,IBSurgeon,
IBSurgeon – 8 years!
• Products• IBFirstAID/FBFirstAID, etc• FBScanner• FBDataGuard
• Clients• Carl Zeiss Meditec, USA• Vneshtorgbank, Russia• Wells Fargo Bank, USA• Watermark Software, UK• Bas-X, Australia• Victoria University, New
Zealand• Kingsway Management, UK
• Team• Dmitry Kuzmenko• Alexey Kovyazin• Sergey Nikitin• Oleg Mateveev & team
• Consultants• Dmitry Yemanov, Vlad
Khorsun, Alex Peshkoff
• Partners• IBPhoenix
Alexey Kovyazin
Borland Russia CodeGear Embarcadero Microsoft IBSurgeon
Hosting & cloud
partners
In 2007 we sold 3mln of Delphi to all Russian
schools
2006 2008 2009 2010
Yes, it’s me! http://ru.linkedin.com/in/kovyazin
Agenda• Why bother?• Why corruption happens?
• Reasons• Symptoms
• What things are to monitor to recognize problem?• Problems with server • Problems with environment• Problems with database
• Maintenance improvements to prevent corruptions.• Backups
• Why we created FBDataGuad?
Why bother?• Firebird databases become bigger and bigger every year
• Information inside Firebird can costs $XXXXXX• Outage (corruption, backup/restore breaks) can costs $XXXXX too
• Real-world examples• Bas-X, Australia – Firebird 2.x, 250Gb, no BLOBs, 250 users• Watermark Software, UK – Firebird 2.x, up to 400Gb, with BLOBs • Profitmed, Russia, medical distribution, Firebird 1.5, 65Gb, 250 users
• 1 Terabyte Firebird 2.1 database• http://www.ib-aid.com/articles/item104 • 3.8Billions of records in the biggest table
Corruption reasons & symptons
•Reasons• Misadministration• Hardware failures• Bugs
• The actual reason often remains undiscovered
• Symptoms• Repeatable• Trackable• Complimentary
We can prevent corruption if see its symptoms.
DatabaseFirebird
Server
Backups
Copy of backups
Sample Firebird environment
What to monitor at Firebird instance level
General parameters
Server version
Logs
How much RAM?
Temp files?
Is Server online?
# Mb
Records to analyze 6 levels
Is it recommended?
Bugs, issues
Size of logs
Firebird instance key parameters
Need to watch for 7 key parameters which can indicate possible or actual problems
1. Server availability
2. Consumed RAM
3. Temp files size
4. Temp files quantity
5. Records in logs
6. Logs’ size
7. Server version related issues
What to monitor-1
• General database checks• Database availability -> Outages, firewalls, stability• Log records related with database in firebird.log -> early
symptoms• Check metadata – validate all metadata -> early showings
• Transactions• Transaction markers monitoring (garbage problems)• Limit (2 billions between backup/restore)
• Users• Min/max/avg users –> peaks problems, design of application
What to monitor-2
• Database files• Single volume and multi-volume -> Volumes in bin• Paths – where to stored (not at the same drive with temp files and
backups!)• Sizes and growth limits -> Warnings about growth
• Delta-files (nbackup)• Life-time and sizes -> Huge/aged delta problems
• Backup files• Existence, sizes and growth limits -> Backup could kill database
What to monitor-3
• Number of formats per table• No more than 255 -> corruption• Less formats in production -> performance problem
• Non-activated and deactivated indices• Deactivated – explicitly deactivated (why deactivated?)• Non-activated – indicates problems during restore
What to monitor-4
• Periodical statistics (gstat) -> deep look into database• Firebird server version
• Examples - problems with nbackup• Latest patches are recommended
• Firebird fbclient.dll version• If fbclient.dll <> fbserver - > Problems (disconnects, 10054, errors)
• Firebird installation size• Default database place is %Firebird%\Bin
• Firebird logs size and paths• Big logs quickly exhaust space -> corruption
Maintenance-1
• Backups• Revolver (days, week, month
copies) backups• Backup depth• Checking restore (need to check
results)• Growth prognosis (if not enough
space, backup should be canceled)• Control backup time (too long
backup indicates problems)
Today
Yesterday
Weekly
5..7
Big database requires individual maintenance plan • Maintenance plan depends on size of database and work
mode (8x5, 24x7)• Backups scheme is not simple• Perform test restores separately• To be checked
• Errors – in firebird.log and run error checking quries on live database
• Metadata – check integrity • Metadata limits• Data & BLOBs – walk through data, check segmentation• Indices – check indices health• Transactions – any gaps, garbage growth, other problems
Everyday minimal (!) maintenance plan for big database
Database is online?
Backup done Ok?
Checking restore done
Ok?
Errors in firebird.log?
All Indices Ok?
All Indices statistics done?
Enough space for database?
Enough space for backups?
Transaction markers Ok?
Example of backup plan for big Firebird database
Firebird database
Nbackup copy Gbak -b
Checking restore
And each step should be confirmed and reported.
Maintenance serverMain server
Maintenance-2
• Indices• Recalculate indices statistics -> Performance
• Selected or excluded
• Check index status – active/in-active/non-activated -> Problems, corruptions
• Check physical index health• Early showings of corruptions
Maintenance-3
• Validate database with gfix• Don’t forget to shutdown database• Analysis (including firebird.log)
• Metadata validation• Check important system tables
• Firebird.log maintenance• When log becomes very big, copy it to backup log files
•And some more things….
And this is not enough!
•Business wants to have warranty - even if hardware fails data should be recovered!
A big job• Implement scripts• Check them in the test environment• Explore errors messages and codes of Firebird
• We spent 6 years getting the necessary information…
THAT’S WHY WE CREATED FBDATAGUARD
FBDataGuard does all above things…• Watches database files, volumes, deltas, performs and checks backups in the right way
• Verifies metadata, data and indices• Watches for errors, limits and wrong versions• Sends alerts and recommendations
Example with TEMP
FBDataGuard found the temp files size = N
Free space at TEMP- locations
= M
Not enough space
M – N<X
Not enough space –
administrator will have alert and
recommendation to increase
TEMP
Example alert
Too big temporary files
Total size of all temporary files 3 Gb is more than recommended: 500 Mb
Firebird creates temporary files for some SQL queries (PLAN SORT). Too big size of temporary files can indicate performance problems. This is not a strictly defined number, so this threshold depends on particular database and application.
Index problem example
FBDataGuard found non-
activated index after restore
non-activated indices usually
indicates corruption (missed
Foreign Keys)
Administrator will get alert and
recommendation to check indices
Possible perfomance
problem prevented!
Example of backup problem resolution
FBDataGuard found free space at backups’ disk =
N
FBDataGuard found the
backup size = M Not
enough spaceM>=N
Backup cancelled, database status is
set to Critical, administrator got
alert
Corruption of backup was prevented!
Example of backup problem alert
• Job backup@[ server-0000 / db-0000 ] malfunction• Unexpected job backup@[ server-0000 / db-0000 ] error:
There is not enough space on the disk
Example of good backup notification
Hardware and UNDELETE failures
•HDD corruption•Flash-drive corruption•UNDELETE problem
And even more – protects from hardware failures
BLOBs
Tables data
Metadata repository FBDataGuard Extractor extracts
data from corrupted
database and inserts to the
new
New DB
Firebird DataGuard
• Watch for 26 important database and server parameters
• Alerts for potential and real problem by email• Proper automation of database maintenance• Windows, Linux, MacOS, Firebird 1.5-2.1 (not 2.5 yet)
• Special licensing for ISV (Independent Software Vendors) Firebird developers
Get FBDataGuard 1 year
• Free 1 year license for all attendees• Send request to [email protected]
Thank you• Questions? • [email protected]