oracle berkeley db - transactional data storage (tds) tutorial
DESCRIPTION
TRANSCRIPT
![Page 1: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/1.jpg)
11
A Use-Case Based Tutorial
Oracle Berkeley DB Transactional Data Store
![Page 2: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/2.jpg)
2
Part III: TransactionalData Store
• Overview• What is TDS?• When is TDS appropriate?
• Case Studies• Web services
![Page 3: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/3.jpg)
3
What is TDS?
• Recoverable data management: recovery from application or system failure.
• Ability to group operations into transactions: all operations in transaction either succeed or fail.
• Concurrency control: multiple readers and writers operate on the database simultaneously.
![Page 4: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/4.jpg)
4
Transactions Provide ACID
• Atomic: multiple operations appear as a single operation• all or none of the operations applied
• Consistent: no access sees a partially completed transaction
• Isolated: system behaves as if there is no concurrency
• Durable: modifications persist after failureApplications can relax ACID for performance
![Page 5: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/5.jpg)
5
Transactions
• All operations can be transactionally protected• Database create, remove, rename• Key/Data pair get, put, update
• Why should you care?• Never lose data after system or application failure (durability)• Group multiple updates into a single operation (atomicity)• Roll back changes (abort transaction if something odd
happens)• Hide changes from other users until completed (consistency)• Changes made in other transactions while your transaction is
underway aren’t visible unless you want to see them (isolation)
![Page 6: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/6.jpg)
6
Transaction Terminology
• Thread of control• Process or true thread
• Free-threaded• Object protected for simultaneous access by multiple threads
• Deadlock• Two or more threads of control request mutually exclusive locks• Blocked threads cannot proceed until a thread releases its locks• They are stuck forever.
• Transaction• One or more operations grouped into a single unit of work
![Page 7: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/7.jpg)
7
Transaction Terminology
• Transaction abort• The unit of work is backed out/rolled back/undone
• Transaction commit• The unit of work is permanently done
• System or application failure• Unexpected application exit, for whatever reason
• Recovery• Making databases consistent after failure so they can be
used again• Must fix data and metadata
![Page 8: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/8.jpg)
8
Transactional APIs (1)
• New flags to DB_ENV->open• DB_INIT_TXN, DB_INIT_LOCK, DB_INIT_LOG,
DB_RECOVER, DB_RECOVER_FATAL
• New configuration options• DB_ENV->set_tx_max: max number of concurrent transactions• DB_ENV->set_lk_detect: set deadlock resolution policy• DB_ENV->set_tx_timestamp: recover to a timestamp• DB_ENV->set_timeout: transaction timeout
![Page 9: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/9.jpg)
9
Transactional APIs (2)
• Transaction calls• DB_ENV->txn_begin• DB_TXN->commit• DB_TXN->abort• DB_TXN->prepare
• New error returns• DB_RUNRECOVERY• DB_DEADLOCK
• Utility functions• DB_ENV->txn_checkpoint• DB_ENV->lk_detect• DB_ENV->log_archive
![Page 10: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/10.jpg)
10
What goes in a transaction?
• All updates must be transaction-protected.• DB->put, DBC->c_put• DBC->c_del, DBC->c_del
• All databases that will be accessed inside transactions must be opened/created in a transaction.
• File system operations• DB_ENV->dbremove• DB_ENV->dbrename
![Page 11: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/11.jpg)
11
What About Reads?
• Read operations can go in transactions• May not always have to
• A read that is a part of read-modify-write should be transaction protected.
• Reads that must be consistent should be transaction-protected.• Other reads might use weaker semantics
• DB_READ_COMMITTED: never see uncommitted data• DB_READ_UNCOMMITTED: may see uncommitted data
![Page 12: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/12.jpg)
12
Anatomy of an Application
• Create/Open/Recover environment• Open database handles• Spawn utility threads• Spawn worker threads
• Begin transaction• Do database operations (insert/delete/update/etc.)• Commit or abort transaction• Do it again if appropriate (main loop)
• Close database handles• Close environment
![Page 13: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/13.jpg)
13
Create/Open/Recover Environment
if ((ret = db_env_create(&DB_ENV, 0)) != 0) …error_handling…
/* Configure the environment. */flags = DB_CREATE |DB_INIT_LOG | DB_INIT_TXN | DB_INIT_LOCK | DB_INIT_MPOOL | DB_RECOVER;if ((ret = DB_ENV->open(DB_ENV, HOME, flags, 0)) != 0) …error_handling…
![Page 14: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/14.jpg)
14
Create/Open Database
• Must specify DB_ENV to database open• That DB_ENV must have been opened with DB_INIT_TXN
• Database open must be transactional• Specify a DB_TXN in the open• Specify DB_AUTO_COMMIT in the open• Open environment with DB_AUTO_COMMIT
![Page 15: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/15.jpg)
15
Transaction Operations
• Begin transaction with DB_ENV->txn_begin()• DB_TXN->commit() commits the transaction• Releases all locks• All log records are written to disk (by default)
• DB_TXN->abort() aborts the transaction• Releases all locks• Modifications are rolled back
• Must close all cursors before commit or abort
![Page 16: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/16.jpg)
16
Transaction Differences
• Without transactions:ret = dbp->put(dbp, NULL, &key, &data, 0);
… error handling …• With transactions
ret = DB_ENV->txn_begin(DB_ENV, NULL, &txn, 0);… error handling …
ret = dbp->put(dbp, txn, &key, &data, 0);… error handling …
ret = txn->commit(txn, 0);… error handling …
• Note: if the DB handle was opened transactionally, then we will automatically wrap every modification operation in a transaction (DB_AUTO_COMMIT behavior).
![Page 17: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/17.jpg)
17
Transactional Cursors
• Specify transaction handle on cursor creation• Cursor operations are performed inside that
transaction• Cursors must be closed before commit or abort
DBC *dbc = NULL;DB_TXN *txn = NULL;
ret = dbenv->txn_begin(dbenv, NULL, &txn, 0);if (ret != 0) … error handling …
ret = dbp->cursor(dbp, txn, &dbc, 0);if (ret != 0) … error handling …
![Page 18: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/18.jpg)
18
Transactional Iteration
ret = dbp->cursor(dbp, txn, &dbc, 0); … error handling …
while ((ret = dbc->c_get(dbc, &key, &data, DB_NEXT)) == 0) … process record: write new data, delete old data …
if (ret == DB_NOTFOUND) ret = 0;if ((temp_ret = dbc->close(dbc)) != 0 && ret == 0) ret = temp_ret;if (ret == 0) ret = txn->commit(txn, 0);else (void) txn->abort(txn);if (ret != 0) … error handling …
![Page 19: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/19.jpg)
19
Nested Transactions
• Split large transactions into smaller ones• Children can be individually aborted.
• Create nested transaction• Pass parent DB_TXN handle to DB_ENV->txn_begin()
• Parent transaction with active children can only• Create more children• Commit or abort
• If parent commits (aborts), all children commit (abort)• Child transaction’s locks
• Won’t conflict with the parent’s locks• Will conflict with other children’s (siblings) locks
![Page 20: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/20.jpg)
20
Deadlocks
• What are deadlocks?• Deadlock resolution• Deadlock detection • Dealing with deadlocks• Deadlock avoidance
![Page 21: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/21.jpg)
21
Defining Deadlocks
• Consider two threads• Thread 1: Write A, Write B• Thread 2: Write B, Write A
• Let’s say that the sequence of operations is• T1: Write lock A• T2: Write lock B• T1: Request lock on B (blocks)• T2: Request lock on A (blocks
• Neither thread can make forward progress
![Page 22: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/22.jpg)
22
Conditions for Deadlock
• Exclusive access• Block when access is unavailable• May request additional resources while
holding resources• The graph of “who is waiting for whom” has a
cycle in it.
![Page 23: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/23.jpg)
23
Deadlock Resolution
• Two techniques:• Assume a sufficiently long wait is a deadlock and self-abort• Detect and selectively abort
• Berkeley DB supports both• Timeouts
• Use DB_ENV->set_timeout to specify a maximum length of time that a lock or transaction should block.
• Return DB_LOCK_NOTGRANTED if the timeout expires before a lock is granted
![Page 24: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/24.jpg)
24
Deadlock Detection
• Two step process:• Traverse waits-for graph looking for cycles• If cycle, select victim to abort
• Looking for cycles• Synchronously: checked on each blocking lock
DB_ENV->set_lk_detect()• + Immediate detection and notification• - Higher CPU cost
• Asynchronously: run detector threadDB_ENV->lock_detect() or db_deadlock utility program• - Detected only when thread of control runs• + Lower CPU cost
![Page 25: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/25.jpg)
25
Picking who to abort
• Ideally want to limit wasted effort.• Deadlock resolution policy is configurable
• Default is DB_LOCK_RANDOM
• Other optionsDB_LOCK_MINLOCKSDB_LOCK_MINWRITE (# of write locks, not write ops)DB_LOCK_YOUNGESTDB_LOCK_OLDEST
• The victim• Gets a DB_LOCK_DEADLOCK error return• Must immediately (close all cursors and) abort
![Page 26: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/26.jpg)
26
Dealing with Deadlocks
• Transactional applications must code for deadlocks.• Typical response to deadlock is to retry• Repeated retries may indicate a serious problem
![Page 27: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/27.jpg)
27
Deadlock Example
for (fail = 0; fail < MAXIMUM_RETRY; fail++) { ret = DB_ENV->txn_begin(DB_ENV, NULL, &txn, 0); … error_handling …
ret = db->put(db, txn, &key, &data, 0);
if (ret == 0) {… commit the transaction …return 0; /* Success! */
} else { /* DB_LOCK_DEADLOCK or something else */ … abort the transaction … }}
DB_ENV->err(DB_ENV, ret, “Maximum retry limit exceeded”);return (ret); /* Retry limit reached; give up. */
![Page 28: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/28.jpg)
28
Deadlock Avoidance
• Good practices• Keep transactions short• Read/write databases in the same order in all transactions• Limit the number of concurrent writers• Use DB_RMW for read-modify-write operations• DB_READ_UNCOMMITTED for readers• DB_READ_COMMITTED for cursors• DB_REVSPLITOFF for cyclical Btrees (grow/shrink/grow/…)
• Debugging• db_stat –Co• db_stat -Cl• “Deadlock Debugging” section of the Reference Guide
![Page 29: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/29.jpg)
29
Subsystems and Utilities
• Checkpoint• Logging• Backups• Recovery
![Page 30: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/30.jpg)
30
Checkpoints
• Recall that Berkeley DB maintains a cache of database pages.
• Checkpoints write dirty pages from the cache into the databases• Transaction commit only writes the log
• Checkpoints:• Permit log file reclamation• Reduce recovery time• Block other operations minimally
• Checkpoint is I/O intensive
• Typically, checkpoint in a separate thread• Command line db_checkpoint• Use DB_ENV->txn_checkpoint()
![Page 31: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/31.jpg)
31
Controlling Checkpoints
• Log file size dictates unit of log reclamation• Use DB_ENV->set_lg_max
• Checkpoint method and utility controlled by• Force checkpoint• More than kbytes of log have been written• More than min minutes have passed
• Reduce checkpoint impact by keeping mpool clean• Use DB_ENV->memp_trickle
![Page 32: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/32.jpg)
32
Logging
• Database modifications are described in log records.• Log records describe physical changes to pages.• Log records are indexed by log sequence numbers (LSNs)• The log is comprised of one or more log files.• Logs are named log.NNNNNNNNNN• Logs can be maintained on-disk or in-memory
• On-disk logs stored in DB_HOME directory• Or in location configured by DB_ENV->set_lg_dir()
• Log records flushed by DB_TXN->commit()
For best performance and recoverability properties,place logs and databases on separate disks
![Page 33: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/33.jpg)
33
Logging Configuration
• DB_ENV->set_lg_max()• Set the size of the individual log files • On disk logs: default is 10MB• In-memory logs: default is 256KB
• DB_ENV->set_lg_bsize()• Set the size of the in-memory log buffer• On-disk logs: default is 32KB• In-memory logs: default is 1MB
• DB_ENV->set_lg_regionmax()• Set the size of the logging subsystem’s shared region
(default is 60KB)• Region holds filenames
![Page 34: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/34.jpg)
34
Backups
• Full backup: Copy database and log files• Standard
• Pause database operations and perform the backup• Creates a snapshot at a known point in time
• Hot• Backup while database(s) are still active• Creates a snapshot at fuzzy point in time
• Incremental backup• Copy log files to be replayed/recovered against a full backup• Hot failover
![Page 35: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/35.jpg)
35
Standard Backup
1. Commit or abort all on-going transactions
2. Pause all database writes
3. Force a checkpoint
4. Copy all database files to backup locationFind active database files with DB_ENV->log_archive()or db_archive program with DB_ARCH_DATA flag
5. Copy the last log file to backup locationDB_ENV->log_archive() or db_archive with DB_ARCH_LOG identifies all of the log files
![Page 36: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/36.jpg)
36
Hot Backup
1. Do not stop database operations2. Copy all database files to backup location
1. Use DB_ENV->log_archive() with DB_ARCH_DATA or db_archive –s
2. Database files may be modified during the backup, so the copy utility must read each database page atomically.
3. Copy all log files to backup location
The order of operations must be preserved. Copy databases first and then log files.
![Page 37: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/37.jpg)
37
Log File Removal
• Remove unused log files to regain disk space• db_archive –l or • DB_ENV->log_archive() to identify unused log
files
• To allow catastrophic recovery• Move to backup media; do not simply delete them
• Never remove all of the transaction logs• Do not remove active transaction logs
![Page 38: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/38.jpg)
38
Recovery
• Write-ahead logging • Log records always written to disk before database updates
• Recovery• Database changes validated against the log
• Redo committed operations not in the database• Undo aborted operations that are in the
database• Removes and re-initializes the environment files • Must be single-threaded• Other threads of control must wait for recovery
![Page 39: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/39.jpg)
39
When to Run Recovery
• DB_RUNRECOVERY error • Subsequent API calls return DB_RUNRECOVERY• Restart application and run recovery
• Always perform recovery at application startup• Prevent spreading corruption
![Page 40: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/40.jpg)
40
Kinds of Recovery
• Normal recovery assumes no loss of media• Reviews log records since the last checkpoint• More frequent checkpoints mean faster recovery• DB_RECOVER flag when opening the environment or
db_recover program
• Catastrophic recovery does the same, but:• Reviews all available log files• Catastrophic recovery can take awhile• DB_RECOVER_FATAL flag or db_recover –c
• Recovery to a timestamp is available
![Page 41: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/41.jpg)
41
Recovery Procedures
• Simplest case• Re-create the databases, no need for recovery
• Normal recovery• Up-to-date database and log files are available• Not used when creating hot backups
• Catastrophic recovery• Database or log files destroyed or corrupted• Normal recovery fails for any reason• Used when creating hot backups
![Page 42: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/42.jpg)
42
Catastrophic Recovery
• Often performed in a directory other than where the database files previously lived
• Copy the most recent snapshot of the database and log files to the recovery directory
• Copy any newer log files into the recovery directory• Log files must be recovered in sequential order• Run db_recover -c
• Or call DB_ENV->open() with DB_RECOVER_FATAL
![Page 43: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/43.jpg)
43
Maintaining a Hot Standby
1. db_archive -s to identify all the active database files 2. Copy identified database files to the backup/failover directory3. Archive all existing log files from the backup directory 4. Use db_archive (no option) in the active environment to identify the
inactive logs and move them to the backup directory5. Use db_archive -l in the active environment to identify all active
log files and copy these to the hot failover directory6. Run db_recover -c against the hot failover directory to
catastrophically recover the environment
• Steps 2-5 can be repeated as often as desired• If step 1 is repeated, steps 2-5 must follow to ensure
consistency
![Page 44: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/44.jpg)
44
Programming Practices
• Disk guarantees• Application structures• Locking• Special errors
![Page 45: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/45.jpg)
45
Disk I/O Integrity Guarantees
• Disk writes are performed a database page at a time• Berkeley DB assumes pages are atomically written• If the OS requires multiple writes per page
• A partial page can be written, possibly corrupting the database
• To guard against this:• Set database page size equal to file system page size OR• Configure the environment to perform checksums*• Use DB->set_flags() with the DB_CHKSUM flag• If corruption is detected run catastrophic recovery
* Checksums can be computationally expensive
![Page 46: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/46.jpg)
46
Disk I/O Integrity Guarantees
• Berkeley DB relies on the integrity of the underlying software and hardware interfaces• Berkeley DB assumes POSIX compliance, especially with regard to
disk I/O• Avoid hardware that performs partial writes or “lies” about writes
(acknowledges them when the data hits the disk cache and does not guarantee that the cache can be written under all circumstances).
• Best practice: store database files and log files on physically separate disks
![Page 47: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/47.jpg)
47
Transactional Applications
• Must handle recovery gracefully• Always run recovery, normally on startup, before other
threads/processes join the environment• Use DB_REGISTER to indicate if recovery is
necessary.• All DB_ENV handles must be opened with DB_REGISTER for
this to work.• If DB_REGISTER and DB_RECOVER are both set, recovery
will only be run if necessary.
• Use DB_ENV->failchk() after any thread/process failure to determine if it is safe to continue.
![Page 48: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/48.jpg)
48
Single-process Applications
• A single process, multi-threaded application• First thread opens the database environment
• Run recovery at this time• First thread opens databases• First thread spawns subsequent threads• Subsequent threads share DB_ENV and DB handles• Last thread to exit closes the DB_ENV and DB handles
• If any thread exits abnormally• Some thread calls DB_ENV->failchk()• Return value dictates action: continue or run recovery
![Page 49: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/49.jpg)
49
Multi-process Applications #1
• Order of process start up must be controlled• Recovery must happen before anything else• Recovery must be single-threaded
• Processes themselves may be threaded• Processes maintain their own DB_ENV and DB handles• If thread of control exits abnormally
• All threads should exit database environment• Recovery must be run
• Automate with DB_REGISTER flag in DB_ENV->open()
![Page 50: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/50.jpg)
50
Multi-process Applications #2
Often easiest to have a single monitoring process• Monitor is responsible for running recovery• Monitor starts other processes after recovery• Monitor is on a timer or waits for other
processes to die• At which time it uses failchk() mechanism• If failchk() returns DB_RUNRECOVERY:
• Monitor runs recovery• Other processes die when the environment is recreated• Monitor kills stubborn or slow processes• Monitor re-starts the system
![Page 51: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/51.jpg)
51
Locking in Transactions
• Initialize locking subsystem• Use the flag DB_INIT_LOCK to DB_ENV->open
• Pages or records are read- or write-locked• Conflicting lock requests wait until object available• Locking within transactions
• Locks are released at the end of the transaction• Earlier if specified (DB_READ_COMMITTED, DB_READ_UNCOMMITTED).
• Locking without transactions• Locks are released at the end of the operation
![Page 52: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/52.jpg)
52
Additional Locking Options
• Use lock timeouts instead of or in addition to regular deadlock detection• DB_LOCK_NOTGRANTED
• (optional) is returned when a lock times out• DB_LOCK_DEADLOCK
• returned when a lock times out• or the transaction is selected for deadlock resolution
• Accuracy of timeout depends on how often deadlock detection is performed
![Page 53: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/53.jpg)
53
Special Errors
• Catastrophic error: should never happen• Subsequent calls to the API will return the
same error• DB->set_paniccall() orDB_ENV->set_paniccall()• Callback function when catastrophic error occurs
• Exit and restart application• Run recovery if you are using an environment
![Page 54: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial](https://reader033.vdocuments.us/reader033/viewer/2022061202/547bfcd85806b5ef3f8b46f0/html5/thumbnails/54.jpg)
54
End of Part II