bit-3107-3.docx

BST 3101: DATABASE SYSTEMS II Topic 3 Database Administration Database administration refers to the whole set of activities performed by a database administrator to ensure that a database is always available as needed. It also includes other related activities and roles as database security, database monitoring and troubleshooting, and planning for future growth. Database administration can also be described as the function of managing and maintaining database management systems (DBMS) software. It is an important function in any organisation that is dependent on one or more databases.

Organisations that use DBMS software often hire specialized IT personnel known as Database Administrators (DBAs), who require deep technical skills on running and operation of databases. They also need knowledge and training in the platform (database engine and operating system) on which the database runs. The main role of DBAs is to ensure maximum uptime for the database so that it is available when needed. Other key roles are as follows:(i) Designing, implementing, and maintaining the database system.(ii) Database tuning - Tweaking of several parameters to optimise performance, such as server

memory allocation, file fragmentation and unauthorized access. (iii) Maintaining database documentation and standards.(iv) Enforcing security and maintenance of the database(s).(v) Backup and recovery - Ensure the database has adequate backup and recovery procedures in

place to recover from any accidental or deliberate loss of data. (vi) Producing reports from queries.(vii) Facilitating training of employees on use of the database.

There are three broad categories of database administrators, systems, development and application database administrators.

Systems DBAs, also known as physical, operations or production Support DBAs focus on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimisation, and maintenance and disaster recovery.

Development DBAs focus on the logical and development aspects of database administration such as data model design and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding stored procedures.

Application DBAs are usually found in organizations that have purchased 3rd party application software such as ERP (enterprise resource planning) and CRM (customer relationship management) systems. Application DBAs cut across the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa.

Transaction ManagementA transaction is one or more SQL statements that make up a unit of work performed against the database. Either all the statements in a transaction are committed as a unit, or all the statements are rolled back as a unit. For example, when a user uses a computer to transfer money from one bank account to another, the request involves a transaction: updating values stored in the database for both accounts. For a transaction to be completed and database changes to be made permanent, a transaction must be completed in its entirety.

1

Managing Commits in TransactionsCommitting (and rolling back) transactions is slow because of the disk I/O and potentially the number of network round trips required. The database must write to disk every modification made by a transaction to the database. This is usually a sequential write to a journal file (or log); nevertheless, it involves expensive disk I/O.

In most standards-based APIs, the default transaction commit mode is autocommit. In auto-commit mode, a commit is performed for every SQL statement that requires a request to the database, such as Insert, Update, Delete, and Select statements. When auto-commit mode is used, the application does not control when database work is committed.

Some database systems do not support auto-commit mode. For these databases, the database driver, by default, sends a commit request to the database after every successful operation (SQL statement). This request equates to a network round trip between the driver and the database. The round trip to the database occurs even though the application did not request the commit and even if the operation made no changes to the database. For example, the driver makes a network round trip even when a Select statement is executed.

For longer periods, unit of work is a series of Select statements, one executed after the other to return a column of balances. In most cases, for every Select statement executed against the database, a lock is placed on rows to prevent another user from updating that data. By holding locks on rows for longer than necessary, active transactions can prevent other users from updating data, which ultimately can reduce throughput and cause concurrency issues. In this case, one may want to commit the Selectstatements in intervals (after every five Select statements, for example) so that locks are released in a timely manner.

Leaving transactions active consumes database memory. Committing a transaction flushes the contents of the log and releases database memory. If an application uses transactions that update large amounts of data (1,000 rows, for example) without committing modifications, the application can consume a substantial amount of database memory. In this case, you may want to commit after every statement that updates a large amount of data.

Isolation LevelsA transaction isolation level represents a particular locking strategy used in the database system to improve data integrity. Most database systems support several isolation levels, and the standards-based APIs provide ways for the user to set isolation levels.

Local Transactions versus Distributed Transactions A local transaction is a transaction that accesses and updates data on only one database. Local transactions are significantly faster than distributed transactions because local transactions do not require communication between multiple databases, which means less logging and fewer network round trips are required to perform local transactions.

A distributed transaction is a transaction that accesses and updates data on multiple networked databases or systems and must be coordinated among those databases or systems. These databases may be of several types located on a single server, such as Oracle, Microsoft SQL Server, and Sybase; or they may include several instances of a single type of database residing on numerous servers.

2

Distributed transactions are substantially slower than local transactions because of the logging and network round trips needed to communicate between all the components involved in the distributed transaction.

The following occurs when the application requests a transaction:1. The driver issues a commit request.2. The transaction coordinator sends a pre-commit request to all databases involved in the

transaction.3. The transaction coordinator sends a commit request command to all databases.

(i) Each database executes the transaction up to the point where the database is asked to commit, and each writes recovery information to its logs.(ii) Each database replies with a status message to the transaction coordinator indicating whether the transaction up to this point succeeded or failed.

4. The transaction coordinator waits until it has received a status message from each database. If the transaction coordinator received a status message from all databases indicating success, the following occurs:(i) The transaction coordinator sends a commit message to all the databases.(ii) Each database completes the commit operation and releases all the locks and resources held

during the transaction.(iii) Each database replies with a status to the transaction coordinator indicating whether the

operation succeeded or failed.5. The transaction coordinator completes the transaction when all acknowledgments have been

received and replies with a status to the driver indicating if the commit succeeded or failed.

Database Transactions: the ACID Model The database transaction ACID (Atomicity, Consistency, Isolation, Durability) rules or ACID model refers to a set of properties that guarantee that database transactions are processed reliably. It is one of the oldest and most important concepts of database theory. The ACID rules sets forward four goals that every database management system must strive to achieve: atomicity, consistency, isolation and durability. No database that fails to meet any of these four goals can be considered reliable.

AtomicityAtomicity states that database modifications must follow an “all or nothing” rule. It refers to the ability of the database to guarantee that either all of the tasks of a transaction are performed or none of them are. Database modifications must follow an all or nothing rule. Each transaction is said to be atomic if when one part of the transaction fails, the entire transaction fails.

It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.

ConsistencyThe consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not). For example, in a storefront there is an inconsistent view of what is truly available for purchase if inventory is allowed to fall below 0, making it impossible to provide more than an intent to complete a transaction at checkout time.

Consistency states that only valid data will be written to the database. If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back and

3

the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules.

IsolationIsolation refers to the requirement that other operations cannot access or see the data in an intermediate state during a transaction. This constraint is required to maintain the performance as well as the consistency between transactions in a database. Thus, each transaction is unaware of another transactions executing concurrently in the system.

Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For example, if Joe issues a transaction against a database at the same time that Mary issues a different transaction, both transactions should operate on the database in an isolated manner. The database should either perform Joe’s entire transaction before executing Mary’s or vice-versa. This prevents Joe’s transaction from reading intermediate data produced as a side effect of part of Mary’s transaction that will not eventually be committed to the database

DurabilityDurability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction. Many databases implement durability by writing all transactions into a transaction log that can be played back to recreate the system state right before a failure. A transaction can only be deemed committed after it is safely in the log.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

Durability does not imply a permanent state of the database. Another transaction may overwrite any changes made by the current transaction without hindering durability.

Concurrency ControlConcurrency control is the process of managing simultaneous execution of transactions in a shared database, to ensure the serialisability of transactions. It ensures that database transactions provide correct results for concurrent operations without violation of data integrity , and as quickly as possible.

Concurrent access to data is desirable when:(i) The amount of data is sufficiently great that at any given time only fraction of the data can be in

primary memory & rest should be swapped from secondary memory as needed.(ii) Even if the entire database can be present in primary memory, there may be multiple processes.

Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistent.

4

The main categories of concurrency control mechanisms are:Optimistic – The optimistic approach to concurrency control assumes that conflicts between transactions are rare. It delays the checking of whether a transaction meets the isolation and other integrity rules until its end, without blocking any of its (read, write) operations, and then abort a transaction to prevent the violation, if the desired rules are to be violated upon its commit. The method does not require locking, and transaction executed without restrictions. Check for conflicts is done just before commit.

Pessimistic – This method assumes that conflicts will happen, and involves blocking an operation of a transaction, if it may cause violation of the rules, until the possibility of violation disappears. Blocking operations is typically involved with performance reduction. The pessimistic concurrency control techniques detect conflicts as soon as they occur and resolve them using blocking

Semi-optimistic - Block operations in some situations, if they may cause violation of some rules, and do not block in other situations while delaying rules checking (if needed) to transaction's end, as done with optimistic.When an object is locked by another transaction, the requesting transaction must wait

The disadvantages of locking include:(i) Lock management overhead.(ii) Deadlock detection/resolution.(iii) Concurrency is significantly lowered, when congested nodes are locked.(iv) To allow a transaction to abort itself when mistakes occur, locks can’t be released until the end of

transaction, thus currency is significantly lowered (v) Conflicts are rare, and checking for locks every moment merely slows down processes.

Methods of Concurrency ControlLocking (e.g., Two-phase locking ) - Controlling access to data by locks assigned to the data. Access of a transaction to a data item (database object) locked by another transaction may be blocked (depending on lock type and access operation type) until lock release.

Serialisation graph checking - Checking for cycles in the schedule's graph and breaking them by aborts.Timestamp ordering - Assigning timestamps to transactions, and controlling or checking access to data by timestamp order.

Commitment ordering (or Commit ordering) - Controlling or checking transactions' chronological order of commit events to be compatible with their respective precedence order.

5

bit-3107-3.docx

Documents