transaction processing concepts prof. sin-min lee department of mathematics and computer science

54
Transaction Processing Concepts Prof. Sin-Min Lee Department of Mathematics and Computer Science

Upload: madison-kennedy

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Transaction Processing Concepts

Prof. Sin-Min Lee

Department of Mathematics and Computer Science

Introduction to Transaction Processing

• Single-User Vs. Multi-User Systems

• Read and Write Operations of a Transaction

• Problems in Concurrency Operations

• Types of Transaction Failures

• Serializability of Schedules

• A DBMS is single-user if at most one user at a time can use the system

• A DBMS is multi-user if many user can use the system concurrently

• In a multi DBMS, the stored data items are the primary resources that may be accessed concurrently by user programs, which are constantly retrieving information from and modifying the database. The execution of a program that accesses or change the contents of the database is called a transaction.

Single-User Vs. Multi-User System

Objectives• Function and importance of transactions.• Properties of transactions.• Concurrency Control

– Meaning of serializability.– How locking can ensure serializability.– Deadlock and how it can be resolved.– How timestamping can ensure

serializability.– Optimistic concurrency control.– Granularity of locking.

• Recovery Control– Some causes of database failure.– Purpose of transaction log file.– Purpose of checkpointing.– How to recover following database

failure.

• Alternative models for long duration transactions.

Transaction SupportTransaction

Action, or series of actions, carried out by user or application, which accesses or changes contents of database.

• Logical unit of work on the database. • Application program is series of transactions with

non-database processing in between. • Transforms database from one consistent state to

another, although consistency may be violated during transaction. (If the system crashes, each transaction's changes are reflected in the persistent database either entirely or not at all.)

Transaction Support

– A transaction is a sequence of one or more SQL operations treated as a unit ( another working definition).

– SQL standard (and Oracle):

– Transaction begins automatically when first SQL command is issued.

– Transaction ends (and new one begins) when "COMMIT" command is issued or session ends.

– Alternative "AUTO COMMIT" mode turns each statement into a transaction.

Example Transaction

Transaction Example• Book passenger Smith on Flight QF9 on

22/10/99 in seat A22• 1.INSERT INTO PASSENGER VALUES (‘Simth’,’98198333’,’VISA’,’4940123456781234’); 2. INSERT INTO SEAT VALUES (‘QF9’,’22-10-99’,’A22’,’Smith’,’Vegetarian’); 3. INSERT FLIGHT SET seats_left = seats_left -1 WHERE flight_code = ‘QF9’ AND flight_data = ’22-10-

99’; step 1 by itself is inconsistent step 1&2 without step 3 would make seats_left inconsistent Step 1,2 &3 make sense together.

Transaction Support• Can have one of two outcomes:

– Success - transaction commits and database reaches a new consistent state.

– Failure - transaction aborts, and database must be restored to consistent state before it started.

– Such a transaction is rolled back or undone.

• Committed transaction cannot be aborted.• Aborted transaction that is rolled back can

be restarted later.

State Transition Diagram for Transaction

Properties of Transactions •Four basic (ACID) properties of a transaction are:

Atomicity , Consistency, Isolation and DurabilityAtomicity Each transaction's operations are executed all-or-nothing, never left "half done." •E.g., If system crashes before transaction commits, no effects of transaction remain in database - transaction can start over when system comes back up. •E.g., If error or exception occurs during a transaction, partial effects of the transaction are undone. •Question: How is this guarantee achieved? •Implemented by COMMIT and ROLLBACK

Properties of Transactions

Consistency Must transform database from one consistent state to another.

Implemented by database integrity constraints.

Properties of Transactions Isolation Partial effects of incomplete transactions should not be visible to other transactions.Implemented by locking•Two transactions can be executed in perfect isolation if they run serially•However, running transactions strictly serially results in low throughput and poor response time•To improve efficiency, transactions can be interleaved. This is called concurrency.•The operations of one transaction can be freely interleaved with another so long as they do not conflict( operate on the same data)•Transaction can run concurrently

Properties of Transactions

Durability Effects of a committed transaction are permanent and must not be lost because of later failure.

Implemented by transaction logging and recovery

DBMS Transaction Subsystem

DBMS Transaction Subsystem

Concurrency Control Process of managing simultaneous operations on the database without having them interfere with one another.( Centralized system with concurrent access by several users)

• Prevents interference when two or more users are accessing database simultaneously and at least one is updating data.

• Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result.

Concurrency Example • Database consisting of two items, X and Y

• Only criterion for correctness: X=Y

• Two concurrent transactions

T1: X X+1 T2: X 2*X

Y Y+1 Y 2*Y

Initially, X=10 and Y=10

Need for Concurrency Control• Three examples of potential problems

caused by concurrency( transaction conflict scenarios) – Lost update problem (Dirty Write)– Uncommitted dependency problem

( Dirty Read)– Inconsistent analysis problem ( Fuzzy

Read).

Lost Update Problem• Successfully completed update is

overridden by another user.

• T1 withdrawing £10 from an account with balx, initially £100.

• T2 depositing £100 into same account.

• Serially, final balance would be £190.

Lost Update Problem

• T1 overwrites data written by T2• Loss of T2's update avoided by preventing T1 from reading balx

until after update.

Uncommitted Dependency Problem

• Occurs when one transaction can see intermediate results of another transaction before it has committed.

• T4 updates balx to £200 but it aborts, so balx should be back at original value of £100.

• T3 has read new value of balx (£200) and uses value as basis of £10 reduction, giving a new balance of £190, instead of £90.

Uncommitted Dependency Problem

• T3 reads sata changed but uncommitted by T4• Problem avoided by preventing T3 from reading balx

until after T4 commits or aborts.

Inconsistent Analysis Problem• Occurs when transaction reads several values

but second transaction updates some of them during execution of first.

• Sometimes referred to as dirty read or unrepeatable read.

• T6 is totaling balances of account x (£100), account y (£50), and account z (£25).

• Meantime, T5 has transferred £10 from balx to balz, so T6 now has wrong result (£10 too high).

Inconsistent Analysis Problem

• T5 modifies data over which T6 is aggregating• Problem avoided by preventing T6 from reading

balx and balz until after T5 completed updates.

Serializability• Objective of a concurrency control protocol is to

schedule transactions in such a way as to avoid any interference.

• Could run transactions serially, but this limits degree of concurrency or parallelism in system.

• Serializability identifies those executions of transactions guaranteed to ensure consistency.

SerializabilitySchedule– Sequence of reads/writes by set of

concurrent transactions. – A sequence of operations from one or more

transactions– In concurrent transactions, the operations are

interleaved

ScheduleOperations

– read(Q,q)

read the value of the database item Q and store in the local variable q.

– write(Q,q)

write the value of the database item Q and store in the local variable q.

– other operations such as arithmetic

– commit

– rollback

Example: A “Good” Schedule

• One possible schedule, initially X = 10, Y=10

•Resulting Database: X=21,Y=21, X=Y

Example: A “Bad” Schedule

• Another possible schedule

•Resulting Database X=22,Y=21, X=Y

The Problem• In the previous schedule, the result was

“incorrect.”• What do we mean by correctness?• Definition of Correctness: The concurrent execution of a set of

transactions is said to be correct if it is “equivalent” to some serial execution of the those transactions.

• Several notions of equivalence, e.g., result equivalent• Serial means one after another• For transactions T1 and T2 there are only two serial

schedules.• T1, T2• T2, T1

Serializability– Serial Schedule

Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions.

• No guarantee that results of all serial executions of a given set of transactions will be identical.

Nonserial Schedule

• Objective of serializability is to find nonserial schedules that allow transactions to execute concurrently without interfering with one another.

• In other words, want to find nonserial schedules that are equivalent to some serial schedule. Such a schedule is called serializable.

Serializable

• Definition: A schedule is said to be serializable if the result of executing that schedule is the same as the result of executing some serial schedule. – A schedule, S, is serializable if S produces the same results as either – T1, T2 – T2, T1

• A simplifying assumption: Operations in a schedule are

– read X – write X – commit – abort

Example• Consider a simple transaction that processes an order• PRODUCT( prodno,qoh) ORDER(ordno,prodno,qty)1. Read qoh SELECT qoh INTO oldqoh FROM PRODUCT WHERE

prodno =1;2. Calculate new qoh newqoh :=oldqoh – 10;3. Update qoh UPDATE STOCK SET qoh = newqoh WHERE prodno =1;4. INSERT order INSERT INTO ORDERS VALUES (1234,1,10);

Example• Consider two transactions TA and TB

running concurrently.

• Various schedules are possible for interleaving the operations.

1 2 3 4 1 2 3 4 Serial (1)

1 2 3 1 4 2 3 4 Seriablizable(1)

1 2 3 1 2 4 3 4 Seriablizable(1)

1 2 3 4 1 2 3 4 Serial (2)

A Nonserial Example

• Example, with an initial database satisfying X = Y

• S2 is not (conflict or result) equivalent to a serial execution of T3, T4 or T4, T3.

Summary of Basic Concepts

• Each transaction preserves database consistency.

• The serial execution of a set of transactions preserves database consistency.

• In a concurrent execution, steps of a set of transactions may be interleaved.

• A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.

Determining Serializability

• Given a schedule S, how do we determine that it is serializable?

• Use a slightly restricted definition: conflict serializability.

• Two transactions Ti and Tj conflict if and only if there exists some item X, accessed by both Ti and Tj, and at least one of these transactions wrote X.

• Intuitively, a conflict between two transactions forces an execution order between them.

Serializability• In serializability, ordering of read/writes is

important:(a) If two transactions only read a data item, they

do not conflict and order is not important.(b) If two transactions either read or write

completely separate data items, they do not conflict and order is not important.

(c) If one transaction writes a data item and another reads or writes same data item, order of execution is important.

Possible Conflicts• Transaction T1 T 2

Read Read No conflict

Read Write Write Read Conflicting

operations

Write Write

Conflict Equivalent Schedules• Let I and J be consecutive instructions by two

different transactions within a schedule S. – If I and J do not conflict, we can swap their order to

produce a new schedule S'.

– The instructions appear in the same order in S and S', except for I and J, whose order does not matter.

• Definition: A schedule is conflict serializable if it is conflict equivalent to a serial schedule. – S and S' are termed conflict equivalent schedules.

Example of Conflict Serializability

Another Example

• S5 is not serial (obvious!) and is not conflict serializable since every pair of operations conflicts, hence none can be swapped.

• T2 and T3 have write instructions without read instructions, termed blind writes.

Precedence Graph• A directed graph depicting conflicts in a

schedule

• A assumption: No blind writes

• Given transactions T1,T2, …, Tn

Precedence Graph• Create:

– node for each transaction;– a directed edge Ti Tj, if Tj reads the value of an item

written by TI;– a directed edge Ti Tj, if Tj writes a value into an

item after it has been read by Ti. – Add and edge labelled X from Ti to Tj if Ti conflict on

X ( and is before ) Tj

• If precedence graph contains cycle schedule is not conflict serializable.

Precedence Graph• Example for schedule S1

• If precedence graph contains cycle schedule

is not conflict serializable.

Testing Serializability

• A schedule is ( conflict) serializable if its precedence (conflict) graph is acyclic; that is,

• If precedence graph contains cycle schedule is not conflict serializable.

• Need to guarantee serializability an inefficient strategy

• Generate a schedule• Build conflict graph• Test for cycle

Impose protocol to generate only serializable schedules

Example - Non-conflict serializable schedule

• T9 is transferring £100 from one account with balance balx to another account with balance baly.

• T10 is increasing balance of these two accounts by 10%.

• Precedence graph has a cycle and so is not serializable.

Example - Non-conflict serializable schedule

Another Example

Relationship Among Schedules

Concurrency Control Techniques• Two basic concurrency control techniques:

– Locking– Timestamping

• Both are conservative approaches: delay transactions in case they conflict with other transactions.

• Optimistic methods assume conflict is rare and only check for conflicts at commit.