1 transactions transaction: –a unit of work –a sequence of operations which are bundled together...

1

Transactions• Transaction:

– A unit of work– A sequence of operations which are bundled

together

• Either all of them should be executed or none i.e. not partial execution

• Important in terms of concurrency and crash recovery

2

FIGURE 21.2Two sample transactions. (a) Transaction T1.

(b) Transaction T2.

3

Bad Failure Eg: Fund Transfer• Transfer $50 from checking to saving account

– Initially, A and B each have 200$.

1. read(checking)

2. checking := checking – 50

3. write(checking)

4. read(saving)

5. saving := saving + 50

6. write(saving)

• If transaction fails after step 3, before step 6 ?

• Money will be “lost” : inconsistent state

4

Bad Failure Eg: Fund Transfer• So if system crashes after step 3, what should

happen?

• DBMS must ensure that either the transaction is re-started and steps 4 through 6 are completed or …

• The effects of steps 1 through 3 are undone– And user informed that the transaction failed

• This is an example of crash recovery

5

Concurrency• Parallel processing: processes are concurrently

executed in multiple CPUs. Advantage?

• Increased throughput: can do different things simultaneously, so can do more things.

• Interleaved processing: concurrent execution of processes interleaved in single CPU. Why?

• When one process is doing I/O, the other can be accessing the database: better throughput.

• Prevents one process from holding up other processes for a long period of time

6

Concurrency• Concurrency: We will refer to both parallel

processing on multiple CPUS and interleaved processing on a single CPU as concurrency.– Most issues apply in both cases

• When would we want to allow two transactions to execute concurrently ?

• If two transactions are not “interfering” with each other then we want to allow concurrent execution.– When do we think of interfering with each other ?

• Dealing with the same data

7

Concurrency• Eg:

– Jack has a bank account– Jill has a separate bank account– Jack depositing 50$ into his account– Jill depositing 50$ into her account– Should we allow concurrent execution here ?

• Yes, can do concurrently– Not interfering with each other

• Suppose a joint bank account ?

8

Bad Concurrency Example• Eg: Jack and Jill each depositing 50$ into a

joint bank account, initial balance $200

Jack: T1 Jill: T2read(balance B):200

B := B + 50: 250

read(B): 200

B := B + 50: 250

write(B):250

write(B):250

• Final balance ends up with wrong value

9

Concurrency and Recovery• Two main issues to deal with:

– Concurrent execution– Failures of various kinds, such as hardware

failures and system crashes

• They do effect each other, but we will initially look at them mostly separately.

• Will look initially at the basics– What we want, what we try to avoid, what kind of

information we need to store etc.

• Later we will look at algorithms

10

FIGURE 21.2Two sample transactions. (a) Transaction T1.

(b) Transaction T2.

11

Transactions• Transaction: includes one or more database

access operations

• Read : no change made

• Write :insert, update, delete: a change is made

• Internal actions: don’t have an effect on other transactions

• Commit, Abort– Needed for recovery, not for concurrency– More on next slide

12

COMMIT/ABORT• commit_transaction: This signals a successful

end of the transaction– Any changes made by the transaction can be safely

committed to the database and will not be undone.– These changes are final

• rollback (or abort): This signals that the transaction has ended unsuccessfully– Any changes to the database made by the

transaction must be undone.

13

Transactions• Transaction could be

– stand-alone : SQL submitted interactively, – embedded within a program.

• application program may contain several transactions. Bank Eg:– at end of month

• Get a list of checking accounts have < 1000$• For these, transfer money from savings

• SQL support – will study later.

14

Some Op. Systems Concepts• Where is the database going to be stored ?

• On disk – much more storage than RAM.

• Where does data have to be in order to manipulate it ?

• RAM: can’t access anything on disk

• Have to move data between RAM and disk

• How much data moved at a time ?

• Block : Basic unit of data transfer from the disk to the computer main memory.

15

Read item• read_item(X): Reads a database item named X

into a program variable– We assume that program variable is also named X.

• read_item(X) includes the following steps:1. Find the address of the disk block that contains item X.

2. Copy that disk block into a buffer in main memory if that block is not already in some main memory buffer

3. Copy item X from the buffer to the program variable named X.

16

Write item• write_item(X): Writes the value of program

variable X into the database item named X.• write_item(X) includes the following steps:1. Find address of the disk block that contains item X.2. Copy that disk block into a buffer in main memory

if that disk block is not already in some RAM buffer

3. Copy item X from the program variable named X into its correct location in the buffer.

4. Store the updated block from the buffer back to disk either immediately or at some later point in time

Why not write directly into disk? Disk I/O is expensive.

17

COMMIT• As its final action, a transaction T has to

either COMMIT or ABORT

• COMMIT :

1. T tells DBMS that it is committing i.e. it has completed successfully.

2. DBMS comes back and issues a commit : • It is telling T that all the changes are going to be

made permanent, even if the system crashes• This is now the DBMS’s responsibility

18

ABORT• T tells DBMS that it is terminating itself due to

some error (eg: no data found).

• Alternatively, DBMS might decide to abort T– Could be due to some internal reason– Could be due to a system crash

• DBMS has to ensure that database remains in a consistent state– Either aborted transaction started again

• If aborted by DBMS– Or it is killed and its effects are undone. Aka

rollback

19

Schedule Example• T1: r(x), w(x), commit.• T2 : r(y), r(x) commit.

T1 T2

r1(x)

r2(y)

w1(x)

c1

r2(y)

c2

• S = r1(x), r2(y), w1(x), r2(x), c1, c2.

20

Schedules• Schedule : a schedule S is sequences of

instructions that specify the chronological order in which instructions of concurrent transactions are executed

• S must preserve the order in which the instructions appear in each individual transaction.

• Operations from other transactions Tj can be interleaved with the operations of Ti in S. – Potential schedule (i.e. what may happen)

• May want to limit potential schedules– Actual schedule

21

Serial Schedules• Serial schedule: A schedule S is serial if, for

every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. – Eg: All ops of T1 followed by all ops of T2

• Otherwise, the schedule is called an interleaved schedule– Eg: Some T1 ops, followed by some T2 ops, followed

by some T1 ops

22

[SKS] Eg of serial, interleaved schedules

23

Schedule Example S1

• T1: r(x), w(x), commit.• T2 : r(y), commit.

T1 T2

r1(x)

w1(x)

c1

r2(y)

c2

• S1 = r1(x), w1(x), c1, r2(y), c2. Is S1 serial?• S1 is serial : T1 followed by T2

24

Schedule Example S2

• T1: r(x), w(x), commit.• T2 : r(y), commit.

T1 T2

r1(x)

r2(y)

w1(x)

c1

c2

• S2 = r1(x), r2(y), w1(x), c1, c2. Is S2 serial?• S2 is not serial – is interleaved.

25

ACID Properties: Atomicity• ACID Properties: These are desirable

properties of transactions:

• Jim Gray of Microsoft won the 1998 Turing award for his work on transactions in databases– ACID properties, locking, fault tolerance

• Atomicity: Either all operations of the transaction should occur or none should occur.– The system has to enforce this by making sure that

only a part of the transaction is not done.

26

Atomicity Example of Fund Transfer

The following is one transaction1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)

• Atomicity requirement – if the transaction fails after step 3 and before step 6,

money will be “lost” leading to an inconsistent database state

– the system should ensure that updates of a partially executed transaction are not reflected in the database

27

ACID Properties: Consistency• What do we mean by consistency ?• Consistency requirements could include

• Explicitly specified constraints such as primary keys and foreign keys

• Implicit integrity constraints– e.g. sum of A and B should remain same

• When do we worry about consistency? • Programmer has to ensure that a transaction which

completes by itself (no interleaving, no crashes) leaves the database in a consistent state – Eg: if T transferring money from checking to saving adds one

more dollar to saving– DBMS can’t catch this kind of inconsistency– Erroneous transaction logic can lead to inconsistency

28

ACID Properties: Consistency• Here we are only interested in consistency when

there could be a problem – via interleaving of different transactions or– T not completing due to system crash

• If things were consistent without concurrency or recovery, we want to ensure– same consistent state with concurrency + recovery.

• Before T starts, assume database is consistent• After T completes, database must be consistent• During execution of T, OK if the database

becomes temporarily inconsistent.

29

Consistency Example1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)

• Consistency requirement in above example?

• The sum of A and B is unchanged by the execution of the transaction. – We are adding 50 to A and subtracting 50 from B– Should be true even if system crashes in the middle

and some part of the transaction has to be redone. What could be the problem ?

30

ACID Properties: Isolation• Isolation: Want to allow multiple transactions

to execute concurrently for better performance– Don’t want transactions to interfere with each other

if they are working on the same data

• User should be able to understand the effect of each transaction by itself– Even if the DBMS does interleaving, the user should

be sheltered from any negative effects of the interleaving.

– To the user it should look like serial execution

31

Isolation Example• T2 is printing A+B, T1 is subtracting 50

from A and adding 50 to B. T1

read(A)

A := A – 50

write(A)

read(B)

B := B + 50

write(B)

T2

read(A)

read(B)

print(A+B)

32

Isolation Example• Will T1 T2 be same as T2 T1 ?

– Eg: A = 80, B = 60

• Will all interleaved schedules give the same result as the serial schedules i.e. same result as T1 T2 or T2 T1 ?

• No – what would be an interleaved schedule which

would give a different (undesirable) result ?

33

Isolation Example• Consider interleaved schedule: T1 T2

1. read(A)2. A := A – 503. write(A)

read(A), read(B), print(A+B)4. read(B)5. B := B + 506. write(B)

– What will be the result with A = 80, B = 60

34

ACID Properties: Isolation• Isolation can be ensured trivially by running

transactions serially– that is, one after the other. Problem ?

• Inefficiency. Should allow concurrent execution but it should be serializable:– To T1 it should like it finished completely before T2 or

it didn’t start till all of T2 had finished – Even if execution is interleaved – Because T1 and T2 don’t interfere with each other– will study later.

• Isolation (non-interference) can be achieved by locks, will study later in detail.

35

ACID Properties: Durability, Eg• Durability: When T finishes, system issues a

commit (tells user that T completed) – then DBMS has to guarantee that changes made by T

will take place i.e. be recorded in the database– even if system crashes.

• Eg: customer comes to withdraw from ATM

1.Customer at ATM, requests 300$ withdrawal

2.ATM sends request to central computer

3.Central computer issues a commit to ATM

4.ATM gives money to customer

36

ACID Properties: Durability, Eg• If system (central computer) crashes before the

debit is made, what will happen ?

• Why shouldn’t the debit be made before the 300$ is given?

• Buffering : for efficiency reason, local copies of file blocks are kept in volatile RAM, and disk I/Os are done periodically– Disk I/Os are expensive !

37

Transaction Failure• We distinguish between catasrophic

failure and non-catastrophic failure. What kind of failure is catastrophic ?

• When the disk gets wiped out ! Why is this important in a database context ?

• The database is stored on disk– For non-catastrophic failure can recover by

using the disk

– Can’t do for catastrophic failure

38

Causes of Non-Catastrophic Failure• A computer failure (system crash): A hardware

or software error occurs during transaction execution– If the hardware crashes, the contents of the

computer’s internal memory may be lost.

• A transaction or system error : Some operation in the transaction may cause it to fail. Eg ?

• Eg: integer overflow or division by zero.

• User may interrupt the transaction during its execution.

39

Causes of Non-Catastrophic Failure• Local errors or exception conditions detected

by the transaction: – Eg: transaction involves fund withdrawal :

• account has insufficient account balance

• Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later– Eg: because it violates serializability

• Will study later: basically means transactions interfering with each other

– Eg: several transactions are in a state of deadlock

40

Causes of Catastrophic Failure• Disk failure: Some disk blocks may lose their

data because of a read or write malfunction or because of a disk read/write head crash.– This may happen during a read or a write operation

of the transaction.

• Physical problems and catastrophes: Eg?

• Power failure, fire, overwriting disks or tapes by mistake.

41

UNDO/REDORecovery techniques use the following operators:• undo: Similar to rollback except that it applies

to a single operation rather than to a whole transaction.

• redo: This specifies that certain transaction operations must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database– If a system crash takes place, we may need to redo

some of the operations of a committed transaction.

42

Transactions and Recovery• Transactions that have written their commit

entry in the log must also have recorded all their write operations in the log. Why?

• Suppose a transaction has been issued a commit, but its operations have not been carried out, and the system crashes. What to do?

• We need to redo the ops. What information do we need to redo the ops?

• We need to know what all the write operations are; this information has to be in the system log.

43

Transactions and Recovery• Can this system log be in RAM or does it have

to be on disk?• The log file must be kept on disk because the

contents of RAM may be lost in a system crash– Master copy on disk, working copy in RAM

• Force writing a log: before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk.– This process is called force-writing the log file and

has to be done before a commit is issued to a transaction.

– If force writing log, why not force write database?

44

System Log• The System Log : keeps track of all

transaction operations that affect the values of database items.

• This information needed to permit recovery from transaction failures.

• The log is kept on disk so it is not affected by any type of failure except for disk or catastrophic failure. – In addition, the log is periodically backed up to

archival storage (tape) to guard against such catastrophic failures.

45

FIGURE 20.4State transition diagram illustrating the states

for transaction execution.

46

Transactions: System Concepts• Recovery manager keeps track of following

operations:• begin_transaction: marks beginning of transaction

execution.• read or write: These specify read or write operations

on the database items that are executed as part of a transaction.

• end_transaction: marks the end limit of transaction execution. Check– whether changes introduced by the transaction can be

permanently applied to the database or – whether the transaction has to be aborted because it violates

concurrency control or for some other reason.

47

System Log RecordsTypes of system log record:1. [start_transaction,T]: Records that transaction T has

started execution.2. [write_item,T,X,old_value,new_value]: Records that

T has changed the value of database item X from old_value to new_value. Why keep old_value ? Why keep new_value ?

3. [read_item,T,X]: Records that T has read the value of database item X. Why keep this ?

4. [commit,T]: Records that T has completed successfully : affirms that its effect should be committed (recorded permanently) to the database.

5. [abort,T]: Records that T has been aborted.

48

Recovery using log records:• If the system crashes, we can recover to a consistent

database state by examining the log – by using one of the techniques we will study later

• Log contains a record of every write operation that changes the value of some database item– it is possible to undo the effect of these write operations of a

transaction T . How ?• Trace backward through the log and reset all items

changed by a write operation of T to their old_values.• Can also redo the effect of the write operations of a

transaction T. How ? • Trace forward through the log and set all items changed

by a write operation of T (that did not get done permanently) to their new_values.

49

Commit Point• Commit Point of a Transaction: Transaction T

reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database has been recorded in the log.– Beyond the commit point, the transaction is said to

be committed, and its effect is assumed to be permanently recorded in the database.

– The transaction then writes entry [commit,T] into the log.

50

Recoverability Example• T1 T2

r1(x): x = 6x x + 3

w1(x): x=9

r2(x)x x + 2

w2(x): x= 11

c2

a1

• Problem ? • T2 committed, can’t undo, but T2 read a value

written by T1 and T2 has to be rolled back

51

Recoverability: Recoverable Schedules• Recoverable schedule: One where no

committed transaction needs to be rolled back.– Intuition: once a transaction is committed, it is

final, and cannot be rolled back

• How can previous Eg be avoided ?– How can we ensure that a schedule is recoverable?

• A schedule S is recoverable if the following holds: if T2 reads from T1, then T2 cannot commit till T1 has committed.

– Let’s look at why this is important

52

Recoverability: Recoverable Schedules– Suppose T2 reads from T1

– Then T2 commits

– Then T1 aborts

– What would we want to see happen to T2 ?

• T2 should be aborted. Why ?

• Because T2 read a value from T1 and T1 aborted

– So T2 read a “non-existent” value

• Is it possible to abort T2 now ?

• No, T2 has already been issued a commit

– DBMS has guaranteed that changes made by T2 are going to be made permanent

– Can’t abort T2 now

53

Cascading Rollbacks [SKS Eg]

• If T10 fails ?

• Then T11 and then T12 must also be rolled back• Cascading rollback : a single transaction

failure leads to a series of transaction rollbacks

• In above eg the schedule is recoverable. Why ?

• None of the transactions have committed

54

Recoverability: Cascadeless Schedules• Cascadeless schedules : no cascading rollbacks.

Advantage ?• Cascading rollbacks are inefficient. Why ?• Work already done is lost.• What do we need to guarantee cascadeless ?• No reading uncommitted writes

– If T1 writes x, T2 can’t read this x till T1 has committed

– Intuition ?

• If T2 allowed to read x before T1 committed, what happens if T1 aborts after T2 has read x?

• T2 has to be aborted – cascading rollback

55

Recoverability: Cascadeless Schedules• If no cascading rollbacks then another advantage

is that don’t need to keep read operations in system log. Why?– Why do we keep read operations ?

• We only keep T1 read operations in case T1 has read from T2 and T2 aborts– we will need to rollback T1 also.

• Can’t happen if no cascading rollbacks.• Every cascadeless schedule is also recoverable

– Not other way around. Eg ?

• Eg: w2(x), r1(x), a2

56

Recoverability: Strict Schedules• Strict Schedules: if T1 writes x, then T2 can’t

read x or write x till T1 committs.• Strict is stronger property than cascadeless

– Difference between cascadeless and strict ?

• In a strict schedule: if T1 has written x, then T2 can’t read or write over x till T1 has committed

• In cascadeless: T2 can’t read, but can write• Eg of a schedule : cascadeless but not strict ?• Eg: w2(x), w1(x)• Accounts for blind writes also.

57

Recoverability: Strict Schedules• With strict schedules, easier to recover. Why?

• Just restore before image : what the value was before it was written over

• Eg: x = 3, w1(x) : 4 (remembers old value = 3), a1 ; restore x = 3.

• Won’t work if not strict:

• Eg: x = 3, w1(x) : 4 (remembers old value = 3), w2(x) : 5 (remembers old value = 4), a1 ; restore x = 3: WRONG ! How to do ?

• Need to remember T2 wrote 5 in the log.

58

Recoverability: Strict Schedules• In strict schedule, to roll back, just have to

restore old value.

• For non strict schedules, have to include new_value since could have blind writes– Do undo’s followed by redo’s

• Strict protocols require simpler write entries – Only need to keep old_value

59

Recoverability• Four levels of classification vis-à-vis recovery

– Not-recoverable– Recoverable but not cascadeless– Cascadeless but not strict– Strict

• Recoverable/Non recoverable : w1(x),r2(x) : depends on which commits first T1or T2

• Cascadeless/strict depends on read/writes of uncommitted writes

60

Recoverability• Recoverable: T1 does not commit before T2 if

T1 reads from a write done by T2 and T2 not yet committed.

• Cascadeless: T1 does not read from write done by T2 till T2 committed.

• Strict: T1 does not read from or write over write done by T2 till T2 committed.

61

Recoverability• Eg: w2(x), r1(x), c1, a2 ?

• Not recoverable

• Eg: w2(x), r1(x), a2 ?

• Recoverable but not cascadeless

• Eg: w2(x), w1(x), a2 ?

• Cascadeless but not strict

• Eg: w2(x), c2, w1(x), ?

• Strict

1 transactions transaction: –a unit of work –a sequence of operations which are bundled together...

Documents

b transaction t2

transaction t1

transaction failedthis

writesavingif transaction

t2readbalance b

interleaved processing

joint bank account

bank accountjill