1 transactions transaction: –a unit of work –a sequence of operations which are bundled together...
TRANSCRIPT
1
Transactions• Transaction:
– A unit of work– A sequence of operations which are bundled
together
• Either all of them should be executed or none i.e. not partial execution
• Important in terms of concurrency and crash recovery
2
FIGURE 21.2Two sample transactions. (a) Transaction T1.
(b) Transaction T2.
3
Bad Failure Eg: Fund Transfer• Transfer $50 from checking to saving account
– Initially, A and B each have 200$.
1. read(checking)
2. checking := checking – 50
3. write(checking)
4. read(saving)
5. saving := saving + 50
6. write(saving)
• If transaction fails after step 3, before step 6 ?
• Money will be “lost” : inconsistent state
4
Bad Failure Eg: Fund Transfer• So if system crashes after step 3, what should
happen?
• DBMS must ensure that either the transaction is re-started and steps 4 through 6 are completed or …
• The effects of steps 1 through 3 are undone– And user informed that the transaction failed
• This is an example of crash recovery
5
Concurrency• Parallel processing: processes are concurrently
executed in multiple CPUs. Advantage?
• Increased throughput: can do different things simultaneously, so can do more things.
• Interleaved processing: concurrent execution of processes interleaved in single CPU. Why?
• When one process is doing I/O, the other can be accessing the database: better throughput.
• Prevents one process from holding up other processes for a long period of time
6
Concurrency• Concurrency: We will refer to both parallel
processing on multiple CPUS and interleaved processing on a single CPU as concurrency.– Most issues apply in both cases
• When would we want to allow two transactions to execute concurrently ?
• If two transactions are not “interfering” with each other then we want to allow concurrent execution.– When do we think of interfering with each other ?
• Dealing with the same data
7
Concurrency• Eg:
– Jack has a bank account– Jill has a separate bank account– Jack depositing 50$ into his account– Jill depositing 50$ into her account– Should we allow concurrent execution here ?
• Yes, can do concurrently– Not interfering with each other
• Suppose a joint bank account ?
8
Bad Concurrency Example• Eg: Jack and Jill each depositing 50$ into a
joint bank account, initial balance $200
Jack: T1 Jill: T2read(balance B):200
B := B + 50: 250
read(B): 200
B := B + 50: 250
write(B):250
write(B):250
• Final balance ends up with wrong value
9
Concurrency and Recovery• Two main issues to deal with:
– Concurrent execution– Failures of various kinds, such as hardware
failures and system crashes
• They do effect each other, but we will initially look at them mostly separately.
• Will look initially at the basics– What we want, what we try to avoid, what kind of
information we need to store etc.
• Later we will look at algorithms
10
FIGURE 21.2Two sample transactions. (a) Transaction T1.
(b) Transaction T2.
11
Transactions• Transaction: includes one or more database
access operations
• Read : no change made
• Write :insert, update, delete: a change is made
• Internal actions: don’t have an effect on other transactions
• Commit, Abort– Needed for recovery, not for concurrency– More on next slide
12
COMMIT/ABORT• commit_transaction: This signals a successful
end of the transaction– Any changes made by the transaction can be safely
committed to the database and will not be undone.– These changes are final
• rollback (or abort): This signals that the transaction has ended unsuccessfully– Any changes to the database made by the
transaction must be undone.
13
Transactions• Transaction could be
– stand-alone : SQL submitted interactively, – embedded within a program.
• application program may contain several transactions. Bank Eg:– at end of month
• Get a list of checking accounts have < 1000$• For these, transfer money from savings
• SQL support – will study later.
14
Some Op. Systems Concepts• Where is the database going to be stored ?
• On disk – much more storage than RAM.
• Where does data have to be in order to manipulate it ?
• RAM: can’t access anything on disk
• Have to move data between RAM and disk
• How much data moved at a time ?
• Block : Basic unit of data transfer from the disk to the computer main memory.
15
Read item• read_item(X): Reads a database item named X
into a program variable– We assume that program variable is also named X.
• read_item(X) includes the following steps:1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory if that block is not already in some main memory buffer
3. Copy item X from the buffer to the program variable named X.
16
Write item• write_item(X): Writes the value of program
variable X into the database item named X.• write_item(X) includes the following steps:1. Find address of the disk block that contains item X.2. Copy that disk block into a buffer in main memory
if that disk block is not already in some RAM buffer
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated block from the buffer back to disk either immediately or at some later point in time
Why not write directly into disk? Disk I/O is expensive.
17
COMMIT• As its final action, a transaction T has to
either COMMIT or ABORT
• COMMIT :
1. T tells DBMS that it is committing i.e. it has completed successfully.
2. DBMS comes back and issues a commit : • It is telling T that all the changes are going to be
made permanent, even if the system crashes• This is now the DBMS’s responsibility
18
ABORT• T tells DBMS that it is terminating itself due to
some error (eg: no data found).
• Alternatively, DBMS might decide to abort T– Could be due to some internal reason– Could be due to a system crash
• DBMS has to ensure that database remains in a consistent state– Either aborted transaction started again
• If aborted by DBMS– Or it is killed and its effects are undone. Aka
rollback
19
Schedule Example• T1: r(x), w(x), commit.• T2 : r(y), r(x) commit.
T1 T2
r1(x)
r2(y)
w1(x)
c1
r2(y)
c2
• S = r1(x), r2(y), w1(x), r2(x), c1, c2.
20
Schedules• Schedule : a schedule S is sequences of
instructions that specify the chronological order in which instructions of concurrent transactions are executed
• S must preserve the order in which the instructions appear in each individual transaction.
• Operations from other transactions Tj can be interleaved with the operations of Ti in S. – Potential schedule (i.e. what may happen)
• May want to limit potential schedules– Actual schedule
21
Serial Schedules• Serial schedule: A schedule S is serial if, for
every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. – Eg: All ops of T1 followed by all ops of T2
• Otherwise, the schedule is called an interleaved schedule– Eg: Some T1 ops, followed by some T2 ops, followed
by some T1 ops
22
[SKS] Eg of serial, interleaved schedules
23
Schedule Example S1
• T1: r(x), w(x), commit.• T2 : r(y), commit.
T1 T2
r1(x)
w1(x)
c1
r2(y)
c2
• S1 = r1(x), w1(x), c1, r2(y), c2. Is S1 serial?• S1 is serial : T1 followed by T2
24
Schedule Example S2
• T1: r(x), w(x), commit.• T2 : r(y), commit.
T1 T2
r1(x)
r2(y)
w1(x)
c1
c2
• S2 = r1(x), r2(y), w1(x), c1, c2. Is S2 serial?• S2 is not serial – is interleaved.
25
ACID Properties: Atomicity• ACID Properties: These are desirable
properties of transactions:
• Jim Gray of Microsoft won the 1998 Turing award for his work on transactions in databases– ACID properties, locking, fault tolerance
• Atomicity: Either all operations of the transaction should occur or none should occur.– The system has to enforce this by making sure that
only a part of the transaction is not done.
26
Atomicity Example of Fund Transfer
The following is one transaction1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)
• Atomicity requirement – if the transaction fails after step 3 and before step 6,
money will be “lost” leading to an inconsistent database state
– the system should ensure that updates of a partially executed transaction are not reflected in the database
27
ACID Properties: Consistency• What do we mean by consistency ?• Consistency requirements could include
• Explicitly specified constraints such as primary keys and foreign keys
• Implicit integrity constraints– e.g. sum of A and B should remain same
• When do we worry about consistency? • Programmer has to ensure that a transaction which
completes by itself (no interleaving, no crashes) leaves the database in a consistent state – Eg: if T transferring money from checking to saving adds one
more dollar to saving– DBMS can’t catch this kind of inconsistency– Erroneous transaction logic can lead to inconsistency
28
ACID Properties: Consistency• Here we are only interested in consistency when
there could be a problem – via interleaving of different transactions or– T not completing due to system crash
• If things were consistent without concurrency or recovery, we want to ensure– same consistent state with concurrency + recovery.
• Before T starts, assume database is consistent• After T completes, database must be consistent• During execution of T, OK if the database
becomes temporarily inconsistent.
29
Consistency Example1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)
• Consistency requirement in above example?
• The sum of A and B is unchanged by the execution of the transaction. – We are adding 50 to A and subtracting 50 from B– Should be true even if system crashes in the middle
and some part of the transaction has to be redone. What could be the problem ?
30
ACID Properties: Isolation• Isolation: Want to allow multiple transactions
to execute concurrently for better performance– Don’t want transactions to interfere with each other
if they are working on the same data
• User should be able to understand the effect of each transaction by itself– Even if the DBMS does interleaving, the user should
be sheltered from any negative effects of the interleaving.
– To the user it should look like serial execution
31
Isolation Example• T2 is printing A+B, T1 is subtracting 50
from A and adding 50 to B. T1
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
T2
read(A)
read(B)
print(A+B)
32
Isolation Example• Will T1 T2 be same as T2 T1 ?
– Eg: A = 80, B = 60
• Will all interleaved schedules give the same result as the serial schedules i.e. same result as T1 T2 or T2 T1 ?
• No – what would be an interleaved schedule which
would give a different (undesirable) result ?
33
Isolation Example• Consider interleaved schedule: T1 T2
1. read(A)2. A := A – 503. write(A)
read(A), read(B), print(A+B)4. read(B)5. B := B + 506. write(B)
– What will be the result with A = 80, B = 60
34
ACID Properties: Isolation• Isolation can be ensured trivially by running
transactions serially– that is, one after the other. Problem ?
• Inefficiency. Should allow concurrent execution but it should be serializable:– To T1 it should like it finished completely before T2 or
it didn’t start till all of T2 had finished – Even if execution is interleaved – Because T1 and T2 don’t interfere with each other– will study later.
• Isolation (non-interference) can be achieved by locks, will study later in detail.
35
ACID Properties: Durability, Eg• Durability: When T finishes, system issues a
commit (tells user that T completed) – then DBMS has to guarantee that changes made by T
will take place i.e. be recorded in the database– even if system crashes.
• Eg: customer comes to withdraw from ATM
1.Customer at ATM, requests 300$ withdrawal
2.ATM sends request to central computer
3.Central computer issues a commit to ATM
4.ATM gives money to customer
36
ACID Properties: Durability, Eg• If system (central computer) crashes before the
debit is made, what will happen ?
• Why shouldn’t the debit be made before the 300$ is given?
• Buffering : for efficiency reason, local copies of file blocks are kept in volatile RAM, and disk I/Os are done periodically– Disk I/Os are expensive !
37
Transaction Failure• We distinguish between catasrophic
failure and non-catastrophic failure. What kind of failure is catastrophic ?
• When the disk gets wiped out ! Why is this important in a database context ?
• The database is stored on disk– For non-catastrophic failure can recover by
using the disk
– Can’t do for catastrophic failure
38
Causes of Non-Catastrophic Failure• A computer failure (system crash): A hardware
or software error occurs during transaction execution– If the hardware crashes, the contents of the
computer’s internal memory may be lost.
• A transaction or system error : Some operation in the transaction may cause it to fail. Eg ?
• Eg: integer overflow or division by zero.
• User may interrupt the transaction during its execution.
39
Causes of Non-Catastrophic Failure• Local errors or exception conditions detected
by the transaction: – Eg: transaction involves fund withdrawal :
• account has insufficient account balance
• Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later– Eg: because it violates serializability
• Will study later: basically means transactions interfering with each other
– Eg: several transactions are in a state of deadlock
40
Causes of Catastrophic Failure• Disk failure: Some disk blocks may lose their
data because of a read or write malfunction or because of a disk read/write head crash.– This may happen during a read or a write operation
of the transaction.
• Physical problems and catastrophes: Eg?
• Power failure, fire, overwriting disks or tapes by mistake.
41
UNDO/REDORecovery techniques use the following operators:• undo: Similar to rollback except that it applies
to a single operation rather than to a whole transaction.
• redo: This specifies that certain transaction operations must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database– If a system crash takes place, we may need to redo
some of the operations of a committed transaction.
42
Transactions and Recovery• Transactions that have written their commit
entry in the log must also have recorded all their write operations in the log. Why?
• Suppose a transaction has been issued a commit, but its operations have not been carried out, and the system crashes. What to do?
• We need to redo the ops. What information do we need to redo the ops?
• We need to know what all the write operations are; this information has to be in the system log.
43
Transactions and Recovery• Can this system log be in RAM or does it have
to be on disk?• The log file must be kept on disk because the
contents of RAM may be lost in a system crash– Master copy on disk, working copy in RAM
• Force writing a log: before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk.– This process is called force-writing the log file and
has to be done before a commit is issued to a transaction.
– If force writing log, why not force write database?
44
System Log• The System Log : keeps track of all
transaction operations that affect the values of database items.
• This information needed to permit recovery from transaction failures.
• The log is kept on disk so it is not affected by any type of failure except for disk or catastrophic failure. – In addition, the log is periodically backed up to
archival storage (tape) to guard against such catastrophic failures.
45
FIGURE 20.4State transition diagram illustrating the states
for transaction execution.
46
Transactions: System Concepts• Recovery manager keeps track of following
operations:• begin_transaction: marks beginning of transaction
execution.• read or write: These specify read or write operations
on the database items that are executed as part of a transaction.
• end_transaction: marks the end limit of transaction execution. Check– whether changes introduced by the transaction can be
permanently applied to the database or – whether the transaction has to be aborted because it violates
concurrency control or for some other reason.
47
System Log RecordsTypes of system log record:1. [start_transaction,T]: Records that transaction T has
started execution.2. [write_item,T,X,old_value,new_value]: Records that
T has changed the value of database item X from old_value to new_value. Why keep old_value ? Why keep new_value ?
3. [read_item,T,X]: Records that T has read the value of database item X. Why keep this ?
4. [commit,T]: Records that T has completed successfully : affirms that its effect should be committed (recorded permanently) to the database.
5. [abort,T]: Records that T has been aborted.
48
Recovery using log records:• If the system crashes, we can recover to a consistent
database state by examining the log – by using one of the techniques we will study later
• Log contains a record of every write operation that changes the value of some database item– it is possible to undo the effect of these write operations of a
transaction T . How ?• Trace backward through the log and reset all items
changed by a write operation of T to their old_values.• Can also redo the effect of the write operations of a
transaction T. How ? • Trace forward through the log and set all items changed
by a write operation of T (that did not get done permanently) to their new_values.
49
Commit Point• Commit Point of a Transaction: Transaction T
reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database has been recorded in the log.– Beyond the commit point, the transaction is said to
be committed, and its effect is assumed to be permanently recorded in the database.
– The transaction then writes entry [commit,T] into the log.
50
Recoverability Example• T1 T2
r1(x): x = 6x x + 3
w1(x): x=9
r2(x)x x + 2
w2(x): x= 11
c2
a1
• Problem ? • T2 committed, can’t undo, but T2 read a value
written by T1 and T2 has to be rolled back
51
Recoverability: Recoverable Schedules• Recoverable schedule: One where no
committed transaction needs to be rolled back.– Intuition: once a transaction is committed, it is
final, and cannot be rolled back
• How can previous Eg be avoided ?– How can we ensure that a schedule is recoverable?
• A schedule S is recoverable if the following holds: if T2 reads from T1, then T2 cannot commit till T1 has committed.
– Let’s look at why this is important
52
Recoverability: Recoverable Schedules– Suppose T2 reads from T1
– Then T2 commits
– Then T1 aborts
– What would we want to see happen to T2 ?
• T2 should be aborted. Why ?
• Because T2 read a value from T1 and T1 aborted
– So T2 read a “non-existent” value
• Is it possible to abort T2 now ?
• No, T2 has already been issued a commit
– DBMS has guaranteed that changes made by T2 are going to be made permanent
– Can’t abort T2 now
53
Cascading Rollbacks [SKS Eg]
• If T10 fails ?
• Then T11 and then T12 must also be rolled back• Cascading rollback : a single transaction
failure leads to a series of transaction rollbacks
• In above eg the schedule is recoverable. Why ?
• None of the transactions have committed
54
Recoverability: Cascadeless Schedules• Cascadeless schedules : no cascading rollbacks.
Advantage ?• Cascading rollbacks are inefficient. Why ?• Work already done is lost.• What do we need to guarantee cascadeless ?• No reading uncommitted writes
– If T1 writes x, T2 can’t read this x till T1 has committed
– Intuition ?
• If T2 allowed to read x before T1 committed, what happens if T1 aborts after T2 has read x?
• T2 has to be aborted – cascading rollback
55
Recoverability: Cascadeless Schedules• If no cascading rollbacks then another advantage
is that don’t need to keep read operations in system log. Why?– Why do we keep read operations ?
• We only keep T1 read operations in case T1 has read from T2 and T2 aborts– we will need to rollback T1 also.
• Can’t happen if no cascading rollbacks.• Every cascadeless schedule is also recoverable
– Not other way around. Eg ?
• Eg: w2(x), r1(x), a2
56
Recoverability: Strict Schedules• Strict Schedules: if T1 writes x, then T2 can’t
read x or write x till T1 committs.• Strict is stronger property than cascadeless
– Difference between cascadeless and strict ?
• In a strict schedule: if T1 has written x, then T2 can’t read or write over x till T1 has committed
• In cascadeless: T2 can’t read, but can write• Eg of a schedule : cascadeless but not strict ?• Eg: w2(x), w1(x)• Accounts for blind writes also.
57
Recoverability: Strict Schedules• With strict schedules, easier to recover. Why?
• Just restore before image : what the value was before it was written over
• Eg: x = 3, w1(x) : 4 (remembers old value = 3), a1 ; restore x = 3.
• Won’t work if not strict:
• Eg: x = 3, w1(x) : 4 (remembers old value = 3), w2(x) : 5 (remembers old value = 4), a1 ; restore x = 3: WRONG ! How to do ?
• Need to remember T2 wrote 5 in the log.
58
Recoverability: Strict Schedules• In strict schedule, to roll back, just have to
restore old value.
• For non strict schedules, have to include new_value since could have blind writes– Do undo’s followed by redo’s
• Strict protocols require simpler write entries – Only need to keep old_value
59
Recoverability• Four levels of classification vis-à-vis recovery
– Not-recoverable– Recoverable but not cascadeless– Cascadeless but not strict– Strict
• Recoverable/Non recoverable : w1(x),r2(x) : depends on which commits first T1or T2
• Cascadeless/strict depends on read/writes of uncommitted writes
60
Recoverability• Recoverable: T1 does not commit before T2 if
T1 reads from a write done by T2 and T2 not yet committed.
• Cascadeless: T1 does not read from write done by T2 till T2 committed.
• Strict: T1 does not read from or write over write done by T2 till T2 committed.
61
Recoverability• Eg: w2(x), r1(x), c1, a2 ?
• Not recoverable
• Eg: w2(x), r1(x), a2 ?
• Recoverable but not cascadeless
• Eg: w2(x), w1(x), a2 ?
• Cascadeless but not strict
• Eg: w2(x), c2, w1(x), ?
• Strict