1 advanced database topics copyright © ellis cohen 2002-2005 concurrency control features &...
TRANSCRIPT
1
Advanced Database Topics
Copyright © Ellis Cohen 2002-2005
Concurrency Control
Features & Mechanisms
These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them, please see http://www.openlineconsult.com/db
Copyright © Ellis Cohen, 2002-2005 2
TopicsConcurrency Control Features &
Mechanisms
Timestamp-Based Concurrency Control
Client Caches
Optimistic Concurrency Control with a Timestamped Cache
Transaction-Based Validation
Optimistic Concurrency Control with a Simple/Snapshot Cache
Snapshot-BasedConcurrency Control
Consistency Failure & The Constraint Violation Problem
Copyright © Ellis Cohen, 2002-2005 3
Concurrency Control Features & Mechanisms
Copyright © Ellis Cohen, 2002-2005 4
Characterizing Concurrency Features
Optimistic vs Pessimistic
Locking vs Non-Locking
Versioned vs Non-Versioned
Cache vs Non-Cache
Copyright © Ellis Cohen, 2002-2005 5
Optimistic & Pessimistic Concurrency
Optimistic Concurrency MechanismsAllows transaction to proceed to
completion.When it attempts to commit, abort it if it
does not serialize with transactions which already completed.
Pessimistic Concurrency MechanismsEach operation in a transaction is only
allowed to proceed if it serializes with overlapping transactions.
Otherwise it must wait (or possibly cause its own or another transaction to abort)
Lock-based concurrency is pessimistic
Copyright © Ellis Cohen, 2002-2005 6
Lock-Based vs Non-Locking
Lock-Based Approaches2PL with Deadlock Detection (in particular)
avoids unnecessary aborts and need for abort is determined as early as possible
Locking has significant constant overhead even when there are few conflicts
Good approach when there is significant conflict (but beware of the convoy phenomenon)
Non-Locking ApproachesAvoid overhead of lockingSusceptible to delayed or increased rate of
abortsGood approach when not too much conflictUseful approach for distributed systems
Copyright © Ellis Cohen, 2002-2005 7
Non-Locking Approaches
Validation-BasedEnsures serializability by pretending entire
transaction occurs at the time it commitsOptimistic: Allow entire transaction to execute,
but at commit time, validate it:check that the data it used is not out-of-date
Timestamp-BasedEnsures serializability by pretending entire
transaction occurs at the time it starts (or an arbitrary time!)
Assign transaction timestamp at that timePessimistic: Check every data access for
timestamp conflict(Partially or Completely Optimistic if some or all
timestamp conflict checks are deferred to commit time)
Copyright © Ellis Cohen, 2002-2005 8
Issues for Non-Locking Protocols
Recoverability– Occurs when a transaction T2 reads data
written by T1, and T2 commits before T1 completes.
– Major dangers:• Durability Failure• Non-recoverable actions taken at commit time -
e.g. reports to user, launch nuclear missile
Avoid Cascading Aborts– Occurs when a transaction T2 reads data
written by T1, and then T1 aborts
Phantoms– Without index or predicate locks, how are
conflicts involving phantoms detected?
Copyright © Ellis Cohen, 2002-2005 9
Database Versioning
Single VersionA single version of the database
state is used by all transactions
Multi-VersionThe database state retains multiple
versions of each piece of data.Each version is timestamped with
the time a new version of the data item was written to the database state.
How might multi-versioning be implemented using a log?
Copyright © Ellis Cohen, 2002-2005 10
Implementing Multi-VersioningEvery modification log entry in a physiological log
contains a rowid and a before and/or after image.Adding a timestamp to every such modification entry
gives us multi-versioning.To find the value of a tuple at time T:• Using an UNDO log
– Find that place in the log– Look forwards until you find an entry for the tuple, and use
its before image.– If you don't find such a tuple in the log, use its current
state
• Using a REDO log– Find that place in the log– Look backwards until you find an entry for the tuple within
a committed tranaction, and use its after image.– If you don't find such a tuple in the log, use the value in
the last complete DB backup (presuming the log begins after the last backup)
Copyright © Ellis Cohen, 2002-2005 11
Concurrency Control Mechanisms
Lock-BasedApproach: Lock-Based (Pessimistic)DB Versioning: Single Version
Timestamp-Based Concurrency ControlApproach: Timestamp-Based
(generally Pessimistic)DB Versioning: Single or Multi-Version
Optimistic Concurrency ControlApproach: Validation-Based (Optimistic)DB Versioning: Single Version
Read-Consistent (not entirely serializable)Approach: Validation-Based (Optimistic) or
Lock-Based (Pessimistic, X locks ONLY!)DB Versioning: Multi-Version
Copyright © Ellis Cohen, 2002-2005 12
Timestamp-Based Concurrency Control
Copyright © Ellis Cohen, 2002-2005 13
Timestamp-Based ConcurrencyApproach: Timestamp-Based
Use timestamps to serialize transactionsPretends entire transaction occurs at the
time it startsAssign transaction timestamp at that timePessimistic: Check every data access for
timestamp conflict (assuming no cache)
Versioning:Single Version
A single version of the database state is used by all transactions
Multi-VersionThe database state retains multiple
versions of each piece of data.
Copyright © Ellis Cohen, 2002-2005 14
Timestamp-Based Concurrency(single version)
When a transaction T starts, assign a (monotonically increasing) timestamp Tt to it
Associate each data item D with– a read timestamp Dr
holds the timestamp of the latest reader– a write timestamp Dw
holds the timestamp of the latest writer
Aborts some transaction (wound/die) if– One transaction tries to perform an operation (e.g.
tries to read some data)– Conflicting operation (e.g. writing the same data)
already done by a transaction that started later
Aborted transactions are restarted with a new timestamp
Copyright © Ellis Cohen, 2002-2005 15
Reading Too Late
Transaction A
select x from XT
update XTset x = 0
x R W 30t0 t0
0t0 t2
Transaction B
t1
t2
This needs to happen at t1, so it needs to see x
as it was at t1, but that's not available!
A is too old, abort it (or B) restart it with a later
timestamp
Copyright © Ellis Cohen, 2002-2005 16
Writing Too Late
Transaction A
update XTset x = 0
select x from XT
x R W 30t0 t0
30t2 t0
Transaction B
t1
t2
This needs to happen at t1, but a later transaction
read its value before it was changed!
A is too old, abort it, restart it with a later
timestamp
Copyright © Ellis Cohen, 2002-2005 17
Thomas Write Rule
Transaction A
update XTset x = 0
update XTset x = 70
x R W 30t0 t0
70t0 t2
Transaction B
t1
t2
This needs to happen at t1, but a later transaction already changed its value.
Executing this will mistakenly overwrite it.
So just ignore this operation. Don't do it!
(but what if B rolls back)?
Copyright © Ellis Cohen, 2002-2005 18
Timestamp-Based Concurrency RulesWhen T tries to read D
– if Tt < Dw, abort T(can't read something written by later transaction)
– else Dr max(Tt ,Dr)When T tries to write D
– if Tt < Dr, abort T(can't write something read by later transaction)
– else if Tt < Dw, ignore [Thomas Write Rule](already overwritten. Note: for blind writes only; most updates are a read+write)
– else Dw Tt [shows it was written by T]
No locking: How do you– Ensure Recoverability?– Avoid Cascading Aborts?– Prevent Phantom Read Problems?
How does it compare to 2PL?Note that even reads require writes (of timestamps)
Copyright © Ellis Cohen, 2002-2005 19
Recoverability & Cascading Abort
Transaction A
update XTset x = 0
update XTset x = x + 100
x R W 30t0 t0
0t0 t1
100t2 t2
Transaction B
t1
t2
t2 reads data written by t1: If t2 commits before t1 completes,we may lose recoverability.
t2 must wait for t1 to commitIf t1 aborts, t2 will need to abort as well
COMMIT
ROLLBACK
Can't be allowedMust wait until A completes
Copyright © Ellis Cohen, 2002-2005 20
Timestamp-BasedMultiversion Concurrency Control
Like single-version timestamp-based, butavoids aborting transactions
that read earlier versions of data essential for long transactions
Maintain versions for each data item– When a transaction first writes a data item, a
new version of the data item is created labeled with that transaction's timestamp
– Subsequent writes by same transaction rewrite that version
– A transaction simply finds the appropriate version to read. Can't read out of order.
– Read only transactions are never aborted.– Read-only transactions can timewarp -- use an
arbitrary timestamp in the past to obtain information about the system at that time
Copyright © Ellis Cohen, 2002-2005 21
Reading Correct Version
Transaction A
select x from XT
update XTset x = 0
x R W 30t0 t0
0t0 t2
Transaction B
t1
t2
This needs to happen at t1, so it finds correct
version to read!
Undo log retains old version
Copyright © Ellis Cohen, 2002-2005 22
Writing Conflicting Version
Transaction B
select x from XT
update XTset x = 0
x R W
30t0 t0
0t0 t3
30t2 t0
Transaction C
t1
t3
Transaction A
update XTset x = 70
t2
We could create a t1 version of x.However, B (at t2) already read the t0 version of x
So we must either abort A or B, restart with a later timestamp
Copyright © Ellis Cohen, 2002-2005 23
Timestamp-BasedMultiversion Concurrency Rules
Find VThe version of D
with the largest write timestamp <= Tt
When T tries to read D- read V
- set Vr max( Tt , Vr )
When T tries to writes D- if Tt < Vr, abort T
(can't write something that should have been read, but wasn't)
- else if Tt = Vw, overwrite it- else create a new version V' with V'w = Tt
What about recoverability, avoiding cascading aborts & phantoms?
Copyright © Ellis Cohen, 2002-2005 24
Client Caches
Copyright © Ellis Cohen, 2002-2005 25
Cache vs Non-Cache Mechanisms
Non-Cache-Based MechanismsAll data reads/updates are made directly
from/to the server database state
Cache-Based MechanismsA separate cache is maintained by/for
each client
All data updates are made to the cache, andd only written to the server database state on commit (often checking first whether this is allowed)
Data needed for a query may be able to be read directly from the cache.
What are the advantages of cache-based mechanisms?
Copyright © Ellis Cohen, 2002-2005 26
Advantages of CachingEfficient reads: If needed data is in the cache, the cost of a DB query may be able to be avoided
Deferred updates: Updates are made to the cache, and deferred to commit time, where they can be done all at once (or not at all if the transaction aborts)
Simplifies Recovery Management: Because changes are not made to the DB state until commit time, they don't need to be undone.
Avoid Pessimistic Overhead: While caches are still useful with pessimistic approaches (locking & timestamp-based concurrency), caches support optimistic approaches where serializability checking is deferred and done once at commit time.
Copyright © Ellis Cohen, 2002-2005 27
Client Cache Placement
Client A
Client B
DB Server
Client A
Client B
DB Server
Server-Managed Client Cache
The DB server maintains a cache on behalf
of each client.Allows its implementation
to be integrated with the server page cache & log.
Client-Side CacheEach client cache is
maintained at the client.Can significantly improve
performance when the client runs on a different
machine than the database server
Most useful for OODBs & Distributed Systems
Most useful for RDBs
Copyright © Ellis Cohen, 2002-2005 28
Client Caches• A separate cache is maintained by/for each client
• If a transaction needs to read a data item D (for an RDB, data items correspond to tuples)
– If D is not in the cache, get it from the server DB state, and put it in the cache
– If D is already in the cache, get it from the cache[though if it is old or stale, and not yet used in the current transaction, the latest value may be obtained from the server DB state]
• All data updates are made to the cache. [This requires integrating items returned from server queries with items already in the cache]
• When the transaction commits, all data updated by the transaction is written back from the client's cache to the server database state.
Note: Don't confuse the server page cache (used for performance and recovery) with
client caches (used primarily for isolating transactions)
Copyright © Ellis Cohen, 2002-2005 29
Client Cache Types for Concurrency
Simple CacheThe cache simply maintains the data read/written by the transaction
Timestamped CacheEach piece of data obtained from the database server is associated with the server's timestamp for that data (the time that data was last updated)
Snapshot CacheThe client's cache holds a "virtual" snapshot of the current database state made when the transaction starts
Copyright © Ellis Cohen, 2002-2005 30
Cache Data Lifetimes
Transaction Lifetime– The data in a client's cache is cleared
at the end of each transaction – i.e. every transaction starts with an empty cache
Session Lifetime (Timestamped Caches Only)– The data remains in the cache at the
end of a transaction. At the start of a transaction, the cache may contain data used in the client's previous transactions.• Does require a (relatively simple) local undo
mechanism if a transaction aborts
Copyright © Ellis Cohen, 2002-2005 31
Tuple Caching
For Relational Databases, the most natural items to cache are tuples, along with the server's ROWID for each.
When a client executesSELECT * FROM Emps WHERE deptno = 10
the tuples are returned from the serveralong with the server's ROWID for each one
The tuples (along with their ROWID's) are retained in the cache.
Copyright © Ellis Cohen, 2002-2005 32
Partial Queries
Suppose the cache is empty, and the very first client operation in a transaction is
SELECT empno, sal FROM EmpsWHERE job = 'CLERK'
How should the client cache managerprocess this query?
Consider what happens if the same request is made later in the transaction.
Copyright © Ellis Cohen, 2002-2005 33
Partial Query Alternatives
1. Send the request to the server. Cache nothing.
Reasonable
2. Send the request to the server.Cache the results for reuse.
Possible, but would significantly complicate concurrency and cache management it would require having field-level granularity, not just tuple-level granularity.
3. Request & cache whole tuples from the serverSELECT * FROM Emps WHERE job = 'CLERK'
Then execute the original query locally against the cached tuples
Makes most use of cache, cost of downloading tuples worth it if they will be used in future queries
Copyright © Ellis Cohen, 2002-2005 34
Tuple Overlap Problem
Suppose a client executesSELECT * FROM Emps WHERE deptno = 10
Later in the same transaction, the client executesSELECT * FROM Emps WHERE job = 'CLERK'
If there are clerks in department 10, the tuples will overlap.
In general, if a query includes a tuple which is already in the cache, what should be done?
Copyright © Ellis Cohen, 2002-2005 35
Tuple Query & ReplacementThe query below is shipped to the server
SELECT * FROM Emps WHERE job = 'CLERK'
It returns a set of tuples, each with its rowid• If the rowid is not in the cache
– add the tuple (and its rowid) to the cache, and include it in the query result set
• If the rowid is in the cache, and its tuple has been used in the current transaction– throw away the tuple returned from the server;
add the tuple in the cache to the query result set• If the rowid is in the cache, but its tuple has not yet
been used in the current transaction– replace the tuple in the cache; put the newly
returned tuple in the query result set
Are there ways to make this more efficient?
Copyright © Ellis Cohen, 2002-2005 36
Tuple Filtering
Server Filtering– The server can itself keep track of the rowids of
tuples it has sent the client during a transaction– For tuples (that match the query) that it has
already sent the client for this transaction, it just sends rowid (it sends the rowids + contents for the other tuples)
Client Filtering (for Session Lifetime Caches)– The client cache manager may be able to identify
(via a local query) tuples that were in its cache prior to the current transaction which match the query.
– It can send the rowids and timestamps of those tuples to the server, and the server will only send those tuples back if they have been updated more recently
Copyright © Ellis Cohen, 2002-2005 37
Predicate Caching
The client cache manager keeps track of the fact that the results of
SELECT * FROM Emps WHERE deptno = 10
are already in the cache , and specifically saved as Cache-Emps-deptno-10. So it just sends the server the query
SELECT * FROM Emps WHERE job = 'CLERK' AND (deptno != 10 OR deptno IS NULL)
It also locally executes the query
SELECT * FROM Cache-Emps-deptno-10 WHERE job = 'CLERK'
and builds the result set for the original query by union-ing these two result sets
Copyright © Ellis Cohen, 2002-2005 38
Querying Updated Data
Suppose a client executesUPDATE Emps SET sal = sal + 100 WHERE deptno = 10
What are the implication for query processing?
Later in the same transaction, the client executesSELECT * FROM Emps WHERE job = 'CLERK'
If there are clerks in department 10, the salary retrieved for those clerks had better be the new salary.
How is the update implemented?
Copyright © Ellis Cohen, 2002-2005 39
Updates & Queries
The client cache manager first processes the query
SELECT * FROM Emps WHERE deptno = 10
and places any new tuples retrieved (i.e. ones not already used in the current transaction) in the cache.
Then, the UPDATE command is applied to the local tuples in the cache. The problem of querying updated data should make it clear why the client cache manager MUST make sure that result sets do NOT include tuples used (and specifically modified) by the current transaction.
Copyright © Ellis Cohen, 2002-2005 40
Processing Complex QueriesProcessing more complex queries, even simple aggregate queries, can be challenging when a cache is used. Consider
SELECT deptno, sum(sal)FROM Emps WHERE job != 'CLERK'GROUP BY deptno
if some employee's salaries have already been updated in the transaction.
The simplest approach is for the client cache manager to first process
SELECT * FROM Emps WHERE job != 'CLERK'
and place any new tuples retrieved (i.e. ones not already used in the current transaction) in the cache.
Then, the more complex query is applied to the local tuples in the cache. Predicate caching allows more complex and clever solutions.
Copyright © Ellis Cohen, 2002-2005 41
Concurrency Control Mechanisms & CachingLock-Based
Approach: Lock-Based (Pessimistic)DB Versioning: Single VersionCaching: None
(Simple Client-side caches can increase performance for distributed clients; all locking is still done at the server)
Timestamp-BasedApproach: Timestamp-Based (Pessimistic)DB Versioning: Single or Multi-VersionCaching: None
(Simple Cache can optionally be used; approach then becomes completely or partly Optimistic)
Optimistic Concurrency ControlApproach: Validation-Based (Optimistic)DB Versioning: Single VersionCaching: Simple, Timestamp
(Can use Snapshot Cache, but same effect as Simple Cache)
Read-Consistent (not entirely serializable)Approach: Validation-Based (Optimistic) or
Lock-Based (Pessimistic, X locks ONLY!)DB Versioning: Multi-VersionCaching: Snapshot
Copyright © Ellis Cohen, 2002-2005 42
Caching with Timestamp-Based Concurrency
A simple client cache can be used with timestamp-based concurrency– Initial read of a data item in a transaction must
contact DB server to set that data item's read timestamp (and get current version of cache value if stale)
– All updates are made to the cache– At commit time, attempt is made to write all
updated data from cache to server database state, with the client's timestamp (of course)
Characteristics– Some transactions that completed without
cache will abort with cache (due to a read by a later transaction between write & commit)
– Commits no longer need to wait for completion of transactions whose writes they read (since no writes are done until commit)
– Fully optimistic if DB is multi-versioned, else partially pessimistic, since reads can fail.
Copyright © Ellis Cohen, 2002-2005 43
Limitations of Timestamp-Based Concurrency
• Transactions that try to write a data item may need to abort because of conflicts with– other incomplete concurrent transactions
(that might abort)– even other transactions that have already aborted
that already have read that data item
• When a cache is used, and a data item is in the cache, the initial read of a data item in a transaction must still contact the DB server– to get the latest version, if necessary, and– to set that data item's read timestamp (adding
significant overhead)
Optimistic Concurrency Controlovercomes these limitations
Copyright © Ellis Cohen, 2002-2005 44
OptimisticConcurrency Control
with aTimestamped Cache
Copyright © Ellis Cohen, 2002-2005 45
Optimistic Concurrency Control
•Assumes (optimistically) that a transaction will not have conflicts with other transactions. Does not set locks; avoids their overhead.
•Cache-Based: Reads all data from and writes all data to its client cache.
•Validation-Based: When the transaction commits, writes all changes back to the DB server, but only after validating that the data it used during the transaction is still up-to-date.This effectively allows the system to act as if the entire transaction happened at the time of the commit.
WHY?
Copyright © Ellis Cohen, 2002-2005 46
Optimistic Concurrency Controlis Cache-Based
• A separate cache is maintained by/for each client
• If a transaction needs to read a data item D– If D is not in the cache, get it from the server
database state, and put it in the cache. Note: it is impossible to read another's transaction's uncommitted modifications!
– If D is already in the cache, get it from the cache(though if it is old or stale, and not yet used in this transaction, the latest value may optionally be obtained from the server database state)
• All WRITEs and UPDATEs are made to the cache
• When the transaction commits, all data written or updated by the transaction is written back from the client's cache to the server database state
• Often, the cache is cleared at the end of each transaction
Copyright © Ellis Cohen, 2002-2005 47
Optimistic Concurrency Controlis Validation-Based
– When a transaction requests a commit, the transaction must first be validated
– Optimistic Concurrency uses R/W Validation:• (R) Suppose T reads some data D• (W) Suppose another transaction commits and
writes a later version of that same data, D'• Then T commits. OOPS! If we are pretending
that the entire transaction happened at commit time, then T should have read D', not D
– If R/W Validation finds any such R/W consistencies, then the validating transaction aborts instead of committing!
Does this ensure Recoverability, avoid Cascading Aborts, and prevent Phantoms?
Copyright © Ellis Cohen, 2002-2005 48
Client Cache Types (Again)
Simple CacheThe cache simply maintains the data read/written by the transaction
Timestamped CacheEach piece of data obtained from the database server is associated with the server's timestamp for that data (the time that data was last updated)
Snapshot CacheThe client's cache holds a "virtual" snapshot of the current database state made when the transaction starts
Copyright © Ellis Cohen, 2002-2005 49
Optimistic Concurrency Controlwith a Timestamped Cache
Uses single version database stateBut associated with each piece of data, is the time
it was last committed (its server timestamp)
When a reads data from the serverGet the data from the database state. Also,
get (and remember) the data's server timestamp
R/W Validation– For each piece of data read by the validating
transaction, compare its cache time to the data's server timestamp
– Fail validation (and abort the transaction) if any data's server timestamp is after the cache's timestamp for the data
– (More complexity added if validation + commit is not atomic)
Copyright © Ellis Cohen, 2002-2005 50
Unrelated Changes
Transaction A
select yfrom YT
update XTset x = x + 100
Transaction B
t0
COMMITt2
Try to COMMIT:Validation succeeds!y has not changed!
t3
t1
A committed while B was running, but it did not
change anything B read
Copyright © Ellis Cohen, 2002-2005 51
Read After CommitTransaction
A
select xfrom XT
update XTset x = x + 100
Transaction B
t0
COMMITt1
t3
Ax x Bx30
130 30
130 130
t1
130 130
130 130
x is changed by A, which overlaps with Bbut B read x after A committed.
However, no transaction that updated x commits between the time B reads it and the time B commits
t2
Try to COMMIT:Validation succeeds!x has not changed!
Copyright © Ellis Cohen, 2002-2005 52
Validation Causes Abort
Transaction A
select xfrom XT
update XTset x = x + 100
Transaction B
t1
COMMITt3
Try to COMMIT,but validation fails!
t4
Ax x Bx30
130 30
130 30 30t0
130 130 30
Reads a value that
doesn't reflect A's write, even though A commits before B
B read the t0 version of x, and then tries to commit at t4
x was last persistently updated (by A) at time t3 => OOPS!
B's Validation:
t2
t3 t0
Copyright © Ellis Cohen, 2002-2005 53
Why Should Read-Only Transactions Fail?
Transaction A
select xfrom XT
update XTset x = x + 100
Transaction B
t0
COMMITt2
COMMIT?
t3
Ax/Ay x/y Bx/By30/30
130/ 30/30
130/130 30/
130/130 130/130 30/
t1 update YTset y = y + 100
30/30
30/130130/130select yfrom YT
If B's commit succeeds, then the parent application might tell the user
inconsistent values for x & y.This violates View Serializability! NO!
Copyright © Ellis Cohen, 2002-2005 54
Lost Update Problem
Suppose transactions A & B simultaneously deposit $1000 into a checking account, whose initial balance is $5000
1) select balance into curbal from checking where acctid = 30792;
2) update checking set balance = curbal + 1000 where acctid = 30792;
3) COMMIT
Optimistic Concurrency Prevents Lost Updates!
Copyright © Ellis Cohen, 2002-2005 55
2300
Preventing Lost Updates
Transaction A
curbal balance
Transaction B
t0
COMMIT
t1
t4
Acurbal/balance
balance2300
t2
Try to COMMIT:Validation fails!
B curbal/balance
2300/
curbal balance2300/23002300/
balance curbal + 1000
2300/3300 2300/2300
balance curbal + 1000
2300/3300 2300 2300/
3300
t5
t3
2300/3300 3300
2300/3300
Why does B's commit fail?
33002300/3300
Copyright © Ellis Cohen, 2002-2005 56
Lost Updates Fail R/W Validation!
At t1, B reads balance (and timestamps it at t1)When B tries to commit at t5, validation discovers
that balance now has a later timestamp (t4).
In other words, this failure is due entirely to failing R/W validation (B reads balance, A then commits, changing balance before B commits). The fact that B is also trying to write balance is IRRELEVANT!
In locking, this could be prevented entirely byR/W conflicts in the grant matrix.
t1curbal balance2300/23002300/
COMMITt42300/3300 3300
2300/3300
COMMIT?
A balance B
33002300/3300t5
When do lock-based systems need to disallow W/W?
Copyright © Ellis Cohen, 2002-2005 57
The Inconsistent Update Problem
Transaction A
1) UPDATE XT SET x = 10
2) UPDATE YT SET y = 10
Transaction B
1) UPDATE XT SET x = 80
2) UPDATE YT SET y = 80
Serial ScheduleA1 A2 B1 B2 x:80 y:80
Serial ScheduleB1 B2 A1 A2 x:10 y:10
NON-SERIALIZABLESchedule
A1 B1 B2 A2 x:80 y:10
The non-serializable schedule is caused by a real W/W conflict.
How does Optimistic Concurrency Controlprevent it?
Copyright © Ellis Cohen, 2002-2005 58
Preventing Inconsistent Updates
Transaction A
x 10
Transaction B
t0
COMMIT
t1
t4
Ax/y x/y
30/30
t2
B x/y
10/
x 8080/10/
y 10
10/ 80/8030/30
10/10 30/30 80/80
t5
t3
10/10 10/10 80/80
Notice: Neither A or B read anything, so both validate successfully. Inconsistent Updates are prevented so long as multiple COMMITs (i.e. Validates + Writes) do not overlap
80/80 80/80
30/30
30/30
y 80
COMMIT
Copyright © Ellis Cohen, 2002-2005 59
PhantomsTransaction
ATransaction
B
SELECT * FROM T WHERE name = 'JONES' AND val = 111
SELECT * FROM T WHERE name = 'JONES' AND val = 111
UPDATE T SET name = 'JONES' WHERE name = 'SMITH' AND val = 111
COMMIT
COMMIT?
If Tuple Caching is used,what's the result of this query?
How can validation failures be ensured when (potential) phantom read violations occur?
Copyright © Ellis Cohen, 2002-2005 60
Validation & Cache Placement
Client A
Client B
DB Server
Client A
Client BDB Server
Server-Managed Client Cache
When the client tells the server to COMMIT, the server validates based upon both the server data and the data in the server-managed client cache
Client-Side CacheWhen the client tells the server to COMMIT, it must pass along
the information needed for validation & commit:
• the timestamps of all data read in the transaction
• the contents of all data written
Actually, the timestamps of the data read during the transaction can be kept at the server
Copyright © Ellis Cohen, 2002-2005 61
Cache Data Lifetime
Some optimistic concurrency algorithms automatically clear the client cache at the end of every transaction.
1) Explain how the decision to clear/not clear the cache affects the correctness and performance of optimistic concurrency control.
2) If the cache is not cleared, when should the cache consider obtaining a newer version of a data item from the server database state. Why?
Copyright © Ellis Cohen, 2002-2005 62
Transaction-BasedValidation
Copyright © Ellis Cohen, 2002-2005 63
Transaction-Based ValidationIf all transactions starts out with an empty client
cache– Then a transaction can only have R/W validation
conflicts with data written by overlapping transactions which have already completed
Rather than having the DB state retain last timestamp of each piece of data written– Each committed transaction keeps track of data items
it committedR/W Validation:
– Compare read set of validating transaction with write set of each completed overlapping transaction
– If a completed transaction's write set has a piece of data in common with the validating transaction's read set, see if the commit time of the completed transaction is later than the time the validating transaction read the data. If so, abort.
Copyright © Ellis Cohen, 2002-2005 64
Using Transaction-Based Validation
Transaction A
select xfrom XT
update XTset x = x + 100
Transaction Bt1
COMMITt2
ABORTt3
Ax x Bx30
130 30
130 30 30
t1
t1
130 130 30
Reads a value that
doesn't reflect A's write, even though A commits before B
Overlapping committed transactions: { A }A's commit time: t2 A's writes: { x }
B's read time of each: { x:t1 } t1 < t2 => B aborts
B's Validation:
Copyright © Ellis Cohen, 2002-2005 65
Forward Validation
Transaction A
select xfrom XT
update XTset x = x + 100
Transaction B
COMMITABORT
Ax x Bx30
130 30
130
30
130
30
Transactions in Process: { B }A's write set: { x }B's read set: { x }Overlap => A wounds B, or A dies
A's Validation:
t1
t2
t3
Because of the R/W
conflict with A, B will
eventually fail its own
validation and abort, so just kill B as part
of A's commit
How does this change the Lost Update
Problem?
Copyright © Ellis Cohen, 2002-2005 66
Concurrent Commits & Inconsistent Updates
Transaction A
update XTset x = 10
Transaction B
COMMITCOMMIT
Ax/Ay x/y Bx/By
30/ 30
update YTset y = 10
10/ 10
30/ 30
update XTset x = 80
update YTset y = 80
10/ 10
30/ 30
80/ 80
10/ 80
80/ 80
10/ 10
10/ 10
80/ 10
80/ 80
80/ 80
Validate/Write
Validate/Write
But if we allow commits to run concurrently without using any kind of
locking, how can we prevent inconsistent updates?
Inconsistent Updates are
prevented so long as we do not allow
COMMITs to overlap
Copyright © Ellis Cohen, 2002-2005 67
Concurrent Commit and W/W Validation
Transaction A
update XTset x = 10
Transaction B
COMMIT
COMMITValidation
FAILS!
Ax/Ay x/y Bx/By
30/ 30
update YTset y = 10
10/ 10
30/ 30
update XTset x = 80
update YTset y = 80
10/ 10
30/ 30
80/ 80
10/ 30
10/ 10
10/ 10
10/ 10
80/ 80
Because B starts its commit before A
completes its COMMITB's write set and A's write
set must NOT overlap!
Validate/Write
Because B starts its commit after A started to commit
B does R/W Validation
against A
Inconsistent Updates are
prevented by W/W validation
against overlapping commits
which already started
Copyright © Ellis Cohen, 2002-2005 68
Optimistic ConcurrencyControl with a
Simple/SnapshotCache
Copyright © Ellis Cohen, 2002-2005 69
Optimistic Concurrency Controlwith a Simple Cache
Like using a timestamped cacheBut don't remember the time data was gottenJust assume all data was read when the transaction
startedSimpler concurrency control, but leads to
unnecessary aborts
R/W Validation:– Compare read set of validating transaction with
write set of each completed overlapping transaction
– If a completed transaction's write set has a piece of data in common with the validating transaction's read set, abort (assume that the validating transaction read the data BEFORE the other transaction committed)
– Still do forward validation aborts at commitand W/W Validation if commits overlap
Copyright © Ellis Cohen, 2002-2005 70
Simple Cache Validation Causes Unnecessary Abort
Transaction A
select xfrom XT
update XTset x = x + 100
Transaction B
COMMIT
ABORT
Ax x Bx30
130 30
130 130
130 130
Even though this was read
AFTER A committed, the cache
doesn’t retain that
information, so validation assumes it was read BEFORE!
Overlapping committed transactions: { A }A's write set: { x }B's read set: { x }Overlap => B aborts
B's Validation:
t1
t2
t3This would
succeed using a timestamp
cache!
Copyright © Ellis Cohen, 2002-2005 71
Abort on Post-Commit Read
Suppose– Some transaction T reads a piece of data
D that is not in its simple cache, and – D was written by an overlapping
committed transaction
Then when T tries to validate– T will abort (validation will assume that T
read D BEFORE the other transaction committed it)
So– Can just abort T immediately!
Copyright © Ellis Cohen, 2002-2005 72
Abort on Post-Commit Read Example
Transaction A
select xfrom XTABORTs
update XTset x = x + 100
Transaction B
COMMIT
Ax x Bx30
130 30
130 130
130 130
Overlapping committed transactions: { A }A's write set: { x }
B is trying to read x, which is in the write set of an overlapping committed transaction, so abort it!
t1
t2
This would succeed using a timestamped
cache!
Copyright © Ellis Cohen, 2002-2005 73
Optimistic Concurrency Controlwith a Snapshot Cache
Snapshot CacheThe transaction's cache holds a "virtual" snapshot of
the current database state made when the transaction started
R/W Validation– Compare read set of validating transaction with
write set of each completed overlapping transaction
– If a completed transaction's write set has a piece of data in common with the validating transaction's read set, abort (because the validating transaction snapshot data was made BEFORE the other transaction completed)
– Also do forward validation aborts at commit, and W/W validation if commits overlap
– Validation ensures that current value of data read is the same as its initial value, so the effect is identical to use of the Simple Cache
Copyright © Ellis Cohen, 2002-2005 74
Using Snapshot Cache
Transaction A
update XTset x = x + 100
Transaction B
COMMIT
Ax x Bx30
130 30
130 130
130 30
Reads the snapshot
value, instead of the current
value
t1
t2
select xfrom XT
Result is identical to use of a simple cache.In both cases, a transaction aborts if it tries to read a
value written by an overlapping committed transaction
ABORT
Copyright © Ellis Cohen, 2002-2005 75
Snapshot-BasedConcurrency Control
Copyright © Ellis Cohen, 2002-2005 76
Snapshot-Based Concurrency Controls
A snapshot cache holds a "virtual" snapshot of the current database state made when the transaction started
Can be used for two different kinds of concurrency control mechanisms
• Optimistic– Validation-Based (with R/W Validation)
Discussed above.
• Read-Consistent– Validation-Based (with W/W Validation)– Lock-Based (with eXclusive locks ONLY!)
Copyright © Ellis Cohen, 2002-2005 77
Snapshot-Based Features
• Transactions see a "consistent" view of the database (i.e. as it was when the transaction started)
• Avoid Dirty Reads, Non-Repeatable Reads & Phantom Reads
What if we use snapshots without validating on commit?
Copyright © Ellis Cohen, 2002-2005 78
Snapshots without Validation
Transaction A
update XTset x = x + 100
Transaction B
COMMIT
COMMIT
Ax x Bx30
130 30
130 130
130 130
update XTset x = x + 100130
If Read-Write transactions use snapshots,but don't validate, they are prone to Lost Updates
Copyright © Ellis Cohen, 2002-2005 79
Snapshots & Lost Updates
• All Snapshot-Based concurrency mechanisms avoid dirty reads, non-repeatable reads & phantom reads (i.e. avoids most R/W conflicts)
• Read Consistent concurrency mechanisms also avoid lost & inconsistent updates by ensuring there are no W/W conflicts
Copyright © Ellis Cohen, 2002-2005 80
Avoiding Lost UpdatesDetect W/W Conflicts
– Approach: Validation-Based (W/W)– Look at overlapping transactions that
already completed.– Fail validation & abort if any of them wrote
same data that this transaction wrote– Neither reads nor writes wait
Prevent W/W Conflicts– Approach: Lock-based (X only)– Use X locks to prevent concurrent writes of
the same data– Writes only wait for other writes– Blocked writers ABORT if lock holder
commits
Copyright © Ellis Cohen, 2002-2005 81
Oracle's Lock-BasedRead-Consistent Concurrency Control
Each client uses a snapshot cache– holds a "virtual" snapshot of the current
database state made when the transaction started
Transactions obtain X locks before writing data.– S locks are not obtained by default, so– Reads never wait & never block writes
If a transaction must wait on a lock, – If the transaction holding the lock aborts,
continue the waiting transaction– Else if the transaction holding the lock
commits, abort the waiting transaction
Copyright © Ellis Cohen, 2002-2005 82
X Locks Can't Just Wait!
Transaction A
update XTset x = x + 100
Transaction B
COMMIT
COMMIT
Ax x Bx30
130 30
130 130
130 130
update XTset x = x + 100
130
Adding X Locksdoes not prevent Lost Updates if locks just block
Lost Updates are only preventedIf lock conflicts only allow one of the conflicting transactions to commit
Blocks until A commits.
Still sees 30instead of 130!
Copyright © Ellis Cohen, 2002-2005 83
Snapshots & Serializability
• All Snapshot-Based concurrency mechanisms avoid dirty reads, non-repeatable reads & phantom reads
• Read Consistent concurrency mechanisms avoid lost & inconsistent updates
Does that mean that Read Consistent concurrency mechanisms
are serializable?
Copyright © Ellis Cohen, 2002-2005 84
Extended Isolation Levels
Read UncommittedLost Updates are prevented,
but Dirty Reads are still possible
Read CommittedDirty Reads are also prevented
but Non Repeatable Reads are still possible
Repeatable ReadNon Repeatable Reads are also preventedPhantoms are still possible
SnapshotPhantoms are also preventedConstraint Violations are still possible
SerializableAll problems are preventedComplete isolation
Isolation level of Read
Consistent transactions.
Oracle calls it serializable.
It's NOT!
Copyright © Ellis Cohen, 2002-2005 85
Read-Only Snapshot Concurrency
Read-Only Transactions in systems which are Snapshot-Based …
• can act as if they commit when the transaction starts (when the snapshot is taken)
• they can "timewarp"– choose an earlier time to start (if the
multi-versioned state goes back to that point)
– Allows system to run queries based on state at an earlier time. Oracle calls these "flashback queries"
Copyright © Ellis Cohen, 2002-2005 86
Consistency Failure & The Constraint
Violation Problem
Copyright © Ellis Cohen, 2002-2005 87
Constraint Violation Problem
Suppose that– C is a constraint (or assertion) about the
database– Every transaction T, when run in isolation, has
the property thatC is true before T C is true after T
If a DB uses a serializable concurrency controlIf C is true initially,Then (looking just at the state of the committed
data), C will always remain true
This may be violated using Snapshot Isolation (e.g. using Read Consistent Concurrency)
Violates Consistency aspect of ACID Properties
Copyright © Ellis Cohen, 2002-2005 88
BEGIN SELECT count(*) INTO dilipknt FROM dilipOwns
WHERE (id = :id); SELECT count(*) INTO chuknt FROM chuOwns
WHERE (id = :id); IF (dilipknt + chuknt = 0) THEN IF (:who = 'dilip') THEN
INSERT INTO dilipOwns VALUES( :id );ELSIF (:who = 'chu') THEN INSERT INTO chuOwns VALUES( :id );END IF;
END IF;END;
Constraint Violation Example
With 2PL, both would acquire S locks on the id indices of both tables
With 2PL, If both T1 and T2 get to this point at the same time, they will each try to acquire X locks on the id indices of one of the tables. They will deadlock and one will abort.
With Read Consistent Concurrency Control, both commit & the
constraint is violated
The code below allocates resource :id to :who (either dilip or chu). dilipOwns holds the resources owned by dilip.chuOwns holds the resources owned by chu.
The constraint requires is that a resources can only be owned by one of them. With 2PL, this constraint is enforced,
even with concurrent execution of the code. With Read Consistent Concurrency Control, that's not true.
Suppose T1 runs the code below with :who = 'dilip' & concurrently T2 runs it with :who = 'chu' and the same value for :id
Copyright © Ellis Cohen, 2002-2005 89
BEGIN LOCK TABLE dilipOwns, chuOwns IN EXCLUSIVE MODE; SELECT count(*) INTO dilipknt FROM dilipOwns
WHERE (id = :id); SELECT count(*) INTO chuknt FROM chuOwns
WHERE (id = :id); IF (dilipknt + chuknt = 0) THEN IF (:who = 'dilip') THEN
INSERT INTO dilipOwns VALUES( :id );ELSIF (:who = 'chu') THEN INSERT INTO chuOwns VALUES( :id );END IF;
END IF;END;
Explicit Locking Must Be UsedConstraint: an id may appear in dilipOwns or chuOwns, but not both
dilipOwns( id )chuOwns( id )
Suppose T1 runs the code below with :who = 'dilip' & concurrently T2 runs it with :who = 'chu' and the same value for :id
To ensure serializability
with Read Consistent
Concurrency Control, explicit
locking must be used
Copyright © Ellis Cohen, 2002-2005 90
Considerrooms( roomnum, location, … )reservations( roomnum,
stime, ftime, reserver, purpose )
CodeSELECT count(*) INTO overlaps FROM reservations
WHERE (roomnum = :roomnum)AND (stime BETWEEN :stime AND :ftime OR ftime BETWEEN :stime AND :ftime);
IF (overlaps = 0) THENINSERT INTO reservations VALUES ( :roomnum, :stime, :ftime, :reserver, :purpose );
END IF;COMMIT;
A Real Constraint Violation ProblemConstraint: There are no two reservations for
the same room with overlapping times
The code below reserves :roomnum
from :stime to :ftime, assuming there is no
conflicting reservation
Suppose two reservations are made concurrently for the same room, but for overlapping times. No problem with 2PL. With Read Consistent CC, both
reservations could be inserted.
Check this is true. How would you code around it?
Copyright © Ellis Cohen, 2002-2005 91
Considerrooms( roomnum, location, … )reservations(
roomnum, stime, ftime, reserver, purpose )Code
SELECT * FROM roomsWHERE (roomnum = :roomnum) FOR UPDATE;
SELECT count(*) INTO overlaps FROM reservationsWHERE (roomnum = :roomnum)AND (stime BETWEEN :stime AND :ftime OR ftime BETWEEN :stime AND :ftime);
IF (overlaps = 0) THENINSERT INTO reservations VALUES ( :roomnum, :stime, :ftime, :reserver, :purpose );
END IF;COMMIT;
Use FOR UPDATE
Using FOR UPDATE forces acquisition of X locks
Copyright © Ellis Cohen, 2002-2005 92
Constraint View Violations
Using Snapshot Isolation can also produce Constraint View Violations:–This is a read anomaly that can occur in a Read-Only transaction using Snapshot Isolation
– It can report an inconsistency (a constraint violation) that actually does not (and moreover cannot) occur with a serial schedule.
Copyright © Ellis Cohen, 2002-2005 93
Read-Committed IsolationUsing Snapshot Concurrency Control
Weakened form of Read-Consistency Default Oracle Concurrency Mechanism
A transaction takes a new virtual snapshot at the beginning of every query– contains any data written by own transaction– plus all data committed by other transactions
If this transaction waits for a write lock– then if the transaction holding the lock aborts,
continue the transaction– else if the transaction holding the lock commits,
re-execute the query [don't abort the transaction!]
Allows Non-Repeatable Reads & PhantomsWeaker than SQL standard Read-Committed
Copyright © Ellis Cohen, 2002-2005 94
Read Committed Problem Suppose transactions A & B simultaneously
withdraw $1000 from a checking account
1) select balance into curbal from checking where acctid = 30792;
2) if curbal < 1000 then raise error
3) update checking set balance = balance - 1000 where acctid = 30792;
4) emit $1000 from ATM;
Schedule A1 A2 B1 B2 B3 B4 A3 A4 doesn’t work using Read Committed Isolation.
How could the code be changed to work even with Read Committed Isolation?
Copyright © Ellis Cohen, 2002-2005 95
Read Committed and FOR UPDATE
Suppose transactions A & B simultaneously withdraw $1000 from a checking account
1) select balance into curbal from checking where acctid = 30792 FOR UPDATE;
2) if curbal < 1000 then raise error
3) update checking set balance = balance - 1000 where acctid = 30792;
4) emit $1000 from ATM;