distributed databases
DESCRIPTION
Distributed Databases. COMP3017 Advanced Databases. Dr Nicholas Gibbins – [email protected] 2012-2013. Overview. Fragmentation Horizontal (primary and derived), vertical, hybrid Query processing L ocalisation , optimisation ( semijoins ) Concurrency control - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/2.jpg)
2
OverviewFragmentation
– Horizontal (primary and derived), vertical, hybrid
Query processing– Localisation, optimisation (semijoins)
Concurrency control– Centralised 2PL, Distributed 2PL, deadlock
Reliability– Two Phase Commit (2PC)
The CAP Theorem
![Page 3: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/3.jpg)
3
What is a distributed database?A collection of sites connected by a communications network
Each site is a database system in its own right, but the sites have agreed to work together
A user at any site can access data anywhere as if data were all at the user's own site
![Page 4: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/4.jpg)
DDBMS Principles
![Page 5: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/5.jpg)
5
Local autonomyThe sites in a distributed database system should be autonomous or independent of each other
Each site should provide its own security, locking, logging, integrity, and recovery. Local operations use and affect only local resources and do not depend on other sites
![Page 6: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/6.jpg)
6
No reliance on a central siteA distributed database system should not rely on a central site, which may be a single point of failure or a bottleneck
Each site of a distributed database system provides its own security, locking, logging, integrity, and recovery, and handles its own data dictionary. No central site must be involved in every distributed transaction.
![Page 7: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/7.jpg)
7
Continuous operationA distributed database system should never require downtime
A distributed database system should provide on-line backup and recovery, and a full and incremental archiving facility. The backup and recovery should be fast enough to be performed online without noticeable detrimental affect on the entire system performance.
![Page 8: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/8.jpg)
8
Location independenceApplications should not know, or even be aware of, where the data are physically stored; applications should behave as if all data were stored locally
Location independence allows applications and data to be migrated easily from one site to another without modifications.
![Page 9: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/9.jpg)
9
Fragmentation independenceRelations can be divided into fragments and stored at different sites
Applications should not be aware of the fact that some data may be stored in a fragment of a table at a site different from the site where the table itself is stored.
![Page 10: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/10.jpg)
10
Replication independenceRelations and fragments can be stored as many distinct copies on different sites
Applications should not be aware that replicas of the data are maintained and synchronized automatically.
![Page 11: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/11.jpg)
11
Distributed query processingQueries are broken down into component transactions to be executed at the distributed sites
![Page 12: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/12.jpg)
12
Distributed transaction managementA distributed database system should support atomic transactions
Critical to database integrity; a distributed database system must be able to handle concurrency, deadlocks and recovery.
![Page 13: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/13.jpg)
13
Hardware independenceA distributed database system should be able to operate and access data spread across a wide variety of hardware platforms
A truly distributed DBMS system should not rely on a particular hardware feature, nor should it be limited to a certain hardware architecture.
![Page 14: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/14.jpg)
14
Operating system independenceA distributed database system should be able to run on different operating systems
![Page 15: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/15.jpg)
15
Network independenceA distributed database system should be designed to run regardless of the communication protocols and network topology used to interconnect sites
![Page 16: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/16.jpg)
16
DBMS independenceAn ideal distributed database system must be able to support interoperability between DBMS systems running on different nodes, even if these DBMS systems are unalike
All sites in a distributed database system should use common standard interfaces in order to interoperate with each other.
![Page 17: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/17.jpg)
17
Distributed Databases• Local autonomy
• No central site
• Continuous operation
• Location independence
• Fragmentation independence
• Replication independence
• Distributed query processing
• Distributed transactions
• Hardware independence
• Operating system independence
• Network independence
• DBMS independence
Distributed Databases vs. Parallel Databases
![Page 18: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/18.jpg)
18
Parallel Databases• Local autonomy
• No central site
• Continuous operation
• Location independence
• Fragmentation independence
• Replication independence
• Distributed query processing
• Distributed transactions
• Hardware independence
• Operating system independence
• Network independence
• DBMS independence
Distributed Databases vs. Parallel Databases
![Page 19: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/19.jpg)
Fragmentation
![Page 20: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/20.jpg)
20
Why Fragment?Fragmentation allows:
– localisation of the accesses of relations by applications– parallel execution (increases concurrency and throughput)
![Page 21: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/21.jpg)
21
Horizontal fragmentation Each fragment contains a subset of the tuples of the global relation
Vertical fragmentation Each fragment contains a subset of the attributes of the global relation
Fragmentation Approaches
global relation verticalfragmentation
horizontalfragmentation
![Page 22: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/22.jpg)
22
DecompositionRelation R is decomposed into fragments FR = {R1, R2, ... , Rn}
Decomposition (horizontal or vertical) can be expressed in terms of relational algebra expressions
![Page 23: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/23.jpg)
23
CompletenessFR is complete if each data item di in R is found in some Rj
![Page 24: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/24.jpg)
24
ReconstructionR can be reconstructed if it is possible to define a relational operator ▽ such that R = ▽Ri, for all Ri ∈ FR
Note that ▽ will be different for different types of fragmentation
![Page 25: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/25.jpg)
25
DisjointnessFR is disjoint if every data item di in each Rj is not in any Rk
where k ≠ j
Note that this is only strictly true for horizontal decomposition
For vertical decomposition, primary key attributes are typically repeated in all fragments to allow reconstruction; disjointness is defined on non-primary key attributes
![Page 26: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/26.jpg)
26
Horizontal FragmentationEach fragment contains a subset of the tuples of the global relation
Two versions:– Primary horizontal fragmentation
performed using a predicate defined on the relation being partitioned
– Derived horizontal fragmentationperformed using a predicate defined on another relation
![Page 27: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/27.jpg)
27
Primary Horizontal FragmentationDecomposition
FR = { Ri : Ri = σfi(R) }
where fi is the fragmentation predicate for Ri
ReconstructionR = ∪ Ri for all Ri ∈ FR
DisjointnessFR is disjoint if the simple predicates used in fi are mutually exclusive
Completeness for primary horizontal fragmentation is beyond the scope of this lecture...
![Page 28: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/28.jpg)
28
Derived Horizontal FragmentationDecomposition
FR = { Ri : Ri = R ▷ Si }
where FS = {Si : Si = σfi(S) }and fi is the fragmentation predicate for the primary horizontal fragmentation of S
ReconstructionR = ∪ Ri for all Ri ∈ FR
Completeness and disjointness for derived horizontal fragmentation is beyond the scope of this lecture...
![Page 29: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/29.jpg)
29
Vertical FragmentationDecomposition
FR = { Ri : Ri = πai(R) }, where ai is a subset of the attributes of R
CompletenessFR is complete if each attribute of R appears in some ai
ReconstructionR = ⨝K Ri for all Ri ∈ FR
where K is the set of primary key attributes of R
DisjointnessFR is disjoint if each non-primary key attribute of R appears in at most one ai
![Page 30: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/30.jpg)
30
Hybrid FragmentationHorizontal and vertical fragmentation may be combined
– Vertical fragmentation of horizontal fragments– Horizontal fragmentation of vertical fragments
![Page 31: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/31.jpg)
Query Processing
![Page 32: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/32.jpg)
32
LocalisationFragmentation expressed as relational algebra expressions
Global relations can be reconstructed using these expressions
– a localisation program
Naively, generate distributed query plan by substituting localisation programs for relations
– use reduction techniques to optimise queries
![Page 33: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/33.jpg)
33
Reduction for Horizontal FragmentationGiven a relation R fragmented as FR = {R1, R2, ..., Rn}
Localisation program is R = R1 ∪ R2 ∪ ... ∪ Rn
Reduce by identifying fragments of localised query that give empty relations
Two cases to consider:– reduction with selection– reduction with join
![Page 34: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/34.jpg)
34
Given horizontal fragmentation of R such that Rj = σpj(R) :
σp(Rj) = ∅ if ∀x∈R, ¬(p(x) ∧ pj(x))
where pj is the fragmentation predicate for Rj
Horizontal Selection Reduction
∪
σp
R1 R2RnR
σp σp
R2 ...query localised query reduced query
![Page 35: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/35.jpg)
35
Recall that joins distribute over unions:(R1 ∪ R2) S ⨝ ≣ (R1 ⨝ S) ∪ (R2 ⨝ S)
Given fragments Ri and Rj defined with predicates pi and pj :
Ri R⨝ j = ∅ if ∀x∈Ri, ∀y∈Rj ¬(pi(x) ∧ pj(y))
Horizontal Join Reduction
⨝ ⨝ ∪
query localised query reduced queryR S
∪
R1 R2 S
⨝
R3 S
⨝
R5 SRn...
![Page 36: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/36.jpg)
36
Reduction for Vertical FragmentationGiven a relation R fragmented as FR = {R1, R2, ..., Rn}
Localisation program is R = R1 R⨝ 2 ... R⨝ ⨝ n
Reduce by identifying useless intermediate relations
One case to consider:– reduction with projection
![Page 37: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/37.jpg)
37
Given a relation R with attributes A = {a1, a2, ..., an} vertically fragmented as Ri = πAi(R) where Ai ⊆ A
πD,K(Ri) is useless if D ⊈ Ai
Vertical Projection Reduction
⨝
πp
R1 R2RnR
πp πp
R2 ...query localised query reduced query
![Page 38: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/38.jpg)
38
We have two relations, R and S, each stored at a different site
Where do we perform the join R S?⨝
The Distributed Join Problem
Site 1
R S⨝
Site 2
R S
![Page 39: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/39.jpg)
39
Site 2
We can move one relation to the other site and perform the join there
– CPU cost of performing the join is the same regardless of site
– Communications cost depends on the size of the relation being moved
The Distributed Join Problem
Site 1
⨝R S
![Page 40: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/40.jpg)
40
Site 2
CostCOM = size(R) = cardinality(R) * length(R)
if size(R) < size(S) then move R to site 2, otherwise move S to site 1
The Distributed Join Problem
Site 1
⨝R S
![Page 41: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/41.jpg)
41
We can further reduce the communications cost by only moving that part of a relation that will be used in the join
Use a semijoin...
Semijoin Reduction
Site 1
R S⨝
Site 2
R S
![Page 42: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/42.jpg)
42
SemijoinsRecall that R ▷p S ≣ πR(R ⨝p S)
where p is a predicate defined over R and S πR projects out only those attributes from R
size(R ▷p S) < size(R ⨝p S)
R ⨝p S ≣ (R ▷p S) ⨝p S ≣ R ⨝p (R ◁p S)≣ (R ▷p S) ⨝p (R ◁p S)
![Page 43: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/43.jpg)
43
R ▷p S ≣ πR(R ⨝p S)≣ πR(R ⨝p πp(S))
where πp(S) projects out from S only the attributes used in predicate p
Semijoin Reduction
Site 1 Site 2
R S
![Page 44: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/44.jpg)
44
Site 2 sends πp(S) to site 1
Semijoin Reduction, step 1
Site 1 Site 2
R Sπp(S)
![Page 45: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/45.jpg)
45
Site 1 calculates R ▷p S ≣ πR(R ⨝p πp(S))
Semijoin Reduction, step 2
Site 1 Site 2
R SR ▷p S
![Page 46: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/46.jpg)
46
Site 1 sends R ▷p S to site 2
Semijoin Reduction, step 3
Site 1 Site 2
R SR ▷p S
R ▷p S
![Page 47: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/47.jpg)
47
Site 2 calculates R ⨝p S ≣ (R ▷p S) ⨝p S
Semijoin Reduction, step 4
Site 1 Site 2
R SR ▷p S R ⨝p S
![Page 48: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/48.jpg)
48
CostCOM = size(πp(S)) + size(R ▷p S)
This approach is better if size(πp(S)) + size(R ▷p S) < size(R)
Semijoin Reduction
Site 1 Site 2
R SR ▷p S R ⨝p S
![Page 49: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/49.jpg)
Concurrency Control
![Page 50: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/50.jpg)
50
Transaction processing may be spread across several sites in the distributed database
– The site from which the transaction originated is known as the coordinator
– The sites on which the transaction is executed are known as the participants
Distributed Transactions
C
P
P
P
transaction
![Page 51: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/51.jpg)
51
Distribution and ACIDNon-distributed databases aim to maintain isolation
– Isolation: A transaction should not make updates externally visible until committed
Distributed databases commonly use two-phase locking (2PL) to preserve isolation
– 2PL ensures serialisability, the highest isolation level
![Page 52: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/52.jpg)
52
Two phases:– Growing phase: obtain locks, access data items– Shrinking phase: release locks
Guarantees serialisable transactions
Two-Phase Locking
#locks
timeBEGIN ENDLOCK POINT
growing phase
shrinking phase
![Page 53: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/53.jpg)
53
Distribution and Two-Phase LockingIn a non-distributed database, locking is controlled by a lock manager
Two main approaches to implementing two-phase locking in a distributed database:
– Centralised 2PL (C2PL)Responsibility for lock management lies with a single site
– Distributed 2PL (D2PL)Each site has its own lock manager
![Page 54: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/54.jpg)
54
Coordinating site runs transaction manager TM
Participant sites run data processors DP
Lock manager LM runs on central site
1. TM requests locks from LM
2. If granted, TM submits operations to processors DP
3. When DPs finish, TM sends message to LM to release locks
Centralised Two-Phase Locking (C2PL)DP TM LM
lock request
lock granted
release locks
operation
end of operation
![Page 55: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/55.jpg)
55
LM is a single point of failure
- less reliable
LM is a bottleneck
- affects transaction throughput
Centralised Two-Phase Locking (C2PL)DP TM LM
lock request
lock granted
release locks
operation
end of operation
![Page 56: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/56.jpg)
56
Coordinating site C runs TM
Each participant runs both an LM and a DP
1. TM sends operations and lock requests to each LM
2. If lock can be granted, LM forwards operation to local DP
3. DP sends “end of operation” to TM
4. TM sends message to LM to release locks
Distributed Two-Phase Locking (D2PL)DP LM TM
operation +lock request
release locks
operation
end of operation
![Page 57: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/57.jpg)
57
Variant:
DPs may send “end of operation” to their own LM
LM releases lock and informs TM
Distributed Two-Phase Locking (D2PL)DP LM TM
operation +lock request
end of operation
operation
end of operation+ release locks
![Page 58: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/58.jpg)
DeadlockDeadlock exists when two or more transactions are waiting for each other to release a lock on an item
Three conditions must be satisfied for deadlock to occur:
– Concurrency: two transactions claim exclusive control of one resource
– Hold: one transaction continues to hold exclusively controlled resources until its need is satisfied
– Wait: transactions wait in queues for additional resources while holding resources already allocated
![Page 59: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/59.jpg)
Representation of interactions between transactions
Directed graph containing:
- A vertex for each transaction that is currently executing
- An edge from T1 to T2 if T1 is waiting to lock an item that is currently locked by T2
Deadlock exists iff the WFG contains a cycle
Wait-For Graph
T1
T3 T2
![Page 60: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/60.jpg)
60
Distributed DeadlockTwo types of Wait-For Graph
– Local WFG(one per site, only considers transactions on that site)
– Global WFG (union of all LWFGs)
Deadlock may occur – on a single site
(within its LWFG)– between sites
(within the GWFG)
![Page 61: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/61.jpg)
61
Consider the wait-for relationship T1→T2→T3→T4→T1
with T1, T2 on site 1 and T3, T4 on site 2
Distributed Deadlock Example
Site 1
T1
T2
Site 2
T3
T4
![Page 62: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/62.jpg)
62
Managing Distributed DeadlockThree main approaches:
1. Prevention– pre-declaration
2. Avoidance– resource ordering– transaction prioritisation
3. Detection and Resolution
![Page 63: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/63.jpg)
63
PreventionGuarantees that deadlocks cannot occur in the first place
1. Transaction pre-declares all data items that it will access
2. TM checks that locking data items will not cause deadlock
3. Proceed (to lock) only if all data items are available (unlocked)
Con: difficult to know in advance which data items will be
accessed by a transaction
![Page 64: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/64.jpg)
64
AvoidanceTwo main sub-approaches:
1. Resource ordering– Concurrency controlled such that deadlocks won’t happen
2. Transaction prioritisation– Potential deadlocks detected and avoided
![Page 65: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/65.jpg)
65
Resource OrderingAll resources (data items) are ordered
Transactions always access resources in this order
Example:– Data item A comes before item B– All transactions must get a lock on A before trying for a lock
on B– No transaction will ever be left with a lock on B and waiting
for a lock on A
![Page 66: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/66.jpg)
66
Transaction PrioritisationEach transaction has a timestamp that corresponds to the time it was started: ts(T)
– Transactions can be prioritised using these timestamps
When a lock request is denied, use priorities to choose a transaction to abort
– WAIT-DIE and WOUND-WAIT rules
![Page 67: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/67.jpg)
67
WAIT-DIE and WOUND-WAITTi requests a lock on a data item that is already locked by Tj
The WAIT-DIE rule: if ts(Ti) < ts(Tj)
then Ti waits else Ti dies (aborts and restarts with same timestamp)
The WOUND-WAIT rule: if ts(Ti) < ts(Tj)
then Tj is wounded (aborts and restarts with same timestamp)
else Ti waits
note: WOUND-WAIT pre-empts active transactions
![Page 68: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/68.jpg)
68
Detection and Resolution1. Study the GWFG for cycles (detection)
2. Break cycles by aborting transactions (resolution)
Selecting minimum total cost sets of transactions to abort is NP-complete
Three main approaches to deadlock detection:– centralised– hierarchical– distributed
![Page 69: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/69.jpg)
69
Centralised Deadlock DetectionOne site is designated as the deadlock detector (DD) for the system
Each site sends its LWFG (or changes to its LWFG) to the DD at intervals
DD constructs the GWFG and looks for cycles
![Page 70: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/70.jpg)
70
Each site has a DD, which looks in the site’s LWFG for cycles
Each site sends its LWFG to the DD at the next level, which merges the LWFGs sent to it and looks for cycles
These DDs send the merged WFGs to the next level, etc
Hierarchical Deadlock Detection
site 1 site 2 site 3 site 4
deadlockdetectors
![Page 71: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/71.jpg)
71
Responsibility for detecting deadlocks is delegated to sites
LWFGs are modified to show relationships between local transactions and remote transactions
Distributed Deadlock Detection
Site 1
T1
T2
Site 2
T3
T4
![Page 72: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/72.jpg)
72
Distributed Deadlock DetectionLWFG contains a cycle not involving external edges
– Local deadlock, resolve locally
LWFG contains a cycle involving external edges– Potential deadlock – communicate to other sites– Sites must then agree on a victim transaction to abort
![Page 73: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/73.jpg)
Reliability
![Page 74: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/74.jpg)
74
Distribution and ACIDNon-distributed databases aim to maintain atomicity and durability of transactions
– Atomicity: A transaction is either performed completely or not at all
– Durability: Once a transaction has been committed, changes should not be lost because of failure
As with parallel databases, distributed databases use the two-phase commit protocol (2PC) to preserve atomicity
![Page 75: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/75.jpg)
75
Two-Phase Commit (2PC)Distinguish between:
– The global transaction– The local transactions into which the global transaction is
decomposed
![Page 76: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/76.jpg)
76
Phase 1: Voting•Coordinator sends “prepare T” message to all participants
• Participants respond with either “vote-commit T” or “vote-abort T”
•Coordinator waits for participants to respond within a timeout period
![Page 77: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/77.jpg)
77
Phase 2: Decision• If all participants return “vote-commit T” (to commit), send “commit T” to all participants. Wait for acknowledgements within timeout period.• If any participant returns “vote-abort T”, send “abort T” to all participants. Wait for acknowledgements within timeout period.•When all acknowledgements received, transaction is completed.• If a site does not acknowledge, resend global decision until it is acknowledged.
![Page 78: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/78.jpg)
78
Normal OperationC P
prepare T
vote-commit T
commit T
ack
vote-commit Treceived from allparticipants
![Page 79: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/79.jpg)
79
LoggingC P
prepare T
vote-commit T
commit T
ack
<commit T>
<begin-commit T>
<end T>
<ready T>
<commit T>
vote-commit Treceived from allparticipants
![Page 80: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/80.jpg)
80
Aborted TransactionC P
prepare T
vote-commit T
abort T
ack
<abort T>
<begin-commit T>
<end T>
<ready T>
<abort T>
vote-abort T received from at least one participant
![Page 81: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/81.jpg)
81
Aborted TransactionC P
prepare T
vote-abort T
abort T
ack
<abort T>
<begin-commit T>
<end T>
<abort T>
Pvote-abort T received from at least one participant
![Page 82: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/82.jpg)
82
State TransitionsC P
prepare T
vote-commit T
commit T
ack
vote-commit Treceived from allparticipants
INITIAL
WAIT
COMMIT
INITIAL
READY
COMMIT
![Page 83: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/83.jpg)
83
State TransitionsC P
prepare T
vote-commit T
abort T
ack
vote-abort T received from at least one participant
INITIAL
WAIT
ABORT
INITIAL
READY
ABORT
![Page 84: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/84.jpg)
84
State TransitionsC P
prepare T
vote-abort T
abort T
ack
P
INITIAL
WAIT
ABORT
INITIAL
ABORT
![Page 85: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/85.jpg)
85
Coordinator State Diagram
sent: prepare T
recv: vote-abort Tsent: abort T
INITIAL
WAIT
ABORT COMMIT
recv: vote-commit Tsent: commit T
![Page 86: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/86.jpg)
86
Participant State Diagram
recv: prepare Tsent: vote-commit T
recv: commit Tsend: ack
INITIAL
READY
COMMIT ABORT
recv: prepare Tsent: vote-abort T
recv: abort Tsend: ack
![Page 87: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/87.jpg)
87
Dealing with failuresIf the coordinator or a participant fails during the commit, two things happen:
– The other sites will time out while waiting for the next message from the failed site and invoke a termination protocol
– When the failed site restarts, it tries to work out the state of the commit by invoking a recovery protocol
The behaviour of the sites under these protocols depends on the state they were in when the site failed
![Page 88: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/88.jpg)
88
Termination Protocol: CoordinatorTimeout in WAIT
– Coordinator is waiting for participants to vote on whether they're going to commit or abort
– A missing vote means that the coordinator cannot commit the global transaction
– Coordinator may abort the global transaction
Timeout in COMMIT/ABORT– Coordinator is waiting for participants to acknowledge
successful commit or abort– Coordinator resends global decision to participants who
have not acknowledged
![Page 89: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/89.jpg)
89
Termination Protocol: ParticipantTimeout in INITIAL
– Participant is waiting for a “prepare T”– May unilaterally abort the transaction after a timeout– If “prepare T” arrives after unilateral abort, either:
– resend the “vote-abort T” message or – ignore (coordinator then times out in WAIT)
Timeout in READY– Participant is waiting for the instruction to commit or abort
– blocked without further information– Participant can contact other participants to find one that
knows the decision – cooperative termination protocol
![Page 90: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/90.jpg)
90
Recovery Protocol: CoordinatorFailure in INITIAL
– Commit not yet begun, restart commit procedure
Failure in WAIT– Coordinator has sent “prepare T”, but has not yet received
all vote-commit/vote-abort messages from participants
– Recovery restarts commit procedure by resending “prepare T”
Failure in COMMIT/ABORT– If coordinator has received all “ack” messages, complete
successfully– Otherwise, terminate
![Page 91: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/91.jpg)
91
Recovery Protocol: ParticipantFailure in INITIAL
– Participant has not yet voted– Coordinator cannot have reached a decision– Participant should unilaterally abort by sending “vote-abort
T”
Failure in READY– Participant has voted, but doesn't know what the global
decision was– Cooperative termination protocol
Failure in COMMIT/ABORT– Resend “ack” message
![Page 92: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/92.jpg)
92
Communication only between the coordinator and the participants
– No inter-participant communication
Centralised 2PC
C P3
P1P2
P4P5
C
P1P2
P5P4P3 C
prepare T vote-commit Tvote-abort T
commit Tabort T ack
voting phase decision phase
![Page 93: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/93.jpg)
93
• First phase from the coordinator to the participants
• Second phase from the participants to the coordinator
• Participants may unilaterally abort
Linear 2PC
C P3P1 P2 P4 P5
prepare T
voting phase
VC/VA TVC/VA T VC/VA T VC/VA T
C/A TC/A TC/A TC/A TC/A T
decision phase
![Page 94: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/94.jpg)
94
Centralised versus Linear 2PC• Linear 2PC involves fewer messages
•Centralised 2PC provides opportunities for parallelism
• Linear 2PC has worse response time performance
![Page 95: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/95.jpg)
The CAP Theorem
![Page 96: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/96.jpg)
96
The CAP TheoremIn any distributed system, there is a trade-off between:
•ConsistencyEach server always returns the correct response to each request
• AvailabilityEach request eventually receives a response
• Partition ToleranceCommunication may be unreliable (messages delayed, messages lost, servers partitioned into groups that cannot communicate with each other), but the system as a whole should continue to function
![Page 97: Distributed Databases](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816733550346895ddbdf93/html5/thumbnails/97.jpg)
97
The CAP TheoremCAP is an example of the trade-off between safety and liveness in an unreliable system
– Safety: nothing bad ever happens– Liveness: eventually something good happens
We can only manage two of three from C, A, P– Typically we sacrifice either availability (liveness) or
consistency (safety)