distributed -...
TRANSCRIPT
DistributedTransactions
Brian [email protected]
Info
rmat
ionst
eknolo
gi
Transactions
SAS
Travel reservationBegin_transaction
if(reserve(SAS.CPH2Paris)==full) Abortif(reserve(Paris.Hotel)==full) AbortIf(reserve(KLM.Paris2Ams)==full) Abort
End_transaction
Hotel
KLM
Travel reservation
reserve(SAS.CPH2Paris)reserve(Paris.Hotel)reserve(KLM.Paris2Ams)
Execute as all-or-noting despite• Constraints• Concurrency• Failures
Info
rmat
ionst
eknolo
gi
TransactionsTransaction: A sequence of object operations that must be executed with ACID properties
Atomicity: the transaction is either performed completely or not at allConsistency: leads from one consistent state to anotherIsolation: is executed in isolation from other transactionsDurability: once completed it is durable
Used in Databases and Distributed Systems
Info
rmat
ionst
eknolo
gi
Transaction Concepts
1. ACID PropertiesAtomicityConsistencyIsolationDurability
2. Transaction Commands: Commit vs. Abort
3. Identify Roles of Distributed Components
4. Flat vs. Nested Transactions
Info
rmat
ionst
eknolo
gi
AtomicityTransactions are either performed completely or no modification is done.
I.e perform successfully every operation in cluster or none is performed
Start of a transaction is a continuation point to which it can roll back.
End of transaction is next continuation point.Changes to the state are atomic:A jump from the initial state to the result state without any observable intermediate state.- Changes include:
Database changes.Messages to outside world.Actions on transducers.(testable / untestable)
Info
rmat
ionst
eknolo
gi
ConsistencyShared resources should always be consistent.Inconsistent states occur during transactions:
hidden for concurrent transactionsto be resolved before end of transaction.
Application defines consistency and is responsible for ensuring it is maintained.
e.g for scenario, consistency means “no money is lost”: at the end is true, but in between operations may be not
Transactions can be aborted if they cannot resolve inconsistencies.
Info
rmat
ionst
eknolo
gi
IsolationEach transaction accesses resources as if there were no other concurrent transactions.Even though transactions execute concurrently, it appears to the outside observer as if they execute in some serial order.Modifications of the transaction are not visible to other resources before it finishes.Modifications of other transactions are not visible during the transaction at all.Implemented through:
two-phase locking oroptimistic concurrency control.
Info
rmat
ionst
eknolo
gi
Durability
A completed transaction is always persistent (though values may be changed by later transactions).Once a transaction completes successfully (commits), its changes to the state survive failures (what is the failure model ? ).- The only way to get rid of what a committed transaction has done is to execute a compensating transaction (which is, sometimes, impossible).
Modified resources must be held on persistent storage before transaction can complete.
May not just be disk but can include properly battery-backed RAM or the like of EEPROMs.
Info
rmat
ionst
eknolo
gi
2.2 Transaction Commands
Begin: Start a new transaction.
Commit: End a transaction.Store changes made during transaction.Make changes accessible to other transactions.
Abort:End a transaction.Undo all changes made during the transaction.
Info
rmat
ionst
eknolo
gi
Transaction Primitives
Write data to a file, a table, or otherwiseWRITE
Read data from a file, a table, or otherwiseREAD
End the transaction and restore the old valuesABORT_TRANSACTION
Terminate the transaction and try to commitEND_TRANSACTION
Make the start of a transactionBEGIN_TRANSACTION
DescriptionPrimitive
Info
rmat
ionst
eknolo
gi
Transaction Roles
SAS
Travel reservationBegin_transaction
if(reserve(SAS.CPH2Paris)==full) Abortif(reserve(Paris.Hotel)==full) AbortIf(reserve(KLM.Paris2Ams)==full) Abort
End_transaction
Hotel
KLM
Transactional Client•issues the transaction
TransactionalCoordinator•Controls execution ofentire transaction•begin / commit / abort
Transactional Server•executes its operations from transactions•commits / aborts these
Info
rmat
ionst
eknolo
gi
Coordinator
Coordinator plays key role in managing transaction.Coordinator is the component that handles begin / commit / abort transaction calls.Coordinator allocates system-wide unique transaction identifier.Different transactions may have different coordinators.
Info
rmat
ionst
eknolo
gi
Transactional ServerEvery component with a resource accessed or modified under transaction control.Transactional server has to know coordinator.Transactional server registers its participation in a transaction with the coordinator.Transactional server has to implement a transaction protocol (two-phase commit).
Info
rmat
ionst
eknolo
gi
Transactional ClientOnly sees transactions through the transaction coordinator.Invokes services from the coordinator to begin, commit and abort transactions.Implementation of transactions are transparent for the client.Cannot tell difference between server and transactional server.
Info
rmat
ionst
eknolo
gi
Execution of a transaction
Successful
openTransactionoperationoperation
operation
closeTransaction
Aborted by client
openTransactionoperationoperation
operation
abortTransaction
Aborted by server
openTransactionoperationoperation
server abortstransaction
operation ERRORreported to client
Coordinator interface:openTransaction() -> trans;closeTransaction(trans) -> (commit,abort);abortTransaction(trans);
Info
rmat
ionst
eknolo
gi
Reasons for abortA participant may decide to abort a transaction for any reason, especially
Logical Constraints (cannot withdraw ifnegative balance)Conflicts in concurrent executionDeadlockChrashed or non-responsive participand
Info
rmat
ionst
eknolo
gi
Bank Exampledeposit(amount)
deposit amount in the accountwithdraw(amount)
withdraw amount from the accountgetBalance() -> amount
return the balance of the accountsetBalance(amount)
set the balance of the account to amount
create(name) -> accountcreate a new account with a given name
lookUp(name) -> accountreturn a reference to the account with the given name
branchTotal() -> amountreturn the total of all the balances at the branch
Operations of the Branch interface
Operations of the Account interface
Info
rmat
ionst
eknolo
gi
A banking transaction
Transaction T:a.withdraw(100);b.deposit(100);c.withdraw(200);b.deposit(200);
Info
rmat
ionst
eknolo
gi
Concurrent transactions
Transaction T: balance = b.getBalance();b.setBalance(balance*1.1);a.withdraw(balance/10)
Transaction U:
balance = b.getBalance();b.setBalance(balance*1.1);c.withdraw(balance/10)
balance = b.getBalance(); $200balance = b.getBalance(); $200
b.setBalance(balance*1.1); $220b.setBalance(balance*1.1); $220a.withdraw(balance/10) $80
c.withdraw(balance/10) $280
the lost update problem
Info
rmat
ionst
eknolo
gi
Concurrent transactions
Transaction V: a.withdraw(100)b.deposit(100)
Transaction W:
aBranch.branchTotal()
a.withdraw(100); $100total = a.getBalance() $100total = total+b.getBalance() $300total = total+c.getBalance()
b.deposit(100) $300
the inconsistent retrievals problem
Info
rmat
ionst
eknolo
gi
Concurrent transactions – serial equivalence
If “correct” transactions are done one at a time in some order, the combined effect will also be correct.A serially equivalent interleaving: An interleaving of the operations of transactions in which the combined effect is the same as if the transactions had been performed one at a time in some order.Serial equivalence prevents lost updates and inconsistent retrievals.
Info
rmat
ionst
eknolo
gi
A serially equivalent interleaving
Transaction T: balance = b.getBalance()b.setBalance(balance*1.1)a.withdraw(balance/10)
Transaction U: balance = b.getBalance()b.setBalance(balance*1.1)c.withdraw(balance/10)
balance = b.getBalance() $200b.setBalance(balance*1.1) $220
balance = b.getBalance() $220b.setBalance(balance*1.1) $242
a.withdraw(balance/10) $80c.withdraw(balance/10) $278
Info
rmat
ionst
eknolo
gi
A serially equivalent interleaving
Transaction V: a.withdraw(100);b.deposit(100)
Transaction W:
aBranch.branchTotal()
a.withdraw(100); $100
b.deposit(100) $300
total = a.getBalance() $100
total = total+b.getBalance() $400
total = total+c.getBalance()...
Info
rmat
ionst
eknolo
gi
Serializability: ExerciseBEGIN_TRANSACTION
x = 0;x = x + 3;
END_TRANSACTION
(c)
BEGIN_TRANSACTIONx = 0;x = x + 2;
END_TRANSACTION
(b)
BEGIN_TRANSACTIONx = 0;x = x + 1;
END_TRANSACTION
(a)
?x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3;Schedule 3
?x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3;Schedule 2
?x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3Schedule 1
Info
rmat
ionst
eknolo
gi
Concurrency ControlGeneral organization of managers for handling distributed transactions.
Info
rmat
ionst
eknolo
gi
Concurrency ControlOptimistic
Transaction does what it wants and validates changesprior to commit
Check if objects have been changed by committed transactions since they were openedConflicts are rare, maximum concurrency, no deadlocksRe-execute ⇒ re-conflict.
Timestamp basedEach transaction Ti is given timestamp ts(Ti)If Ti wants to do an operation that conflicts with Tj
Abort Ti if ts(Ti) < ts(Tj)
Lock based (most common)Read/write locks acquired on objects2 phase locking protocol ⇒ serializabilitystrict 2 phase locking protocol ⇒ no cascading aborts
Info
rmat
ionst
eknolo
gi
Two-Phase Locking
Two-phase locking.
Info
rmat
ionst
eknolo
gi
Strict Two-Phase Locking
Strict two-phase locking.
Info
rmat
ionst
eknolo
gi
Deadlocks
B
A
Waits for
Held by
Held by
UT
Waits for
•Deadlock =def Eternal cyclic wait•Some schedules may deadlock•wait-for graphs
T:Lock(a);Lock(b);
U:Lock(b);Lock(a);
•Circumventing Deadlocks•Prevention•Avoidance•Detection+Recovery
T:Lock(a);
Lock(b);
U:
Lock(b);
Lock(a);
Info
rmat
ionst
eknolo
gi
Atomic CommitmentProtocols (ACP)
A correct ACP guaranteesAll the participants that reach a decision, reach the same decision.Decisions are not reversible.A commit decision can only be reached if all participants votes to commit.If there are no failures and all the participants voted to commit, the decision will be commit.At any point, if all failures are repaired, and no new failures are introduced, then all participants eventually reach a decision.
Info
rmat
ionst
eknolo
gi
2 Phase CommitUnanumously vote for outcome!Voting Phase:
Coordinator asks participants for their voteson wheter to abort or commit
Decision Phase:If all votes to commit:
coordinator sends commit to all
If one votes to abort.coordinator send abort to all
Info
rmat
ionst
eknolo
gi
2PCC P1 P2 PnEnd_transaction
CCanCommit?
P1 P2 Pn
End_transaction
CanCommit?
Info
rmat
ionst
eknolo
gi
2PCC P1 P2 Pn
CVote: Commit
P1 P2 Pn
Vote: abort
Vote: C/A
Info
rmat
ionst
eknolo
gi
2-Phase CommitC P1 P2 Pn
CDecision: Abort
P1 P2 Pn
Decision: abort
Decision: A/C
Info
rmat
ionst
eknolo
gi
Performance of the two-phase commit protocol
if there are no failures, the 2PC involving N participants requires
N canCommit? messages and replies, followed by NdoCommit messages.
the cost in messages is proportional to 3N, and the cost in time is three rounds of messages. The haveCommitted messages are not counted
there may be arbitrarily many server and communication failures2PC is is guaranteed to complete eventually, but it is not possible to specify a time limit within which it will be completed
delays to participants in uncertain statesome 3PCs designed to alleviate such delays
• they require more messages and more rounds for the normal case
Info
rmat
ionst
eknolo
gi
Performance of 2PCAssuming N participating servers:
3 message rounds(N-1) Voting requests from coordinator to servers.(N-1) Votes(N-1) Completion requests from coordinator to servers.Thus, Linear: 3(N-1) messages
2PC is is guaranteed to complete eventually, but it is not possible to specify a time bound
delays to participants in uncertain statesome 3PCs designed to alleviate such delaysthey require more messages and more rounds for the normal case
Info
rmat
ionst
eknolo
gi
Recovery of 2PCRole Status Action of recovery managerCoordinator prepared No decision had been reached before the server failed. It sends
abortTransaction to all the servers in the participant list and adds thetransaction status abortedin its recovery file. Same action for stateaborted. If there is no participant list, the participants will eventuallytimeout and abort the transaction.
Coordinator committed A decision to commit had been reached before the server failed. Itsends a doCommit to all the participants in its participant list (in caseit had not done so before) and resumes the two-phase protocol at step 4(Fig 13.5).
Participant committed The participant sends a haveCommitted message to the coordinator (incase this was not done before it failed). This will allow the coordinatorto discard information about this transaction at the next checkpoint.
Participant uncertain The participant failed before it knew the outcome of the transaction. Itcannot determine the status of the transaction until the coordinatorinforms it of the decision. It will send a getDecision to the coordinatorto determine the status of the transaction. When it receives the reply itwill commit or abort accordingly.
Participant prepared The participant has not yet voted and can abort the transaction.Coordinator No action is required.done
Info
rmat
ionst
eknolo
gi
Transaction RecoveryRecovery concerns data durability and failure atomicity.A server keeps data in volatile memory and records committed data in a recovery file.Recovery manager
Save data items in permanent storageRestore the server’s data items after a crashReorganize the recovery file for better performance
Logging or Shadow versions
Info
rmat
ionst
eknolo
gi
Recovery managerThe task of the Recovery Manager (RM) is:
to save objects in permanent storage (in a recovery file) for committed transactions;to restore the server’s objects after a crash;to reorganize the recovery file to improve the performance of recovery;to reclaim storage space (in the recovery file).
media failuresi.e. disk failures affecting the recovery fileneed another copy of the recovery file on an independent disk. e.g. implemented as stable storage or using mirrored disks
we deal with recovery of 2PC separately (at the end)we study logging (13.6.1) but not shadow versions (13.6.2)
•
Info
rmat
ionst
eknolo
gi
Recovery FilesThe recovery files contain sufficient information to ensure the transaction is committed by all the servers.Object Values:
<O1, 4>, <O2,”Hello World”>, …
Intentions list:Transaction ID + a list of data item names and values altered by a transaction.
T42: <O12, 17>, <O23,37>, …
When a server prepares to commit, it must first have saved the intentions list in its recovery file
Transaction StatusTransaction identifier + prepared, committed, or aborted
Info
rmat
ionst
eknolo
gi
LoggingA log contains history of all the transactions performed by a server.The recovery file contains a recent snapshot of the values of all the data items in the server followed by a history of transactions.When a server is prepared to commit, the recovery manager appends all the data items in its intentions list to the recovery file.
Info
rmat
ionst
eknolo
gi
Log for Banking Service
Forward recoveryReplay committed transactions from last checkpoint
Backward recoveryRead log backwards and restore untill all objectsrestoredEg:
P7: U aborted ⇒ skip P4: T committed ⇒ restore <A, 96> <B,204>P0: C not restored ⇒ restore <C,300>
Info
rmat
ionst
eknolo
gi
Recovery by LoggingRecovery of data items
Recovery manager is responsible for restoring the server’s data items.The most recent information is at the end of the log.A recovery manager gets corresponding intentions list from the recovery file.
Reorganizing the recovery fileCheckpointing: the process of writing the current committed values (checkpoint) to a new recovery file.Can be done periodically or right after recovery.
Info
rmat
ionst
eknolo
gi
Shadow VersionsShadow versions technique uses a map to locate versions of the server’s data items in a file called a version store.The versions written by each transaction are shadows of the previous committed versions.When prepared to commit, any changed data are appended to the version store. When committing, a new map is made. When complete, new map replaces the old map.
Info
rmat
ionst
eknolo
gi
Shadow Versions
Info
rmat
ionst
eknolo
gi
2PC – Stable Storage
Participant:If YES: records vote and object updates in permanent storage, and then votesIF NO: aborts (discards or undo’esupdates), and votes
Coordinator Records vote, sends it to all participants
C P1 P2 Pn
CDecision: Abort
P1 P2 Pn
Decision: abort
Decision: A/CWrite to stable storage
Info
rmat
ionst
eknolo
gi
2PC - FailuresC P1 P2 Pn
1. CanCommit message lost• Timeout and abort
2. Vote message lost• Coordinator times out and aborts
3. Decision message lost• Yes voting participant uncertain!• Ask coordinator or other participants
about verdict4. Crash Participant: prepared
• Coordinator times out and aborts5. Crash Participant: After Abort
• Abort6. Crash Participant: After Yes
• Participant uncertain!• Ask coordinator or other participants
about verdict when rebooted7. Crash Coordinator: Before or after decision
• Yes voting participants uncertain• Elect new coordinator (CAREFULL!)
1
2
3
4
65
7
8
Info
rmat
ionst
eknolo
gi
2.2 Log and 2PC
Trans:T Coord’r:T Trans:T Trans: U Part’pant:U Trans:U Trans: U
prepared part’pant
list: . . .
committed prepared Coord’r: . . uncertain committed
intentions
list
intentions
list
Info
rmat
ionst
eknolo
gi
Summary of transaction recovery
Transaction-based applications have strong requirements for the long life and integrity of the information stored.Transactions are made durable by performing checkpoints and logging in a recovery file, which is used for recovery when a server is replaced after a crash. Users of a transaction service would experience some delay during recovery.It is assumed that the servers of distributed transactions exhibit crash failures and run in an asynchronous system,
but they can reach consensus about the outcome of transactions because crashed servers are replaced with new processes that can acquire all the relevant information from permanent storage or from other servers
•
Info
rmat
ionst
eknolo
gi
Failure model for the commit protocols
Recall the failure model for transactions in Chapter 12this applies to the two-phase commit protocol
Commit protocols are designed to work inasynchronous system (e.g. messages may take a very long time)servers may crash messages may be lost. assume corrupt and duplicated messages are removed. no byzantine faults – servers either crash or they obey their requests
2PC is an example of a protocol for reaching a consensus. Chapter 11 says consensus cannot be reached in an asynchronous system if processes sometimes fail.however, 2PC does reach consensus under those conditions. because crash failures of processes are masked by replacing a crashed process with a new process whose state is set from information saved in permanent storage and information held by other processes. •
Info
rmat
ionst
eknolo
gi
Nested transactions
Client
X
Y
Z
X
Y
M
NT1
T2
T11
Client
P
TT12
T21
T22
(a) Flat transaction (b) Nested transactions
TT
Figure 13.1A flat client transaction completes each of its requests before going on to the next one. Therefore, each transaction accesses servers’ objects sequentially
In a nested transaction, the top-level transaction can open subtransactions, and each subtransaction can open further subtransactions down to any depth of nesting
In the nested case, subtransactions at the same level can run concurrently, so T1 and T2 are concurrent, and as they invoke objects in different servers, they can run in parallel.
Info
rmat
ionst
eknolo
gi
Committing Nested Transactions
Cannot use same mechanism to commit nested transactions as:
subtransactions can abort independent of parent.subtransactions must have made decision to commit or abort before parent transaction.
Top level transaction needs to be able to communicate its decision down to all subtransactions so they may react accordingly.Hierarchical versions of 2PC
Info
rmat
ionst
eknolo
gi
Provisional CommitSubtransactions vote either:
aborted orprovisionally committed.
Abort is handled as normal.Provisional commit means that coordinator and transactional servers are willing to commit subtransaction but have not yet done so.
Info
rmat
ionst
eknolo
gi
Nested transactions
T : top-level transactionT1 = openSubTransaction T2 = openSubTransaction
openSubTransaction openSubTransactionopenSubTransaction
openSubTransaction
T1 : T2 :
T11 : T12 :
T211 :
T21 :
prov.commit
prov. commit
abort
prov. commitprov. commit
prov. commit
commit
Info
rmat
ionst
eknolo
gi
CORBA Transaction Service
ApplicationObjects
CORBAfacilities
CORBAservices
Object Request Broker
Transaction
Info
rmat
ionst
eknolo
gi
IDL Interfaces
Object Transaction Service defined through three IDL interfaces:
Current
Coordinator
Resource
Info
rmat
ionst
eknolo
gi
Currentinterface Current {
void begin() raises (...);void commit (in boolean report_heuristics)
raises (NoTransaction, HeuristicMixed,HeuristicHazard);
void rollback() raises(NoTransaction);Status get_status();string get_transaction_name();Coordinator get_control();Coordinator suspend();void resume(in Coordinator which)
raises(InvalidControl);};
Info
rmat
ionst
eknolo
gi
Coordinatorinterface Coordinator {Status get_status();Status get_parent_status();Status get_top_level_status();boolean is_same_transaction(in Coordinator tr);boolean is_related_transaction(in Coordinator tr);RecoveryCoordinator register_resource(
in Resource r) raises(Inactive);void register_subtran_aware(
in SubtransactionAwareResource r)raises(Inactive, NotSubtransaction);
...};
Info
rmat
ionst
eknolo
gi
Resourceinterface Resource {
Vote prepare();void rollback() raises(...);void commit() raises(...);void commit_one_phase raises(...);void forget();
};interface SubtransactionAwareResource:Resource {
void commit_subtransaction(in Coordinator p);void rollback_subtransaction();
};
END