document6

In The Cloud

SS-2014 Uni-Konstanz (31-Aug-2014)

Multiversion Concurrency Control, Theory and Algorithm (PHILIP A. BERNSTEIN and NATHAN

GOODMAN) Waleed Abrar

Uni-Konstanz

ABSTRACT By the advent of multitasking and multiprocessing

there is a need for parallel processing in databases

as the number of requests and parallel users for a

databases are increasing so the author of the paper

introduced a very unique technique which tends to

enhance the performance of the databases

processing dramatically and make them consistent

and transaction failures are minimized.

Their main concept is to create a new

version of the object whenever a write request

comes from the user so the databases has the huge

repository to select from and thus enhance the

concurrency control feature of the DBMS and then

Apply that theory on some of the well-known

Algorithm like Multiversion timestamp ordering

and mixed methods as well. . Keywords MVCC, Concurrency control, Serializability, View Serializable, Time stamp ordering. Phase Locking. 1. INTRODUCTION There are mainly two strategies while working with transactions, one is optimistic and other is pessimistic, there is one other called as “hybrid” but the author of the paper doesn’t show that much interested in it. So the Optimistic strategy believes that the conflict are less often during the transaction processing so transaction are validated in the end. The pessimistic strategy believes conflicts are always there so it locks the object unless it’s being released or committed, the authors of the paper introduced different strategies to verify the correctness of the schedule generated by the Multiversion like view Serializability, Conflict Serializability and Serializable graph and an intuitive way to visualize the Serializability of the transaction. So as we know the correctness criteria of any given transaction schedule (A schedule is basically a group of read and writes for a particular transaction) is that even if its executed in parallel with other transactions, the final result should be just like if transactions are executed serially. So any given

sequence of schedule is correct if it’s Serializable (GOODMAN, 1983). Lastly the author of the paper introduced mathematical proofs for the claims about the validity of the schedules and introduced some new algorithms based on exiting algorithm’s regarding Multiversion concurrency control Like Locking, Timestamp and a combination of both

2. INFLUENCE

The authors of this paper doesn’t bound themselves with the practical implementation of their claims on any DBMS which rather makes the paper one dimensional, but because of this one dimensionality it’s not dependent upon the architecture and they stick to formal verification (mathematical proof) that’s why people are still working on their proposed models for verification and making their model more and more flexible to incorporate other algorithms. There are 15 citations of this paper in the year 2014 which can easily explain that it’s not dead meat and still lot of work is built upon their research. Their idea about mapping the transaction in form of graphs and then deducing the relationship between reads and writes at the page level is quiet unique. The introduction of R-F (read from) and R-T (relations) and their idea about “Serialization graph testing” is also very helpful as you can easily identify if any particular schedule is having Conflicts and actually is Serializable or not. If the Graph has the loop then the Schedule in not Serializable (GOODMAN, 1983) and Thanks to their mathematical proof we can say that “Every

Conflict Serializable schedule is View Serializable as well”. View Serializable is all the possible execution that can exist for a transaction that can created and executed and their effect is of just like serial execution. To elaborate more let me show with as example (Santoso, 2010)

https://creativecommons.org/licenses/by-sa/3.0/

Fig.1: A transaction schedule (T1, T2, T3)

Fig.2: Conflict Serializability Testing

Here I am not going to explain how the conflict graph is created but actually to view their concept, that in case of loop the schedule is not Serializable. The conflicting order is always W (write)-R (read), R-W, or W-W. So Fig 1 is just any schedule that I took for the example and Fig 2 is its Conflict Graph as you can see there is a loop between T1 and T2 so we can say that it’s not conflict Serializable and thus not view Serializable. Some problems that I think are worth pointing out while reading the paper is that actually the strategy and the protocol to use for concurrency control is relative and depend upon the situation not one protocol is beneficial in all situation. For example in a high update scenario it’s better to assume that there are often conflicts so optimistic approach is bad where as in less update scenario like where there are mostly reads its bad to use locking because its overhead, So they didn’t throw much light on that. Secondly yes Multiversion concurrency control makes a database more flexible as it can have many version and in case of any conflict other versions might be used, but how many version that’s actually not well elaborated and the cost for flexibility like complex architecture and addition space for version storage is also not explained. Flexibility introduced by MVCC can be seen in the below figure which explains the flexibility that MVCC can introduce in the DB. (Vossen, 2002)

Fig.3: Normal locking just R - R are compatible

Fig.4: Here R-R, R-W, W-R are also compatible

3. RATING

As an honest opinion the paper was well structured, as before moving to any critical section the author makes up the reader mind and prepare him about the content before actually going into depth as if you see he first explains about the algorithm itself then the modifications they have performed and then the mathematical proof, but as you move to some later section its getting ambiguous as he can just show it with commutative or associative properly but he tends to take in deep set theory and some of the notation he uses was not very good Like TS 1 and TS2 and then TS 2.1.1 . All in all I have to say that the start is very good but as you go further it becomes boring and mathematical proof for the model are good but the notation and the way they are presented is really bad. Actually the paper was not very hard because I took a course from Prof. Dr. Marc H. Scholl in Transaction information System so I do have some background but the paper actually disappoint me as I have mentioned above as well that, they explain the algorithm but they didn’t explain the context in detail where one strategy should be used and where others are effective. And I have taken a course in Data warehousing I think that now a days people have moved there attention from OLTP (transaction processing) to OLAP (Analytical processing) so one transaction doesn’t matter but now batches are important, so what is not important any more, why is more important but it’s good to learn how actually the lowest level of DBMS (pages) actually work.

4. LESSON LEARNET:-

Lessons learnt can be summarized as following

- There is no perfect protocol that one can implement in every situation and it depend upon the operational environment.

- The more version we keep the more parallelism we can allow for the particular protocol.

- Have to find a trade-off between memory consumed and flexibility in DB.

- Long and short transaction are handled well

and starvation can be controlled because

read write and write read locks are

compatible.

5. REFERENCES:-

Some of the interesting papers found related to

research done by “PHILIP A. BERNSTEIN and

NATHAN GOODMAN” are

1) “High-Performance Concurrency

Control Mechanisms for Main-Memory

Databases”

(http://arxiv.org/pdf/1201.0228v1.pdf)

In this paper they have further enhanced

concurrency controls methods and are

looking at full memory-resident databases

2) “Serializable Isolation for Snapshot

Databases”

This paper is a PHD thesis of Michael

Cahill's and this paper further elaborated

the correctness criteria of transaction and

snapshot isolation with additional locks to

ensure Serializability.

3) “Concurrency Control: Methods,

Performance, and Analysis” This paper actually implemented the theory

presented by (GOODMAN, 1983) and perform

analysis and actually add another dimension to

the concurrency control and some methods to

improve locking performance.

Works Cited

GOODMAN, P. A. (1983). Multiversion

Concurrency Control-Theory and

Algorithms. Harvard University. ACM.

RH, I. (1999, 2 10). How DigiCash Blew Everything.

Retrieved from Next! Magazine:

http://cryptome.org/jya/digicrash.htm

Santoso, T. (2010, 09 18). How to check for view

serializable and conflict serializable.

Vossen, G. W. (2002). Transactional Information

Systems:. © 2002 Morgan Kaufmann.

http://arxiv.org/pdf/1201.0228v1.pdf

document6

Documents