designing concurrent search structure algorithms dennis shasha

30
Designing Concurrent Search Structure Algorithms Dennis Shasha

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Designing Concurrent Search Structure Algorithms

Dennis Shasha

What is a Search Structure?

• Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary.

• Operations are insert key-value pair, delete key-value pair, and search for key-value pair.

How to make a search structure algorithm concurrent

• Naïve approach: use two phase locking (but then at the very least the root is read-locked so lock conflicts are frequent).

• Semi-naïve algorithm: use hierarchical tree locking: lock root; afterwards lock node n only if you hold lock on parent of n. (Still tends to hold locks high in tree.)

How can we do better: fundamental insight

• In a search structure algorithm, all that we really care about is that we implement the dictionary operations correctly.

• Operations on structure need not even be serializable provided they maintain certain constraints.

Train Your Intuition:parable of the library

• Imagine a library with books.

• It’s a little old fashion so there are still card catalogues that identify the shelf where a book is held.

• Bob wants to get a book B.

• Alice is working on reorganizing the library by moving books from shelf to shelf and then changing the card catalogue.

Parable of the library: interleaving of ops

• Bob 1. look up book B in catalogue.

• Bob 2. read “go to shelf S”

• Bob 3. Start walking but see friend.

• Alice 1: move several books from S to S’, leaving a note.

• Alice 2: change catalogue so B maps to S’

• Bob 4: go to S, follow note to S’

Parable of the library: observations

• Not conflict-preserving serializable:Bob Alice (Bob reads catalog then Alice changes it)Alice Bob(Alice modifies S before Bob reads)

• Indeed in no serial execution would Bob go to two shelves.

• Yet execution is completely ok!

Parable of the library: what’s going on?

• All we care about is that 1. structure is ok after Alice finishes.2. Bob gets his book if it’s there

• We want to find a general theory for this.• Ref: Vossen Weikum book and

``Concurrent Search Structure Algorithms'‘ D. Shasha and N. Goodman, ACM Transactions on Database Systems, vol. 13, no. 1,pp. 53-90, March 1988.

Good Structure for any Dictionary Data Structure

• Dictionary holds a set of key-value pairs. Values don’t matter for our theory so consider just the set of keys that could be present, denoted keyspace. Example: all natural numbers.

• From the root (in general, any root), must be able to navigate to a node n such that n either has a key being sought or no node has that key.

Example: binary search tree

50

7010

35

Inset = Keyspace

Inset = {x| x > 50}Inset = {x| x < 50}

Inset = {x| x < 50 and x > 10}

Inset, Outset, Keyset

Inset(n) is the subset of Keyspace that are either in n or could be reachable (according to the rules of the structure) from n

• Edgeset(n,n’) is the subset of Keyspace directed to descendant n’ of n. Union of all edgesets with source n is outset(n)

• Keyset(n) = Inset(n) – Outset(n). The set of keys that are in node n or nowhere.

Notes

Inset(n) = union over all edges (m,n) of inset(m) ^ edgeset(m,n).

• Note that Edgeset(n,n’) need not always be a subset of Inset(n). You’ll see why this is good later.

Example: binary search treeKeyspace is all integers

50

7010

35

Inset = Keyspace; keyset = {50}

Outset = {x|x!=50}

Inset = {x| x > 50} = edgeset(node 50,

node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 10}

= {x| x <= 10}

Inset = {x| x < 50 and x > 10}

edgeset (node 10, node 35)

= {x|x > 10}

Keyset = Inset

Structure Goodness Conditions

• The keysets of the nodes partition the keyspace.So U {Keyset(n) | n is a node} = Keyspaceand if n!=n’ then keyset(n) is disjoint from keyset(n’).

• Edgsets leaving node n are disjoint• Let Existkeys(n) be the keys actually

present at node n. Existkeys(n) is a subset of keyset(n).

Structure Goodness Conditions(applies to each root)

• In the library, suppose that initially, inset(shelf S) = {books | authors begin with “S”}.Afterwards, outset(S) = {books|author names begin with “Sh” or later}

• At end keyset(S) = books having names starting with Sa through Sg. Inset(S’)= books having names starting with Sh through Sz.

Example: library at beginning

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “S”} = edgeset(cat,S)

Keyset = Inset

Inset = {x| x begins with “A”}= edgeset(cat,S) …

Example: library after reshelving

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “Sh” .. “Sz”}

Keyset = Inset

Inset = {x| x begins with “A”}

S’

Inset = {x| x begins with “S”} = edgeset(cat,S)

Outset = {x |x begins with “Sh” or greater}

Example: library after reshelvingand catalog change

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “Sh” .. “Sz”} = edgeset(Cat, S’)

Keyset = Inset

Inset = {x| x begins with “A”}

S’

Inset = {x| x begins with “S” through “Sg”} =

edgset(cat, S)

Outset = {x |x begins with “Sh” or greater}

Observe

• Without the note from S to S’, there would be keys on S’ yet S’ would have a null inset and hence a null keyset.

• This violates the Existkeys part of the structural condition.

• Note also that we can’t eliminate the note from S to S’ even after the catalog is updated. Why?

Execution Goodness

• For a search for an item B beginning at node m, the following invariant holds:

• After any operation of any process, if the search for item B is at node x, then B is in keyset(x) or there is a path from x to node y such that B is in keyset(y) and every edge E along that path has B in its edgeset.

Execution Goodness Proof Sketch

• Provided the search reaches the node having B in its keyset, the search will find B there or will find it nowhere.

• The invariant ensures that the search will not end its search anywhere else.

Execution Goodness Proof

• Why is it that Bob is fine in spite of the fact that the Bob and Alice concurrent execution could never execute serially?

• Because even when Bob is at shelf S, the book Bob is looking for is in edgeset(S,S’) and B is in keyset(S’).

Practical Applications

• Most sophisticated database management systems use some version of the library parable in their B-trees, hash structures, etc.

• Reason: locks need not be held as long and can be held lower in the tree.

• B trees for example have links at the leaf level. So a split looks like this:

B tree simplified (two vals per node)

50

701, 7

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset

B tree insert(32): split left leaf at 15Only 1,7 node needs to be locked

50

701, 7 32

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 15}

= {x| x <= 15}

Edgeset = {x|x > 15}

Readjust parent (so lock it briefly)

15, 50

701, 7 32

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 15}

= {x| x <= 15}

Edgeset = {x|x > 15}

Can Generalize Using Model

• Above algorithm is due to Lehman and Yao and is called the B-link algorithm. Long journal article to present and prove.

• Now can generalize to any structure. Ensure structure works and invariant holds on execution.

• Also possible to invent a new algorithm making direct use of the model.

High Concurrency Without Links:Give-up algorithm

• Explicitly record the description of inset of each node in the node.

• Search(B) descends. If B is ever not in the inset of the current node, then give up and start over.

• Happens rarely enough that performance is as good as B-link for searches. Less work for deletions.

• Proof is immediate.

Conclusion

• Simple framework for all search structures. Handful of concepts: keyspace, inset, edgeset, outset, keyset.

• Can be a guide to coding.

Exercise

• When can Alice remove the note directing those seeking certain books to go from S to S’?

• Try to design a merge algorithm for a B-tree in the give-up setting. Lock as little and as low as possible.