scalable and lock-free concurrent dictionaries

31
Scalable and Lock- Free Concurrent Dictionaries Håkan Sundell Philippas Tsigas

Upload: abram

Post on 12-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Scalable and Lock-Free Concurrent Dictionaries. Håkan Sundell Philippas Tsigas. Outline. Synchronization Methods Dictionaries Concurrent Dictionaries Previous results New Lock-Free Algorithm Experiments Conclusions. Synchronization. Shared data structures needs synchronization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable and Lock-Free Concurrent Dictionaries

Scalable and Lock-Free Concurrent Dictionaries

Håkan SundellPhilippas Tsigas

Page 2: Scalable and Lock-Free Concurrent Dictionaries

2

Outline

Synchronization Methods Dictionaries Concurrent Dictionaries

Previous resultsNew Lock-Free Algorithm

Experiments Conclusions

Page 3: Scalable and Lock-Free Concurrent Dictionaries

3

Synchronization

Shared data structures needs synchronization

Synchronization using Locks Mutually exclusive access to whole or parts

of the data structure

P1P2

P3

P1P2

P3

Page 4: Scalable and Lock-Free Concurrent Dictionaries

4

Blocking Synchronization

DrawbacksBlockingPriority InversionRisk of deadlock

Locks: Semaphores, spinning, disabling interrupts etc.Reduced efficiency because of

reduced parallelism

Page 5: Scalable and Lock-Free Concurrent Dictionaries

5

Non-blocking Synchronization

Lock-Free Synchronization Optimistic approach (i.e. assumes no

interference)1. The operation is prepared to later take effect

(unless interfered) using hardware atomic primitives

2. Possible interference is detected via the atomic primitives, and causes a retry• Can cause starvation

Wait-Free Synchronization Always finishes in a finite number of its

own steps.

Page 6: Scalable and Lock-Free Concurrent Dictionaries

6

Dictionaries (Sets)

Fundamental data structure Works on a set of <key,value> pairs Three basic operations:

Insert(k,v): Adds a new item

v=FindKey(k): Finds the item <k,v>v=DeleteKey(k): Finds and removes

the item <k,v>

Page 7: Scalable and Lock-Free Concurrent Dictionaries

7

Previous Non-blocking Dictionaries M. Michael: “High Performance Dynamic

Lock-Free Hash Tables and List-Based Sets”, SPAA 2002 Based on Singly-Linked List

• Linear time complexity! Fast Lock-Free Memory Management

• Causes retries of concurrent search operations! Building-block of Hash Tables

• Assumes each branch is of length <<10. However, Hash Tables might not be

uniformly distributed.

Page 8: Scalable and Lock-Free Concurrent Dictionaries

8

Randomized Algorithm: Skip Lists

William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different

densities, achieves a tree-like behavior

Time complexity: O(log2N) – probabilistic!

1 2 3 4 5 6 7

Head Tail

50%25%…

Page 9: Scalable and Lock-Free Concurrent Dictionaries

9

New Lock-Free Concurrent Skip List

Define node state to depend on the insertion status at lowest level as well as a deletion flag

Insert from lowest level going upwards

Set deletion flag. Delete from highest level going downwards

1 2 3 4 5 6 7D D D D D D D

123

p

123

p D

Page 10: Scalable and Lock-Free Concurrent Dictionaries

10

Overlapping operations on shared data Example: Insert operation

- which of 2 or 3 gets inserted? Solution: Compare-And-Swap

atomic primitive:

CAS(p:pointer to word, old:word, new:word):booleanatomic do

if *p = old then *p := new; return true;

else return false;

12

34

Insert 3

Insert 2

Page 11: Scalable and Lock-Free Concurrent Dictionaries

11

Concurrent Insert vs. Delete operations Problem:

- both nodes are deleted!

Solution (Harris et al): Use bit 0 of pointer to mark deletion status

13

42Delete

Insert

a)b)

13

42 * a)b)

c)

Page 12: Scalable and Lock-Free Concurrent Dictionaries

12

New Lock-Free Dictionary - Techniques Summary Based on Skip Lists

Treated as layers of ordered lists Uses CAS atomic primitive Lock-Free memory management

IBM Freelists Reference counting (Valois+Michael&Scott)

Helping scheme Back-Off strategy All together proved to be linearizable

Page 13: Scalable and Lock-Free Concurrent Dictionaries

13

Experiments

Experiment with 1-30 threads performed on systems with 2 respective 64 cpu’s. Each thread performs 20000 operations,

whereof the first total 50-10000 operations are Insert’s, remaining are equally randomly distributed over Insert, FindKey and DeleteKey’s.

Fixed Skiplist maximum level of 10. Compare with implementation by Michael,

using same scenarios. Averaged execution time of 50 experiments.

Page 14: Scalable and Lock-Free Concurrent Dictionaries

14

SGI Origin 2000, 64 cpu’s.

Page 15: Scalable and Lock-Free Concurrent Dictionaries

15

Linux Pentium II, 2 cpu’s

Page 16: Scalable and Lock-Free Concurrent Dictionaries

16

Conclusions

Our lock-free implementation also includes the value-oriented operations FindValue and DeleteValue.

Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrency Will be available as part of NOBLE software

library, http://www.noble-library.org See Technical Report for full details,

http://www.cs.chalmers.se/~phs

Page 17: Scalable and Lock-Free Concurrent Dictionaries

17

Questions?

Contact Information: Address:

Håkan Sundell vs. Philippas TsigasComputing ScienceChalmers University of Technology

Email:<phs , tsigas> @ cs.chalmers.se

Web: http://www.cs.chalmers.se/~phs/warp

Page 18: Scalable and Lock-Free Concurrent Dictionaries

18

Dynamic Memory Management

Problem: System memory allocation functionality is blocking!

Solution (lock-free), IBM freelists:Pre-allocate a number of nodes, link

them into a dynamic stack structure, and allocate/reclaim using CAS

Head Mem 1 Mem 2 Mem n…

Used 1Reclaim

Allocate

Page 19: Scalable and Lock-Free Concurrent Dictionaries

19

The ABA problem

Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!!

1 764

2 734

Step 1:

Step 2:

Page 20: Scalable and Lock-Free Concurrent Dictionaries

20

The ABA problem

Solution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node

1 * 6 *

2 734

1 1

? ? ?

1

CAS Failes!

New Step 2:

Page 21: Scalable and Lock-Free Concurrent Dictionaries

21

Helping Scheme

Threads need to traverse safely

Need to remove marked-to-be-deleted nodes while traversing – Help!

Finds previous node, finish deletion and continues traversing from previous node

1 42 *1 42 * or

? ?

1 42 *

Page 22: Scalable and Lock-Free Concurrent Dictionaries

22

Back-Off Strategy

For pre-emptive systems, helping is necessary for efficiency and lock-freeness

For really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contention

Solution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially

Page 23: Scalable and Lock-Free Concurrent Dictionaries

23

Non-blocking Synchronization

Lock-Free SynchronizationAvoids problems with locks Simple algorithmsFast when having low contention

Wait-Free SynchronizationAlways finishes in a finite number of

its own steps.• Complex algorithms• Memory consuming• Less efficient in average than lock-free

Page 24: Scalable and Lock-Free Concurrent Dictionaries

24

Full SGI

Page 25: Scalable and Lock-Free Concurrent Dictionaries

25

Full Linux

Page 26: Scalable and Lock-Free Concurrent Dictionaries

26

The algorithm in more detail

Insert:1. Create node with random height2. Search position (Remember drops)3. Insert or update on level 14. Insert on level 2 to top (unless

already deleted)5. If already deleted then HelpDelete(1)

All of this while keeping track of references, help deleted nodes etc.

Page 27: Scalable and Lock-Free Concurrent Dictionaries

27

The algorithm in more detail

DeleteKey1. Search position (Remember drops)2. Mark node at level 1 as deleted, otherwise

fail3. Mark next pointers on level 1 to top4. Delete on level top to 1 while detecting

helping, indicate success5. Free node

All of this while keeping track of references, help deleted nodes etc.

Page 28: Scalable and Lock-Free Concurrent Dictionaries

28

The algorithm in more detail

HelpDelete(level)1. Mark next pointer at level to top2. Find previous node (info in node)3. Delete on level unless already

helped, indicate success4. Return previous node

All of this while keeping track of references, help deleted nodes etc.

Page 29: Scalable and Lock-Free Concurrent Dictionaries

29

Correctness

Linearizability (Herlihy 1991)In order for an implementation to be

linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution

Page 30: Scalable and Lock-Free Concurrent Dictionaries

30

Correctness

Define precise sequential semantics Define abstract state and its interpretation

Show that state is atomically updated Define linearizability points

Show that operations take effect atomically at these points with respect to sequential semantics

Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable

Page 31: Scalable and Lock-Free Concurrent Dictionaries

31

Correctness

Lock-freenessAt least one operation should always

make progress There are no cyclic loop depencies,

and all potentially unbounded loops are ”gate-keeped” by CAS operationsThe CAS operation guarantees that at

least one CAS will always succeed• The algorithm is lock-free