lock-free resizeable concurrent tries

Lock-Free Resizeable Concurrent Tries

Aleksandar Prokopec, Phil Bagwell, Martin OderskyLAMP, École Polytechnique Fédérale de Lausanne

Switzerland

Motivation

xs.foreach { x => doSomething(x)}

Motivation

xs.foreach { x => doSomething(x)}

ys = xs.map { x => x * (-1)}

Motivation

ys = new ConcurrentMapxs.foreach { x => ys.insert(x * (-1))}

Hash Array Mapped Tries (HAMT)

0 = 0000002

016 = 0100002

0 164 = 0001002

4 = 0001002

12 = 0011002

0 4 12

4 12 16 20 25 33 37

0 1 8 93

4 12 16 20 25 33 37

0 1 8 93

Too much space!

4 12 16 20 25 33 37

0 1 8 93

4 12 16 20 25 33 37

0 1 8 93

Linear search at every level - slow!

4 12 16 20 25 33 37

0 1 8 93

Solution – bitmap index!Relying on BITPOP instruction.

48 571 0 1 0

48 5710

BITPOP(((1 << ((hc >> lev) & 1F)) – 1) & BMP)

4 12 16 20 25 33 37

0 1 8 93

For 32-way tries – 32-bit bitmap.

4 12 16 20 25 33 37

0 1 8 93

4 12 16 20 25 33 37

0 1 93

4 9 12 16 20 25 33 37

Remove compresses the trie.

• advantages:• low space consumption and shrinking• no contiguous memory region required• fast – logarithmic complexity, but with a low

constant factor• used as efficient immutable maps• no global resize phase – real time

applications, potentially more scalable concurrent operations?

Concurrent Trie (Ctrie)

• goals:• thread-safe concurrent trie• maintain the advantages of HAMT• rely solely on CAS instructions• ensure lock-freedom and linearizability

• lookup – probably same as for HAMT

CAS instruction

CAS(address, expected_value, new_value)

Atomically replaces the value at the address with the new_value if it is equal to the expected_value.

Returns true if successful, false otherwise.

May fail spuriously.

Lock-freedom

If multiple threads execute an operation, at least one of them will complete the operation within a finite number of steps.

Lock-freedom

do { a = READ(addr) b = a + 1 } while (!CAS(addr, a, b))

Lock-freedom

def counter() do { a = READ(addr) b = a + 1 } while (!CAS(addr, a, b))

Insertion

4 9 12 16 20 25 33 37

17 = 0100012

Insertion

4 9 12 16 20 25 33 37

17 = 010001216 17

1) allocate

Insertion

4 9 12 20 25 33 37

17 = 010001216 17

2) CAS

Insertion

4 9 12 20 25 33 37

17 = 010001216 17

Insertion

4 9 12 33 37

18 = 0100102

Insertion

4 9 12 33 37

18 = 0100102

1) allocate16 17 18

Insertion

4 9 12 33 37

18 = 0100102

2) CAS 16 17 18

Insertion

4 9 12 33 37

18 = 0100102

2) CAS 16 17 18

Unless…

Insertion

4 9 12 33 37

18 = 0100102

T1-1) allocate16 17 18

Unless…28 = 0111002

Insertion

4 9 12

18 = 0100102

Unless…28 = 0111002

20 25 28 T2-1) allocate

Insertion

4 9 12

18 = 0100102

28 = 0111002

20 25 28

T2-2) CAS

Insertion

4 9 12

18 = 0100102

T1-2) CAS

16 17 18

28 = 0111002

20 25 28

T2-2) CAS

Insertion

4 9 12

18 = 0100102

16 17 18

28 = 0111002

20 25 28

Lost insert!

Insertion – 2nd attempt

4 9 12

0 1 3 16 17

Solution: I-nodes

4 9 12

0 1 3 16 17

18 = 0100102

28 = 0111002

4 9 12

0 1 3 16 17

18 = 0100102

28 = 0111002

16 17 18

20 25 28 T2-1) allocate

T1-1) allocate

4 9 12

0 1 3 16 17

16 17 18

20 25 28

T2-2) CAS

T1-2) CAS

4 9 12

0 1 3 16 17 18

20 25 28

4 9 12

0 1 3 16 17 18

20 25 28

Idea: once added to the Ctrie, I-nodes remain present.

Remove

4 9 12

0 1 3 16 17 18

20 25 28

Idea: same logic as insert.

Remove

4 9 12

0 1 3 16 17 18

20 25 28

Remove

4 9 12

0 1 3 16 17 18

20 25 28

16 18 1) allocate

Remove

4 9 12

0 1 3 16 17 18

20 25 28

2) CAS

Remove

4 9 12

0 1 3 16 18

20 25 28

Remove

4 9 12

0 1 3 18

20 25 28

Remove

4 9 12

0 1 3 18

Remove

4 9 12

0 1 18

Remove

0 1 18

Remove

Ctrie is not compact => could be faster

Remove – 2nd attempt

4 9 12

0 1 3 18

20 25 28 3) allocate18 20 25 28

4 9 12

0 1 3 18

20 25 284) CAS

18 20 25 28

4 9 12

0 1 3 18

20 25 284) CAS

18 20 25 28

Not correct.

4 9 12

T1-3) allocate

18 20 25 28

20 25 28

T2-1) allocate17 18

T1 – compressT2 – insert 17

4 9 12

T1-4) CAS

18 20 25 28

20 25 28

T2-2) CAS17 18

T1 – compressT2 – insert 17

Remove – 3rd attempt

4 9 12

0 1 3 18

20 25 28

Idea: disallow insertions as you do compression

4 9 12

T1-3) allocate

20 25 28

T2-1) allocate17 18

T-node18

4 9 12

T1-4) CAS

20 25 28

T2-2) CAS17 18

4 9 12

T1-4) CAS

20 25 28

T2-2) CAS failed - repeat17 18

4 9 12

T1-5) allocate

20 25 28

T2-1) do the same as T1, then repeat

18 20 25 28

4 9 12

T1-6) CAS

20 25 2818

18 20 25 28

Is this still lock-free?

4 9 12

T1-6) CAS

20 25 2818

18 20 25 28

Is this still lock-free?Yes - roughly, whoever sees the T-node will help remove it, and there is a finite number of T-nodes (full proof in the paper).

4 9 12

T1-6) CAS

20 25 2818

18 20 25 28

Is this linearizable?

4 9 12

T1-6) CAS

20 25 2818

18 20 25 28

Is this linearizable?Yes – roughly, the CAS instruction which makes the new value reachable is the linearization point (see paper for full list).

Evaluation – quad core i7

Evaluation – UltraSPARC T2

Evaluation – 4x 8-core i7

Summary

• pseudocode and implementation for a concurrent hash trie

• properties proven:• correctness• linearizability• lock-freedom• compactness

• performance evaluation – scalable insertion and remove

Future work

• concurrent memory pool to avoid GC• lock-free size, iterator and clear operations

running in O(1)

Thank you!

lock-free resizeable concurrent tries

readaddr b

finite number of steps

concurrent trie ctriegoals

scalable concurrent

new concurrentmapxs

low space consumption

low constant

phil bagwell

Documents

verifying a two-lock concurrent queue hussain tinwala fall...

xqueue: extreme fine-grained concurrent lock-less...

verifying a two-lock concurrent queue

fastforward for efficient pipeline parallelism: a...

concurrent b-trees with lock-free...

design and implementation of concurrent c0 - max … and...

fast and lock-free concurrent priority queues for ·...

cs510 concurrent systems class 2 a lock-free multiprocessor...

lock-free cache-aware queue10/06/2011 anders gidenstam,...

ipdps 2003 - fast and lock free concurrent priority queues

fast and lock-free concurrent priority queues for...

tries and suﬃx tries - department of computer science

cs510 concurrent systems jonathan walpole. transactional...

cs510 concurrent systems class 1b spin lock performance

chapter 14: concurrency control · database system concepts...

chapter 11 concurrency control. lock-based protocols a lock...

a two-lock concurrent queue algorithm

29.lock-based concurrent data structures · lock-based...

concurrency controlhcao/teaching/cs582/note/... ·...

concurrency programming in java - 07 - high-level...