lock-free resizeable concurrent tries

Post on 05-Feb-2016

82 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lock-Free Resizeable Concurrent Tries. Aleksandar Prokopec, Phil Bagwell, Martin Odersky LAMP, École Polytechnique Fédérale de Lausanne Switzerland. Motivation. xs.foreach { x => doSomething(x) }. Motivation. xs.foreach { x => doSomething (x) }. ys = xs.map { x => x * (-1) }. - PowerPoint PPT Presentation

TRANSCRIPT

Lock-Free Resizeable Concurrent Tries

Aleksandar Prokopec, Phil Bagwell, Martin OderskyLAMP, École Polytechnique Fédérale de Lausanne

Switzerland

Motivation

xs.foreach { x => doSomething(x)}

Motivation

xs.foreach { x => doSomething(x)}

ys = xs.map { x => x * (-1)}

Motivation

ys = new ConcurrentMapxs.foreach { x => ys.insert(x * (-1))}

Hash Array Mapped Tries (HAMT)

Hash Array Mapped Tries (HAMT)

0 = 0000002

Hash Array Mapped Tries (HAMT)

0

Hash Array Mapped Tries (HAMT)

016 = 0100002

Hash Array Mapped Tries (HAMT)

0 16

Hash Array Mapped Tries (HAMT)

0 164 = 0001002

Hash Array Mapped Tries (HAMT)

16

0

4 = 0001002

Hash Array Mapped Tries (HAMT)

16

0 4

Hash Array Mapped Tries (HAMT)

16

0 4

12 = 0011002

Hash Array Mapped Tries (HAMT)

16

0 4

12 = 0011002

Hash Array Mapped Tries (HAMT)

16

0 4 12

Hash Array Mapped Tries (HAMT)

16 33

0 4 12

Hash Array Mapped Tries (HAMT)

16 33

0 4 12

48

Hash Array Mapped Tries (HAMT)

16

0 4 12

48

33 37

Hash Array Mapped Tries (HAMT)

16

4 12

48

33 37

0 3

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Too much space!

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Linear search at every level - slow!

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Solution – bitmap index!Relying on BITPOP instruction.

Hash Array Mapped Tries (HAMT)

48 57

48 571 0 1 0

48 571 0 1 0

48 5710

BITPOP(((1 << ((hc >> lev) & 1F)) – 1) & BMP)

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

For 32-way tries – 32-bit bitmap.

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 8 93

48 57

Hash Array Mapped Tries (HAMT)

4 12 16 20 25 33 37

0 1 93

48 57

Hash Array Mapped Tries (HAMT)

4 9 12 16 20 25 33 37

0 1 3

48 57

Remove compresses the trie.

Hash Array Mapped Tries (HAMT)

• advantages:• low space consumption and shrinking• no contiguous memory region required• fast – logarithmic complexity, but with a low

constant factor• used as efficient immutable maps• no global resize phase – real time

applications, potentially more scalable concurrent operations?

Concurrent Trie (Ctrie)

• goals:• thread-safe concurrent trie• maintain the advantages of HAMT• rely solely on CAS instructions• ensure lock-freedom and linearizability

• lookup – probably same as for HAMT

CAS instruction

CAS(address, expected_value, new_value)

Atomically replaces the value at the address with the new_value if it is equal to the expected_value.

Returns true if successful, false otherwise.

May fail spuriously.

Lock-freedom

If multiple threads execute an operation, at least one of them will complete the operation within a finite number of steps.

Lock-freedom

If multiple threads execute an operation, at least one of them will complete the operation within a finite number of steps.

do { a = READ(addr) b = a + 1 } while (!CAS(addr, a, b))

Lock-freedom

If multiple threads execute an operation, at least one of them will complete the operation within a finite number of steps.

def counter() do { a = READ(addr) b = a + 1 } while (!CAS(addr, a, b))

Insertion

4 9 12 16 20 25 33 37

0 1 3

48 57

17 = 0100012

Insertion

4 9 12 16 20 25 33 37

0 1 3

48 57

17 = 010001216 17

1) allocate

Insertion

4 9 12 20 25 33 37

0 1 3

48 57

17 = 010001216 17

2) CAS

Insertion

4 9 12 20 25 33 37

0 1 3

48 57

17 = 010001216 17

Insertion

4 9 12 33 37

0 1 3

48 57

18 = 0100102

16 17

20 25

Insertion

4 9 12 33 37

0 1 3

48 57

18 = 0100102

16 17

20 25

1) allocate16 17 18

Insertion

4 9 12 33 37

0 1 3

48 57

18 = 0100102

20 25

2) CAS 16 17 18

Insertion

4 9 12 33 37

0 1 3

48 57

18 = 0100102

20 25

2) CAS 16 17 18

Unless…

Insertion

4 9 12 33 37

0 1 3

48 57

18 = 0100102

16 17

20 25

T1-1) allocate16 17 18

Unless…28 = 0111002

T1

T2

Insertion

4 9 12

0 1 3

18 = 0100102

16 17

20 25

T1-1) allocate16 17 18

Unless…28 = 0111002

T1

T2

20 25 28 T2-1) allocate

Insertion

4 9 12

0 1 3

18 = 0100102

16 17

20 25

T1-1) allocate16 17 18

28 = 0111002

T1

T2

20 25 28

T2-2) CAS

Insertion

4 9 12

0 1 3

18 = 0100102

16 17

20 25

T1-2) CAS

16 17 18

28 = 0111002

T1

T2

20 25 28

T2-2) CAS

Insertion

4 9 12

0 1 3

18 = 0100102

16 17

20 25

16 17 18

28 = 0111002

T1

T2

20 25 28

Lost insert!

Insertion – 2nd attempt

4 9 12

0 1 3 16 17

20 25

Solution: I-nodes

Insertion – 2nd attempt

4 9 12

0 1 3 16 17

20 25

18 = 0100102

28 = 0111002

T1

T2

Insertion – 2nd attempt

4 9 12

0 1 3 16 17

T1

T2

20 25

18 = 0100102

28 = 0111002

16 17 18

20 25 28 T2-1) allocate

T1-1) allocate

Insertion – 2nd attempt

4 9 12

0 1 3 16 17

T1

T2

20 25

16 17 18

20 25 28

T2-2) CAS

T1-2) CAS

Insertion – 2nd attempt

4 9 12

0 1 3 16 17 18

20 25 28

Insertion – 2nd attempt

4 9 12

0 1 3 16 17 18

20 25 28

Idea: once added to the Ctrie, I-nodes remain present.

Remove

4 9 12

0 1 3 16 17 18

20 25 28

Idea: same logic as insert.

Remove

4 9 12

0 1 3 16 17 18

20 25 28

Remove

4 9 12

0 1 3 16 17 18

20 25 28

16 18 1) allocate

Remove

4 9 12

0 1 3 16 17 18

20 25 28

16 18

2) CAS

Remove

4 9 12

0 1 3 16 18

20 25 28

Remove

4 9 12

0 1 3 18

20 25 28

Remove

4 9 12

0 1 3 18

20 25

Remove

4 9 12

0 1 18

20 25

Remove

4 9

0 1 18

20 25

Remove

4 9

1 18

20 25

Remove

4 9

1 18

20

Remove

9

1 18

20

Remove

1 18

Ctrie is not compact => could be faster

Remove – 2nd attempt

4 9 12

0 1 3 18

20 25 28 3) allocate18 20 25 28

Remove – 2nd attempt

4 9 12

0 1 3 18

20 25 284) CAS

18 20 25 28

Remove – 2nd attempt

4 9 12

0 1 3 18

20 25 284) CAS

18 20 25 28

Not correct.

Remove – 2nd attempt

4 9 12

0 1 3

T1-3) allocate

18 20 25 28

18

20 25 28

T2-1) allocate17 18

T1 – compressT2 – insert 17

Remove – 2nd attempt

4 9 12

0 1 3

T1-4) CAS

18 20 25 28

18

20 25 28

T2-2) CAS17 18

T1 – compressT2 – insert 17

Remove – 3rd attempt

4 9 12

0 1 3 18

20 25 28

Idea: disallow insertions as you do compression

Remove – 3rd attempt

4 9 12

0 1 3

T1-3) allocate

18

20 25 28

T2-1) allocate17 18

Idea: disallow insertions as you do compression

T-node18

Remove – 3rd attempt

4 9 12

0 1 3

T1-4) CAS

18

20 25 28

T2-2) CAS17 18

Idea: disallow insertions as you do compression

18

Remove – 3rd attempt

4 9 12

0 1 3

T1-4) CAS

18

20 25 28

T2-2) CAS failed - repeat17 18

Idea: disallow insertions as you do compression

18

Remove – 3rd attempt

4 9 12

0 1 3

T1-5) allocate

20 25 28

T2-1) do the same as T1, then repeat

Idea: disallow insertions as you do compression

18

18 20 25 28

Remove – 3rd attempt

4 9 12

0 1 3

T1-6) CAS

20 25 2818

18 20 25 28

Is this still lock-free?

Remove – 3rd attempt

4 9 12

0 1 3

T1-6) CAS

20 25 2818

18 20 25 28

Is this still lock-free?Yes - roughly, whoever sees the T-node will help remove it, and there is a finite number of T-nodes (full proof in the paper).

Remove – 3rd attempt

4 9 12

0 1 3

T1-6) CAS

20 25 2818

18 20 25 28

Is this linearizable?

Remove – 3rd attempt

4 9 12

0 1 3

T1-6) CAS

20 25 2818

18 20 25 28

Is this linearizable?Yes – roughly, the CAS instruction which makes the new value reachable is the linearization point (see paper for full list).

Evaluation – quad core i7

Evaluation – UltraSPARC T2

Evaluation – 4x 8-core i7

Summary

• pseudocode and implementation for a concurrent hash trie

• properties proven:• correctness• linearizability• lock-freedom• compactness

• performance evaluation – scalable insertion and remove

Future work

• concurrent memory pool to avoid GC• lock-free size, iterator and clear operations

running in O(1)

Thank you!

top related