locality-conscious lock-free linked lists

42
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1

Upload: annice

Post on 23-Feb-2016

75 views

Category:

Documents


0 download

DESCRIPTION

Locality-Conscious Lock-Free Linked Lists. Anastasia Braginsky & Erez Petrank. Lock-Free Locality-Conscious Linked Lists. List of constant size '' containers " , with minimal and maximal bounds on the number of elements in container Traverse the list quickly to the relevant container - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Locality-Conscious  Lock-Free Linked Lists

Locality-Conscious Lock-Free Linked Lists

Anastasia Braginsky & Erez Petrank

1

Page 2: Locality-Conscious  Lock-Free Linked Lists

Lock-Free Locality-Conscious Linked Lists

List of constant size ''containers", with minimal and maximal bounds on the number of elements in container

Traverse the list quickly to the relevant containerLock-free, locality-conscious, fast access, scalable

3 7 9 12 18 25 26 31 40 52 63 77 89 92

2

Page 3: Locality-Conscious  Lock-Free Linked Lists

Non-blocking AlgorithmsEnsures progress in finite number of steps. A non-blocking algorithm is:

◦wait-free if there is a guaranteed per-thread progress in bounded number of steps

◦ lock-free if there is a guaranteed system-wide progress in bounded number of steps

◦obstruction-free if a single thread executing in isolation for a bounded number of steps will make progress.

3

Page 4: Locality-Conscious  Lock-Free Linked Lists

Existing Lock-Free Lists DesignsJ. D. VALOIS, Lock-free linked lists using compare-

and-swap, in Proc. PODC, 1995.

T.L. HARRIS, A pragmatic implementation of non-blocking linked-lists, in DISC 2001.

M.M. MICHAEL, Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects, in IEEE 2004.

M. FORMITCHEV, and E. RUPERT. Lock-free linked lists and skip lists, in Proc. PODC, 2004.

4

Page 5: Locality-Conscious  Lock-Free Linked Lists

OutlineIntroductionA list of memory chunksDesign of in-chunk listMerges & Splits via freezingEmpirical resultsSummary

5

Page 6: Locality-Conscious  Lock-Free Linked Lists

The List StructureA list consists of

◦A list of memory chunks◦A list in each chunk (chunk implementation)

When a chunk gets too sparse or dense, the update operations on the list are stopped and the chunk is split or merged with its preceding chunk.

6

Page 7: Locality-Conscious  Lock-Free Linked Lists

An Example of a List of Fixed-Sized Memory Chunks

Chunk A

HEAD

NextChunk Chunk B NextChunk

NULL

Key: 3Data: G

Key: 14Data: K

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead EntriesHead

7

Page 8: Locality-Conscious  Lock-Free Linked Lists

When No More Space for Insertion

Chunk A

HEAD

NextChunk Chunk B NextChunk

Key: 3Data: G

Key: 6Data: B

Key: 9Data: C

Key: 14Data: K

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead EntriesHead

Key: 12Data: H

Freeze

8

NULL

Page 9: Locality-Conscious  Lock-Free Linked Lists

Split

Chunk A

HEAD

NextChunk Chunk B NextChunk

Key: 3Data: G

Key: 6Data: B

Key: 9Data: C

Key: 14Data: K

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead EntriesHead

Key: 12Data: H

Freeze

Chunk C NextChunk

Key: 3Data: G

Key: 9Data: C

EntriesHead

Key: 6Data: B

Chunk D NextChunk

Key: 12Data: H

EntriesHead

Key: 14Data: K

9

NULL

Page 10: Locality-Conscious  Lock-Free Linked Lists

Split

Chunk A

HEAD

NextChunk Chunk B NextChunk

Key: 3Data: G

Key: 6Data: B

Key: 9Data: C

Key: 14Data: K

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead EntriesHead

Key: 12Data: H

Freeze

Chunk C NextChunk

Key: 3Data: G

Key: 9Data: C

EntriesHead

Key: 6Data: B

Chunk D NextChunk

Key: 12Data: H

EntriesHead

Key: 14Data: K

10

NULL

Page 11: Locality-Conscious  Lock-Free Linked Lists

When a Chunk Gets SparseHEAD

Chunk B NextChunk

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead

Chunk C NextChunk

Key: 3Data: G

Key: 9Data: C

EntriesHead

Key: 6Data: B

Chunk D NextChunk

EntriesHead

Key: 14Data: K

Freeze master

Freeze slave

11

NULL

Page 12: Locality-Conscious  Lock-Free Linked Lists

MergeHEAD

Chunk B NextChunk

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead

Chunk C NextChunk

Key: 3Data: G

Key: 9Data: C

EntriesHead

Key: 6Data: B

Chunk D NextChunk

EntriesHead

Key: 14Data: K

Freeze master

Freeze slave

Chunk E NextChunk

Key: 3Data: G

Key: 6Data: B

Key: 9Data: C

Key: 14Data: K

EntriesHead

12

NULL

Page 13: Locality-Conscious  Lock-Free Linked Lists

MergeHEAD

Chunk B NextChunk

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead

Chunk C NextChunk

Key: 3Data: G

Key: 9Data: C

EntriesHead

Key: 6Data: B

Chunk D NextChunk

EntriesHead

Key: 14Data: K

Freeze master

Freeze slave

13

Chunk E NextChunk

Key: 3Data: G

Key: 6Data: B

Key: 9Data: C

Key: 14Data: K

EntriesHead

NULL

Page 14: Locality-Conscious  Lock-Free Linked Lists

OutlineIntroductionA list of memory chunksDesign of in-chunk listMerges & Splits via freezingEmpirical resultsSummary

14

Page 15: Locality-Conscious  Lock-Free Linked Lists

A List of Fixed-Sized Memory Chunks

Chunk A

HEAD

NextChunk Chunk B NextChunk

NULL

Key: 3Data: G

Key: 14Data: K

Key: 25Data: A

Key: 67Data: D

Key: 89Data: M

EntriesHead EntriesHead

15

Page 16: Locality-Conscious  Lock-Free Linked Lists

The Structure of an Entry2 machine wordsFreeze bit: to mark chunk entries frozen.A ┴ (bottom) value is not allowed as a key value.

It means that entry is not allocated.

Data Key Freezebit

Next entry pointer

32 bit 31 bit

Deletebit

Freezebit

62 bit

KeyData word NextEntry word

16

Page 17: Locality-Conscious  Lock-Free Linked Lists

The Structure of a Chunk

Key :┴ Key: 7Data: 89

Head :dummy entry

Key: 14Data: 9 Key :┴ Key: 22

Data: 13 Key :┴Key: 23Data: 53

Deleted bit: 1Key: 11Data: 13

Counter :4

Key: 24Data: 78

Deleted bit: 1

NextChunk pointer

new pointer

MergeBuddy pointer

Freeze State

2 bits

An array of entries of size MAX

17

Page 18: Locality-Conscious  Lock-Free Linked Lists

Initiating a FreezeWhen a process p realizes that

◦A chunk is full, or◦A chunk is sparse, or◦A chunk is in progress of being frozen,

Then p starts a freeze or p helps another process that has already started a freeze.

18

Page 19: Locality-Conscious  Lock-Free Linked Lists

The Freeze Process Starts by:Going over all the entries in the array and

setting their freeze bit

Finish ◦insertions of all currently allocated entries that

are not yet in the list◦deletions of entries already marked as deleted

but still in the list

19

Page 20: Locality-Conscious  Lock-Free Linked Lists

Chunk List is Different from Known Lock-Free Linked Lists

Non-private insertion: entry is visible when allocated, even before linking to the list.

Allow help with insertion.

Boundary conditions causing merges and splits.

20

Page 21: Locality-Conscious  Lock-Free Linked Lists

Entry Allocation1. Entry is allocated at the

beginning of the insertion process2. Find zeroed entry, with ┴ key value 3. Allocate by swapping the KeyData word to the

desired value. ◦ Upon a failure of the CAS command, goto 2. ◦ Frozen entry can not be allocated

4. If no entry is found -- freeze starts

Next, use allocated entry for list insertion…

21

k:3d:9f:1

k:4d:2f:1

k:8d:5f:0

k:┴d:0f:1

k:┴d:0f:0

Page 22: Locality-Conscious  Lock-Free Linked Lists

Entry Allocation1. Entry is allocated at the

beginning of the insertion process2. Find zeroed entry, with ┴ key value 3. Allocate by swapping the KeyData word to the

desired value. ◦ Upon a failure of the CAS command, goto 2. ◦ Frozen entry can not be allocated

4. If no entry is found -- freeze starts

Next, use allocated entry for list insertion…

22

k:3d:9f:1

k:4d:2f:1

k:8d:5f:0

k:┴d:0f:1

k:6d:2f:0

Page 23: Locality-Conscious  Lock-Free Linked Lists

Insertion Algorithm

1. Record entry’s next pointer value in savedNext.

2. Find a location for adding the new entry. ◦ If key already exists (in a different entry) – free allocated entry

by clearing it and return. 3. CAS entry’s next pointer from savedNext to the next entry

in the list4. CAS previous entry’s next pointer to newly allocated entry

◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk)

5. Increase the counter and return

23

k:3d:9f:1

k:4d:2f:1

k:8d:5f:0

k:┴d:0f:1

k:6d:2f:0

previous next

Page 24: Locality-Conscious  Lock-Free Linked Lists

Insertion Algorithm

1. Record entry’s next pointer value in savedNext.

2. Find a location for adding the new entry. ◦ If key already exists (in a different entry) – free allocated entry

by clearing it and return. 3. CAS entry’s next pointer from savedNext to the next entry

in the list4. CAS previous entry’s next pointer to newly allocated entry

◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk)

5. Increase the counter and return

24

k:3d:9f:1

k:4d:2f:1

k:8d:5f:0

k:┴d:0f:1

k:6d:2f:0

previous next

Page 25: Locality-Conscious  Lock-Free Linked Lists

Insertion Algorithm

1. Record entry’s next pointer value in savedNext.

2. Find a location for adding the new entry. ◦ If key already exists (in a different entry) – free allocated entry

by clearing it and return. 3. CAS entry’s next pointer from savedNext to the next entry

in the list4. CAS previous entry’s next pointer to newly allocated entry

◦ If any CAS fails, goto 1 (restarting from the beginning of a chunk)

5. Increase the counter and return

25

k:3d:9f:1

k:4d:2f:1

k:8d:5f:0

k:┴d:0f:1

k:6d:2f:0

previous next

Page 26: Locality-Conscious  Lock-Free Linked Lists

Deletion Standard implementation, except for taking care not to get

under the minimum number of entriesCounter always holds a lower bound on the actual

number of entries. ◦ increased after actual insert

◦ decreased before actual delete

Decrementing the counter below the minimum allowed number, initiates a freeze

Frozen entry can not be marked as deleted

26

Page 27: Locality-Conscious  Lock-Free Linked Lists

OutlineIntroductionA list of memory chunksDesign of in-chunk listMerges & Splits via freezingEmpirical resultsSummary

28

Page 28: Locality-Conscious  Lock-Free Linked Lists

FreezingPhase I: Marking entries with frozen bits

◦Non-frozen entries can still change concurrently Phase II: List stabilization

◦Everything frozen, now finish all incomplete operations.

Phase III: Decision◦Split, merge, or copy.

Phase IV: Recovery◦Implementation of the above decision

29

Page 29: Locality-Conscious  Lock-Free Linked Lists

Phase IV - RecoveryAllocate new chunk or chunks locallyCopy the frozen data to the new chunkExecute the operation that initially caused the

freezeAttach the new chunk to the frozen oneReplace frozen chunk(s) with new chunk(s) in

the entire List’s data structure

30

Page 30: Locality-Conscious  Lock-Free Linked Lists

RemarksSearch can run on a frozen chunk (and is

not delayed). ◦Wait-free except for the use of the hazard

pointer mechanism

A chunk can never be unfrozen

31

Page 31: Locality-Conscious  Lock-Free Linked Lists

OutlineIntroductionA list of memory chunksDesign of in-chunk listMerges & Splits via freezingEmpirical resultsSummary

32

Page 32: Locality-Conscious  Lock-Free Linked Lists

The Test EnvironmentPlatform: SUN FIRE with UltraSPARC T1

8-core processor, each core running 4 hyper-threads.

OS: Solaris 10 Chunk size set to virtual page size -- 8KB.

◦All accesses inside a chunk are on the same page

33

Page 33: Locality-Conscious  Lock-Free Linked Lists

Workload Each test had two stages:

◦ Stage I: Insertions (only) of N random keys (in order to obtain a substantial

list) N: 103, 104, 105, 106

◦ Stage II: Insertions, deletions and searches in parallel N operations overall out of which 15% insertions, 15% deletions,

and 70% searches.

Reporting results for runs of 32 concurrent threads.

34

Page 34: Locality-Conscious  Lock-Free Linked Lists

Reference for ComparisonMichael’s lock-free linked list implemented in C

according to the pseudo-code from◦ MICHAEL, M. M., Hazard Pointers: Safe Memory

Reclamation for Lock-Free Objects., in IEEE 2004.◦ Uses hazard pointers.

A Java implementation of the lock-free linked list provided in the book “The Art of Multiprocessor Programming”◦ Garbage collection is assumed.

35

Page 35: Locality-Conscious  Lock-Free Linked Lists

Comparison with Michael’s List Total Time

36

1000 10000 100000 10000000.001

0.01

0.1

1

10

100

1000

0.01 0.56

27.00

368.08

0.16

1.16

4.90

24.68

Stage I total time / NOriginal List Chunk List

N

time

(s)

loga

rithm

icsc

ale

1000 10000 100000 10000000.001

0.01

0.1

1

10

100

1000

0.01

1.15

33.91

237.93

0.004 0.071

2.050

20.269

Stage II total time / NOriginal List Chunk List

N

time

(s)

loga

rithm

icsc

ale

Already at 20000 we get

same performance

More then 10 times faster

Constantly better performance.

For substantial lists in more then

10 times

Page 36: Locality-Conscious  Lock-Free Linked Lists

Comparison with Michael’s List Single Operation Average

37

Better performance, as lists are going

more substantial

Again constantly better

performance

Page 37: Locality-Conscious  Lock-Free Linked Lists

Comparison with Lock-Free List in Java Total Times

38

Page 38: Locality-Conscious  Lock-Free Linked Lists

Comparison with Lock-Free List in Java Single Operation Average

39

Page 39: Locality-Conscious  Lock-Free Linked Lists

OutlineIntroductionA list of memory chunksDesign of in-chunk listMerges & Splits via freezingEmpirical resultsSummary

40

Page 40: Locality-Conscious  Lock-Free Linked Lists

ConclusionNew lock-free algorithm for chunked linked listFast due to:

◦Skips over chunks

◦Restarting from the beginning of a chunk

◦Locality-conscious

May be useful for other structures that can use the chunks

Good empirical results for the substantial lists

41

Page 41: Locality-Conscious  Lock-Free Linked Lists

Questions?

42

Page 42: Locality-Conscious  Lock-Free Linked Lists

Thank you !!

43