a methodology for creating fast wait-free data structures alex koganand erez petrank computer...

Post on 13-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Methodology for Creating Fast Wait-Free Data Structures

Alex Kogan and Erez PetrankComputer Science

Technion, Israel

2

Concurrency & (Non-blocking) synchronization Concurrent data-structures require (fast

and scalable) synchronization

Non-blocking synchronization: No thread is blocked in waiting for another

thread to complete no locks / critical sections

3

Lock-free (LF) algorithms

Among all threads trying to apply operations on the data structure, one will succeed

Opportunistic approach read some part of the data structure make an attempt to apply an operation when failed, retry

Many scalable and efficient algorithms

Global progressAll but one threads may starve

4

Wait-free (WF) algorithms A thread completes its operation a bounded

#steps regardless of what other threads are doing

Particularly important property in several domains e.g., real-time systems and operating systems

Commonly regarded as too inefficient and complicated to design

5

The overhead of wait-freedom Much of the overhead is because of helping

key mechanism employed by most WF algorithms

controls the way threads help each other with their operations

Can we eliminate the overhead? The goal: average-case efficiency of lock-

freedom and worst-case bound of wait-freedom

6

Why is helping slow? A thread helps others immediately when it

starts its operation All threads help others in exactly the same

order contention redundant work

Each operation has to be applied exactly once usually results in a higher # expensive atomic

operations

Lock-free MS-queue (PODC,

1996)

Wait-free KP-queue (PPOPP,

2011)

# CASs in enqueue

2 3

# CASs in dequeue

1 4

7

Reducing the overhead of helpingMain observation: “Bad” cases happen, but are very rareTypically a thread can complete without any help

if only it had a chance to do that …

Main ideas: Ask for help only when you really need it

i.e., after trying several times to apply the operation Help others only after giving them a chance to

proceed on their own delayed helping

8

Fast-path-slow-path methodology Start operation by running its (customized)

lock-free implementation

Upon several failures, switch into a (customized) wait-free implementation notify others that you need help keep trying

Once in a while, threads on the fast path check if their help is needed and provide help

Fast path

Slow path

Delayed helping

9

Do I need

to help ?

Start yes Help Someone

noApply my op

using fast path(at most N

times)

Success?

no

Apply my op using slow

path(until

success)

Return

yes

Fast-path-slow-path generic scheme

Different threads may run on two paths concurrently!

10

Fast-path-slow-path: queue example

Fast path (MS-queue)

Slow path (KP-queue)

11

Thread ID

Fast-path-slow-path: queue exampleInternal structures

state

9

true

false

null

4

true

true

null

9

false

false

null

phasependin

genqueue

node

0 1 2

12

Thread ID

Fast-path-slow-path: queue exampleInternal structures

state

9

true

false

null

4

true

true

null

9

false

false

null

phasependin

genqueue

node

0 1 2Counts # ops on

the slow path

13

Thread ID

Fast-path-slow-path: queue exampleInternal structures

state

9

true

false

null

4

true

true

null

9

false

false

null

phasependin

genqueue

node

0 1 2Is there a pending

operation on the slow path?

14

Thread ID

Fast-path-slow-path: queue exampleInternal structures

state

9

true

false

null

4

true

true

null

9

false

false

null

phasependin

genqueue

node

0 1 2 What is the pending

operation?

15

Thread ID

Fast-path-slow-path: queue exampleInternal structures

1

4

3

0

5

8

0

9

0

curTid

lastPhasenextChec

k

helpRecords

0 1 2

16

Thread ID

Fast-path-slow-path: queue exampleInternal structures

1

4

3

0

5

8

0

9

0

curTid

lastPhasenextChec

k

helpRecords

0 1 2

ID of the next thread that I will

try to help

17

Thread ID

Fast-path-slow-path: queue exampleInternal structures

1

4

3

0

5

8

0

9

0

curTid

lastPhasenextChec

k

helpRecords

0 1 2Phase # of that thread at the

time the record was created

18

Thread ID

Fast-path-slow-path: queue exampleInternal structures

1

4

3

0

5

8

0

9

0

curTid

lastPhasenextChec

k

helpRecords

0 1 2

Decrements with every my

operation. Check if my help is

needed when this counter reaches

0

HELPING_DELAY controls the frequency of

helping checks

19

Fast-path-slow-path: queue exampleFast path1. help_if_needed()2. int trials = 0

while (trials++ < MAX_FAILURES) {

apply_op_with_customized_LF_alg(finish if succeeded)

}3. switch to slow path

LF algorithm customization is required to synchronize operations run on two paths

MAX_FAILURES controls the

number of trials on the fast path

20

Fast-path-slow-path: queue exampleSlow path1. my phase ++2. announce my operation (in state)3. apply_op_with_customized_WF_alg

(until finished)

WF algorithm customization is required to synchronize operations run on two paths

Performance evaluation32-core Ubuntu server with OpenJDK 1.6

8 2.3 GHz quadcore AMD 8356 processors

The queue is initially empty Each thread iteratively performs (100k

times): Enqueue-Dequeue benchmark: enqueue and

then dequeue

Measure completion time as a function of # threads

22

Performance evaluation

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 610

20

40

60

80

100

120

140MS-queueKP-queue

number of threads

tim

e (

sec)

23

Performance evaluation

0

20

40

60

80

100

120

140

MS-queue

KP-queue

fast WF (0, 0)

number of threads

tim

e (

sec)

MAX_FAILURES

HELPING_DELAY

24

Performance evaluation

0

20

40

60

80

100

120

140

MS-queueKP-queuefast WF (0, 0)fast WF (3,3)fast WF (10,10)fast WF (20,20)

number of threads

tim

e (

sec)

25

The impact of configuration parameters

0

20

40

60

80

100

120

140

fast WF (0,0)

fast WF (10,10)

number of threads

tim

e (

sec)

MAX_FAILURES

HELPING_DELAY

26

The use of the slow path

1 9 17 25 33 41 49 570

20

40

60

80

100enqueue

number of threads

% o

ps o

n s

low

path

1 9 17 25 33 41 49 57

dequeue

number of threads

MAX_FAILURES

HELPING_DELAY

27

Tuning performance parameters Why not just always use large values for both

parameters (MAX_FAILURES, HELPING_DELAY)? (almost) always eliminate slow path

Lemma: The number of steps required for a thread to complete an operation on the queue in the worst-case is O(MAX_FAILURES + HELPING_DELAY * n2)

→Tradeoff between average-case performance and worst-case completion time bound

28

Summary A novel methodology for creating fast wait-

free data structures key ideas: two execution paths + delayed

helping good performance when the fast path is

extensively utilized concurrent operations can proceed on both

paths in parallel

Can be used in other scenarios e.g., running real-time and non-real-time

threads side-by-side

29

Thank you!Questions?

top related