a methodology for creating fast wait-free data structures
DESCRIPTION
Alex Kogan and Erez Petrank Computer Science Technion , Israe l. A Methodology for Creating Fast Wait-Free Data Structures. Concurrency & (Non-blocking) synchronization. Concurrent data-structures require (fast and scalable) synchronization Non-blocking synchronization: - PowerPoint PPT PresentationTRANSCRIPT
A Methodology for Creating Fast Wait-Free Data Structures
Alex Kogan and Erez PetrankComputer Science
Technion, Israel
2
Concurrency & (Non-blocking) synchronization Concurrent data-structures require (fast
and scalable) synchronization
Non-blocking synchronization: No thread is blocked in waiting for another
thread to complete no locks / critical sections
3
Lock-free (LF) algorithms
Among all threads trying to apply operations on the data structure, one will succeed
Opportunistic approach read some part of the data structure make an attempt to apply an operation when failed, retry
Many scalable and efficient algorithms
Global progressAll but one threads may starve
4
Wait-free (WF) algorithms A thread completes its operation a bounded
#steps regardless of what other threads are doing
Particularly important property in several domains e.g., real-time systems and operating systems
Commonly regarded as too inefficient and complicated to design
5
The overhead of wait-freedom Much of the overhead is because of helping
key mechanism employed by most WF algorithms
controls the way threads help each other with their operations
Can we eliminate the overhead? The goal: average-case efficiency of lock-
freedom and worst-case bound of wait-freedom
6
Why is helping slow? A thread helps others immediately when it
starts its operation All threads help others in exactly the same
order contention redundant work
Each operation has to be applied exactly once usually results in a higher # expensive atomic
operations
Lock-free MS-queue (PODC,
1996)
Wait-free KP-queue (PPOPP,
2011)
# CASs in enqueue
2 3
# CASs in dequeue
1 4
7
Reducing the overhead of helpingMain observation: “Bad” cases happen, but are very rareTypically a thread can complete without any help
if only it had a chance to do that …
Main ideas: Ask for help only when you really need it
i.e., after trying several times to apply the operation Help others only after giving them a chance to
proceed on their own delayed helping
8
Fast-path-slow-path methodology Start operation by running its (customized)
lock-free implementation
Upon several failures, switch into a (customized) wait-free implementation notify others that you need help keep trying
Once in a while, threads on the fast path check if their help is needed and provide help
Fast path
Slow path
Delayed helping
9
Do I need
to help ?
Start yes Help Someone
noApply my op
using fast path(at most N
times)
Success?
no
Apply my op using slow
path(until
success)
Return
yes
Fast-path-slow-path generic scheme
Different threads may run on two paths concurrently!
10
Fast-path-slow-path: queue example
Fast path (MS-queue)
Slow path (KP-queue)
11
Thread ID
Fast-path-slow-path: queue exampleInternal structures
state
9
true
false
null
4
true
true
null
9
false
false
null
phasependin
genqueue
node
0 1 2
12
Thread ID
Fast-path-slow-path: queue exampleInternal structures
state
9
true
false
null
4
true
true
null
9
false
false
null
phasependin
genqueue
node
0 1 2Counts # ops on
the slow path
13
Thread ID
Fast-path-slow-path: queue exampleInternal structures
state
9
true
false
null
4
true
true
null
9
false
false
null
phasependin
genqueue
node
0 1 2Is there a pending
operation on the slow path?
14
Thread ID
Fast-path-slow-path: queue exampleInternal structures
state
9
true
false
null
4
true
true
null
9
false
false
null
phasependin
genqueue
node
0 1 2 What is the pending
operation?
15
Thread ID
Fast-path-slow-path: queue exampleInternal structures
1
4
3
0
5
8
0
9
0
curTid
lastPhasenextChec
k
helpRecords
0 1 2
16
Thread ID
Fast-path-slow-path: queue exampleInternal structures
1
4
3
0
5
8
0
9
0
curTid
lastPhasenextChec
k
helpRecords
0 1 2
ID of the next thread that I will
try to help
17
Thread ID
Fast-path-slow-path: queue exampleInternal structures
1
4
3
0
5
8
0
9
0
curTid
lastPhasenextChec
k
helpRecords
0 1 2Phase # of that thread at the
time the record was created
18
Thread ID
Fast-path-slow-path: queue exampleInternal structures
1
4
3
0
5
8
0
9
0
curTid
lastPhasenextChec
k
helpRecords
0 1 2
Decrements with every my
operation. Check if my help is
needed when this counter reaches
0
HELPING_DELAY controls the frequency of
helping checks
19
Fast-path-slow-path: queue exampleFast path1. help_if_needed()2. int trials = 0
while (trials++ < MAX_FAILURES) {
apply_op_with_customized_LF_alg(finish if succeeded)
}3. switch to slow path
LF algorithm customization is required to synchronize operations run on two paths
MAX_FAILURES controls the
number of trials on the fast path
20
Fast-path-slow-path: queue exampleSlow path1. my phase ++2. announce my operation (in state)3. apply_op_with_customized_WF_alg
(until finished)
WF algorithm customization is required to synchronize operations run on two paths
Performance evaluation32-core Ubuntu server with OpenJDK 1.6
8 2.3 GHz quadcore AMD 8356 processors
The queue is initially empty Each thread iteratively performs (100k
times): Enqueue-Dequeue benchmark: enqueue and
then dequeue
Measure completion time as a function of # threads
22
Performance evaluation
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 610
20
40
60
80
100
120
140MS-queueKP-queue
number of threads
tim
e (
sec)
23
Performance evaluation
0
20
40
60
80
100
120
140
MS-queue
KP-queue
fast WF (0, 0)
number of threads
tim
e (
sec)
MAX_FAILURES
HELPING_DELAY
24
Performance evaluation
0
20
40
60
80
100
120
140
MS-queueKP-queuefast WF (0, 0)fast WF (3,3)fast WF (10,10)fast WF (20,20)
number of threads
tim
e (
sec)
25
The impact of configuration parameters
0
20
40
60
80
100
120
140
fast WF (0,0)
fast WF (10,10)
number of threads
tim
e (
sec)
MAX_FAILURES
HELPING_DELAY
26
The use of the slow path
1 9 17 25 33 41 49 570
20
40
60
80
100enqueue
number of threads
% o
ps o
n s
low
path
1 9 17 25 33 41 49 57
dequeue
number of threads
MAX_FAILURES
HELPING_DELAY
27
Tuning performance parameters Why not just always use large values for both
parameters (MAX_FAILURES, HELPING_DELAY)? (almost) always eliminate slow path
Lemma: The number of steps required for a thread to complete an operation on the queue in the worst-case is O(MAX_FAILURES + HELPING_DELAY * n2)
→Tradeoff between average-case performance and worst-case completion time bound
28
Summary A novel methodology for creating fast wait-
free data structures key ideas: two execution paths + delayed
helping good performance when the fast path is
extensively utilized concurrent operations can proceed on both
paths in parallel
Can be used in other scenarios e.g., running real-time and non-real-time
threads side-by-side
29
Thank you!Questions?