cafÉ: scalable task pool with adjustable fairness and contention dmitry basin, rui fan, idit...

Post on 11-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention

Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman

Technion, Israel Institute of Technology

Task Pools

Exist in most server applications:Web Servers, e.g., building block of SEDA ArchitectureHandling asynchronous requests

Ubiquitous programming pattern for parallel programs

Scalability is essential!

Task PoolProducer

sConsumers

Shared Memory

Typical Implementation: FIFO Queue

Has inherent scalability problem

Do we really need FIFO ?In many cases no!We would like to:

Relax the requirementControl the degree of relaxation

contention pointsShared Memory

CAFÉ: Contention and Fairness ExplorerOrdered list of scalable bounded non-FIFO pools

TreeContainer size controls contention-fairness trade-off

garbage collected

TreeContainer TreeContainer TreeContainerJava VM

Less fairnessLess contention

More fairnessMore contention

Tree height

0

Pure FIFO

TreeContainer (TC) SpecificationBounded containerA put operation can fail if no free space

foundA get operation returns a task or null if TC

is emptyRandomized algorithms for put and get

operations

TreeContainer Data StructureComplete binary

tree

Free node

Used node without a

task

Occupied node containing a

task

Right sub-tree doesn’t

contain tasks

Left sub-tree has tasks

Get/Put OperationsStep 1: find target node

By navigating the treeStep 2: perform put/get on that node

Need to handle racesStep 3: update routes

Update meta-data on path to node – tricky!

Task

Get() OperationGet(): Start from the root

TreeContainer

CAS

Close to the root updates

are rare

Step3: Update routes up to

the root

Step1: Navigate to a

task by random walk on arrows

graph

Step 2: Extract the task.

Put() Operation

Level 0

Level 1

Level 2

Level 3

TreeContainer

Random node

occupied

Random node

occupied

Random node

√ free

Taskput():

Every level of the tree implemented by array of nodes

occupied

Random node

Put() OperationTaskput():

Go to the highest free predecessor

CA

S op

eration

TreeContainer

Level 0

Level 1

Level 2

Level 3Random node

√ free

Finished Step 1: found free node

Step 2: occupy the free node

Put() Operationput(): Close to the root, updates

are raretrue

TreeContainer

Upd

ate

rout

es

RacesConcurrency issues are not trivial :)Challenge:

guarantee linearizabilityavoid updating all the metadata up to the

root upon each operationSee the paper

TreeContainer propertiesPut/Get operations are

linearizablewait-free

Under the worst-case thread scheduling:Good step complexity of puts

When N nodes occupied - O(log2N) whpDoes not depend on TC size

Good tree density (arbitrarily close to 2h whp)

TreeContainer

CAFÉ Data Structures

TC TC

GT

TC

PT

CAFÉ Data Structures

TC TC TC

PT

TC

TC.Put(task) false

Allocate and connect new TC

TC.Put(task) true

GT

TC

CAFÉ:get() from Empty TC

TC TC TC

PT

TC

TC.Getnull TC.Gettask

Garbage collected

GT

CAFÉ: Races

TC TC TC

PT

TC

Suspended producer thread

The task is lost for consumers

GT

CAFÉ: Handling Races – Try 1

TC TC TC

PT

TC

Move GT back

Check if GT bypassed TC GT

CAFÉ: Races (2)

TC TC TC

PT

TC

TC.Get null

Consumer thread

Producer thread

Going to

move GT

forward

TC.Put(task) true

Consumers can access the task

can terminateTask is lost

GT

CAFÉ: Handling Races – Try 2

TC TC TC

PT

TC

Read prev,If empty read curr

GT

cur>

<prev

Lock-Free. To make it wait-free we do additional tricks.

CAFÉ: PropertiesSafety:

Put()/Get() operations are linearizableWait-freedom:

Get() operations are deterministically wait-free

Put() operations are wait-free with probability 1

Fairness:Preserves order among trees

Evaluation SetupCompared pools:

LBQ: Java 6 FIFO blocking queueCLQ: Java 6 FIFO non-blocking queue (M&S)EDQ: non-FIFO Elimination-Diffraction Tree

Queue

Evaluation server:8 AMD Opteron quad-cores total 32 cores

CAFÉ evaluationThroughput

CAFÉ-13: CAFÉ with tree height 13

LBQ: Java 6 FIFO blocking queue CLQ: Java 6 FIFO non-blocking

queue (M&S) EDQ: non-FIFO Elimination-

Diffraction Tree Queue Throughput as a function of thread number

factor of 30over lock-free implementations

CAFÉ evaluationThroughput

CAFÉ: CAFÉ queue LBQ: Java 6 FIFO blocking queue CLQ: Java 6 FIFO non-blocking

queue (M&S) EDQ: non-FIFO Elimination-

Diffraction Tree Queue .

CAFÉ throughput as a function of TreeContainer height

CAFÉ evaluationCAS-failures CAS failures per operation as a function of TreeContainer height

SummaryCAFÉ:

EfficientWait-FreeWith adjustable fairness and contention

Thank you

top related