Download - Lock Free Queue introduction

Lock Free Queue

Multi-thread

Multi-thread

Pessimistic lock

Multi-thread

Optimistic Lock

Scalability

Mongodb on NoSQL DB– sharding

Multi-core – lock-free queue

CAS

Compare and Swap/Set - cmpxchg It compares the contents of a memory location to a given value

and, only if they are the same, modifies the contents of that memory location to a given new value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail.

int compare_and_swap (int* reg, int oldval, int newval)

{

int old_reg_val = *reg;

if (old_reg_val == oldval)

*reg = newval;

return old_reg_val;

}

CAS in C/C++

GCC bool __sync_bool_compare_and_swap (type *ptr, type oldval type

newval, ...)

type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)

Windows InterlockedCompareExchange ( __inout LONG volatile *Target,

__in LONG Exchange,

__in LONG Comperand);

C++11 template< class T >

bool atomic_compare_exchange_weak( std::atomic<T>* obj,

T* expected, T desired );

template< class T >

bool atomic_compare_exchange_weak( volatile std::atomic<T>* obj,

T* expected, T desired );

Lock-free queue List implementation

EnQueue(x)

{

q = new record();

q->value = x;

q->next = NULL;

do {

p = tail;

} while( CAS(p->next, NULL, q) != TRUE);

CAS(tail, p, q); //why we do NOT care the return value?

} //the CAS of while loop success in T1 thread, all the other

//threads failed. After Ti update the tail pointer, one of the

// other threads can get the new tail pointer.

Lock-free queue

Enhancement If T1 thread hang up before update tail pointer, dead loop for

other threads EnQueue(x) {

q = new record();

q->value = x;

q->next = NULL;

p = tail; oldp = p;

do {

while (p->next != NULL)

p = p->next;

} while( CAS(p.next, NULL, q) != TRUE);

CAS(tail, oldp, q);

}

Lock-free queue

Dequeue DeQueue() {

do{

p = head; // head is dummy node

if (p->next == NULL){

return ERR_EMPTY_QUEUE;

}

while( CAS(head, p, p->next) != TRUE );

return p->next->value;

}

CAS ABA issue

It's possible that between the time the old value is

read and the time CAS is attempted, some other

processors or threads change the memory location

two or more times such that it acquires a bit pattern

which matches the old value. The problem arises if

this new bit pattern, which looks exactly like the old

value, has a different meaning

CAS just compare the pointer address, what if this

address is reused?

ABA solution

Double-length CAS

on a 32 bit system, a 64 bit CAS. The second half is used

to hold a counter. The compare part of the operation

compares the previously read value of the pointer *and*

the counter, to the current pointer and counter. If they

match, the swap occurs - the new value is written - but

the new value has an incremented counter.

Double-length CAS SafeRead(q)

{

loop:

p = q->next;

if (p == NULL){

return p;

}

Fetch&Add(p->refcnt, 1);

if (p == q->next){

return p;

}else{

Release(p);

}

goto loop;

}

Lock-free queue in Disruptor

Ring-buffer implementation

sequence mod array length = array index

Only tail pointer

It is faster,

array, cache-friendly, pre-loaded, pre-allocate, no need to clean

up

Lock-free queue

Add data to Disruptor

http://ifeve.com/disruptor-writing-ringbuffer/






Cache Line

Cache

cache line 64 bytes.

Java long type 8 bytes, so 8 long variables in one cache

line

False sharing

Two variables, one is head, another is tail.

False sharing struct foo {

int x;

int y;

};

static struct foo f;

/* The two following functions are running concurrently: */

int sum_a(void){

int s = 0;

int i;

for (i = 0; i < 1000000; ++i)

s += f.x;

return s;

}

void inc_b(void){

int i;

for (i = 0; i < 1000000; ++i)

++f.y;

}

Eliminate False sharing

Disruptor – cache line padding

public long p1, p2, p3, p4, p5, p6, p7;//cache line padding

Private volatile long cursor = 0;

http://www.drdobbs.com/parallel/eliminate-false-

sharing/217500206

http://ifeve.com/false-sharing/

http://ifeve.com/volatile/

http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206









http://ifeve.com/volatile/

Memory Barrier

a type of barrier instruction which causes a central

processing unit (CPU) or compiler to enforce an

ordering constraint on memory operations issued

before and after the barrier instruction. This typically

means that certain operations are guaranteed to be

performed before the barrier, and others after.

Memory Barrier

编译器和CPU可以在保证输出结果一样的情况下对指

令重排序，使性能得到优化。

强制更新一次不同CPU的缓存。

volatile

volatile，Java内存模型将在写操作后插入一个写屏障指令，在读操作前插入一个读屏障指令。

一旦你完成写入，任何访问这个字段的线程将会得到最新的值。在你写入前，会保证所有之前发生的事已经发生，并且任何更新过的数据值也是可见的，因为内存屏障会把之前的写入值都刷新到缓存。

http://hedengcheng.com/?p=725



Summary术语英文单词描述

共享变量在多个线程之间能够被共享的变量被称为共享变量。共享变量包括所

有的实例变量，静态变量和数组元素。他们都被存放在堆内存中，

Volatile只作用于共享变量。

内存屏障 Memory Barriers 是一组处理器指令，用于实现对内存操作的顺序限制。

缓冲行 Cache line 缓存中可以分配的最小存储单位。处理器填写缓存线时会加载整个缓

存线，需要使用多个主内存读周期。

原子操作 Atomic operations 不可中断的一个或一系列操作。

缓存行填充 cache line fill 当处理器识别到从内存中读取操作数是可缓存的，处理器读取整个缓

存行到适当的缓存（L1，L2，L3的或所有）

缓存命中 cache hit 如果进行高速缓存行填充操作的内存位置仍然是下次处理器访问的地

址时，处理器从缓存中读取操作数，而不是从内存。

写命中 write hit 当处理器将操作数写回到一个内存缓存的区域时，它首先会检查这个

缓存的内存地址是否在缓存行中，如果存在一个有效的缓存行，则处

理器将这个操作数写回到缓存，而不是写回到内存，这个操作被称为

写命中。

写缺失 write misses the cache 一个有效的缓存行被写入到不存在的内存区域。

http://ifeve.com/disruptor/ - Disruptor

Thanks

http://ifeve.com/disruptor/

http://ifeve.com/disruptor/

Concurrency vs parallelism

Download - Lock Free Queue introduction

Top Related