Download - Lock Free Queue introduction
Lock Free Queue
Multi-thread
Multi-thread
Pessimistic lock
Multi-thread
Optimistic Lock
Scalability
Mongodb on NoSQL DB– sharding
Multi-core – lock-free queue
CAS
Compare and Swap/Set - cmpxchg It compares the contents of a memory location to a given value
and, only if they are the same, modifies the contents of that memory location to a given new value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail.
int compare_and_swap (int* reg, int oldval, int newval)
{
int old_reg_val = *reg;
if (old_reg_val == oldval)
*reg = newval;
return old_reg_val;
}
CAS in C/C++
GCC bool __sync_bool_compare_and_swap (type *ptr, type oldval type
newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
Windows InterlockedCompareExchange ( __inout LONG volatile *Target,
__in LONG Exchange,
__in LONG Comperand);
C++11 template< class T >
bool atomic_compare_exchange_weak( std::atomic<T>* obj,
T* expected, T desired );
template< class T >
bool atomic_compare_exchange_weak( volatile std::atomic<T>* obj,
T* expected, T desired );
Lock-free queue List implementation
EnQueue(x)
{
q = new record();
q->value = x;
q->next = NULL;
do {
p = tail;
} while( CAS(p->next, NULL, q) != TRUE);
CAS(tail, p, q); //why we do NOT care the return value?
} //the CAS of while loop success in T1 thread, all the other
//threads failed. After Ti update the tail pointer, one of the
// other threads can get the new tail pointer.
Lock-free queue
Enhancement If T1 thread hang up before update tail pointer, dead loop for
other threads EnQueue(x) {
q = new record();
q->value = x;
q->next = NULL;
p = tail; oldp = p;
do {
while (p->next != NULL)
p = p->next;
} while( CAS(p.next, NULL, q) != TRUE);
CAS(tail, oldp, q);
}
Lock-free queue
Dequeue DeQueue() {
do{
p = head; // head is dummy node
if (p->next == NULL){
return ERR_EMPTY_QUEUE;
}
while( CAS(head, p, p->next) != TRUE );
return p->next->value;
}
CAS ABA issue
It's possible that between the time the old value is
read and the time CAS is attempted, some other
processors or threads change the memory location
two or more times such that it acquires a bit pattern
which matches the old value. The problem arises if
this new bit pattern, which looks exactly like the old
value, has a different meaning
CAS just compare the pointer address, what if this
address is reused?
ABA solution
Double-length CAS
on a 32 bit system, a 64 bit CAS. The second half is used
to hold a counter. The compare part of the operation
compares the previously read value of the pointer *and*
the counter, to the current pointer and counter. If they
match, the swap occurs - the new value is written - but
the new value has an incremented counter.
Double-length CAS SafeRead(q)
{
loop:
p = q->next;
if (p == NULL){
return p;
}
Fetch&Add(p->refcnt, 1);
if (p == q->next){
return p;
}else{
Release(p);
}
goto loop;
}
Lock-free queue in Disruptor
Ring-buffer implementation
sequence mod array length = array index
Only tail pointer
It is faster,
array, cache-friendly, pre-loaded, pre-allocate, no need to clean
up
Lock-free queue
Add data to Disruptor
http://ifeve.com/disruptor-writing-ringbuffer/
Cache Line
Cache
cache line 64 bytes.
Java long type 8 bytes, so 8 long variables in one cache
line
False sharing
Two variables, one is head, another is tail.
False sharing struct foo {
int x;
int y;
};
static struct foo f;
/* The two following functions are running concurrently: */
int sum_a(void){
int s = 0;
int i;
for (i = 0; i < 1000000; ++i)
s += f.x;
return s;
}
void inc_b(void){
int i;
for (i = 0; i < 1000000; ++i)
++f.y;
}
Eliminate False sharing
Disruptor – cache line padding
public long p1, p2, p3, p4, p5, p6, p7;//cache line padding
Private volatile long cursor = 0;
http://www.drdobbs.com/parallel/eliminate-false-
sharing/217500206
http://ifeve.com/false-sharing/
http://ifeve.com/volatile/
Memory Barrier
a type of barrier instruction which causes a central
processing unit (CPU) or compiler to enforce an
ordering constraint on memory operations issued
before and after the barrier instruction. This typically
means that certain operations are guaranteed to be
performed before the barrier, and others after.
Memory Barrier
编译器和CPU可以在保证输出结果一样的情况下对指
令重排序,使性能得到优化。
强制更新一次不同CPU的缓存。
volatile
volatile,Java内存模型将在写操作后插入一个写屏障指令,在读操作前插入一个读屏障指令。
一旦你完成写入,任何访问这个字段的线程将会得到最新的值。 在你写入前,会保证所有之前发生的事已经发生,并且任何更新过的数据值也是可见的,因为内存屏障会把之前的写入值都刷新到缓存。
http://hedengcheng.com/?p=725
Summary术语 英文单词 描述
共享变量 在多个线程之间能够被共享的变量被称为共享变量。共享变量包括所
有的实例变量,静态变量和数组元素。他们都被存放在堆内存中,
Volatile只作用于共享变量。
内存屏障 Memory Barriers 是一组处理器指令,用于实现对内存操作的顺序限制。
缓冲行 Cache line 缓存中可以分配的最小存储单位。处理器填写缓存线时会加载整个缓
存线,需要使用多个主内存读周期。
原子操作 Atomic operations 不可中断的一个或一系列操作。
缓存行填充 cache line fill 当处理器识别到从内存中读取操作数是可缓存的,处理器读取整个缓
存行到适当的缓存(L1,L2,L3的或所有)
缓存命中 cache hit 如果进行高速缓存行填充操作的内存位置仍然是下次处理器访问的地
址时,处理器从缓存中读取操作数,而不是从内存。
写命中 write hit 当处理器将操作数写回到一个内存缓存的区域时,它首先会检查这个
缓存的内存地址是否在缓存行中,如果存在一个有效的缓存行,则处
理器将这个操作数写回到缓存,而不是写回到内存,这个操作被称为
写命中。
写缺失 write misses the cache 一个有效的缓存行被写入到不存在的内存区域。
http://ifeve.com/disruptor/ - Disruptor
Thanks
Concurrency vs parallelism