the c++11 memory model

21
The C++11 Memory Model CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC Wiki Created by Eran Gilad 1

Upload: jarah

Post on 23-Feb-2016

141 views

Category:

Documents


0 download

DESCRIPTION

The C++11 Memory Model. CDP 2012 Based on “C ++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC Wiki Created by Eran Gilad. Reminder: what is a memory model?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The C++11  Memory Model

1

The C++11 Memory Model

CDP 2012Based on “C++ Concurrency In Action” by Anthony Williams

and The C++11 Memory Model and GCC Wiki

Created by Eran Gilad

Page 2: The C++11  Memory Model

2

The guarantees provided by the runtime environment to a multithreaded program, regarding memory operations.

Each level of the environment might have a different memory model – CPU, virtual machine, language.

The correctness of parallel algorithms depends on the memory model.

Reminder: what is a memory model?

Page 3: The C++11  Memory Model

3

Isn’t the CPU memory model enough? Threads are now part of the language. Their

behavior must be fully defined. Till now, different code was required for

every compiler, OS and CPU combination. Having a standard guarantees portability

and simplifies the programmer’s work.

Why C++ needs a memory model

Page 4: The C++11  Memory Model

4

Given the following data race:Thread 1: x = 10;Thread 2: cout << x;

Pre-2011 C++ C++11

C++ in the past and now

Huh? What’s a “Thread”?Namely – behavior is unspecified.

Undefined Behavior.But now there’s a way to get it right.

Page 5: The C++11  Memory Model

5

if (x >= 0 && x < 3) {switch (x) {case 0: do0(); break;case 1: do1(); break;case 2: do2(); break;}

}

Why Undefined Behavior mattersCan the compiler convert the switch to a quick jump table, based on x’s value?

At this point, on some other thread:

x = 8;

Switch jumps to unknown location. Your monitor catches fire.

Optimizations that are safe on sequential programs might not be safe on parallel ones.So, races are forbidden!

Page 6: The C++11  Memory Model

6

Optimizations are crucial for increasing performance, but might be dangerous: Compiler optimizations:

◦ Reordering to hide load latencies◦ Using registers to avoid repeating loads and

stores CPU optimizations:

◦ Loose cache coherence protocols◦ Out Of Order Execution

A thread must appear to execute serially to other threads. Optimizations must not be visible to other threads.

Dangerous optimizations

Page 7: The C++11  Memory Model

7

Every variable of a scalar (simple) type occupies one memory location.

Bit fields are different. One memory location must not be affected

by writes to an adjacent memory location!

Memory layout definitions

struct s { char c[4]; int i: 3, j: 4;

struct in { double d; } id;};

Can’t read and write all 4 bytes when changing just one. Same mem. locationConsidered one mem. loc. even if hardware has no 64-bit atomic ops.

Page 8: The C++11  Memory Model

8

Sequenced before : The order imposed by the appearance of (most) statements in the code, for a single thread.

Synchronized with : The order imposed by an atomic read of a value that has been atomically written by some other thread.

Inter-thread happens-before : A combination of the above.

An event A happens-before an event B if:◦ A inter-thread happens-before B or ◦ A is sequenced-before B.

Order definitions

Page 9: The C++11  Memory Model

9

Thread 1

(1.1) read x, 1

Sequenced Before

(1.2) write y, 2

Order definitions (2) Thread 2

(2.1) read y, 2

Sequenced Before

(2.2) write z, 3Synchronize

d

With

(1.1) inter-thread happens-before (2.2)The full happens-before order:

1.1 < 1.2 < 2.1 < 2.2

Page 10: The C++11  Memory Model

10

Various classes – thread, mutex, condition_variable (for wait/notify) etc.

Not extremely strong, to allow portability (native_handle() allows lower level ops)

RAII extensively used – no finally clause in C++ exceptions.

Threading API in a nutshell

std::mutex m;int main() { m.lock(); std::thread t(f); // do stuff m.unlock(); t.join();}

void f() { lock_guard<mutex> lock(m); // do stuff // auto unlock}

Page 11: The C++11  Memory Model

11

The class to use when writing lock-free code!

A template wrapper around various types, providing access that is:◦ Atomic – reads and writes are done as a whole.◦ Ordered with respect to other accesses to the

variable (or others). The language offers specializations for basic

types (bool, numbers, pointers). Custom wrappers can be created by the programmer.

Depending on the target platform, operations can be lock-free, or protected by a mutex.

std::atomic

Page 12: The C++11  Memory Model

12

The class offers member functions and operators for atomic loads and stores.

Atomic compare-exchange is also available. Some specializations offer arithmetic

operations such as ++, which are in fact executed atomically.

All operations guarantee sequential consistency by default. However, other options are possible as well.

Using atomic, C++ memory model supports various orderings of memory operations.

std::atomic (2)

Page 13: The C++11  Memory Model

13

SC is the strongest order. It is the most natural to program with, and is therefore the default (can also explicitly specify memory_order_seq_cst if you want).

All processors see the exact same order of atomic writes.

Compiler optimizations are limited – no action can be reordered past an atomic action.

Atomic writes flush non-atomic writes that are sequenced-before them.

Using atomics – sequential consistency

Page 14: The C++11  Memory Model

14

atomic<int> x;int y;

void thread1() {y = 1;x.store(2);

}

Using atomics – sequential consistency (2)

void thread2() {if (x.load() == 2)assert(y == 1);}

The assert is guaranteed not to fail.

Page 15: The C++11  Memory Model

15

Happens-before now only exists between actions to the same variables.

Similar to cache coherence, but can be used on systems with incoherent caches.

Imposes fewer limitations on the compiler – atomic actions can now be reordered if they access different memory locations.

Less synchronization is required on the hardware side as well.

Using atomics – relaxed consistency

Page 16: The C++11  Memory Model

16

Both asserts might fail –no order between operations on x and y.

Using atomics – relaxed consistency (2)#define rlx memory_order_relaxed /* save slide space… */atomic<int> x, y;

void thread1() { y.store(20, rlx); x.store(10, rlx);}void thread2() { if (x.load(rlx) == 10) { assert(y.load(rlx) == 20); y.store(10, rlx); }}

void thread3() { if (y.load(rlx) == 10) assert(x.load(rlx) == 10);}

Page 17: The C++11  Memory Model

17

Let’s look at the trace of a run in which the assert failed (read 0 instead of 10 or 20):

Using atomics – relaxed consistency (3)

T1: W x, 10; W y, 20;

T2: R x, 10; R y, 0; W y, 10;

T3: R y, 10; R x, 0;

T1: W x, 10; W y, 20;

T2: R x, 10; R y, 0; W y, 10;

T3: R y, 10; R x, 0;

T1: W x, 10; W y, 20;

T2: R x, 10; R y, 0; W y, 10;

T3: R y, 10; R x, 0;

Page 18: The C++11  Memory Model

18

Synchronized-with relation exists only between the releasing thread and the acquiring thread.

Other threads might see updates in a different order.

All writes before a release are visible after an acquire, even if they were relaxed or non-atomic.

Similar to release consistency memory model.

Using atomics – acquire release

Page 19: The C++11  Memory Model

19

Using atomics – acquire release (2)#define rls memory_order_release /* save slide space… */#define acq memory_order_acquire /* save slide space… */atomic<int> x, y;

void thread1() { y.store(20, rls);}

void thread2() { x.store(10, rls);}

void thread3() { assert(y.load(acq) == 20 && x.load(acq) == 0);}void thread4() { assert(y.load(acq) == 0 && x.load(acq) == 10);}

Both asserts might succeed – noorder between writes to x and y.

Page 20: The C++11  Memory Model

20

Declaring a variable as volatile prevents the compiler from placing it in a register.

Exists for cases in which devices write to memory behind the compiler’s back.

Wrongly believed to provide consistency. Useless in C++11 multi-threaded code –

has no real consistency semantics, and might still cause races (and thus undefined behavior).

Use std::atomic instead. In Java, volatile does guarantee atomicity

and consistency. But this is C++…

volatile in multi-threaded C++

Page 21: The C++11  Memory Model

21

Safest order is SC. It is also the default. Orders can be mixed, but the result is extremely

hard to understand… avoid that. Some orders were not covered in this tutorial

(mostly variants of those that were). Memory order is a run-time argument. Use

constants to let the compiler better optimize. Lock-free code is hard to write. Unless speed is

crucial, better use mutexes. C++11 is new. Not all features are supported by

common compilers.

Summary and recommendations