Download - Threads Cannot be Implemented As a Library Andrew Hobbs

Threads Cannot be Implemented As a Library

Andrew Hobbs

As a library...what does that mean?

Language specification doesn't say anything about it

The specification defines what compilers should do

So the compiler doesn't know about them either

How does this affect programming?

The compiler transforms your code to hopefully make it as fast as possible

It has some restrictions, depending on the language specification

But if the compiler doesn't know about concurrency...

It can make optimizations that are valid in sequential programs, but can cause bugs in multiprocessor environments

An example

x = 1; r1 = y;

Thread 1:

Thread 2:

y = 1; r2 = x;

What are the possible values of r1 and r2 at the end of both threads executing?

Assuming x and y are both set to 0, suppose we have 2 threads:

An example

r1 = y; x = 1;

Thread 1:

Thread 2:

r2 = x; y = 1;

What are the possible values of r1 and r2 at the end of both threads executing?

This results could turn out differently...but from the compiler's view, everything is fine, because it doesn't know each thread can interact with others.

But what if our compiler changes our code to the following?

Why did this happen?

The compiler didn't know about concurrency, so it performed optimizations assuming sequential execution

Some of these don't work with concurrency! In fact, the hardware itself can also do this in

an attempt to speed up execution, by (for example) putting loads before unrelated stores

The Pthreads approach

No threads shall read or modify memory that another thread is modifying (such an activity is called a race condition)

To restrict access, the programmer uses synchronization routines:

pthread_mutex.lock() pthread_mutex.unlock() …

The Pthreads approach

If the programmer uses the synchronization methods correctly to prevent race conditions, then they should have no issues

But this isn't quite true...

Concurrent modification

if (x == 1) ++y;

Thread 1:

Thread 2:

if (y == 1) ++x;

Is there a data race in this program?

Suppose we had the following two threads:

Concurrent modification

++y; if (x != 1) --y;

Thread 1:

Thread 2:

++x; if (y != 1) --x;

Is there a data race in this program?

What if our compiler modified our code a little?

Adjacent data

{ tmp = x; // Read both fields into // 32-bit variable tmp &= ~0x1ffff; // Mask off old a tmp |= 42; x = tmp; // Overwrite all of x}

There are probably no machines that have a 17-bit wide store, so if someone were to attempt to execute: x.a = 42; it would probably be done like this:

Suppose we had the following structure definition:

struct { int a:17; int b:15 } x;

Adjacent data

x.b = ’b’; x.c = ’c’; x.d = ’d’;x.e = ’e’; x.f = ’f’; x.g = ’g’; x.h = ’h’;

x = ’hgfedcb\0’ | x.a;

Where a is the only field that needs to be protected by a lock.If that was the case, some programmer might write the following code:

Suppose we had the following structure definition:

struct { char a; char b; char c; char d; Char e; char f; char g; char h; } x;

But a compiler might realize that it could just write all of the data at once as a 64-bit quantity (not exact syntax):

Register PromotionSuppose we had a global shared variable x, protected by a lock...but only conditionally, perhaps only if we had actually created other threads:

for (...) { ... if (mt) pthread_mutex_lock(...); x = ... x ... if (mt) pthread_mutex_unlock(...);}

r = x;for (...) { ... if (mt) { x = r; pthread_mutex_lock(...); r = x; } r = ... r ... if (mt) { x = r; pthread_mutex_unlock(...); r = x; }}x = r;

If the conditionals are rarely taken, it might decide to promote x to a register to increase the performance:

What does this mean?

Pthreads says that as long as we prevent race conditions with the synchronization functions, we will be fine

But since our compiler doesn't know, it might make optimizations that break it, even though it looks perfectly fine to us

We can't use locks at a high level if the presence of race conditions depends on the compiler and the hardware

Performance

So why are we running multiple threads?

To (hopefully) get better performance out of our program

But locking is expensive! Atomic updates are hundreds of times slower than normal ones

Is synchronization always needed?Consider the following Sieve of Eratosthenes implementation:

for (my_prime = start; my_prime < 10000; ++my_prime) if (!get(my_prime)) { for (multiple = my_prime; multiple < 100000000; multiple += my_prime) if (!get(multiple)) set(multiple);}

What happens if we run this on multiple threads, with all of them accessing one shared data block?

The conclusions?

Sometimes there are times when you can gain large performance benefits without directly using atomic operations

But if we use a library that disallows this (like Pthreads), we are throwing away this ability

But we are allowed to, then we need the compiler and hardware to somehow know about it and help us

The conclusions?

So how do we get the compiler and hardware to help us?

We need to have the programming language itself define a memory model so that the programmer knows whether there are races

Only if we have that can we reason about our programs

Download - Threads Cannot be Implemented As a Library Andrew Hobbs

Top Related