pthreads (posix threads)tdesell.cs.und.edu/lectures/pthreads.pdf · threads are more lightweight...

71
PThreads (POSIX Threads) All material not from online sources/textbook copyright © Travis Desell, 2012 A good tutorial/overview can be found here as well: https://computing.llnl.gov/tutorials/pthreads/

Upload: others

Post on 25-Aug-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

PThreads (POSIX Threads)

All material not from online sources/textbook copyright © Travis Desell, 2012

A good tutorial/overview can be found here as well: https://computing.llnl.gov/tutorials/pthreads/

Page 2: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Overview

1. Forking and Joining Threads

2. Busy Waiting

3. Mutexes

4. Semaphores

5. Conditional Variables

6. Read-Write Locks

7. Conclusions

Page 3: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Forking and Joining Threads

Page 4: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A process can have multiple threads, each of which access the same memory of the process. !Many cores have hyper-threading which lets them run multiple threads at the same time. Threads can run across multiple cores and processors on the same motherboard. !Now for some re-cap.

Threads are Shared Memory

Page 5: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Threads are Shared Memory

ALU Regs. !!!

Address Contents !

!

Main Memory

.

.

.

Core

…Ctrl. Regs. !!!

ALU Regs. !!!

Core

Ctrl. Regs. !!!

ALU Regs. !!!

Core

Ctrl. Regs. !!!

ALU Regs. !!!

Core

Ctrl. Regs. !!!

Interconnect

Page 6: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Threads are more lightweight than processes, as they are contained within the same process and utilize the same memory and resources. This allows them to be swapped faster than processes. Threads still need their own program counter and call stack however. !The time it takes to swap a thread or process is called context-switching time.

Threads are Lightweight

Page 7: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

The initial process often acts as the master thread. It will fork off child threads, which later join back to the master process when they have completed their subtask.

Threads

Thread 2

Thread 1

Process

Page 8: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

As the process and its threads share the same memory, it is important to make sure that they don’t try to modify the same memory at the same time, as it could lead to inconsistencies in the program execution.

Threads

Thread 2

Thread 1

Process

Page 9: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Threads can also fork off other threads, which can later join back to them (or to another thread).

Threads

Thread 2

Thread 1

Process

Thread 3

Page 10: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

You can start using pthreads simply by including the pthreads library (it is on almost every system), and compiling with the -lpthreads library.

pthread_hello.cxx

Page 11: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

Threads are started with variables of the the pthread_t type, and you make sets of them just like you would any other array.

pthread_hello.cxx

Page 12: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

pthread_create is the function you use to create threads. Calling this will actually create the thread and start it running.

pthread_hello.cxx

Page 13: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_create takes a set of arguments: !int pthread_create( pthread_t* thread_p, /*out*/ const pthread_attr_t* attr_p, /*in*/ void* (*start_routine)(void*), /*in*/ void* arg_p /*in*/ );

!The first argument, thread_p, is initialized by this call, and returns a pointer to the thread object. !attr_p we can ignore for the time being, it lets us set thread attributes

pthread_create

Page 14: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_create takes a set of arguments: !int pthread_create( pthread_t* thread_p, /*out*/ const pthread_attr_t* attr_p, /*in*/ void* (*start_routine)(void*), /*in*/ void* arg_p /*in*/ );

!The third argument is a function that will be what the thread will run (that function can call other functions and so on). It needs to return a void pointer and take a void pointer as arguments. !And the last argument is a void pointer which contain the arguments that will be passed to the function when the thread starts.

pthread_create

Page 15: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

So in this case, each thread will start as it’s own Hello function, each a different thread argument.

pthread_hello.cxx

Page 16: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

The pthread_create function returns immediately, it doesn’t wait for the function passed into it or the thread it creates to finish. This allows all the threads to be created and parallel and the main thread to do other things.

pthread_hello.cxx

Page 17: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

After you’ve started threads, often you’ll want to wait for them to complete what they’re working on before you proceed to do something else !This can be accomplished with the pthread_join function.

pthread_hello.cxx

Page 18: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_join takes a set of arguments: !int pthread_join( pthread_t thread_p, /*in*/ void** ret_val_p /*out*/ );

!pthread_join waits for the thread identified by thread_p to complete (exit its function). !ret_val_p is set to a pointer to return value from that function.

pthread_join

Page 19: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

01 #include <pthread.h> … 10 //A global variable accessible to all threads. 11 int thread_count; 12 13 void* Hello(void* rank); /* The function for each thread to run */ 14 15 int main(int argc, char** argv) { 16 pthread_t* thread_handles; 24 25 thread_count = atol(argv[1]); 26 28 thread_handles = new pthread_t[thread_count]; 29 30 long thread; 31 for (thread = 0; thread < thread_count; thread++) { 32 pthread_create(&thread_handles[thread], NULL, Hello, (void*)thread); 33 } 34 35 printf("Hello from the main thread!\n"); 37 38 for (thread = 0; thread < thread_count; thread++) { 39 pthread_join(thread_handles[thread], NULL); 40 } 41 42 free(thread_handles); 43 return 0; 44 } 45 46 void* Hello(void* rank) { 47 long my_rank = (long)rank; 50 printf("Hello from thread %ld of %d\n", my_rank, thread_count); 52 return NULL; 53 }

Finally, the memory allocated by the thread objects need to be released.

pthread_hello.cxx

Page 20: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Busy Waiting (don’t do this)

Page 21: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Often when dealing with threads, they’ll all need to access the same piece of memory, but only one thread can be allowed to access it at a given time to prevent memory issues.

Busy Waiting

unsorted_map<string, int> my_map; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } … }

Page 22: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !int flag = 0; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! my_rank = get_rank(arguments); ! while (flag != my_rank); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! flag++; … }

A simple solution might be to put a busy waiting while loop in front of this critical section of the code, and increment the flag after it’s done. !This way (assuming the compiler doesn’t optimize the loop away) will ensure all the threads go into the critical section in order.

Busy Waiting

Page 23: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !int flag = 0; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! my_rank = get_rank(arguments); ! while (flag != my_rank); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! flag++; … }

However, this approach has a lot of problems. Depending on the compiler and it’s optimizations, the while loop might get removed, or some of the critical section might get placed before the while loop.

Busy Waiting

Page 24: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !int flag = 0; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! my_rank = get_rank(arguments); ! while (flag != my_rank); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! flag++; … }

Second, this approach ensures that threads go through the critical section in order. If higher ranked threads get to this section before others, they will have to wait on them.

Busy Waiting

Page 25: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !int flag = 0; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! my_rank = get_rank(arguments); ! while (flag != my_rank); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! flag++; … }

Lastly, all the waiting threads are spinning away busily constantly checking that while loop. This is not a good use of system resources (and on mobile devices would probably put unnecessary drain on the battery).

Busy Waiting

Page 26: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Mutexes

Page 27: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Mutexes provide a much easier way to control access to the critical sections of your program. !Mutex is an abbreviation of mutual exclusion, which is what mutexes provide: the thread in control of the mutex excludes all other threads from obtaining it and entering the critical section until it releases the mutex.

Mutexes

Page 28: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Mutexes are used with 5 different fuctions: ! int pthread_mutex_init(pthread_mutex_t* mutex_p, const pthread_mutexattr_t attr_p); ! int pthread_mutex_lock(pthread_mutex_t* mutex_p); int pthread_mutex_unlock(pthread_mutex_t* mutex_p); int pthread_mutex_trylock(pthread_mutex_t* mutex_p); int pthread_mutex_destroy(pthread_mutex_t* mutex_p);

Mutexes

Page 29: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_mutex_init initializes the mutex. Just like with pthread_create, we can pass NULL for the attribute if we aren’t using it. !This returns 0 on success, an an error value otherwise.

pthread_mutex_init

Page 30: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_mutex_lock will try to get control of the mutex. It will block until it gains control of the mutex. !This returns 0 on success, an an error value otherwise.

pthread_mutex_lock

Page 31: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_mutex_unlock will release the mutex, causing one (and only one) of the other threads waiting for the given mutex to gain the lock. The rest will continue blocking. !This returns 0 on success, an an error value otherwise.

pthread_mutex_unlock

Page 32: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

pthread_mutex_trylock gives a non blocking version of lock. For example, you might want to see if you can get a lock, and if you can’t do something else (and then check back on the lock). !This way you can compute something else while waiting on a lock. It returns 0 when it gets the lock, 1 otherwise).

pthread_mutex_trylock

Page 33: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! pthread_mutex_lock(mymutex); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! pthread_mutex_unlock(mymutex); … }

We can take the previous example code and rewrite it using mutexes. It actually ends up a bit simpler.

Mutexes

Page 34: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! pthread_mutex_lock(mymutex); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! pthread_mutex_unlock(mymutex); … }

Mutexes can also be created statically.

Mutexes

Page 35: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! pthread_mutex_lock(&mymutex); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! pthread_mutex_unlock(&mymutex); … }

Instead of having a busy wait loop, we simply have a lock call.

Mutexes

Page 36: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! pthread_mutex_lock(&mymutex); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! pthread_mutex_unlock(&mymutex); … }

When we leave the critical section, we unlock the mutex.

Mutexes

Page 37: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

unsorted_map<string, int> my_map; !pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER; !void my_thread_function(void* arguments) { … //only one thread can put things in the //unsorted map at a time ! pthread_mutex_lock(&mymutex); ! if (my_map[“key”] > 0) { my_map[“key”]++; } else { my_map.insert(“key”, 1); } ! pthread_mutex_unlock(&mymutex); … }

Note that mutexes do not guarantee any order to how the threads will pass through the critical section. !This makes it more efficient than busy waiting, but if order is important then a mutex is not applicable (however order generally is not important).

Mutexes

Page 38: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Semaphores

Page 39: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Suppose you want to guarantee some order for threads passing through a critical section? Semaphores allow this (and a lot more). Semaphores are commonly used in implementing mailboxes for concurrent and distributed message passing. !They’re named after a mechanical railroad device by Edsger Dijkstra. !In essence, they’re a specialized case of an unsigned int, taking values 0, 1, 2, 3, etc. They are in a locked state when their value is 0, and unlocked otherwise.

Semaphores

Page 40: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

int sem_init(sem_t* semaphore_p /*out*/, int shared /*in*/, unsigned initial_val /*in*/); !int sem_destroy(sem_t* semaphore_p); int sem_post(sem_t* semaphore_p); int sem_wait(sem_t* semaphore_p);

The use of semaphores is similar to mutexes. Instead of lock and unlock, there is wait and post. !Remember, semaphores take values 0..N. !sem_post increments the value of the semaphore. !sem_wait decrements it by 1, unless it is 0. If it is 0, it waits for it to become positive, then decrements it and unblocks.

Semaphores

Page 41: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Implementing a Barrier

A barrier makes all threads reach the same point before they can all proceed. !It is possible to implement a barrier with semaphores that doesn’t require busy waiting (which using only a mutex would).

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Page 42: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Implementing a Barrier

With count_sem initialized to 1, one thread will be able to pass this sem_wait call, all the others will wait.

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Page 43: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Implementing a Barrier

counter won’t be equal to thread_count-1 until all the threads have made it through the sem_wait call.

Page 44: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Implementing a Barrier

The counter will be incremented for every thread that gets through the first sem_wait. !This ensures only 1 thread is accessing the counter at the same time. !After incrementing the counter, each thread will wait on the barrier semaphore.

Page 45: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count - 1; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Implementing a Barrier

When the last thread makes it through the sem_wait on the count semaphore, it will reset the counter, make a post to the count semaphore (so it can be reused for the next barrier) and then post enough to the barrier semaphore to allow all other threads to pass through it.

Page 46: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!int counter = 0; sem_t count_sem; /*initialize to 1*/ sem_t barrier_sem; /*initialize to 0*/ … void* thread_task(…) { … /*start barrier*/ sem_wait( &count_sem ); if (counter == thread_count - 1) { counter = 0; sem_post(&count_sem); for (j = 0; j < thread_count - 1; j++) { sem_post(&barrier_sem); } } else { counter++; sem_post(&count_sem); sem_wait(&barrier_sem); } /*end barrier*/ }

Race conditions in this Barrier

What if we try to reuse this barrier? !A potential problem is that while the last thread going through the barrier is making enough posts, a second thread could reach the beginning of the barrier and snag one of posts to the barrier semaphore; preventing all the threads making it through the first barrier (which will make it wait indefinitely).

Page 47: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Condition Variables

Page 48: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Condition Variables

There is an even better way to implement a barrier. Pthreads allow for condition variables, which allow threads to wait until a certain event (or condition) happens. These conditions are always associated with a mutex. !Condition variables are also used similar to mutexes, however there is a third option. With a condition variable, it is possible to wait, signal a single thread to unlock, or broadcast making all waiting threads unlock.

Page 49: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

int pthread_cond_init( pthread_cond_t* cond_p /*out*/, const pthread_condattr_t cond_attr_p /*in*/); !int pthread_cond_destroy(pthread_cond_t *cond_p); !!int pthread_cond_signal(pthread_cond_t *cond_var_p); int pthread_cond_broadcast(pthread_cond_t *cond_var_p); int pthread_cond_wait(pthread_cond_t *cond_var_p, pthread_mutex_t *mutex_p);

Just like mutexes, condition variables need to be initialized and destroyed. !pthread_cond_wait causes threads to wait on the mutex for a given signal. !pthread_cond_signal will cause one (and only one) thread to unlock from the pthread_cond_wait. !pthread_cond_broadcast will cause all threads to unlock from the pthread_cond_wait.

Condition Variables

Page 50: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

int pthread_cond_wait(pthread_cond_t *cond_var_p, pthread_mutex_t *mutex_p); !!//pthread_cond_wait is essentially: !pthread_mutex_unlock(&mutex_p); wait_on_signal(&cond_var_p); pthread_mutex_lock(&mutex_p);

In detail, pthread_cond_wait unlocks the mutex it is referring to, and cause the executing thread to block until it is unblocked by another thread’s pthread_cond_signal or pthread_cond_broadcast.

Condition Variables

Page 51: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!/*shared variables*/ int counter = 0; pthreat_mutex_t mutex; pthread_cond_t cond_var; … void* thread_task(…) { … /*start barrier*/ pthread_mutex_lock(&mutex); counter++; if (counter == thread_count) { counter = 0; pthread_cond_broadcast(&cond_var); } else { while (pthread_cond_wait(&cond_var, &mutex) != 0); } pthread_mutex_unlock(&mutex); /*end barrier*/ }

A Safer Barrier

Using condition variables we can make a reusable barrier without race conditions.

Page 52: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!/*shared variables*/ int counter = 0; pthreat_mutex_t mutex; pthread_cond_t cond_var; … void* thread_task(…) { … /*start barrier*/ pthread_mutex_lock(&mutex); counter++; if (counter == thread_count) { counter = 0; pthread_cond_broadcast(&cond_var); } else { while (pthread_cond_wait(&cond_var, &mutex) != 0); } pthread_mutex_unlock(&mutex); /*end barrier*/ }

A Safer Barrier

This will let one thread through at a time.

Page 53: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!/*shared variables*/ int counter = 0; pthreat_mutex_t mutex; pthread_cond_t cond_var; … void* thread_task(…) { … /*start barrier*/ pthread_mutex_lock(&mutex); counter++; if (counter == thread_count) { counter = 0; pthread_cond_broadcast(&cond_var); } else { while (pthread_cond_wait(&cond_var, &mutex) != 0); } pthread_mutex_unlock(&mutex); /*end barrier*/ }

A Safer Barrier

The first thread_count-1 threads will enter the while loop, unlocking the mutex for another thread to enter the while loop, etc. !This needs to be in a while loop in case the pthread_cond_wait call exits on an error (or potentially from another signal).

Page 54: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!/*shared variables*/ int counter = 0; pthreat_mutex_t mutex; pthread_cond_t cond_var; … void* thread_task(…) { … /*start barrier*/ pthread_mutex_lock(&mutex); counter++; if (counter == thread_count) { counter = 0; pthread_cond_broadcast(&cond_var); } else { while (pthread_cond_wait(&cond_var, &mutex) != 0); } pthread_mutex_unlock(&mutex); /*end barrier*/ }

A Safer Barrier

When the last thread gets through the mutex, it can broadcast to all other threads, causing them to exit the pthread_cond_wait statement, and then exit the barrier.

Page 55: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!/*shared variables*/ int counter = 0; pthreat_mutex_t mutex; pthread_cond_t cond_var; … void* thread_task(…) { … /*start barrier*/ pthread_mutex_lock(&mutex); counter++; if (counter == thread_count) { counter = 0; pthread_cond_broadcast(&cond_var); } else { while (pthread_cond_wait(&cond_var, &mutex) != 0); } pthread_mutex_unlock(&mutex); /*end barrier*/ }

A Safer Barrier

Finally, the mutex is unlocked to get all the threads out of the pthread_cond_wait (remember the 3rd part of it is to lock the mutex).

Page 56: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Read-Write Locks

Page 57: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Read-Write Locks

Say we want to have a data structure which is thread safe, i.e., multiple threads can access it simultaneously without data becoming corrupted and without there being segfaults. !One of the simplest well performing strategies for this is to use read-write locks. The general idea is that multiple threads can read from the data structure simultaneously without any problem because they aren’t changing anything. On the other head, only one thread can write to the data structure at a time because writing is when problems occur.

Page 58: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

int pthread_rwlock_init( pthread_rwlock_t* rwlock_p /*out*/, const pthread_rwlockattr_t attr_p /*in*/); !int pthread_rwlock_destroy(pthread_rwlock_t *rwlock_p); !!int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock_p); int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock_p); int pthread_rwlock_unlock(pthread_rwlock_t *rwlock_p);

Similar to the other locks, rowlocks need to be initialized and destroyed. !pthread_rwlock_rdlock locks the rwlock for reading. Multiple threads can hold the read lock at the same time. !pthread_rwlock_wrlock locks the rwlock for writing. Only one thread can hold the lock for writing (and no threads can have it for reading while the write lock is held). !pthread_rwlock_unlock unlocks the rwlock.

Read-Write Locks

Page 59: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Concurrent/Lock Free Data Structures

For more info: http://en.wikipedia.org/wiki/Non-blocking_algorithm

Page 60: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Concurrent/Lock-Free Data Structures

When developing your data structures for concurrent use you want to make sure that this does not cause any data inconsistencies, race conditions, deadlocks, etc.

Page 61: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Non-Blocking Algorithms

In current terminology, there are non-blocking algorithms which ensure that if there are multiple threads competing for their resource, no thread is postponed indefinitely by mutual exclusion (i.e., a mutex lock). !Non-blocking algorithms can be lock-free, where the entire system is guaranteed to progress if it runs long enough (although individual threads my starve or block indefinitely). !They can also be wait-free, which is stronger. This guarantees each thread will make progress (i.e., no thread starves).

Page 62: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Lock-Free Data Structures

You should be aware that in recent years implementations of different data structures (such as queues) have been made lock free, which can provide some big performance benefits. See: !! 1.! Michael, Maged; Scott, Michael (May 23–26). "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue

Algorithms". Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing(PODC 1996). Philadelphia, Pennsylvania, USA: ACM Press. pp. 267–275. ISBN 0-89791-800-2.!

! 2.! Kogan, Alex; Petrank, Erez (February 25–29). "A methodology for creating fast wait-free data structures". Proceedings of the 17ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 2012). New Orleans, LA: ACM Press. pp. 141–150. ISBN 978-1-4503-1160-1.

Page 63: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Lock-Free Data Structures

Many of these lock-free data structures are build using a hardware provided Compare and Swap (CAS) operator1. !Compare and Swap compares the contents of a memory location to a given value, and if they are the same, modifies the value at that memory location to a new given value. It will return true if the swap occurred, and false otherwise. !int cas(void *pointer, int compare_to, int new_value); !This must be done atomically (no other threads can access/modify the memory) for this to be useful without memory issues.

1http://en.wikipedia.org/wiki/Compare-and-swap

Page 64: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A two-lock queue

struct node_t {value: data type, next: pointer_t} struct queue_t {Head: pointer_t, Tail: pointer_t, H_lock: lock type, T_lock: lock type} !initialize(Q: pointer to queue_t) node_t node = node_node(); node->next.ptr = NULL Q->head = Q->tail = node Q->H_lock = Q->T_lock = FREE #locks are initial unlocked (free)

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 65: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A two-lock queue

enqueue(Q: pointer to queue_t, value: data type) node = new_node() node->value = value node->next.ptr = NULL lock(&Q->T_lock) #acquire T_lock to access tail Q->Tail->next = node #link node at the end of the linked list Q->Tail = node #swing tail to node unlock(&Q->T_lock)

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 66: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A two-lock queue

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean lock(&Q->H_lock) #Acquire H_lock in order to access Head node = Q->Head #Read Head new_head = node->next #Read next pointer if new_head == NULL #Is the queue empty? unlock(&Q->H_lock) #If so, release the lock and return false return FALSE endif *pvalue = new_head->value #queue was not empty, read value before #releasing lock Q->Head = new_head #swing head to next node unlock(&Q->H_lock) #release H_lock free(node) #free node return TRUE #queue was not empty, dequeue succeeded

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 67: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A lock free queue

struct pointer_t {ptr: pointer to node_t, count: unsigned int} struct node_t {value: data type, next: pointer_t} struct queue_t {Head: pointer_t, Tail: pointer_t} !initialize(Q: pointer to queue_t) node_t node = node_node(); node->next.ptr = NULL Q->head = Q->tail = node

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 68: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A lock free queue

enqueue(Q: pointer to queue_t, value: data type) E1: node = new_node() #create a new node E2: node->value = value E3: node->next.ptr = NULL E4: loop #keep trying until enqueue finished E5: tail = Q->Tail #read tail.ptr and tail.count together E6: next = tail.ptr->next #read next ptr and count fields together E7: if tail == Q->Tail #are tail and next consistent? E8: if next.ptr == NULL #was tail pointing to the last node? #try to link to the node at the end of the linked list E9: if CAS(&tail.ptr->next, next, <node, next.count+1>) E10: break #enqueue done, exit loop E11: endif E12: else #tail was not pointing to the last node #try to swing tail to the next node E13: CAS(&Q->Tail, tail, <next.ptr, tail.count+1>) E14: endif E15: endif E16: endloop #enqueue done, try to swing tail to the next node E17: CAS(&Q->Tail, tail, <node, tail.count+1>

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 69: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

A lock free queuedequeue(Q: pointer to queue_t, value: data type) : boolean D1: loop #keep trying until dequeue finished D2: head = Q->Head D3: tail = Q->Tail D4: next = head->next D5: if head.ptr == Q->Head #are head, tail and next consistent? D6: if head.ptr == tail.ptr #is the queue empty or tail falling behind? D7: if next.ptr == NULL #is the queue empty? D8: return FALSE #queue is empty, could not dequeue D9: endif #tail is falling behind, try to advance it D10: CAS(&Q->Tail, tail, <next.ptr, tail.count+1>) D11: else #no need to deal with tail #Read value before CAS, otherwise another dequeue might #free the next node D12: *pvalue = next.ptr->value #try to swing head to the next node D13: if CAS(&Q->Head, head, <next.ptr,head.count+1>) D14: break #dequeue is done, exit loop D15: endif D16: endif D17: endif D18: endloop D19: free(head.ptr) D20: return TRUE

Michael & Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms: http://dl.acm.org/citation.cfm?doid=248052.248106

Page 70: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

!

Conclusions

Page 71: PThreads (POSIX Threads)tdesell.cs.und.edu/lectures/pthreads.pdf · Threads are more lightweight than processes, as they are contained within the same process and utilize the same

Conclusions

Just because a program has the right output does not mean it is correct! (See the barrier implementation(s)). !Minimizing the use of locks makes your programs faster! If you use locks poorly, you end up with serial programs. !Researchers are actively working on better and faster concurrent data structures, trying to eliminate as many locks as possible. Keep yourself up to date!