multiprocessors and multithreading – classroom slides
DESCRIPTION
Multiprocessors and Multithreading – classroom slides. Example use of threads - 1. compute thread. I/O thread. compute. I/O request. I/O. I/O complete. I/O result Needed. I/O result Needed. compute. (a) Sequential process. (b) Multithreaded process. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/1.jpg)
Multiprocessors and Multithreading – classroom slides
![Page 2: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/2.jpg)
compute
compute
I/O
I/O resultNeeded
(a) Sequential process
compute thread
I/O resultNeeded
(b) Multithreaded process
I/O request
I/O complete
I/O thread
Example use of threads - 1
![Page 3: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/3.jpg)
Digitizer Tracker Alarm
Example use of threads - 2
![Page 4: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/4.jpg)
Programming Support for Threads• creation
– pthread_create(top-level procedure, args)• termination
– return from top-level procedure– explicit kill
• rendezvous– creator can wait for children
• pthread_join(child_tid)
• synchronization– mutex– condition variables
Main thread
thread_create(foo, args)
(a) Before thread creation
main thread
thread_create(foo, args)
(b) After thread creation
foo thread
![Page 5: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/5.jpg)
Sample program – thread create/join
int foo(int n){ ..... return 0;}int main(){ int f; thread_type child_tid; ..... child_tid = thread_create (foo, &f); ..... thread_join(child_tid);}
![Page 6: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/6.jpg)
Programming with Threads
• synchronization– for coordination of the threads
• communication– for inter-thread sharing of data– threads can be in different processors– how to achieve sharing in SMP?
• software: accomplished by keeping all threads in the same address space by the OS
• hardware: accomplished by hardware shared memory and coherent caches
producer consumer
buffer
![Page 7: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/7.jpg)
Need for Synchronization
digitizer(){ image_type dig_image; int tail = 0; loop { if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } }}
tracker(){ image_type track_image; int head = 0; loop { if (bufavail < MAX) { track_image = frame_buf[head mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } }}
Problem?
![Page 8: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/8.jpg)
digitizer tracker
bufavail bufavail = bufavail – 1; bufavail = bufavail + 1;
Shared data structure
……
head tail
(First valid filled frame in frame_buf)
(First empty spot in frame_buf)
0 99 frame_buf
![Page 9: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/9.jpg)
Synchronization Primitives• lock and unlock
– mutual exclusion among threads– busy-waiting Vs. blocking– pthread_mutex_trylock: no blocking– pthread_mutex_lock: blocking– pthread_mutex_unlock
![Page 10: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/10.jpg)
Fix number 1 – with locks
digitizer(){ image_type dig_image; int tail = 0; loop { thread_mutex_lock(buflock); if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } thread_mutex_unlock(buflock); }}
tracker()( image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail < MAX) { track_image = frame_buf[head
mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } thread_mutex_unlock(buflock); }}Problem?
![Page 11: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/11.jpg)
Fix number 2
digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); while (bufavail == 0) do nothing;
thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } }
tracker(){ image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); while (bufavail == MAX) do nothing; thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); }}
Problem?
![Page 12: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/12.jpg)
Fix number 3
digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); while (bufavail == 0) do nothing; frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } }
tracker(){ image_type track_image; int head = 0; loop { while (bufavail == MAX) do nothing; track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); }}
Problem?
![Page 13: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/13.jpg)
• condition variables– pthread_cond_wait: block for a signal– pthread_cond_signal: signal one waiting thread– pthread_cond_broadcast: signal all waiting
threads
![Page 14: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/14.jpg)
T1 T2
cond_wait (c, m)
cond_signal (c) blocked
resumed
T1 T2
cond_wait (c, m)
cond_signal (c)
(a) Wait before signal (b) Wait after signal (T1 blocked forever)
Wait and signal with cond vars
![Page 15: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/15.jpg)
Fix number 4 – cond var
digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); if (bufavail == 0)
thread_cond_wait(buf_not_full, buflock); thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_cond_signal(buf_not_empty); thread_mutex_unlock(buflock); }}
tracker(){ image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail == MAX)
thread_cond_wait(buf_not_empty, buflock); thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_cond_signal(buf_not_full); thread_mutex_unlock(buflock); analyze(track_image); }}
This solution is correct so long as there is exactly one producer and one consumer
![Page 16: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/16.jpg)
Gotchas in programming with cond vars
acquire_shared_resource(){ thread_mutex_lock(cs_mutex); if (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex);
res_state = BUSY; thread_mutex_unlock(cs_mutex);}release_shared_resource(){ thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; thread_cond_signal(res_not_busy); thread_mutex_unlock(cs_mutex);}
T3 is here
T2 is here
T1 is here
![Page 17: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/17.jpg)
cs_mutex T3
res_not_busy T2
(a) Waiting queues before T1 signals
cs_mutex T3
res_not_busy
T2
(a) Waiting queues after T1 signals
State of waiting queues
![Page 18: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/18.jpg)
Defensive programming – retest predicate
acquire_shared_resource(){ thread_mutex_lock(cs_mutex); T3 is here while (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex); T2 is here res_state = BUSY; thread_mutex_unlock(cs_mutex);}release_shared_resource(){ thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; T1 is here thread_cond_signal(res_not_buys); thread_mutex_unlock(cs_mutex);}
![Page 19: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/19.jpg)
mail box
Dispatcher
workers
(a) Dispatcher model
mailbox
mailbox
(b) Team model
(c) Pipelined model
stages
Threads as software structuring abstraction
![Page 20: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/20.jpg)
Threads and OS
Traditional OS
• DOS– memory layout
– protection between user and kernel?
User
Kernel
Program data
DOS code data
![Page 21: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/21.jpg)
• Unix– memory layout
– protection between user and kernel?– PCB?
user
kernel
P1 P2
process code and data
process code and data
kernel code and data
PCB PCB
![Page 22: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/22.jpg)
• programs in these traditional OS are single threaded– one PC per program (process), one stack, one
set of CPU registers– if a process blocks (say disk I/O, network
communication, etc.) then no progress for the program as a whole
![Page 23: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/23.jpg)
MT Operating Systems
How widespread is support for threads in OS?
• Digital Unix, Sun Solaris, Win95, Win NT, Win XP
Process Vs. Thread?
• in a single threaded program, the state of the executing program is contained in a process
• in a MT program, the state of the executing program is contained in several ‘concurrent’ threads
![Page 24: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/24.jpg)
Process Vs. Thread
– computational state (PC, regs, …) for each thread
– how different from process state?
P1 P2User
Kernel kernel code and data
code data code data
PCB PCB
T2 T3 T1 T1
P1 P2
![Page 25: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/25.jpg)
(a) ST program (b) MT program
code code
global global
heap heap
stack stack1 stack2 stack3 stack4
![Page 26: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/26.jpg)
• threads– share address space of process– cooperate to get job done
• threads concurrent?– may be if the box is a true multiprocessor– share the same CPU on a uniprocessor
• threaded code different from non-threaded?– protection for data shared among threads– synchronization among threads
![Page 27: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/27.jpg)
Threads Implementation
• user level threads– OS independent– scheduler is part of the runtime system– thread switch is cheap (save PC, SP, regs)– scheduling customizable, i.e., more app control– blocking call by thread blocks process
![Page 28: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/28.jpg)
Kernel
User P2
Threads library
T1 T2 T3
T2 T3 T1
P1 P2 P3 process ready_q
P3
mutex, cond_var
threadready_q
P1
Threads library
T1 T2 T3
T2 T3 T1
mutex, cond_var
threadready_q
![Page 29: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/29.jpg)
Kernel
User P1
Threads library
T2 T3 T1
Currently executing thread
Blocking call to the OS Upcall to
the threads library
![Page 30: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/30.jpg)
• solution to blocking problem in user level threads– non-blocking version of all system calls– polling wrapper in scheduler for such calls
• switching among user level threads– yield voluntarily– how to make preemptive?
• timer interrupt from kernel to switch
![Page 31: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/31.jpg)
• Kernel level– expensive thread switch– makes sense for blocking calls by threads– kernel becomes complicated: process vs.
threads scheduling– thread packages become non-portable
• problems common to user and kernel level threads– libraries– solution is to have thread-safe wrappers to such
library calls
![Page 32: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/32.jpg)
Kernel
User P2
P1 P2 P3 process ready_q
P3 P1
T2 T3 T1 T2 T1
T1 T2 T3 T1 T2 thread level scheduler
process level scheduler
![Page 33: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/33.jpg)
Kernel
User P2 P3 P1
T2 T3 T1 T2 T1
lwp
Solaris threads
![Page 34: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/34.jpg)
/* original version */ | /* thread safe version */ | | mutex_lock_type cs_mutex;void *malloc(size_t size)| void *malloc(size_t size){ | { | thread_mutex_lock(cs_mutex); | ...... | ...... ...... | ...... | | thread_mutex_unlock(cs_mutex); | return(memory_pointer);| return (memory_pointer);} | }
Thread safe libraries
![Page 35: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/35.jpg)
Synchronization support
• Lock– Test and set instruction
![Page 36: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/36.jpg)
Shared Memory
CPU CPU CPU CPU . . . . Input/output
Shared bus
SMP
![Page 37: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/37.jpg)
cache
Shared Memory
CPU
Shared bus
cache
CPU
cache
CPU . . . .
SMP with per-processor caches
![Page 38: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/38.jpg)
X
Shared Memory
P1
Shared bus
X
P2
X
P3
T1 T2 T3
Cache consistency problem
![Page 39: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/39.jpg)
X -> X’
Shared Memory
P1
Shared bus
X -> inv
P2
X -> inv
P3
T1 T2 T3
(b) write-invalidate protocol
invalidate ->
X -> X’
Shared Memory
P1
Shared bus
X -> X’
P2
X -> X’
P3
T1 T2 T3
(c) write-update protocol
update ->
Two possible solutions
![Page 40: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/40.jpg)
Given the following details about an SMP (symmetric multiprocessor):Cache coherence protocol: write-invalidateCache to memory policy: write-backInitially:
The caches are emptyMemory locations:
A contains 10B contains 5
Consider the following timeline of memory accesses from processors P1, P2, and P3.Contents of caches and memory?
Time (in increasing order)
Processor P1 Processor P2 Processor P3
T1 Load A
T2 Load A
T3 Load A
T4 Store #40, A
T5 Store #30, B
![Page 41: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/41.jpg)
![Page 42: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/42.jpg)
What is multithreading?
• technique allowing program to do multiple tasks
• is it a new technique?– has existed since the 70’s (concurrent Pascal,
Ada tasks, etc.)
• why now?– emergence of SMPs in particular– “time has come for this technology”
![Page 43: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/43.jpg)
active
• allows concurrency between I/O and user processing even in a uniprocessor box
process
• threads in a uniprocessor?
![Page 44: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/44.jpg)
Multiprocessor: First Principles• processors, memories, interconnection
network
• Classification: SISD, SIMD, MIMD, MISD
• message passing MPs: e.g. IBM SP2
• shared address space MPs– cache coherent (CC)
• SMP: a bus-based CC MIMD machine– several vendors: Sun, Compaq, Intel, ...
• CC-NUMA: SGI Origin 2000
– non-cache coherent (NCC)• Cray T3D/T3E
![Page 45: Multiprocessors and Multithreading – classroom slides](https://reader036.vdocuments.us/reader036/viewer/2022062308/5681329c550346895d993a78/html5/thumbnails/45.jpg)
• What is an SMP?– multiple CPUs in a single box sharing all the
resources such as memory and I/O
• Is an SMP more cost effective than two uniprocessor boxes?– yes (roughly 20% more for a dual processor
SMP compared to a uni)– modest speedup for a program on a dual-
processor SMP over a uni will make it worthwhile