cs510 – advanced operating systems 1 the synergy between non-blocking synchronization and...
TRANSCRIPT
CS510 – Advanced Operating Systems 1
The Synergy Between Non-blocking Synchronization and Operating
System Structure
By Michael Greenwald and David Cheriton
Presented by Jonathan Walpole
CS510 – Advanced Operating Systems
2
The Cache Kernel
Yet Another Attempt at a µ-kernel Minimal privileged mode code: Caching model
of kernel functionality OS functionality in user-mode class libraries
allowing application-customizablility of OS Memory and signal-based communication Goal: scalable, robust, flexible operating system
design
CS510 – Advanced Operating Systems
3
Synergy of NBS and kernel structure
Claim: NBS and Good Design go together in the Cache Kernel design and implementation
o NBS allows better OS structuringo Good OS structure simplifies NBS
CS510 – Advanced Operating Systems
4
Non-blocking Synchronization
Basic ideao Associate version number with data structureo Read version number at start of critical sectiono Atomically check and increment version number at end
of critical section at the same time as applying the update(s), using a Double Compare and Swap (DCAS) instruction.
o Try again on failure
Similar to Synthesis approach
CS510 – Advanced Operating Systems
5
Example: Deletion from Linked List
do {retry:
backoffIfNeeded();version = list->version; /* Read version # */for (p = list->head;(p->next != elt); p = p->next) {
if (p == NULL) { /* Not found */if (version != list->version){ goto retry; }/* List changed */return NULL; /* Really not found
*/}
}} while( !DCAS( &(list->version), &(p->next),
version, elt,version+1, elt->next
) )
CS510 – Advanced Operating Systems
6
Double Compare and Swap
int DCAS( int *addr1, int *addr2, int old1, int old2, int new1, int new2){ <begin atomic>
if ((*addr1 == old1) && (*addr2 == old2)) { *addr1 = new1;
*addr2 = new2;
return(TRUE);
} else { return(FALSE);
} <end atomic> }
CS510 – Advanced Operating Systems
7
Problem …
What happens if someone else deletes an element while we are traversing the list?
o What if we end up traversing into the free pool?o What if the memory is reused?o What if we end up in a different data structure?o What if the memory is reused for a different type?
How can we avoid these problems?
CS510 – Advanced Operating Systems
8
Solutions to reader hijacking
If memory is reclaimed immediately upon updates, readers must protect themselves from hijacking
Possible solution 1?o Readers use a load-linked / store-conditional (LL/SC)
sequence to detect changed pointers at the end of their critical sections … or at the end of each use of a pointer
Solution 2:o Readers verify that version numbers associated with
data structures have not changed (2 memory barriers) at the end of each use of a pointer
CS510 – Advanced Operating Systems
9
Type-stable memory management (TSM)
Descriptor that is allocated as type T is guaranteed to remain of type T for at least tstable
o I.e. cannot switch quickly from T1 to T2 by reallocationo Restrict or prevent memory reuse???o Generalization of existing technique:
• e.g. process-descriptors are statically allocated at system init, and are type-stable for lifetime of system
Advantageo T * ptr remains a pointer to an instance of T even if ptr
is changed asynchronously by another process Problem
o Unacceptable restriction on memory reuse for production systems
CS510 – Advanced Operating Systems
10
TSM aids Non-Blocking Synchronization
Type stability ensures safety of pointer dereferences.
o Without TSM, delete example is too big to fit on slide• And very expensive to execute
o Need to check for changes on each loop iterationo TSM makes NBS code simpler and faster
CS510 – Advanced Operating Systems
11
Other benefits of NBS in Cache Kernel
Signals are only form of IPC in Cache Kernel NBS simplifies synchronization in signal handlers
o Makes it easier to write efficient signal-safe code
CS510 – Advanced Operating Systems
12
Contention Minimizing Data Structures (CMDS)
Localization-based structuring used in Cache kernel:
o Replication: Per-processor structures (ie. run queue)o Extra level of read-mostly hierarchy (ie. hash tables
with per bucket synchronization)o Cache block alignment of descriptors
Well-known techniques used in other systems (Tornado, Synthesis, …)
CS510 – Advanced Operating Systems
13
Benefits of CMDS
Minimizes logical and physical contentiono Minimizes (memory) consistency overheado Minimizes false sharing
Reduces lock conflicts/convoys in lock-based system
Reduces synchronization contention with NBSo fewer retries from conflict at point of DCAS
CMDS is good for locks, cache consistency and NBS!
o NBS needs CMDS
CS510 – Advanced Operating Systems
14
Minimizing the Window of Inconsistency
Another general strategy to improve performance
1) Delay writes, and group together at end2) Find intermediate consistent states3) Keep log to allow undo/backout
Small W of I reduces probability of leaving inconsistent data structures after failure.
CS510 – Advanced Operating Systems
15
Minimizing the Window of Inconsistency
Preemption-safe: Easy to back-out of a critical section
Reduces lock hold time, and thus, contention (in lock-based systems)
Less probability of failure leaving inconsistency
Good for system design whether you use blocking or non-blocking synchronization!
CS510 – Advanced Operating Systems
16
Does NBS minimize the window of inconsistency?
What is the relationship between window of inconsistency and probability of conflict (contention)?
CS510 – Advanced Operating Systems
17
Priority Inversion Issues
NBS allows synchronization to be subordinate to scheduling
o It avoids priority inversion problem of lockso Also, page fault,I/O, and other blocking
Highest priority process always makes progress
CS510 – Advanced Operating Systems
18
Why NBS is good for OS structure
Fail-stop safe: OS class libraries tolerant of user threads being terminated.
o Most Cache Kernel OS functionality implemented in user-mode class libraries.
Portable: same code on uniprocessor, multiprocessor, and signal handlers.
Deadlock free (almost)
NBS supports class library implementation of OS functions
CS510 – Advanced Operating Systems
19
Implementing non-blocking Synchronization
Basic approacho Read version numbero Rely on TSM for type safety of pointerso Increment version number and check with every
modification (abort if changed).
Straightforward transformation from lockingo … so long as you have DCAS and can tolerate TSMo Replace acquire with read of version numbero Replace release with DCAS …
CS510 – Advanced Operating Systems
20
Implementing non-blocking Synchronization (cont.)
Variants of the basic approach:o N reads, 1 write: No backouto 2 reads, 2 writes: No version number
Optimization:o Cache based advisory locking avoids contention (&
“useless parallelism”)o Uses Cload instruction
In the Cache kernel, every case of synchronization falls into the special cases
CS510 – Advanced Operating Systems
21
Complexity and Correctness
DCAS reduces size of algorithms (lines of code) by 75% compared to CAS, for linked lists, queues and stacks
o Special case used of DCAS reduce complexity further
Relatively straightforward transformation from locking
o Similar code size
CS510 – Advanced Operating Systems
22
Performance Issues
Simulation-based study With non-preemptive scheduling:
o DCAS-based NBS almost as fast as spin-lockso CAS-based NBS slower
With preemptive scheduling:o DCAS and CAS-based NBS better than spin-locks
CS510 – Advanced Operating Systems
23
Conclusions
“Good OS structure” can support Non-blocking synchronization
o Type-Stable Mem. Mgmt (TSM)o Data Structures that Minimize Contention (CMDS)o Minimizing the window of inconsistency
Non-blocking synchronization can support convenient OS structure
o Avoids deadlock; allows signals as sole IPCo Fault tolerant; enables functionality to be moved from kernel.o Performance; isolates scheduling from synch.
Strong synergy between non-blocking synch & good OS structure.
CS510 – Advanced Operating Systems
24
Advantages of Non-blocking Synchronization
Non-blocking:o Signal handlers don’t deadlock
Portability:o same code on uni and multiprocessors and signal
handlers Performance:
o Minimizes interference between synchronization and process scheduling
Recovery:o Insulation from process failures
So why isn’t it universally deployed?
CS510 – Advanced Operating Systems
25
Obstacles to deployment
Complexity:o confusing to design and write efficient algorithms
Correctness:o hard to convince that there are no subtle bugs
Performance:o increased contention, high overhead
Unfamiliarity and insufficient system support