transactional memoryajayk/tmslide.pdf · 2013. 4. 25. · transactional memory optimistic against...
TRANSCRIPT
Transactional Memory
Yaohua Li and Siming Chen
Yaohua Li and Siming Chen Transactional Memory 1 / 41
Background
Processor hits physical limit on transistor density
Cannot simply put more transistors to get more performance
Multi-processor and multi-core systems are now normal
Full utilization via threaded execution
Synchronization is needed to access shared memory
Otherwise program state may become corrupted due to interleavingaccesses
Mutual exclusion locks have been widely used for synchronization
Yaohua Li and Siming Chen Transactional Memory 2 / 41
Mutual Exclusion Locks
Popular for its simplicity and support by general-purpose computers
Only the thread owning lock can access shared object
Difficult to use and easily causes many problems, if number of locksbecomes large
Deadlock/Livelock if two or more threads need to obtain multiplelocks, they may block each other indefinitely or comeinto a situation that never exits
Priority Inversion if low-priority task obtains a lock, it could block ahigh-priority task which also wants this lock
Convoy if a thread owning a lock is de-scheduled from running,then other threads are prevented from making progress
Effort for perfect locking can be huge
Race condition can raise, if system run without certain locks, leadingto errors and vulnerabilities
Yaohua Li and Siming Chen Transactional Memory 3 / 41
Transactional Memory
What is it?
A mechanism to tackle parallel-programming by abstracting concurrentaccess to shared memory.
With TM,
Multiple threads can simultaneously try to access shared memory inan atomic way.
Atomic transaction: all the accesses of a thread succeed or none
1 atomic {2 hist[array[i][j]]++;
3 }
Code of atomic region that computes the histogram of an array
Yaohua Li and Siming Chen Transactional Memory 4 / 41
Blocking
For highly scalable network servers, non-blocking I/O APIs aredesired
Server is notified when a channel is ready for more input or outputoperationsFire an API call, do something else, and check its status laterDemand-driven: input and output processing only occur when inputdata or output space are available
Non-blocking synchronization algorithms
Does not require mutual exclusion locks to achieve safe concurrencyTry to keep threads (CPUs) from busy waiting for locksTransactional memory is one member
Yaohua Li and Siming Chen Transactional Memory 5 / 41
Transaction
Transaction
Transaction is composed of one or more revocable, temporary steps, whichare committed on the whole with a single atomic command, and no stephas visible impact until the commit phase completes.
Yaohua Li and Siming Chen Transactional Memory 6 / 41
Transaction
ACID Properties
Atomicity the transaction will complete in its entirety or not at all
Consistency all modifications preserve the underlying structure of theobject being modified in an consistent way
Isolation operations may appear to have executed in isolation, in theirentirety. No partial modification is perceived.
Durability once committed the changes persist, or once aborted noresidue of partial transaction will remain
In regard to transactional memory, we are more interested in amoticityand isolation.
Yaohua Li and Siming Chen Transactional Memory 7 / 41
Comparison
Conflict violation of a temporal order, when performing memoryaccess
Transactional Memory
Optimistic against conflict
Abandon work of one ofconflicting transactions
Mutual Exclusion Lock
Pessimistic against conflict
Prevent conflicts fromhappening
Since actual conflicts are rare in many programs, the optimistictransactional memory makes more sense.
Yaohua Li and Siming Chen Transactional Memory 8 / 41
Benefits of Transactional Memory
TM provides
Higher-level abstraction
Better trade-off between scaling and implementation effort
Inherently deadlock free since no lock used
Failure atomicity – all or none
Yaohua Li and Siming Chen Transactional Memory 9 / 41
Synchronization Levels
Hierarchy of synchronization levels within non-blocking algorithms
Obstruction-freedom guarantees that a thread operating on a shared datastructure will complete, if eventually executed in isolation. Itmeans deadlock will not occur, but no guard against livelock.
Lock-freedom denies livelock, i.e. guarantees that all threads of executionwill make progress, are lock-free, and all lock-free algorithmsare obstruction-free. It does not guarantee that individualthreads may not starve.
Wait-freedom guarantees that any thread can complete any operation in afinite number of steps, and all wait-free algorithms are alsolock-free.
Yaohua Li and Siming Chen Transactional Memory 10 / 41
Hardware Support for TM
No intrinsic support for transactional algorithms in most machines
Support instruction for basic atomic operations
CompareAndSwap is one such instruction commonly found inmodern processors
Yaohua Li and Siming Chen Transactional Memory 11 / 41
CompareAndSwap
CompareAndSwap, CAS
Set a memory location to a new value, if it currently contains a specifiedexpected value
CompareAndSwap (a:WordAddress, old:Word, new:Word): Bool
if *a = old
then *a ← new
return True
else return False
Executed in one machine instruction, providing atomicity
Building block for create arbitrarily complicated non-blockingalgorithms
Yaohua Li and Siming Chen Transactional Memory 12 / 41
More Instructions
LoadLinked
Load a value from memory and “lock” it
LoadLinked (r :Register, a :WordAddress)
r ← *a
Linked(a ) ← True
StoreConditional
Store succeeds only if the memory location read in the LoadLinked stephas not been modified by some other memory operation
StoreConditional (r :Register, a :WordAddress)
if Linked(a ) = True
then *a ← r
r ← 1
else r ← 0Yaohua Li and Siming Chen Transactional Memory 13 / 41
Example CompareAndSwap
Obstruction-free double-ended queue implementation.
Push and Pop operations from both sides
Look into only operations on right side
Implemented with CompareAndSwap
Yaohua Li and Siming Chen Transactional Memory 14 / 41
RightPush Procedure
1 type element = val: valtype; ctr: int
2 RightPush(A, value )
3 while True
4 do k ← Oracle(A, right )
5 prev ← Ak−1 // author cited algorithm wrong here
6 cur ← Ak
7 if prev.val 6= Endr ∧ cur.val = Endr8 then if k = Max + 1
9 then return Full
10 if CAS(&Ak−1, prev , 〈prev.val, prev.ctr + 1〉)11 then if CAS(&Ak,cur ,〈value, cur.ctr + 1〉)12 then return OK
The Oracle procedure guesses where in the queue the left or right end ofthe queue is, and returns that index.
Yaohua Li and Siming Chen Transactional Memory 15 / 41
RightPop Procedure
1 type element = val: valtype; ctr: int
2 RightPop(A)
3 while True
4 do k ← Oracle(A, right )
5 cur ← Ak−1
6 next ← Ak
7 if cur.val 6= Endr ∧ next.val = Endr8 then if cur.val = Endl ∧ Ak−1 = cur
9 then return Empty
10 if CAS(&Ak, next , 〈Endr, next.ctr + 1〉)11 then if CAS(&Ak−1,cur ,〈Endr, cur.ctr + 1〉)12 then return cur.val
The Oracle procedure guesses where in the queue the let or right end ofthe queue is, and returns that index.
Yaohua Li and Siming Chen Transactional Memory 16 / 41
Implementations
Hardware Transactional Memory (HTM)
Software Transactional Memory (STM)
Hybrid Transactional Memory (HyTM)
Hardware-assisted STM (HaSTM)
HyTM Supports HTM but falls back on STM transactionswhen hardware resources are exceeded.
HaSTM Combines STM with new architectural support toaccelerate parts of the STM’s implementation.
Yaohua Li and Siming Chen Transactional Memory 17 / 41
Hardware transactional memory
Minimalist approaches
ISA additions Complement the instruction set architecture (ISA) witha small set of new instructions.
Cache/Buffer modifications Modify the cache consistency protocols
Alternates
Require minimal or no ISA support or cachemodifications.
Yaohua Li and Siming Chen Transactional Memory 18 / 41
API design: ISA additions
STR/ETR: Start/end transaction
TLD/TST: Transactional read/write
ABR/VLD: Abort/validation of a transaction
ABR Select a victim transaction for aborting under programcontrol
VLD Can validate a transaction and catch a conflict early
Yaohua Li and Siming Chen Transactional Memory 19 / 41
Data versioning and conflicts: cache/buffer modifications
Work at the word or cache line level
Transactional loads and stores in a separate transactional cache or inconventional data caches with transactional support
Keep transactions’ speculative state in the data cache or in ahardware buffer area
Rely on extending cache coherence protocols, such as MESI (modified,exclusive, shared, invalid) to detect conflicts and enforce atomicity
Yaohua Li and Siming Chen Transactional Memory 20 / 41
One design of cache modifications by Herlihy and Moss
Keeps the transactions read set and write set in the data cache
Hardware extension: two extra bits needed per cache line
Two bits Indicate whether the line is to be discarded on commit(for lines holding unmodified data) or on abort (forspeculatively modified lines).
Protocol extension: whether the version is to be kept or dropped.
Conflict A load has read invalid data and the associatedtransaction must abort
On abort, the write set of the aborting transaction (associated withthe tentative store instructions) is dropped
On commit, the version of the original values before the storeinstructions are dropped, and the transactions speculative stores arecommitted to memory
Yaohua Li and Siming Chen Transactional Memory 21 / 41
One variant of the design
The system keeps the original state in main memory
Keeps the speculative state in the data cache
One bit added per cache line for this design
The bit is set when a transaction accesses the line
Upon commit and abort, the bit is cleared
On abort, the modified lines with this bit set are also invalidated
Yaohua Li and Siming Chen Transactional Memory 22 / 41
Software transactional memory
First use By Shavit and Touitou in 1995
Basic case Non-nesting transactions that make updates to sharedmemory within a single multithreaded process
Main problems an STM must tackle
Must provide separate per-thread views of the heapMust provide a mechanism for detecting and resolvingconflicts between transactions
Yaohua Li and Siming Chen Transactional Memory 23 / 41
STM API design: Managing transactional state
Mechanism Allows a transaction to see its own writes as it continues torun and allows memory updates to be discarded if thetransaction ultimately aborts
Data organization in memory One approach separates transactional dataand ordinary data, introducing a distinct memory format fortransactional objects; An alternative approach allows data toretain its ordinary structure in memory, and the STM usesseparate structures to maintain its own metadata.
Object-based STM (OSTM) Uses the object header to track whichtransactions are currently accessing the object. Theprogramming API provides operations to open an objectheader, returning a reference to a copy of the objects bodyfor the transaction to use
Bartok STM Makes no low-level distinction between ordinary andtransactional data
Yaohua Li and Siming Chen Transactional Memory 24 / 41
OSTM API design
o body ∗ OpenForReading(tx ∗ tx , o header ∗ o);
o body ∗ OpenForWriting(tx ∗ tx , o header ∗ o);
Open-ForWriting Returns a private shadow copy of the object body
Open-ForReading only Must not update object bodies
Pointer-based data structures Must always add an extra level ofindirection by storing references to object headers ratherthan direct references to object bodies
Yaohua Li and Siming Chen Transactional Memory 25 / 41
OSTM API design
Read/Write set Maintained by the OSTM runtime system
Abort Discards any shadow copies that have been created fortransaction
Commit 1) Atomically checks that no conflicting transaction hasupdated objects in the read set or the write set2) Updates the object headers for objects in the write set,thus publishes the private shadow copies as the objects newcontents.
Yaohua Li and Siming Chen Transactional Memory 26 / 41
Bartok STM API design
Metadata Holded by STM in separate structures for concurrencycontrol
TMW Transactional Metadata Word, mapped by STM from aword’s address to manage that data
TMW API Includes functions to open the TMWs for the data that itwill read from or write to
void OpenForReading(tx *tx, tmw *t);
void OpenForWriting(tx *tx, tmw *t);
Read/Write API
word STMRead(tx *tx, word *a);
void STMWrite(tx *tx, word *a, word d);
Yaohua Li and Siming Chen Transactional Memory 27 / 41
Bartok STM API design: Compiler’s Translation
1 atomic {2 hist[index]++;
3 }
After compiler’s translation
1 atomic {2 ...
3 OpenForReading(tr, TMW_FOR(index));
4 OpenForWriting(tr, TMW_FOR(hist));
5 int *addr = &hist[STMRead(tr, &index)];
6 STMWrite(tr, addr, STMRead(tr, addr) + 1);
7 ...
8 }The atomic block itself must be translated into calls to library functions tostart and commit transactions and to reexecute the transaction ifthe commit fails.
Yaohua Li and Siming Chen Transactional Memory 28 / 41
Bartok STM API implementation
Broadly, there are two ways of managing tentative updates.
Buffered updates Keeps a private shadow copy of all the memory words itupdates. Calls to STMRead must consult the shadow copiesso that they will see earlier writes by the same transaction.Hashing can accelerate this look-up (mapping an address toa slot in the current transactions shadow table to avoidsearching it).
In-place updates STMWrite directly updates the heap so that calls toSTMRead will see earlier updates without needing to searcha table. In this case, STMWrite must maintain an undo logof all values that it overwrites, so that the transaction canroll back its changes if it aborts.
Yaohua Li and Siming Chen Transactional Memory 29 / 41
Detecting and resolving conflicts in STM
Blocking Objects locking, which acquires locks as a transactionexecutes and holds them until it commits. (lock inshared-read mode)
Disadvantage Scales poorly on multiprocessor hardware because itintroduces contention in the memory hierarchy.
Nonblocking OSTM performs an atomic multiword update across theheader contents of the objects in the read and write sets. Itchecks that there has been no update to objects in the readset and updates the object headers in the transactions writeset to publish the transactions shadow copies.
Advantage This design avoids read locks; the transactions read set isvalidated purely by memory read operations.
Disadvantage Complicated in terms of both software engineering and thenumber of atomic compare-and-swap operations used atruntime.
Yaohua Li and Siming Chen Transactional Memory 30 / 41
Hybrid approach in detecting and resolving conflicts
Hybrid approach Combines optimistic and pessimistic schemes by usingversioned mutual-exclusion locks, which support normalmutual-exclusion semantics and provide access to a versionnumber counting the number of times the lock has beenacquired and released.
The design uses mutual exclusion for pessimistic concurrencycontrol to grant write access to the data the lock protects.The version number allows optimistic concurrency control forread access. That is, a transaction records the versionnumber before it first reads from an object and then, atcommit time, checks that the version number is unchanged,meaning that no one has updated the object concurrentlywith the transaction.
Yaohua Li and Siming Chen Transactional Memory 31 / 41
Procedure of Hybrid approach
1 write {2 acquire lock to write the object;
3 update the lock version number;
4 write access;
5 release write lock;
6 }1 read {2 record the object version number as v1;
3 read access;
4 record the object version number as v2;
5 if (v1 == v2) {6 commit;
7 }8 else
9 abort;
10 }Yaohua Li and Siming Chen Transactional Memory 32 / 41
Challenges
Open/Closed nesting transactions
I/O
TM mixed with programming models, OpenMP or MPI
Yaohua Li and Siming Chen Transactional Memory 33 / 41
Open and closed nesting
Closed nesting Either all or none of the transactions in a nested regioncommit
Open nesting When an inner transaction commits, its effects becomevisible for all threads in the system.
Flattening model Includes all nested transactions in the outermosttransactionHTM systems must use counters to implement flattening(increment for STR, decrement for ETR, commit when zero)On conflict, must roll back to the outermost transaction
Alternate Allow each nested transaction to have its own read and writesets
Open-nested transactions increase the programmers burden.
Compensating actionsComplex and the programmer must have an expert grasp of the codessemantics.
Yaohua Li and Siming Chen Transactional Memory 34 / 41
Open and closed nesting challenges
Hardware complexity for flattening model while limiting concurrencyand performance
Proposed open-nested transaction increase complexity forprogrammers (commit/abort handler)
Yaohua Li and Siming Chen Transactional Memory 35 / 41
Original code without nesting
1 atomic {2 someprocess(shared_data);
3 work=getwork(workqueue);
4 process(work, shared_data);
5 }6
7 node *getWork(workqueue_t &workqueue)
8 {9 node *work;
10 work = workqueue.head;
11 if (work != NULL)
12 workqueue.head = work->next;
13 return work;
14 }
Yaohua Li and Siming Chen Transactional Memory 36 / 41
Code with closed-nested transactions
1 atomic {2 someprocess(shared_data);
3 atomic {4 work=getwork(workqueue);
5 }6 process(work, shared_data);
7 }
Yaohua Li and Siming Chen Transactional Memory 37 / 41
Code with open-nested transactions
1 atomic {2 someprocess(shared_data);
3 atomic_open {4 work = getwork(workqueue);
5 commit {6 free(work);
7 }8 abort {9 insertwork(workqueue, work);
10 }11 }12 process(work, shared_data);
13 }Commit executes when the outermost atomic block commits,free(work)Abort executes when a surrounding transaction aborts,insertwork(workqueue,work)
Yaohua Li and Siming Chen Transactional Memory 38 / 41
I/O challenges
Example Suppose that inside a transaction, a system call attempts tooutput a character to the terminal
Solution 1 Execute the system call immediately
This transaction aborted later?
Solution 2 Defer the I/O operation until commit
Real-time systems?
Solution 3 Try to forbid I/O within transactions
Yaohua Li and Siming Chen Transactional Memory 39 / 41
Final approach to I/O challenges
A final approach to solving the I/O problem is to categorize inputs andoutputs according to their abortive properties.
Undoable If its effects can be rolled back
Challenge Programmers must be aware of types of I/O operations thatdon’t make sense to transparently perform as part of a singleatomic transaction, for instance, prompting the user forinput and then receiving and acting on the input.
Yaohua Li and Siming Chen Transactional Memory 40 / 41
Thank You
Thank You!
Yaohua Li and Siming Chen Transactional Memory 41 / 41