hybrid transactional memory

22
1 Hybrid Transactional Memory Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006

Upload: ulric-baker

Post on 03-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Hybrid Transactional Memory. Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006. Quick Background Review. A transaction is a sequence of operations that “as a whole” is performed atomically. Life cycle of a transaction: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hybrid Transactional Memory

1

Hybrid Transactional Memory

Reza SherafatProf. Cristiana Amza

University of Toronto

Dec 4, 2006

Page 2: Hybrid Transactional Memory

2

Quick Background Review• A transaction is a sequence of operations that “as a whole” is performed

atomically.

• Life cycle of a transaction:

– Initialization: start a transaction by storing the current state;

– Execution: Open objects for read/write; • Data modifications are hidden from others;

• Watch for conflicts;

– Termination: end the transaction• Successful completion (Commit):

Let other threads know about the changes were made; and modifications take effect; or

• Unsuccessful completion (Abort): Discard modifications

Page 3: Hybrid Transactional Memory

3

Outline• Motivations

• Hybrid Transactional Memory

• Implementation

• Evaluations

• Conclusions

Page 4: Hybrid Transactional Memory

4

Motivations• In parallel programs we must protect concurrent access to shared data.

• Locks are widely used; but several problems are associated with using locks:– Performance (speedup)

Overhead of locking (wait time, acquire, release)Granularity (hard to balance wait time, overhead)Over serialization

– Programming Hard for programmers to write and debugDeadlocks are hard to avoid

– Other problemsPriority inversionProblem when a process holding the lock crashes

Page 5: Hybrid Transactional Memory

5

Transactional Memory (TM)• Main idea: Non-blocking execution

– Execute each concurrent transaction speculatively;– Apply changes when transaction completed successfully.

• Non-conflicting access to shared objects within transactions is allowed:– Once conflict is detected, transaction rolls back and state is restored (abort);

• TM support is provided through an API:– Start a transaction– Abort/commit a transaction– Wrap objects in TM objects

• Properties of transactions:– Atomic: a transaction is like a single unit (all-or-nothing) – Serializable: concurrent Start a transaction t transactions are performed in some serial order– Obstruction-freedom: guarantees progress of one process in absence of contention– No deadlock

Page 6: Hybrid Transactional Memory

6

Conflicting Access to Shared Data• Conflicts in accessing shared data may result in data inconsistencies.

• Conflicts happen when an object that has been accessed by other transactions (read or write) is updated before others commit.

– Multiple readers are allowed– Only one writer is allowed at each time

• The system ensures that transactions that access data don’t conflict.If no conflicts occur, the transactions are serializable.

• Conflict resolution: once a conflict is detected, we can get a serializable execution by aborting all but one of the conflicting transactions.

• Speculative modifications of aborted transactions are discarded.Old values before starting the transaction become valid.

Page 7: Hybrid Transactional Memory

7

Hybrid TMEach approach should implement TM semantics:

Start transaction, open object, detect conflicts, abort, commit.

• Hardware-based approaches:– Bounded number of locations– Maintain versions in cache

→ Low overhead

• Software-based approaches:– Unbounded number of locations can be accessed within a transaction– Slow due to overhead of maintaining multiple copies– Potentially orders of magnitude

• Hybrid: Combines the benefits of both approaches– High performance (unless the transaction exceeds HW limits)– Support for unlimited transactional objects– Handles simultaneous data access from HW/SW modes

Page 8: Hybrid Transactional Memory

8

Implementations• Two modes for executing transactions: HW vs. SW.

• In general, HW mode is preferred (it is faster), unless we run out of resources.

• Naïve approach: the system has a universal mode of operation.

• A better approach: transactions have two modes to choose from.– Each transaction separately chooses the mode of operation when it starts.– Better performance and utilization of system resources

• Other policies may also be applied to chose the mode:If the transaction fails for a number of time (e.g., 3) then start in SW mode;

• Pure HW/SW implementations must be tailored such that they can coexist.– Objects may be accessed simultaneously in transactions in HW, SW modes.– Interoperability is a must.

Page 9: Hybrid Transactional Memory

9

Hardware TMA HW-TM scheme that can used for the Hybrid implementation that relies on the

standard cache coherence protocol and some additional components.

• Cache coherence protocol handles data consistencies across multiple processors:– Only one processor has permission to write to a cache line;– No processor can read a line that another processor has permission to write to.

• Additional components on each processor store speculative data and check for conflicts:

– ISA extensions• Instructions for: transactional begin, commit, abort, load/store, etc.

– Additional components on the processor chip (In parallel with the L1 cache)• Transactional buffer: old, • Transactional state table: state of the contexts (threads) running on the processor

• All memory accesses within a transaction are done transactionally.

Page 10: Hybrid Transactional Memory

10

HW-TM• Old field is keeps speculative values

• Transactional semantics:– Start transaction: Transactional state for

that context is set to SELECT, ALL.– Abort: Exception flag is set, clear

corresponding read/write bits, invalidates speculative written data

– Commit: Update the transactional state.– Detect conflicts: read/write bit vector

• If the exception flag is set, any attempt to commit or load/store by the transaction results in a trap that will be handled by the exception handler.

Question: How is abort implemented across multiple processors?CCP!

Page 11: Hybrid Transactional Memory

11

Quick Review of DSTM

Object Contents

Object Pointer

Object Contents

State PointerOldNew

State

Object Contents

State PointerOldNew

State

XValid Copy

Before accessing an object within a transaction

Modify

Page 12: Hybrid Transactional Memory

12

Software TM• Uses a locator similar to DSTM:

– Redirection and object copying.

• The locator also keeps track of the readers.

– As opposed to local hash tables to store the last data value in each read transaction.

– This helps early abort, and avoids validation when committing

• A locator consists of:– Valid field– Write state (one)– Read state (multiple)– Old/new objects– Object size

A locator object in Hybrid-TM

Page 13: Hybrid Transactional Memory

13

Putting Things Together• Transactions in HW may conflict with those of SW, and vice versa.

– Opening an object in HW:• [read the TMObject pointer transactionally] • Abort all conflicting HW/SW

– Opening an object in SW:• Create a state object, and load it transactionally• Abort conflicting HW/SW transactions

– Hardware aborts Hardware• A load/store (trans. by default) causes an abort

– Software aborts Hardware• When SW opens a TMObject, it assigns it to a new locator. Since the object is transactionally

read by the HW, the transaction is aborted.– Hardware aborts Software

• When HW opens a TMObject, it writes ABORTED to transaction state having this object– Software aborts Software

• Write ABORTED to the state from the reader/writer pointers.

Page 14: Hybrid Transactional Memory

14

Software aborts Hardware

Object Contents

Object Pointer

Object Contents

State PointerOldNew

State

Object Contents

State PointerOldNew

State

X

In the Software Mode Copy and Modify

In the Hardware Mode Modify in place

Thread 1: HW modeThread 2: HW mode

Thread 3: SW mode

Conflict detected by the threads in the hardware mode

Page 15: Hybrid Transactional Memory

15

Evaluations• Three microbenchmarks

– VR: Small critical section (overhead of starting/committing transactions)– HT: Simultaneous lookup operations (per object overhead of transactions)– GU: Course grained locking vs. transactional memory

• For each case two scenarios: Low and High Contention

• Compare four synchronization implementations– Lock– Pure Hardware Transactional Memory– Pure Software Transactional Memory– Hybrid Transactional Memory

Page 16: Hybrid Transactional Memory

17

Evaluations (Hybrid Execution)

• In all cases of hybrid execution, the ratio of SW/HW mode is very small.

• This is due to relatively (compared to size of transactional objects) large size of transactional buffer. (is this realistic?)

• Since in most transactions HW mode is used, this does not give a good view of the overhead associated with effects of slow SW mode.

Page 17: Hybrid Transactional Memory

18

Evaluations (VR)

• When # of processors grow, contention does not grow significantly– This is because transactions are too small (conflicts rarely happen)

Page 18: Hybrid Transactional Memory

19

Evaluations (HT)

• It is true that several lookup operations can be performed simultaneously, however those operations will be rolled back all together once a conflict with a writer occurs

– This seems to be significant for slightly long duration transactions – The lock performance is better.

• The paper claims similar behavior would be achieved by reader-writer locks;– I expect that would have a much better performance, since once underway concurrent operations will

not be undone

Page 19: Hybrid Transactional Memory

20

Evaluations (GU)

• Why does the execution time decreases in the lock implementation from GU-low to GU-high?

• It is usually inverse!– Do locks have back-offs?

Page 20: Hybrid Transactional Memory

21

Conclusions• Transactional memory outperforms the lock-based synchronization in most cases

• Hybrid Transactional Memory approach gives a good balance between scalability of SW and performance of HW

– Requires only modest hardware support (transactional buffer, state table)

– Within system limits: Good performance for most transactions

– Exceeding system limits: fallbacks to software mode when a transaction cannot complete within the hardware bounds

• More needs to be gone to ensure progress.

Page 21: Hybrid Transactional Memory

22

Questions?!

Page 22: Hybrid Transactional Memory

23

• Nested transaction?

• Additional limits for the HW:– Contexts

• Hybrid has limitations:– Uses transactional buffer

• I am not sure how the non-blocking mechanism is implemented across multiple processors.