hardware transactional memory
DESCRIPTION
Hardware Transactional Memory. Shimin Chen (LBA Reading Group). Outline. Transaction Concept A simple HTM Common Case Transaction Behaviors HTM Research Directions Description of Papers Summary. Transaction. A finite sequence of instructions Atomicity: all or nothing - PowerPoint PPT PresentationTRANSCRIPT
Hardware Transactional Memory
Shimin Chen(LBA Reading Group)
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
Transaction A finite sequence of instructions Atomicity: all or nothing Serializability (Isolation): steps of one
transaction never appear to be interleaved with the steps of another.
A and B cannot be concurrent if ReadSet(A) WriteSet(B) , or WriteSet(A) ReadSet(B) , or WriteSet(A) WriteSet(B)
A simple HTMNew hardware mechanisms to checkpoint register state
Checkpoint register renaming table buffer transactional writes
in private cache record transactional read-set and write-set
R bit and W bit per cache line Or dedicated state buffer on the side
detect conflict leverage cache coherence protocol
resolve conflict e.g. requester wins
Simple HTM Operations TxBegin
Checkpoint register state Load/Store
Set state bits in cache; abort upon cache eviction Incoming coherence message
Check conflicts with state bits; abort if conflicted TxCommit
Flash clear state bits Abort
Flash invalidate write sets and read sets Restore register checkpoint
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
“The Common Case Transactional Memory Behavior of Multithreaded Programs”. Stanford Team (Kozyrakis, Olukotun, and their students: Chung, Chafi, Minh, McDonald, Carlstrom). HPCA 2006.
Studied 35 applications Java, C+Pthread, C+OpenMP,
Parallel Processing Macros Assume high level parallelism
structure remains the same: convert lock/unlock into begin/end etc.
Trace-based analysis
Non-blocking synchronization
ReadSet and WriteSet Size
For 95% of transactions, RS < 4KB, WS<1KB
Weighted by time: 52KB RS, 30KB WS needed for covering 80% time
(assuming 32B cache lines)
Nesting
Nesting distance could be high Partial rollback may be needed
Two-level of nests are common
Speculative Parallelization
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
Directions
Dealing with overflows Virtualizing HTM
Mixing HTM with STM Two code paths Use hardware mechanisms to speed
up STM
Terminology
Conflict Detection Eager: at coherence message Lazy: at commit time
Version Management Eager: save old version, update in
place Lazy: buffer updates
Conflict Resolution
Outline
Transaction Concept A simple HTM Common Case Transaction
Behaviors HTM Research Directions Description of Papers Summary
• “Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.
• “Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.
• “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001.
• “Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002.
• “Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004.
• “Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.
• “Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005.
• “LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.
• “Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.
• “Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.
• “Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.
• “Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.
• “Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.
• “Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006.
• “Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.
• “Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.
• “An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.
• “An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007.
• “Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.
Non-overflowed HTM
“Transactional Memory: Architectural Support for Lock-Free Data Structures.” Herlihy (DEC) & Moss (UMass). ISCA 1993.
First HTM paper Simple HTM like
Transactional cache along L1D Abort, roll-back: not fully automatic
HW discards transactional updates SW jumps back and retries transaction (w/ exp
backoffs)
Conflict detection: eager (coherence) Conflict resolution: requester aborts
“Multiple Reservations and the Oklahoma Update.” Stone, Stone, Heidelberger, Turek (IBM). IEEE Parallel & Distributed Technology. 1993.
Single reservation: LL-SC Multiple reservations: all or nothing,
transactions w/ read-modified-writes Oklahoma update (In a musical “Oklahoma!”, there is
a song titled “All er Nothin”) Simple HTM like
Batch updates and detection at commit time
“Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution.” Rajwar & Goodman. (Wisconsin). ISCA 2001. (SLE)
Idea: speculate lock-unlock critical section
while eliding locks using simple HTM fall back to locking upon conflicts &
overflows Novelty: recognizing lock and unlock
Lock: LL-SC with predictors Unlock: a store to restore value
changed by LL-SC
“Transactional Lock-Free Executionof Lock-Based Programs.” Rajwar & Goodman. (Wisonsin). ASPLOS 2002. (TLR)
SLE + resolve conflicts Timestamp
<# of commited TLR on the local cpu, cpu ID> Stall or Abort the younger transaction upon conflicts
Non-trivial addition to cache coherence protocol for avoiding deadlocks
“Transactional Memory Coherence and Consistency.” Stanford team. ISCA 2004. (TCC)
Conflict detection: lazy Novelty: propose to use
transactional memory to replace cache coherence Illusion of shared memory Batch communication like message
passing
“Bulk Disambiguation of Speculative Threads in Multiprocessors.” Ceze, Tuck, Cascaval, Torrellas. (UIUC). ISCA 2006.
Conflict Detection: lazy Use bloom filter signature to do batch
detection 2000 bit bloom filter, avg 70 read lines and 20
write lines per transaction
Virtualizing HTM
How?
Generally: save transaction states in virtual memory Read set, write set Or readers, writers per block in
memory Conflict detection needs to check this
structure Question: how to make it efficient?
“Unbounded Transactional Memory.” Ananian, Asanovic, Kuszmaul, Leiserson, Lie (MIT). HPCA 2005.
First paper on overflowed transactions UTM (“Unbounded TM”):
Idealized (very complicated) LTM (“Large TM”):
Lazy versioning Limitations: less than a time slice, no
migration, smaller than physical memory
“Virtualizing Transactional Memory.” Rajwar, Herlihy, Lai. (Intel & Brown). ISCA 2005. (VTM)
A fairly complete description Novelty:
XSW: transaction status word load/store entries point to XSW; can change transaction state with a single atomic
update Filter for conflict detection
Lazy versioning (buffer updates) Eager conflict detection
“LogTM: Log-based Transactional Memory.” Moore, Bobba, Moravan, Hill, Wood. (Wisconsin team). HPCA 2006.
Overflow handling Eager versioning: per-thread undo log
Update in place, save old values in log Favors commits
Eager conflict detection Cache has a single overflow bit Use directory to remember the transactional access
to a line even if the line is evicted from cache
“Architectural Semantics for Practical Transactional Memory.” Stanford team. ISCA 2006.
Provide support to call software callbacks
Commit, abort, violation Nested transactions
Flatterning: a violation rolls back to the beginning of the top-most transaction
Closed nesting: allow partial roll-backs
Open nesting: allow partial commits
“Supporting Nested Transactional Memory in LogTM.” Wisconsin team. ASPLOS 2006.
Undo log is organized as transaction log frames
(just like stack frames) LIFO
“Unbounded Page-Based Transactional Memory.” Chuang, Narayanasamy, Venkatesh, Sampson, Biesbrouck, Pokam, Colavin, Calder. (UCSD, ST Microelectronics, Microsoft). ASPLOS 2006.
Shadow page + home page Conflict detection: special cache for overflow
info before traversing memory structure
“Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory.” Blundell, Devietti, Lewis, Martin. (UPenn, VMware). ISCA 2007.
Making the fast case common: Permission-only cache Cache RW bits for overflowed cache lines
Making the uncommon case simple: Allow only a single overflowed transaction OneTM-serialized: stall all other Xactions OneTM-concurrent: allow other non-
overflowed xactions Each block in memory requires a RW bits +
transaction ID
“Performance Pathologies in Hardware Transactional Memory.” Wisconsin team. ISCA 2007.
Seven pathological scenarios that different HTMs may do poorly
Livelock cases, starvation, convoy, futile stalling for a xaction that eventually aborts
Enhances: Conflict resolution: back-offs, priorities Predicting writes in a transactions, so that one can
get ownership at reads
Combining HTM and STM
“Hybrid Transactional Memory.” Kumar, Chu, Hughes, Kundu, Nguyen. PPoPP 2006.
Enhance the Dynamic STM (Herlihy et al: wrap objects with indirection/replication)
HTM mode STM mode Tries HTM first
A trick for conflict detection between HTM and STM:
STM also starts a hardware xaction But only access a single state word transactionally Perform all other actions nontransactionally
“Tradeoffs in Transactional Memory Virtualization.” Stanford team. ASPLOS 2006. (XTM)
Two modes: all in hardware, all in software If HTM overflows, aborts it and runs it in
software mode Software mode:
Per-transaction page table Copy-on-firstaccess: check if read data is
not changed at commit Copy-on-write: buffer transactional writes
“Hybrid Transactional Memory.” Damron, Fedorova, Lev, Luchangco, Moir, Nussbaum. (Sun). ASPLOS 2006.
Compiler generates two code paths, choose at runtime:
STM HTM
Word-based Metadata access per memory operation
required even for HTM (to detect conflict with STM)
“An Effective Hybrid Transactional Memory System with Strong Isolation guarantees.” Stanford team. ISCA 2007.
SigTM: Enhance a STM system with hardware
signatures
“An Integrated Hardware-Software Approach to Flexible Transactional Memory.” Shriraman, Spear, Hossain, Marathe, Dwarkadas, Scott. (U Rochester). ISCA 2007. (RTM)
Two hardware mechanisms to improve a STM (RSTM) performance:
Alert-on-update: allow software callbacks for invalidation and eviction of selected cache lines
Programmable data isolation: control cache to hold transactional blocks
Summary
Simple HTM is nice Major complexity comes in
because of space and time limitations Logs, shadow pages, filters, caches,
etc. Combine HTM and STM