transactional- memory real time systems leeor peled, advanced topics 049011 technion, december 2014
TRANSCRIPT
![Page 1: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/1.jpg)
Transactional- Memory Real Time SystemsLeeor Peled,
Advanced topics 049011
Technion, December 2014
![Page 2: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/2.jpg)
Lock-freedom
• Shared data that does not require mutual exclusion.– Avoid common problems as deadlocks,
livelocks, priority inversion, convoying, fail-tolerance, async signal safety
– Allow interruption/preemption without blocking the objects being operated upon.
• LF Algorithms vs LF data structures
![Page 3: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/3.jpg)
Lock-Free
Wait-Free
Wait-Free bounded
Synchronization Paradigms
• Classification:– Blocking
• Blocking• Starvation-Free
– Obstruction-Free– Lock-Free – Wait-Free
• Wait-Free• Wait-Free Bounded• Wait-Free Population Oblivious
Wait-Free population oblivious
![Page 4: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/4.jpg)
Synchronization for lawyers
• Starvation-Free : As long as one thread is in the critical section, then some other thread that wants to enter in the critical section will eventually succeed (even if the thread in the critical section has halted).
• Obstruction-Free: A function is Obstruction-Free if, from any point after which it executes in isolation, if finishes in a finite number of steps.
• Lock-Free: A method is Lock-Free if it guarantees that infinitely often some thread calling this method finishes in a finite number of steps.
• Wait-Free: A method is Wait-Free if it guarantees that every call finishes its execution in a finite number of steps.
• Wait-Free Bounded: A method is Wait-Free Bounded if it guarantees that every call finishes its execution in a finite and bounded number of steps. This bound may depend on the number of threads.
• Wait-Free Population Oblivious: A Wait-Free method whose performance does not depend on the number of active threads.
•
Real Time approved
![Page 5: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/5.jpg)
Synchronization Paradigms (2)
• Are lock-free algorithms completely useless in RT context?– Bounded number of retries in priority-based
systems (Anderson, ’97)• Hard-RT scheduler based on lock-free objects
often incurs less overhead than wait-free implementation
– NonBlocking serialization for RT systems (Hohmuth & Härtig ‚‘01)• Implement linux kernel benchmarks with LF/WF
algorithms, demonstrating RT capabilities
![Page 6: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/6.jpg)
Alternative: Transactional Memory
• Originally proposed by Herlihy & Moss, ’93 – earlier idea by Knight, ’86
• HW concept based on cache coherency extension – Speculative work, writes are marked in cache and
can’t become external/visible until commit• Upon commit, allow snoops/WB• Upon abort – invalidate spec lines and rollback• Reads are also marked to monitor conflicts
![Page 7: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/7.jpg)
Example – deadlock prevention
• consider implementations of move(A,B, elem)– moves a single element from data structure A to B
• Drawbacks? Think of a linked-list
Lock ALock BA.remove(elem)B.insert(elem)Unlock BUnlock A
atomic { A.remove(elem) B.insert(elem)}
Non TM TM
![Page 8: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/8.jpg)
Overflow…
Way 0 Way 1 Way 2 Way 3
store 0,[a]TX_begin store 1,[a] store 1,[b] store 1,[c] store 1,[d] store 1,[e]TX_end
[a], 1, w [b], 1, w [c], 1, w [d], 1, w[a], 0, M
4-way L1 cache
[e], 1
What happens if a write hits a spec/non-spec line? Other resources are also limited
• Assume [a]..[e] all map to the same L1 set– Limited capacity– Worse - non determinism
![Page 9: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/9.jpg)
Software Transactional Memory
• Proposed by Shavit and Touitou (‘95)– Manage data structure through a SW
intermediate layer– Log all reads/writes to track conflicts
• Enhanced in TL2– Rely on versioned clock for commits
• Standalone approach or temporary solution until HW catches up?
![Page 10: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/10.jpg)
TM flavors
• TM (Herlihy, Moss, ‘93) - original design, best effort• SLE (Rajwar, Goodman, ’01) - simplify interface: avoid locks, no TM ISA required• LTM (Ananian, ’03) - physical memory spilling by HW• UTM (“) - virtual memory, context switch support, very heavy (virtualizes each line)• VTM (Rajwar, Herlihy, ’05) – another unbounded flavor, virtualizes Txs like virt-mem• HyTM (Moir, Sun Labs, ’05) - attempt HTM, fall back on STM. Special consideration
to syncing between instances of both types.• DSTM (Koomar) - similar to HyTM (although both are trying hard to deny it)• TL2 (Dice, Shavit ’06) – another hybrid, very popular as baseline for others• PhTM (Lev, ’07) – another hybrid, no simultaneous HW/SW Transactions• USTM (Baugh, ’08) - another hybrid - user fault-on STM, with unbounded HTM based
on HW memory protection • TLE (Dice, ’08) – TM version of SLE• TTM, LogTM, etc (Moore)
Bottom line: Most of the above are still best-effort HTMs – no success (forward progress) guaranteed, some level of SW support required
![Page 11: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/11.jpg)
HTM: Industry Trends
• Sun Microsystems: Rock CPU– Feat. Hybrid-TM and lots of other goodies such as spec-lookahead, OOO
retirement, and a built in desk warmer (250W!). Allows mix of Tx and non-Tx code inside Tx boundaries, but retains TSO.
– R.I.P as of May 2010
• Azul: Vega 2/3 - “Java Compute Appliance (JCA)”. – Release 2007/8. RISC, in order, CMP (48/54 cores per die)– JVM oriented, >100k threads– Simple HTM, no regs rollbacks (rely on SW), no STM fallback
• AMD: Advanced Synchronization Facility (ASF)– Spec released on 2009. ISA includes Speculate/commit, locked-mov– Very resource constrained (4 atomic lines), flat nesting, also allows mix of Tx
and non-Tx code inside tx boundaries, but may break x86 mem consistency.
• Intel: – TM compiler with HW support (HASTM based on RSM)– TSX on Haswell! Oops, sorry - only as of HSW-EX due to errata
Sun: http://labs.oracle.com/scalable/pubs/ASPLOS2006.pdf Azul: http://sss.cs.purdue.edu/projects/tm/tmw2010//talks/Click-2010_TMW.pdfAMD: http://www.amd64.org/fileadmin/user_upload/pub/transact-2010-asfooo.pdf http://www-ali.cs.umass.edu/~moss/transact-2010/public-papers/08.pdf http://llvm.org/pubs/2010-04-EUROSYS-DresdenTM.pdf
![Page 12: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/12.jpg)
RTTM (Schöeberl ‘10)- premise
• “RTTM brings the benefits of transactional memories into the real-time systems world”.
• Paper contributions:– Design of a time-predictable hardware transactional memory– Analysis of the worst-case number of retries in a periodic thread
model– suggestions for analysis to reduce the number of possible
conflicting transactions– First evaluation of RTTM on a simulation within a Java based
CMP.
• Optimized for WCET, not avg performance• Implemented on Java optimized processor(JOP)
![Page 13: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/13.jpg)
Java optimized processor (Schoeberl ‘07)
• Unlike JVM, JOP is "a RISC stack architecture”
![Page 14: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/14.jpg)
WCET-friendly CPU
• Time-predictable computer Architecture, Schoeberl ‘08– A collection of simplifications for CPU design to reduce the
bounds on WCET, at small penalty to ACET/BCET– Provides some reasoning (but no concrete proof)
![Page 15: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/15.jpg)
WCET-friendly CPU - 2
• Time Division Multiple Access (TDMA) memory access scheduling (Pitter and Schoeberl, ’09, Rosen ‘07)
• Memory access allows a slot per core– Transactions may only start during the access window– Gap allows completion (depends on memory access
time)
![Page 16: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/16.jpg)
Memory access WCET
• Fixed priority (WCET for high prio req)
– • Fair priority
– • TDMA
![Page 17: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/17.jpg)
OS scheduling
• “Real Time Specification for Java”– RT threads are assigned a deadline– Scheduler is preemptive based on priority
• Same priority behaves like fifo
– Scheduler guarantees all threads hit their deadline• Estimation on blocking boundaries
![Page 18: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/18.jpg)
RTTM - proposal
• Transaction buffering - fully assoc.• Read set caching (tags only)• Word granularity (no false conflicts)• Commit in bursts
– All other cores listen (conflict checks)– Protected by global lock (“commit token”)
• (what is the overhead for short transactions?)
• No aborts on overflow! Grab the commit token on the fly
• On true abort – mark as zombie transaction
![Page 19: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/19.jpg)
RTTM Analysis
• Assume n threads, each executes a single atomic region once – - Thread period – - execution time (cost)– - atomic region time– r - max retries
• WCET assumes:– Always conflict (all atomic regions use the same var)– worst phase - all threads attempt to enter the atomic
region simultaneously
![Page 20: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/20.jpg)
RTTM Analysis (2)
• Single transaction per thread per period– : time of transaction resolution (all threads)
• Period per transactions (same thread)
• Max number of retries
– Assuming )•
![Page 21: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/21.jpg)
Preliminary analysis
• Possible directions– Context-sensitive points-to analysis– Static detection of race conditions– Simulation-based analysis of buffer overflows
• RTTM’s Analysis was based on WALA analyzer (open source from IBM, 06’)
![Page 22: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/22.jpg)
Experiment methodology
• Implemented over JOP simulated on JVM• 3 tasks
– Producer enqueues into a buffer– Consumer removes elements from its buffer– Mover atomically moves elements between
• Buffer types– Standard Java vector– Bounded queue
![Page 23: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/23.jpg)
Results
![Page 24: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/24.jpg)
STM example (Fahmy, ‘09)
• EDF scheduling • Response time analysis
– Predicted vs simulated w/ random alignments (> 1)– Utilization: task time vs period (< 1)
![Page 25: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/25.jpg)
Bibliography
• Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. ISCA ‘93.
• J.H. Anderson, S. Ramamurthy, K. Jeffay. Real-time computing with lock-free shared objects. ACM ToCS, May ‘97
• M. Hohmuth H. Härtig, Pragmatic nonblocking synchronization for real-time systems, USENIX ‘01
• M. Schoeberl, F. Brandner, J. Vitek, RTTM: Real-Time Transactional Memory, SAC ’10
• M. Schoeberl , A Java processor architecture for embedded real-time systems, Journal of Systems Architecture, volume 54, Jan 2008, 265-286
• M.Schoeberl. Time-predictable computer architecture. EURASIP J. Embedded Syst. 2009, Article 2 (January 2009)
• C. Pitter and M. Schoeberl. A real-time Java chip-multiprocessor. Trans. on Embedded Computing Sys., accepted for publication 2009.
• Manson (‘05) – Preemptible atomic regions (uni-processor)
![Page 26: Transactional- Memory Real Time Systems Leeor Peled, Advanced topics 049011 Technion, December 2014](https://reader030.vdocuments.us/reader030/viewer/2022032722/56649cdb5503460f949a6511/html5/thumbnails/26.jpg)
Memory ordering rules
Type Alpha ARMv7PA-RISC
POWERSPARC RMO
SPARC PSO
SPARC TSO
x86x86 oostore
AMD64 IA-64 zSeries
Loads reordered after loads
Y Y Y Y Y Y Y
Loads reordered after stores
Y Y Y Y Y Y Y
Stores reordered after stores
Y Y Y Y Y Y Y Y
Stores reordered after loads
Y Y Y Y Y Y Y Y Y Y Y Y
Atomic reordered with loads
Y Y Y Y Y
Atomic reordered with stores
Y Y Y Y Y Y
Dependent loads reordered
Y
Incoherent instruction cache pipeline
Y Y Y Y Y Y Y Y Y Y
Source: http://en.wikipedia.org/wiki/Memory_ordering