transactional locking
DESCRIPTION
Transactional Locking. Nir Shavit Tel Aviv University Joint work with Dave Dice and Ori Shalev. object. object. Shared Memory. Concurrent Programming. How do we make the programmer’s life simple without slowing computation down to a halt?!. b. c. d. a. A FIFO Queue. Head. Tail. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/1.jpg)
Transactional Locking
Nir ShavitTel Aviv University
Joint work with Dave Dice and Ori Shalev
![Page 2: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/2.jpg)
object
object
Shared Memory
Concurrent Programming
How do we make the programmer’s life simple without slowing computation down to a halt?!
![Page 3: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/3.jpg)
A FIFO Queue
b c d
TailHead
a
Enqueue(d)Dequeue() => a
![Page 4: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/4.jpg)
A Concurrent FIFO Queuesynchronized{}
Object lock
b c d
TailHead
a
P: Dequeue() => a Q: Enqueue(d)
![Page 5: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/5.jpg)
Fine Grain Locks
b c d
TailHead
a
P: Dequeue() => a Q: Enqueue(d)
Better Performance, More Complex Code
Worry about deadlock, livelock…
![Page 6: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/6.jpg)
Lock-Free (JSR-166)
b c d
TailHead
a
P: Dequeue() => a Q: Enqueue(d)
Even Better Performance, Even More Complex Code
Worry about deadlock, livelock, subtle bugs, hard to modify…
![Page 7: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/7.jpg)
Transactional Memory [Herlihy-Moss]
b c d
TailHead
a
P: Dequeue() => a Q: Enqueue(d)
Don’t worry about deadlock, livelock, subtle bugs, etc…
Great Performance, Simple Code
![Page 8: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/8.jpg)
Transactional Memory [Herlihy-Moss]
b c d
TailHead
a
P: Dequeue() => a Q: Enqueue(d)
Don’t worry about deadlock, livelock, subtle bugs, etc…
b
TailHead
a
Great Performance, Simple Code
![Page 9: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/9.jpg)
TM: How Does It Work
synchronized{<sequence of instructions>}
atomic
Execute all synchronized instructions as an atomic transaction…
Simplicity of Global Lock with Granularity of Fine-Grained Implementation
![Page 10: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/10.jpg)
Hardware TM [Herlihy-Moss]
• Limitations: atomic{<~10-20-30?…but not ~1000 instructions>}
• Machines will differ in their support
• When we build 1000 instruction transactions, it will not be for free…
![Page 11: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/11.jpg)
Software Transactional Memory
• Implement transactions in Software• All the flexibility of hardware…today• Ability to extend hardware when it is available
(Hybrid TM)• But there are problems:
– Performance?– Ease of programming (software engineering)?– Mechanical code transformation?
![Page 12: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/12.jpg)
The Breif History of STM19
93S
TM (S
havi
t,Tou
itou)
2003
DS
TM (H
erlih
y et
al)
2003
WS
TM (F
rase
r, H
arris
)
Lock-free
2003
OS
TM (F
rase
r, H
arris
)
2004
AS
TM (M
arat
he e
t al)
2004
T-M
onito
r (Ja
gann
atha
n…)
Obstruction-free Lock-based
2005
Lock
-OS
TM (E
nnal
s)
2004
Hyb
ridTM
(Moi
r)
2004
Met
a Tr
ans
(Her
lihy,
Sha
vit)
2005
McT
M (S
aha
et a
l)
2006
Ato
mJa
va (H
indm
an…
)
1997
Tran
s S
uppo
rt TM
(Moi
r)
2005
TL (D
ice,
Sha
vit))
![Page 13: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/13.jpg)
As Good As Fine Grained
Postulate (i.e. take it or leave it):
If we could implement fine-grained locking with the same simplicity of course grained, we would never think of building a transactional memory.
Implication:
Lets try to provide TMs that get as close as possible to hand-crafted fine-grained locking.
![Page 14: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/14.jpg)
Premise of Lock-based STMs
1. Memory Lifecycle: work with GC or any malloc/free
2. Transactification: allow mechanical transformation of sequential code
3. Performance: match fine grained
4. Safety: work on coherent state
Unfortunately: Hybrid, Ennals, Saha, AtomJava deliver only 2 and 3 (in some cases)…
![Page 15: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/15.jpg)
Transactional Locking
• TL2 Delivers all four properties• How ? - Unlike all prior algs: use
Commit time locking instead of Encounter order locking - Introduce Version Clock mechanism for validation
![Page 16: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/16.jpg)
TL Design Choices
Map
Array of Versioned-Write-LocksApplication
Memory
PS = Lock per Stripe (separate array of locks)
PO = Lock per Object(embedded in object)
V#
![Page 17: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/17.jpg)
Encounter Order Locking (Undo Log)
1. To Read: load lock + location2. Check unlocked add to Read-Set3. To Write: lock location, store value 4. Add old value to undo-set5. Validate read-set v#’s unchanged6. Release each lock with v#+1
V# 0 V# 0
V# 0
V# 0
V# 0
V# 0
V# 0
X V# 1
V# 0 Y V# 1
V# 0 V# 0
Mem Locks
V#+1 0
V#+1 0
V# 0
V# 0
V# 0
V#+1 0
V# 0
V# 0
V# 0
V# 0
V#+1 0
V# 0
X
Y
Quick read of values freshly written by the reading transaction
[Ennals,Hybrid,Saha,Harris,…]
![Page 18: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/18.jpg)
Commit Time Locking (Write Buff)
1. To Read: load lock + location2. Location in write-set? (Bloom Filter)3. Check unlocked add to Read-Set4. To Write: add value to write set5. Acquire Locks6. Validate read/write v#’s unchanged7. Release each lock with v#+1
V# 0 V# 0
V# 0
V# 0
V# 0
V# 0
V# 0
V# 0
V# 0 V# 0
V# 0 V# 0
Mem Locks
V#+1 0
V# 0
V# 0
Hold locks for very short duration
V# 1
V# 1
V# 1 X
Y
V#+1 0
V# 1 V#+1 0
V# 0
V#+1 0
V# 0
V# 0
V# 0
V# 0
V#+1 0
V# 0
X
Y
[TL,TL2]
![Page 19: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/19.jpg)
Why COM and not ENC?
1. Under low load they perform pretty much the same.
2. COM withstands high loads (small structures or high write %). ENC does not withstand high loads.
3. COM works seamlessly with Malloc/Free. ENC does not work with Malloc/Free.
![Page 20: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/20.jpg)
COM vs. ENC High Load
ENC
Hand
MCS
COM
Red-Black Tree 20% Delete 20% Update 60% Lookup
![Page 21: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/21.jpg)
COM vs. ENC Low Load
COMENC
Hand
MCS
Red-Black Tree 5% Delete 5% Update 90% Lookup
![Page 22: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/22.jpg)
COM: Works with Malloc/Free
PS Lock ArrayA
B
To free B from transactional space: 1. Wait till its lock is free. 2. Free(B)
B is never written inconsistently because any write is preceded by a validation while holding lock
V# VALIDATEX FAILSIF INCONSISTENT
![Page 23: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/23.jpg)
ENC: Fails with Malloc/Free
PS Lock ArrayA
B
Cannot free B from transactional space because undo-log means locations are written after every lock acquisition and before validation.
Possible solution: validate after every lock acquisition (yuck)
V# VALIDATEX
![Page 24: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/24.jpg)
Problem: Application Safety
1. All current lock based STMs work on inconsistent states.
2. They must introduce validation into user code at fixed intervals or loops, use traps, OS support,…
3. And still there are cases, however rare, where an error could occur in user code…
![Page 25: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/25.jpg)
Solution: TL2’s “Version Clock”
• Have one shared global version clock
• Incremented by (small subset of) writing transactions
• Read by all transactions
• Used to validate that state worked on is always consistent
Later: how we learned not to worry about contention and love the clock
![Page 26: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/26.jpg)
Version Clock: Read-Only COM Trans
1. RV VClock2. On Read: read lock, read mem,
read lock: check unlocked, unchanged, and v# <= RV
3. Commit.
87 0 87 0
34 0
88 0
V# 0
44 0
V# 0
34 0
99 0 99 0
50 0 50 0
Mem Locks
Reads form a snapshot of memory.No read set!
100 VClock
87 0
34 0
99 0
50 0
87 0
34 0
88 0
V# 0
44 0
V# 0
99 0
50 0
![Page 27: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/27.jpg)
Version Clock: Writing COM Trans
1. RV VClock2. On Read/Write: check
unlocked and v# <= RV then add to Read/Write-Set
3. Acquire Locks4. WV = F&I(VClock)5. Validate each v# <= RV6. Release locks with v# WV
Reads+Inc+Writes=Linearizable
100 VClock
87 0 87 0
34 0
88 0
44 0
V# 0
34 0
99 0 99 0
50 0 50 0
Mem Locks
87 0
34 0
99 0
50 0
34 1
99 1
87 0
X
Y
Commit
121 0
121 0
50 0
87 0
121 0
88 0
V# 0
44 0
V# 0
121 0
50 0
100 RV
100120121
X
Y
![Page 28: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/28.jpg)
Version Clock Implementation
• On sys-on-chip like Sun T200™ Niagara: virtually no contention, just CAS and be happy
• On others: add TID to VClock, if VClock has changed since last write can use new value +TID. Reduces contention by a factor of N.
• Future: Coherent Hardware VClock that guarantees unique tick per access.
![Page 29: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/29.jpg)
Performance Benchmarks
• Mechanically Transformed Sequential Red-Black Tree using TL2
• Compare to STMs and hand-crafted fine-grained Red-Black implementation
• On a 16–way Sun Fire™ running Solaris™ 10
![Page 30: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/30.jpg)
Uncontended Large Red-Black Tree5% Delete 5% Update 90% Lookup Hand-
crafted
TL/PSTL2/PS
TL/PO TL2/P0
Ennals
FarserHarrisLock-free
![Page 31: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/31.jpg)
Uncontended Small RB-Tree
5% Delete 5% Update 90% Lookup TL/P0
TL2/P0
![Page 32: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/32.jpg)
Contended Small RB-Tree30% Delete 30% Update 40% Lookup
Ennals
TL/P0
TL2/P0
![Page 33: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/33.jpg)
Speedup: Normalized Throughput
Hand-Crafted
TL/PO
Large RB-Tree 5% Delete 5% Update 90% Lookup
![Page 34: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/34.jpg)
Overhead Overhead Overhead
• STM scalability is as good if not better than hand-crafted, but overheads are much higher
• Overhead is the dominant performance factor – bodes well for HTM
• Read set and validation cost (not locking cost) dominates performance
![Page 35: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/35.jpg)
On Sun T200™ (Niagara): maybe a long way to go…
RB-tree 5% Delete 5% Update 90% LookupHand-crafted
STMs
![Page 36: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/36.jpg)
Conclusions
• COM time locking, implemented efficiently, has clear advantages over ENC order locking: – No meltdown under contention– Working seamlessly with malloc/free
• VCounter can guarantee safety so we – don’t need to embed repeated validation in
user code
![Page 37: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/37.jpg)
What Next?
• Further improve performance
• Make TL1 and TL2 library available
• Mechanical code transformation tool…
• Cut read-set and validation overhead, maybe with hardware support?
• Add hardware VClock to Sys-on-chip.
![Page 38: Transactional Locking](https://reader035.vdocuments.us/reader035/viewer/2022081506/5681498e550346895db6d350/html5/thumbnails/38.jpg)
Thank You