spin locks and contention - puc-rionoemi/pcp-13/aula3/tas.pdf · art of multiprocessor programming...
TRANSCRIPT
![Page 1: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/1.jpg)
Spin Locks and Contention
Companion slides for Chapter 7 The Art of Multiprocessor
Programming by Maurice Herlihy & Nir Shavit
![Page 2: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/2.jpg)
Art of Multiprocessor Programming 2
Focus so far: Correctness and Progress
• Models – Accurate (we never lied to you)
– But idealized (so we forgot to mention a few things)
• Protocols – Elegant – Important – But naïve
![Page 3: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/3.jpg)
Art of Multiprocessor Programming 3
New Focus: Performance
• Models – More complicated (not the same as complex!)
– Still focus on principles (not soon obsolete)
• Protocols – Elegant (in their fashion) – Important (why else would we pay attention) – And realistic (your mileage may vary)
![Page 4: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/4.jpg)
Art of Multiprocessor Programming 4
Kinds of Architectures • SISD (Uniprocessor)
– Single instruction stream – Single data stream
• SIMD (Vector) – Single instruction – Multiple data
• MIMD (Multiprocessors) – Multiple instruction – Multiple data.
![Page 5: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/5.jpg)
Art of Multiprocessor Programming 5
Kinds of Architectures • SISD (Uniprocessor)
– Single instruction stream – Single data stream
• SIMD (Vector) – Single instruction – Multiple data
• MIMD (Multiprocessors) – Multiple instruction – Multiple data.
Our space
(1)
![Page 6: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/6.jpg)
Art of Multiprocessor Programming 6
MIMD Architectures
• Memory Contention • Communication Contention • Communication Latency
Shared Bus
memory
Distributed
![Page 7: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/7.jpg)
Art of Multiprocessor Programming 7
Today: Revisit Mutual Exclusion
• Think of performance, not just correctness and progress
• Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware
• And get to know a collection of locking algorithms…
(1)
![Page 8: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/8.jpg)
Art of Multiprocessor Programming 8
What Should you do if you can’t get a lock?
• Keep trying – “spin” or “busy-wait” – Good if delays are short
• Give up the processor – Good if delays are long – Always good on uniprocessor
(1)
![Page 9: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/9.jpg)
Art of Multiprocessor Programming 9
What Should you do if you can’t get a lock?
• Keep trying – “spin” or “busy-wait” – Good if delays are short
• Give up the processor – Good if delays are long – Always good on uniprocessor
our focus
![Page 10: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/10.jpg)
Art of Multiprocessor Programming 10
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . .
![Page 11: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/11.jpg)
Art of Multiprocessor Programming 11
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . .
…lock introduces sequential bottleneck
![Page 12: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/12.jpg)
Art of Multiprocessor Programming 12
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . .
…lock suffers from contention
![Page 13: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/13.jpg)
Art of Multiprocessor Programming 13
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . . Notice: these are distinct phenomena
…lock suffers from contention
![Page 14: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/14.jpg)
Art of Multiprocessor Programming 14
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . .
…lock suffers from contention
Seq Bottleneck à no parallelism
![Page 15: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/15.jpg)
Art of Multiprocessor Programming 15
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
. . . Contention à ???
…lock suffers from contention
![Page 16: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/16.jpg)
Art of Multiprocessor Programming 16
Review: Test-and-Set
• Boolean value • Test-and-set (TAS)
– Swap true with current value – Return value tells if prior value was true
or false • Can reset just by writing false • TAS aka “getAndSet”
![Page 17: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/17.jpg)
Art of Multiprocessor Programming 17
Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; } }
(5)
![Page 18: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/18.jpg)
Art of Multiprocessor Programming 18
Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; } }
Package java.util.concurrent.atomic
![Page 19: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/19.jpg)
Art of Multiprocessor Programming 19
Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; } }
Swap old and new values
![Page 20: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/20.jpg)
Art of Multiprocessor Programming 20
Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)
![Page 21: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/21.jpg)
Art of Multiprocessor Programming 21
Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)
(5)
Swapping in true is called “test-and-set” or TAS
![Page 22: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/22.jpg)
Art of Multiprocessor Programming 22
Test-and-Set Locks
• Locking – Lock is free: value is false – Lock is taken: value is true
• Acquire lock by calling TAS – If result is false, you win – If result is true, you lose
• Release lock by writing false
![Page 23: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/23.jpg)
Art of Multiprocessor Programming 23
Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
![Page 24: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/24.jpg)
Art of Multiprocessor Programming 24
Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Lock state is AtomicBoolean
![Page 25: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/25.jpg)
Art of Multiprocessor Programming 25
Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Keep trying until lock acquired
![Page 26: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/26.jpg)
Art of Multiprocessor Programming 26
Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Release lock by resetting state to false
![Page 27: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/27.jpg)
Art of Multiprocessor Programming 27
Space Complexity
• TAS spin-lock has small “footprint” • N thread spin-lock uses O(1) space • As opposed to O(n) Peterson/Bakery • How did we overcome the Ω(n) lower
bound? • We used a RMW operation…
![Page 28: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/28.jpg)
Art of Multiprocessor Programming 28
Performance
• Experiment – n threads – Increment shared counter 1 million times
• How long should it take? • How long does it take?
![Page 29: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/29.jpg)
Art of Multiprocessor Programming 29
Graph
ideal tim
e
threads
no speedup because of sequential bottleneck
![Page 30: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/30.jpg)
Art of Multiprocessor Programming 30
Mystery #1
tim
e
threads
TAS lock Ideal
(1)
What is going on?
![Page 31: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/31.jpg)
Art of Multiprocessor Programming 31
Test-and-Test-and-Set Locks
• Lurking stage – Wait until lock “looks” free – Spin while read returns true (lock taken)
• Pouncing state – As soon as lock “looks” available – Read returns false (lock free) – Call TAS to acquire lock – If TAS loses, back to lurking
![Page 32: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/32.jpg)
Art of Multiprocessor Programming 32
Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }
![Page 33: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/33.jpg)
Art of Multiprocessor Programming 33
Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Wait until lock looks free
![Page 34: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/34.jpg)
Art of Multiprocessor Programming 34
Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } }
Then try to acquire it
![Page 35: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/35.jpg)
Art of Multiprocessor Programming 35
Mystery #2 TAS lock TTAS lock Ideal
tim
e
threads
![Page 36: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/36.jpg)
Art of Multiprocessor Programming 36
Mystery
• Both – TAS and TTAS – Do the same thing (in our model)
• Except that – TTAS performs much better than TAS – Neither approaches ideal
![Page 37: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/37.jpg)
Art of Multiprocessor Programming 37
Opinion
• Our memory abstraction is broken • TAS & TTAS methods
– Are provably the same (in our model)
– Except they aren’t (in field tests)
• Need a more detailed model …
![Page 38: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/38.jpg)
Art of Multiprocessor Programming 38
Bus-Based Architectures
Bus
cache
memory
cache cache
![Page 39: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/39.jpg)
Art of Multiprocessor Programming 39
Bus-Based Architectures
Bus
cache
memory
cache cache
Random access memory (10s of cycles)
![Page 40: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/40.jpg)
Art of Multiprocessor Programming 40
Bus-Based Architectures
cache
memory
cache cache
Shared Bus • Broadcast medium • One broadcaster at a time • Processors and memory all “snoop”
Bus
![Page 41: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/41.jpg)
Art of Multiprocessor Programming 41
Bus-Based Architectures
Bus
cache
memory
cache cache
Per-Processor Caches • Small • Fast: 1 or 2 cycles • Address & state information
![Page 42: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/42.jpg)
Art of Multiprocessor Programming 42
Jargon Watch
• Cache hit – “I found what I wanted in my cache” – Good Thing™
![Page 43: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/43.jpg)
Art of Multiprocessor Programming 43
Jargon Watch
• Cache hit – “I found what I wanted in my cache” – Good Thing™
• Cache miss – “I had to shlep all the way to memory
for that data” – Bad Thing™
![Page 44: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/44.jpg)
Art of Multiprocessor Programming 44
Cave Canem
• This model is still a simplification – But not in any essential way – Illustrates basic principles
• Will discuss complexities later
![Page 45: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/45.jpg)
Art of Multiprocessor Programming 45
Bus
Processor Issues Load Request
cache
memory
cache cache
data
![Page 46: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/46.jpg)
Art of Multiprocessor Programming 46
Bus
Processor Issues Load Request
Bus
cache
memory
cache cache
data
Gimme data
![Page 47: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/47.jpg)
Art of Multiprocessor Programming 47
cache
Bus
Memory Responds
Bus
memory
cache cache
data
Got your data right
here data
![Page 48: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/48.jpg)
Art of Multiprocessor Programming 48
Bus
Processor Issues Load Request
memory
cache cache data
data
Gimme data
![Page 49: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/49.jpg)
Art of Multiprocessor Programming 49
Bus
Processor Issues Load Request
Bus
memory
cache cache data
data
Gimme data
![Page 50: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/50.jpg)
Art of Multiprocessor Programming 50
Bus
Processor Issues Load Request
Bus
memory
cache cache data
data
I got data
![Page 51: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/51.jpg)
Art of Multiprocessor Programming 51
Bus
Other Processor Responds
memory
cache cache
data
I got data
data data Bus
![Page 52: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/52.jpg)
Art of Multiprocessor Programming 52
Bus
Other Processor Responds
memory
cache cache
data
data data Bus
![Page 53: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/53.jpg)
Art of Multiprocessor Programming 53
Modify Cached Data
Bus
data
memory
cache data
data
(1)
![Page 54: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/54.jpg)
Art of Multiprocessor Programming 54
Modify Cached Data
Bus
data
memory
cache data
data
data
(1)
![Page 55: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/55.jpg)
Art of Multiprocessor Programming 55
memory
Bus
data
Modify Cached Data
cache data
data
![Page 56: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/56.jpg)
Art of Multiprocessor Programming 56
memory
Bus
data
Modify Cached Data
cache
What’s up with the other copies?
data
data
![Page 57: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/57.jpg)
Art of Multiprocessor Programming 57
Cache Coherence
• We have lots of copies of data – Original copy in memory – Cached copies at processors
• Some processor modifies its own copy – What do we do with the others? – How to avoid confusion?
![Page 58: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/58.jpg)
Art of Multiprocessor Programming 58
Write-Back Caches
• Accumulate changes in cache • Write back when needed
– Need the cache for something else – Another processor wants it
• On first modification – Invalidate other entries – Requires non-trivial protocol …
![Page 59: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/59.jpg)
Art of Multiprocessor Programming 59
Write-Back Caches
• Cache entry has three states – Invalid: contains raw seething bits – Valid: I can read but I can’t write – Dirty: Data has been modified
• Intercept other load requests • Write back to memory before using cache
![Page 60: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/60.jpg)
Art of Multiprocessor Programming 60
Bus
Invalidate
memory
cache data data
data
![Page 61: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/61.jpg)
Art of Multiprocessor Programming 61
Bus
Invalidate
Bus
memory
cache data data
data
Mine, all mine!
![Page 62: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/62.jpg)
Art of Multiprocessor Programming 62
Bus
Invalidate
Bus
memory
cache data data
data
cache
Uh,oh
![Page 63: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/63.jpg)
Art of Multiprocessor Programming 63
cache Bus
Invalidate
memory
cache data
data
Other caches lose read permission
![Page 64: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/64.jpg)
Art of Multiprocessor Programming 64
cache Bus
Invalidate
memory
cache data
data
Other caches lose read permission
This cache acquires write permission
![Page 65: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/65.jpg)
Art of Multiprocessor Programming 65
cache Bus
Invalidate
memory
cache data
data
Memory provides data only if not present in any cache, so no need to
change it now (expensive)
(2)
![Page 66: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/66.jpg)
Art of Multiprocessor Programming 66
cache Bus
Another Processor Asks for Data
memory
cache data
data
(2)
Bus
![Page 67: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/67.jpg)
Art of Multiprocessor Programming 67
cache data Bus
Owner Responds
memory
cache data
data
(2)
Bus
Here it is!
![Page 68: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/68.jpg)
Art of Multiprocessor Programming 68
Bus
End of the Day …
memory
cache data
data
(1)
Reading OK, no writing
data data
![Page 69: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/69.jpg)
Art of Multiprocessor Programming 69
Mutual Exclusion
• What do we want to optimize? – Bus bandwidth used by spinning threads – Release/Acquire latency – Acquire latency for idle lock
![Page 70: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/70.jpg)
Art of Multiprocessor Programming 70
Simple TASLock
• TAS invalidates cache lines • Spinners
– Miss in cache – Go to bus
• Thread wants to release lock – delayed behind spinners
![Page 71: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/71.jpg)
Art of Multiprocessor Programming 71
Test-and-test-and-set
• Wait until lock “looks” free – Spin on local cache – No bus use while lock busy
• Problem: when lock is released – Invalidation storm …
![Page 72: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/72.jpg)
Art of Multiprocessor Programming 72
Local Spinning while Lock is Busy
Bus
memory
busy busy busy
busy
![Page 73: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/73.jpg)
Art of Multiprocessor Programming 73
Bus
On Release
memory
free invalid invalid
free
![Page 74: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/74.jpg)
Art of Multiprocessor Programming 74
On Release
Bus
memory
free invalid invalid
free
miss miss
Everyone misses, rereads
(1)
![Page 75: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/75.jpg)
Art of Multiprocessor Programming 75
On Release
Bus
memory
free invalid invalid
free
TAS(…) TAS(…)
Everyone tries TAS
(1)
![Page 76: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/76.jpg)
Art of Multiprocessor Programming 76
Problems
• Everyone misses – Reads satisfied sequentially
• Everyone does TAS – Invalidates others’ caches
• Eventually quiesces after lock acquired – How long does this take?
![Page 77: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/77.jpg)
Art of Multiprocessor Programming 77
Mystery Explained TAS lock TTAS lock Ideal
tim
e
threads Better than TAS but still not as good as
ideal
![Page 78: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/78.jpg)
Art of Multiprocessor Programming 78
Solution: Introduce Delay
spin lock time d r1d r2d
• If the lock looks free • But I fail to get it
• There must be lots of contention • Better to back off than to collide again
![Page 79: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/79.jpg)
Art of Multiprocessor Programming 79
Dynamic Example: Exponential Backoff
time d 2d 4d spin lock
If I fail to get lock – wait random duration before retry – Each subsequent failure doubles expected wait
![Page 80: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/80.jpg)
Art of Multiprocessor Programming 80
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
![Page 81: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/81.jpg)
Art of Multiprocessor Programming 81
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
![Page 82: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/82.jpg)
Art of Multiprocessor Programming 82
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
![Page 83: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/83.jpg)
Art of Multiprocessor Programming 83
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
![Page 84: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/84.jpg)
Art of Multiprocessor Programming 84
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Back off for random duration
![Page 85: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/85.jpg)
Art of Multiprocessor Programming 85
Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Double max delay, within reason
![Page 86: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/86.jpg)
Art of Multiprocessor Programming 86
Spin-Waiting Overhead
TTAS Lock
Backoff lock tim
e
threads
![Page 87: Spin Locks and Contention - PUC-Rionoemi/pcp-13/aula3/tas.pdf · Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks”](https://reader036.vdocuments.us/reader036/viewer/2022071102/5fdc3192cc5edf623c50529e/html5/thumbnails/87.jpg)
Art of Multiprocessor Programming 87
Backoff: Other Issues
• Good – Easy to implement – Beats TTAS lock
• Bad – Must choose parameters carefully – Not portable across platforms