Download - Multiprocessor Cache Consistency
![Page 1: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/1.jpg)
Multiprocessor Cache Consistency
(or, what does volatile mean?)
Andrew Whitaker
CSE451
![Page 2: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/2.jpg)
What Does This Program Print?1. public class VisiblityExample extends Thread {
2. private static int x = 1;3. private static int y = 1; 4. private static boolean ready = false;
5. public static void main(String[] args) {6. Thread t = new VisiblityExample(); 7. t.start();8. 9. x = 2;10. y = 2;11. ready = true;12. }
13. public void run() {14. while (! ready)15. Thread.yield(); // give up the processor16. System.out.println(“x= “ + x + “ y= “ + y);17. }18. }
![Page 3: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/3.jpg)
Answer
It’s a race condition. Many different outputs are possible: x=2, y=2 x=1,y=2 x=2,y=1 x=1,y=1 Or, the program may print nothing!
The ready loop runs forever
![Page 4: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/4.jpg)
What’s Going on Here?
Processor caches ($) can get out-of-sync
CPU
$
Memory
CPU
$
CPU
$
CPU
$
![Page 5: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/5.jpg)
1. // Not real code; for illustration purposes only2. public class Example extends Thread {3. private static final int NUM_PROCESSORS = 4;
4. private static int x[NUM_PROCESSORS];5. private static int y[NUM_PROCESSORS]; 6. private static boolean ready[NUM_PROCESSORS];7. // …
A Mental Model
Every thread/processor has its own copy of every variable Yikes!
![Page 6: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/6.jpg)
Two Issues
Cache coherence Do caches eventually converge on the same state
All modern caches are coherent
Cache consistency When are operations by one processor visible on other
processors? Sometimes called “publication”
How much re-ordering is possible across processors?
![Page 7: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/7.jpg)
Subjective View of Cache Consistency Strategies
Fast and scalable
Amount ofreordering
Relaxed
Strict
![Page 8: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/8.jpg)
Factors Pushing Towards Relaxed Consistency Models
Hardware perspective: consistency operations are expensive Writing processor must invalidate all other processors Reading processor must re-validate its cached state
Compiler perspective: optimizations frequently re-arrange memory operations to hide latency These are guaranteed to be transparent, but only on a
single processor
![Page 9: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/9.jpg)
Caches 101
Caches store blocks of main memory Blocks are fairly small (perhaps 64 bytes)
Each cache block exists in one of three states Invalid, shared, exclusive
Memory operations causes the cache block to change states
CPUs must communicate to implement cache block state changes
![Page 10: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/10.jpg)
Cache Block State During a Coherence Operation
Invalid Shared(read-only)
Exclusive(read-write)
Writingprocessor
Reading processors
![Page 11: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/11.jpg)
Some Terminology
Publication: A CPU announces its updates to some or all of cache memory
Fetch: A CPU loads that latest values for previously published updates
![Page 12: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/12.jpg)
Hardware Support: Memory Fences (Barriers)
No memory operation can be moved across a fence No operation after the fence appears before the
fence No operation before the fence appears after the
fenceSeveral variants:
Write fences (for publication) Read fences (for fetch) Read/write (total) fences
![Page 13: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/13.jpg)
Sequential Consistency
All writes are immediately published All reads fetch the latest value All processors agree on order of
memory accesses Every operation is a fence
Behaves like shuffling cards
![Page 14: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/14.jpg)
Sequential Consistency Example
A. x = 2;B. y = 3;
C. x = 4;D. y = 5;
Processor 1 Processor 2
A always appears before BC always appears before D
A. x = 2;B. y = 3;C. x = 4;D. y = 5;
C. x = 4;D. y = 5;A. x = 2;B. y = 3;
C. x = 4;A. x = 2;D. y = 5;B. y = 3;
A. x = 2;C. x = 4;D. y = 5;B. y = 3;
A subset of legal orderings:
![Page 15: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/15.jpg)
The Cost of Sequential Consistency
Every write requires a complete cache invalidation Writing processor acquires exclusive access Writing processor sends an invalidation
message Writing processor receives acknowledgements
from all processors
Expensive!
![Page 16: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/16.jpg)
Relaxed Consistency Models
Updates are published lazily Therefore, updates may appear out-of-order
Challenge: Exposing a programming model that a human can understand
![Page 17: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/17.jpg)
Release Consistency
Observation: concurrent programs usually use proper synchronization “All shared, mutable state must be properly synchronized”
It suffices to sync-up memory during synchronized operations
Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores
![Page 18: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/18.jpg)
synchronized (this) { x++; y++;}
Fetch current values
Publish new values
Simple Example
Within the critical section, updates can be re-ordered
Without publication, updates may never be visible
![Page 19: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/19.jpg)
Java Volatile Variables
Java synchronized does double-duty It provides mutual exclusion, atomicity It ensures safe publication of updates
Sometimes, we don’t want to pay the cost of mutual exclusion
Volatile variables provide safe publication without mutual exclusion
volatile int x = 7;
![Page 20: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/20.jpg)
More on Volatile
Updates to volatile fields are propagated immediately “Don’t cache me!” Effectively, this activates sequential consistency
Volatile serves as a fence to the compiler and hardware Memory operations are not re-ordered around a
volatile
![Page 21: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/21.jpg)
Rule #1, Revised
All shared, mutable state must be properly synchronized With a synchronized statement, an Atomic
variable, or with volatile
![Page 22: Multiprocessor Cache Consistency](https://reader030.vdocuments.us/reader030/viewer/2022032414/56813336550346895d9a2fc5/html5/thumbnails/22.jpg)
Example: Lazy Initialization
class Example { static List list = null;
public static List getList () { if (list == null) { list = new LinkedList(); return list; }} Need synchronization to
ensure publication