multiprocessor cache consistency
Post on 02-Jan-2016
64 Views
Preview:
DESCRIPTION
TRANSCRIPT
Multiprocessor Cache Consistency
(or, what does volatile mean?)
Andrew Whitaker
CSE451
What Does This Program Print?1. public class VisiblityExample extends Thread {
2. private static int x = 1;3. private static int y = 1; 4. private static boolean ready = false;
5. public static void main(String[] args) {6. Thread t = new VisiblityExample(); 7. t.start();8. 9. x = 2;10. y = 2;11. ready = true;12. }
13. public void run() {14. while (! ready)15. Thread.yield(); // give up the processor16. System.out.println(“x= “ + x + “ y= “ + y);17. }18. }
Answer
It’s a race condition. Many different outputs are possible: x=2, y=2 x=1,y=2 x=2,y=1 x=1,y=1 Or, the program may print nothing!
The ready loop runs forever
What’s Going on Here?
Processor caches ($) can get out-of-sync
CPU
$
Memory
CPU
$
CPU
$
CPU
$
1. // Not real code; for illustration purposes only2. public class Example extends Thread {3. private static final int NUM_PROCESSORS = 4;
4. private static int x[NUM_PROCESSORS];5. private static int y[NUM_PROCESSORS]; 6. private static boolean ready[NUM_PROCESSORS];7. // …
A Mental Model
Every thread/processor has its own copy of every variable Yikes!
Two Issues
Cache coherence Do caches eventually converge on the same state
All modern caches are coherent
Cache consistency When are operations by one processor visible on other
processors? Sometimes called “publication”
How much re-ordering is possible across processors?
Subjective View of Cache Consistency Strategies
Fast and scalable
Amount ofreordering
Relaxed
Strict
Factors Pushing Towards Relaxed Consistency Models
Hardware perspective: consistency operations are expensive Writing processor must invalidate all other processors Reading processor must re-validate its cached state
Compiler perspective: optimizations frequently re-arrange memory operations to hide latency These are guaranteed to be transparent, but only on a
single processor
Caches 101
Caches store blocks of main memory Blocks are fairly small (perhaps 64 bytes)
Each cache block exists in one of three states Invalid, shared, exclusive
Memory operations causes the cache block to change states
CPUs must communicate to implement cache block state changes
Cache Block State During a Coherence Operation
Invalid Shared(read-only)
Exclusive(read-write)
Writingprocessor
Reading processors
Some Terminology
Publication: A CPU announces its updates to some or all of cache memory
Fetch: A CPU loads that latest values for previously published updates
Hardware Support: Memory Fences (Barriers)
No memory operation can be moved across a fence No operation after the fence appears before the
fence No operation before the fence appears after the
fenceSeveral variants:
Write fences (for publication) Read fences (for fetch) Read/write (total) fences
Sequential Consistency
All writes are immediately published All reads fetch the latest value All processors agree on order of
memory accesses Every operation is a fence
Behaves like shuffling cards
Sequential Consistency Example
A. x = 2;B. y = 3;
C. x = 4;D. y = 5;
Processor 1 Processor 2
A always appears before BC always appears before D
A. x = 2;B. y = 3;C. x = 4;D. y = 5;
C. x = 4;D. y = 5;A. x = 2;B. y = 3;
C. x = 4;A. x = 2;D. y = 5;B. y = 3;
A. x = 2;C. x = 4;D. y = 5;B. y = 3;
A subset of legal orderings:
The Cost of Sequential Consistency
Every write requires a complete cache invalidation Writing processor acquires exclusive access Writing processor sends an invalidation
message Writing processor receives acknowledgements
from all processors
Expensive!
Relaxed Consistency Models
Updates are published lazily Therefore, updates may appear out-of-order
Challenge: Exposing a programming model that a human can understand
Release Consistency
Observation: concurrent programs usually use proper synchronization “All shared, mutable state must be properly synchronized”
It suffices to sync-up memory during synchronized operations
Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores
synchronized (this) { x++; y++;}
Fetch current values
Publish new values
Simple Example
Within the critical section, updates can be re-ordered
Without publication, updates may never be visible
Java Volatile Variables
Java synchronized does double-duty It provides mutual exclusion, atomicity It ensures safe publication of updates
Sometimes, we don’t want to pay the cost of mutual exclusion
Volatile variables provide safe publication without mutual exclusion
volatile int x = 7;
More on Volatile
Updates to volatile fields are propagated immediately “Don’t cache me!” Effectively, this activates sequential consistency
Volatile serves as a fence to the compiler and hardware Memory operations are not re-ordered around a
volatile
Rule #1, Revised
All shared, mutable state must be properly synchronized With a synchronized statement, an Atomic
variable, or with volatile
Example: Lazy Initialization
class Example { static List list = null;
public static List getList () { if (list == null) { list = new LinkedList(); return list; }} Need synchronization to
ensure publication
top related