shared memory consistency models : a broad survey ganesh gopalakrishnan* school of computing,...
Post on 20-Dec-2015
213 views
TRANSCRIPT
![Page 1: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/1.jpg)
Shared Memory Consistency Models :A broad survey
Ganesh Gopalakrishnan*
School of Computing, University of Utah, Salt Lake City, UT
* Past work supported in part by SRC Contract 1031.001, NSF Award 0219805 and an equipment grant from Intel Corporation
![Page 2: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/2.jpg)
2
Shared Memory: Hardware Realities
Memory performance
CPU performance
![Page 3: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/3.jpg)
3
Shared Memory: Software Realities
• Must define the formal semantics of shared-memory concurrent programming while allowing for all reasonable optimizations
•Defining the Shared Thread semantics for Java (Original Java book’s Chapter 17 has essentially been ripped out…)
• Defining the Shared Memory Model for new languages such as Unified Parallel C (UPC) for Scientific Programming
• At a deeper level: Must have formal basis for Automatic Minimal Fence Insertion to make programs appear to execute sequentially consistent
![Page 4: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/4.jpg)
4
Topics:
• Motivations for strong and weak memory models - How it affects consistency protocol design - How it affects programming
• Classical memory models- Their “power”
• Fence insertion during compilation - Run on weak architectures but appear to run SC
• Overview of some weak architectures
• Itanium in a nutshell
• SAT-based programs that check executions against memory model specs - Demo of MP Execution Checker (MPEC) tool for Itanium
![Page 5: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/5.jpg)
5
Topics:
• Theoretical aspects of memory model specification
- Specify using Traces or Specify using Transducers
• Why Traced-based Specification can allow one to talk about unrealizable machines
- Hence “undecidability of sequential consistency” is not a solved problem
• Why trace-based verification methods need to exert some care
- Otherwise can prove “conniving machines” to be SC !!
• A brief taxonomy of recent results in this area
- Mainly Alur et.al., Qadeer, Bingham et.al., and Sezgin
![Page 6: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/6.jpg)
6
Sequential Consistency : The Most Basic Memory Consistency Model
• Requirements1. Exists a common total
order 2. Respects program order 3. Read sees the “latest”
write
Under Sequential Consistency: No
Under many weak models: Yes
Example
Initially, x = y = 0. Finally, can r1 = r2 = 0?
Thread 1 Thread 2x = 1;
r1 = y;
x = 1;
r1 = y;y = 2;
r2 = x;
y = 2;
r2 = x;
![Page 7: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/7.jpg)
7
How to Think About Sequential Consistency
P1 P2 Pn
Memory
Initially, x = y = 0. Finally, can r1 = r2 = 0?
Thread 1 Thread 2x = 1;
r1 = y;
x = 1;
r1 = y;y = 2;
r2 = x;
y = 2;
r2 = x;
No! Not under SC ! But possible under many weak memory models!
An example of such a weak memory model is Sparc TSO
![Page 8: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/8.jpg)
8
Coherence == Per-location Sequential Consistency
P1 P2 Pn
1-address Memory
Notice that the same execution is Coherent !
Initially, x = y = 0. Finally, can r1 = r2 = 0?
Thread 1 Thread 2x = 1;
r1 = y;
x = 1;
r1 = y;y = 2;
r2 = x;
y = 2;
r2 = x;
![Page 9: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/9.jpg)
9
Memory Consistency Models
Defines the legal orderings of memory operations that can be perceived at the user level
• Processors intermittently throw colors onto memory cells and also intermittently look at their colors
P1 P2 Pn
Memory Cell 1
Memory Cell 2
Memory Cell n
…
Pi
![Page 10: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/10.jpg)
10
Memory Consistency Models
Defines the legal orderings of memory operations that can be perceived at the user level
• Many have been developed: – Sequential Consistency (SC)
– Coherence (per-location SC)
– Parallel Random Access Memory (PRAM)– Causal Consistency– Processor Consistency (PC)– Release Consistency– Location Consistency– The Intel Itanim Memory Model– Java Memory Model (JMM)– and more!
![Page 11: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/11.jpg)
11
Memory Consistency Model Specifications:
A VERY complex specification for a real architecture (e.g. Itanium, PowerPC, …)
Also of growing concern in Software (e.g. Java Memory Model, Unified Parallel C model, …)
![Page 12: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/12.jpg)
12
Motivation for (weak) Memory Consistency models:
A Hardware Perspective :
• Cannot afford to do industrious updates across large MP systems
• Delayed and re-orderable updates allow considerable latitude in memory consistency protocol design less bugs in protocols !!
…
dir dir
Chip-level protocols
Inter-cluster protocols
Intra-cluster protocols
mem mem
![Page 13: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/13.jpg)
13
Price Paid for Delayed Updates : Bugs!
Algorithms such as Peterson’s Mutual Exclusion cease to work!
Thread 1 Thread 2------------ -----------Flags[1] = BUSY; Flags[2] = BUSY;Turn = 2; Turn = 1;
While (Flags[2] == BUSY && While (Flags[1] == BUSY && Turn != 1) ; Turn != 2) ;
Critical section Critical section
Flags[1] = FREE; FLAGS[2] = FREE;
CAN READ OLD VALUE!!
CAN READ OLD VALUE!!
![Page 14: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/14.jpg)
14
Scope of Tutorial:
• Survey of ‘Classical’ Work
• Survey of Current Activities (that this speaker is aware of)
• Verification Challenges
• Theoretical Questions
• Justification for topic selection:
- Complement talks on Shared Memory Consistency Protocols
- Intuitions more important than the detailzzz….
- Knowing who’s who in this area helps
- Excuse for me to stick my neck out and learn something new
![Page 15: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/15.jpg)
15
Organization:
1. Overview (mainly of classical works)
2. Practical aspects of weak consistency models (more depth)
3. What’s not apparent at first glance (still more depth)
4. Conclusions and references
![Page 16: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/16.jpg)
16
Part 1: Overview of Classical Work
![Page 17: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/17.jpg)
17
Memory Serves to Plumb Data…
Uniprocessor:
Write ( address = 2 , data = 33) ; …….. Read ( address = 2 , returns data = 33) ;
Multiprocessor:
P1 P2---- ----Write (2, 33) ; || Read (2, 33) ;
Multiprocessor: P1 P2 ---- ----Write(2, 33); Write(2, 77);
Read (2, 77); Read(2, 33);
P1 P2 P3 P4---- ---- ---- ----Write (2, 33) ; Write (2, 77) ; Read(2, 33); Read(2, 77);
Read(2, 77); Read(2, 33);
…but respecting Coherence!
![Page 18: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/18.jpg)
18
…but Coherence is not sufficient:
From Shasha and Snir, Figure 1, P. 282 (ACM TOPLAS (10)2: 1988)
Processor 1 Processor 2------------- --------------
Test_and_set1(LOCK); Test_and_set2(LOCK);
Read1(X); Read2(X);
Write1(X); Write2(X);
Reset1(LOCK); Reset2(LOCK);
The following memory access sequence respects Coherence but breaks the critical section :
Test_and_set1(LOCK); Read1(X); Reset1(LOCK);
Test_and_set2(LOCK); Read2(X); Write1(X); Write2(X); Reset2(LOCK);
• Consistent view ACROSS ADDRESS SPACE is needed
• Most intuitive such : Sequential Consistency !
![Page 19: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/19.jpg)
19
Basic understanding of SC:
• Execute AS IF instructions in each thread were executed sequentially and atomically
- respecting the program order in each thread
- no constraints across sequential programs
Requires effort to achieve above effect AS WELL AS high performance :
CPU 1
Memoryand
Bus Controller
CPU n …
Write (2, 55) ; MISSESRead (4, 11) ; HITS
Write (4, 66) ; MISSESRead (2, 22) ; HITS Which Read waits ?
![Page 20: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/20.jpg)
20
CPU 1
Memoryand
Bus Controller
CPU n …
Write (2, 55) ; MISSESRead (4, 11) ; HITS
Write (4, 66) ; MISSESRead (2, 22) ; HITS
Aggressive SC Implementations:
From Adve, Pai, and Ranganathan (Proc IEEE, (87)3, March 1999, p.448)
“If the accessed location does not change its value until the Read could have been non-speculatively issued, then the speculation is successful. Otherwise, roll-back speculation until incorrect load.” (Similar schemes used in HP PA-8000, Intel Pentium Pro, MIPS R10K)
One way to implement this: * If bus-snoop for Write(4,..) arrives before that for Write(2,..), the Read(4, 11) is invalidated – and it reissues…
Snoops areWrite(4,66);Write(2,55);
Snoops areWrite(4,66);Write(2,55);
![Page 21: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/21.jpg)
21
Unexpected Interactions:SC and Write Update Protocols(from Grahn, Stenstrom, Dubois)
• An important aspect of Sequential Consistency is Write Atomicity
• Write-Invalidate protocols can easily guarantee Write Atomicity
• However, Write-Update protocols are often recommended (Read-latency)
• Ensuring Write-Atomicity in Write-Update Protocols is tricky
• WEAK MEMORY MODELS TO THE RESCUE ! Don’t care about Write Atomicity except at Acquire / Release points
…
dir dir
Chip-level protocols
Inter-cluster protocols
Intra-cluster protocols
mem mem
![Page 22: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/22.jpg)
22
A Deeper Look at Coherence :
Complexity of Checking Coherence of Executions is in NPC :
Cantin’s proof: Reduction from SAT:
Example: Consider (u1 \/ u2) /\ (~u1 \/ u2)
Create the following concurrent processes:
h1 h2 h_u1 h_~u1 h_u2 h_~u2 h3--- --- ----- ------- ----- ------- ---W(d_u1) W(d_~u1) R(d_u1) R(d_~u1) R(d_u2) R(d_~u2) R(d_c1)
W(d_u2) W(d_~u2) R(d_~u1) R(d_u1) R(d_~u2) R(d_u2) R(d_c2)
W(d_c1) W(d_c2) W(d_c1) W(d_u1) W(d_c2) W(d_u2) W(d_~u1)
W(d_~u2) W(d_F)
Literal Gadget
Clause Gadget
Existence of aCoherent Scheduleis tested
![Page 23: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/23.jpg)
23
A Deeper Look at Coherence :
Memory models that relax coherence – and how “useful” they are:
• PRAM (pipelined RAM – Lipton and Sandberg) is of academic interest
One memory per processor
Program order is obeyed, butNo Write-Atomicity
P1 P2 Pn
…
…
…
![Page 24: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/24.jpg)
24
A Deeper Look at Coherence :Memory models that relax coherence – and how “useful” they are:
• PRAM – of academic interest
• Location consistency
- Proposed by Gao and Sarkar- They tout its advantages in terms of scalability- They describe an LC protocol “machine”
- Analysis by Wallace et.al (PDPTA 2002: 1542-1550) :
* Shown that this LC machine is stronger than the LC definition
* Question whether LC programs indeed appear to execute with sequentially consistent outcomes assuming that they are “properly labeled”
* I have not seen many pubs on LC of late…
![Page 25: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/25.jpg)
25
Classical Weak Memory Models:
• Processor Consistency is widely known
• Good discussions in Ahamad et.al.,
“The Power of Processor Consistency”
• First understand PRAM :
- For each processor p, there is a legal serialization S_p of
H_p+w such that if o1 and o2 are in H_p+w and o1 –po-> o2
then o1 – s_p o2
- For PC_g, we add the following condition:
for any two processors p and q, and for any location x,
S_p | (w,x) = S_q | (w,x)
“Processor Consistency according to Goodman (PC_g)”
is not the same as
“PC_d – processor consistency according to the DASH project”
![Page 26: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/26.jpg)
26
Execution that’s PRAM and Coherent … but not PC_g:
P : w(x,0) w(y,0)
Q: r(y,0) w(x,1)
R: r(x,1) r(x,0)
* Coherent! Just look at each color separately
* Not PC_g :
Construct a history per processor with all of the processor’s actionsand all of others’ writes in that history
PC_g requires the write-histories to agree per variable; but in our example,
History of Q = …w(x,0)… w(x,1)… while
History of R = …w(x,1)… w(x,0)…
![Page 27: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/27.jpg)
27
The “power” of Processor Consistency:
• Can handle “Peterson” (Ahamad)
• Can’t handle “Bakery” (Ahamad)
• What else? (Kawash and Higham, “Bounds for mutual
exclusion with only Processor Consistency”):
- Peterson is correct for PC-G (a multi-writer protocol)
- Bakery is incorrect for PC-G (a single-writer protocol)
- Kawash and Higham prove that for mutual exclusion under
PC-G, one multi-writer and n single-writers are necessary
![Page 28: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/28.jpg)
28
Observations:
• Weak shared memory consistency models allow consistency
protocols to be efficient
• Unfortunately programmers find weak models non-intuitive
• How can we have the best of both worlds:
- weak models to be supported by the hardware
- strong models to be presented by the software
This can be achieved through compilers that insert the minimal number of fence instructions to give the appearance of SC
![Page 29: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/29.jpg)
29
Basics of Fence Insertion:
• Widely cited work is by Shasha and Snir
• Recent work by Lee, Midkiff, and Padua extends the above
• Let us go through some examples (initially all mem. locations are 0)
P1 P2 ---- ----write(x,1) ; read(y, yd) ;
write(y,1); read(x, xd) ;
Under SC,
If yd = 1, then xd = 1
![Page 30: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/30.jpg)
30
Basics of Fence Insertion:
P1 P2 ---- ----write(x,1) ; read(y, yd) ;
write(y,1); read(x, xd) ;
BUT if we allow instructions to re-order, then the guarantee
If yd = 1, then xd = 1
is lost !!
• But often we CAN re-order without noticing an SC violation
• When can we re-order ??
![Page 31: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/31.jpg)
31
Basics of Fence Insertion:
• Widely cited work is by Shasha and Snir (our exs. from their paper)
• Recent work by Lee, Midkiff, and Padua extends the above
• Let us go through some examples (initially all mem. locations are 0) P1 P2
---- ----write(x,1) ; read(y, yd) ;
write(y,1); read(x, xd) ;
Which program order edges in P = {a,b} must be respectedin order to guarantee SC-compliant executions ?
• Preserving a alone : Insufficient, as it can return xd=0, yd=1• Preserving b alone : Insufficient, as it can return xd=0, yd=1
• BOTH a and b need to be preserved – how to compute this in general?
• Terminology : {a,b} in this example forms the Delay Set, D
a b
![Page 32: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/32.jpg)
32
Analysis is based on Critical Cycles
• Locate all critical cycles in the concurrent program
• Equate Delay Set D to all the program-order edges in all
critical cycles
• Locating Critical Cycles :
- Locate all Conflict Edges C
. Locate two accesses that are concurrent and one of them is
a write; these give the undirected Conflict Edges C
. A critical cycle is a cycle in P U C that has the following
properties :
* Contains at-most two operations from the same thread
that are consecutive in it
* Contains 0, 2, or 3 accesses to each shared variable
that are consecutive in it (further properties omitted…)
![Page 33: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/33.jpg)
33
P1 P2 ---- ----write(x,1) ; read(y, yd) ;
write(y,1); read(x, xd) ;
Conflict Edges C
ProgramOrderEdgesP
P1 P2 ---- ----write(x,1) ; read(y, yd) ;
write(y,1); read(x, xd) ;
CriticalCycle
Delay Set D = all the P edges in Critical Cycle = P in our case
Finding Critical Cycles : Example 1
![Page 34: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/34.jpg)
34
Finding Critical Cycles : Example 2
P1 P2 ---- ----read(x, xd); write(x,1);
read(y, yd); write(y,1);
Basicallya“while”loop
P1 P2 ---- ----read(x, xd); write(x,1);
read(y, yd); write(y,1);
ConflictEdges
CriticalCycle
Delay Set D = {b, c} whereas P = {a, b, c}
ab c
![Page 35: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/35.jpg)
35
Finding Critical Cycles : Example 3
a1 : read A
b1 : read B
c1 : read C
d1 : read D
a2 : write B
b2 : write C
c2: write D
d2 : write A
D = { (a1,b1), (a1,c1), (a1,d1), (a2,d2), (b2,d2), (c2,d2) }
suffices to ensure SC !
I.e., a1 is an acquire-read and d2 is a release-write !!
![Page 36: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/36.jpg)
36
Basic Approach to Fence Insertion:
• Goal : Discover the minimal set of fences to be inserted into
a concurrent shared memory program
• Suppose D is the delay-set discovered by the previous analysis
• Suppose the underlying (weak) architecture supports orderings
D_o
• Let D_m be the fences to be inserted to get the effect of D
• D_m = ( ( D U D_o )+ )tr - D_o
where “tr” is the transitive reduction
a
b
cd
• Required Delay Set = { (a,b), (b,c), (a,d) }
• D_o = (c,d)
• ( (D U D_o )+ )tr = {(a,b), (b,c), (c,d)}
• ( (D U D_o)+ )tr – D_o = {(a,b), (b,c)} - fences needed only here
![Page 37: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/37.jpg)
37
Basic Approach to Fence Insertion:
a
b
cd
• Required Delay Set = { (a,b), (b,c), (a,d) }
• D_o = (c,d)
• ( (D U D_o )+ )tr = {(a,b), (b,c), (c,d)}
• ( (D U D_o)+ )tr – D_o = {(a,b), (b,c)} - fences needed only here
a
b
cd
fence
fence
Hardware-providedordering
So, in a nutshell, ….
implements the desireddelay-set
![Page 38: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/38.jpg)
38
Deriving Fences from Correctness Proofs
Lamport’s paper “How to make a Correct Multiprocess Program Execute Correctly on a Multiprocessor,” IEEE Trans Computer 46(7) – 1997
provides a really good insight on deriving required weak orderings thru proofs
• Notations :
A B : Every event in A precedes every event in B
A -- > B : Some event in A precedes some event in B
Implies
Implies
![Page 39: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/39.jpg)
39
Deriving Places to insert a Synch Instruction:
Repeat forever
noncritical section;
L : x_i := true;
For j := 1 until i-1
Do if x_j then x_I := false ; while x_j do od; goto L fi oD
For j := i+1 until N do while x_j do od od;
critical section ;
x_j := false
End Repeat
Synch
Synch
Synch
There is a proof in Lamport’s paper that withjust these Synch instructions, mutual exclusion is guaranteed.
![Page 40: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/40.jpg)
40
Part 2: A Detailed Look at a Practical Weak Memory Model : Itanium(I do mention three others briefly…)
![Page 41: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/41.jpg)
41
Well, let’s look at the big picture first:
Sparc TSO, PSO, RMO
• Reads and Writes follow the
TSO, PSO, or RMO semantics
• Additional Fence instructions
and others (e.g. semaphores)
• I’m not upto speed on these…
Alpha
• Reads (only coherence)
• Writes (only coherence)
• Load-Locked
• Store-Conditional
• Membar
![Page 42: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/42.jpg)
42
Well, let’s look at the big picture:
Power-4
• Reads and Writes (don’t know much)
• Sync (Synchronize)
• Lwsync (Lightweight Sync – new in Power4)
• E I E I O (Enforce In-Order Execution of I/O)
• Lwarx (Load word and reserve)
• Ldarx (Load doubleword and reserve)
• Stwcx (Store word conditional)
• Stdcx (Store Doubleword Conditional)
• Isync (Instruction synchronize)
Perhaps Old-McDonald knows more…
![Page 43: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/43.jpg)
43
IA-32, IA-64, AMD, … ?
•Generally thought to be “Processor Consistency”
•Does it really help formally specify (or even reveal the details) ?
•Intel thought so ……
The Itanium memory model is described next…
![Page 44: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/44.jpg)
44
The Intel Itanium® Processor memory model
• Has these kinds of instructions : “weak load” or “ordinary load” -- ld
“strong load” or “acquire-load” -- ld.acq
“weak store” or “ordinary store” -- st
“strong store” or “release store” -- st.rel
“memory fence” (NOT barrier!) -- mf
A few semaphore-types
Allows sub-word writes, I/O spaces…
We don’t model these
![Page 45: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/45.jpg)
45
Itanium® memory model thru examples
st [x] = 2
…
…
Can freely slide in asequential program…
Only rule is coherence
“Ordinary store”
ld reg1 = [x]
The same applies to an “ordinary load”
…
…
![Page 46: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/46.jpg)
46
Itanium® memory model thru examples
st.rel [x] = 2
…
Things before it in sequential program ordercan’t happen after it
“Release store”
Things after it in sequential program Ordermay happen before it !!
![Page 47: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/47.jpg)
47
Itanium® memory model thru examples
ld.acq r3 = [y]
…
Things before it in sequential program ordermay happen after it
“Acquire load”
Things after it in sequential program Ordercan’t happen before it !!
![Page 48: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/48.jpg)
48
st.rel [y] = 1
ld reg1 = [x] <0> ld reg2 = [y] <0>
st.rel [x] = 2
ld.acq r3 = [y] <1> ld.acq r4 = [x] <2>
Datadep.
ld.acqrule
Itanium specification DOES NOT try to explain outcomes in terms of “shuffles” of the original instructions!
But with these rules alone, we can’t explain thefollowing legal outcome in Itanium®
![Page 49: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/49.jpg)
49
This has turned out to be an unspoken convention in this area for other memory models also…
st [y] = 1
Local copy for P0
“remote” copy for P0
“remote” copy for P1
A store generates (n+1) progenies
ld.acq r3 = [y]
Other instructionsgenerate only one
Itanium® rules explain execution outcomes in terms of “progenies” of stores and loads
![Page 50: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/50.jpg)
50
P1: St a,1; Ld r1,a <1>; St b,r1 <1>;
P2: Ld.acq r2,b <1>; Ld r3,a <0>;
We wrote such a “breeding assembler”
{id=0; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Local; wrProc=0; reg=-1; useReg=false};
{id=1; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Remote; wrProc=0; reg=-1; useReg=false};
{id=2; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Remote; wrProc=1; reg=-1; useReg=false};
{id=3; proc=0; pc=1; op= Ld; var=0; data=1; wrID=-1; wrType=DontCare; wrProc=-1; reg=0; useReg=true};
{id=4; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Local; wrProc=0; reg=0; useReg=true};
{id=5; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Remote; wrProc=0; reg=0; useReg=true};
{id=6; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Remote; wrProc=1; reg=0; useReg=true};
{id=7; proc=1; pc=0; op= LdAcq; var=1; data=1; wrID=-1; wrType=DontCare; wrProc=-1; reg=1; useReg=true};
{id=8; proc=1; pc=1; op= Ld; var=0; data=0; wrID=-1; wrType=DontCare; wrProc=-1; reg=2; useReg=true}
Tuple 1
Tuple 9
...
![Page 51: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/51.jpg)
51
Itanium® rules specify how to line-up the tuplesto explain the load-outcomes !!
st [y] = 1
ld reg1 = [x] <0> ld reg2 = [y] <0>
P0 P1
st [x] = 2
Now, arrange the split copies…
st [y] = 1 “l”
st [y] = 1 “rp0”st [y] = 1 “rp1”
st [x] = 2 “l”st [x] = 2 “rp0”st [x] = 2 “rp1”
st [y] = 1 “l”
st [y] = 1 “rp0”
st [y] = 1 “rp1”
st [x] = 2 “l”
st [x] = 2 “rp0”
st [x] = 2 “rp1”
ld reg1 = [x] <0>
ld reg2 = [y] <0>
Explanation…
ld.acq r3 = [y] <1> ld.acq r4 = [x] <2>
ld.acq r3 = [y] <1>
ld.acq r4 = [x] <2>
Dependencies
Anti-dependencies
![Page 52: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/52.jpg)
52
legalItanium(exec) =Exists order.( requireStrictTotalOrder exec order
/\ requireWriteOperationOrder exec order/\ requireItProgramOrder exec order/\ requireMemoryDataDependence exec order/\ requireDataFlowDependence exec order/\ requireCoherence exec order/\ requireAtomicWBRelease exec order/\ requireSequentialUC exec order/\ requireNoUCBypass exec order /\ requireReadValue exec order
SC(exec) =Exists order.( requireStrictTotalOrder exec order
/\ requireProgramOrder exec order
/\ requireReadValue exec order
Gist of our method: Illustration on SC and of Itanium
The tuples to be ordered
Find an arrangement under SC constraints
The tuples to be ordered
Find arrangement as per above constraints
![Page 53: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/53.jpg)
53
legal_itanium exec = (* a given execution *) ?order. requireStrictTotalOrder exec order /\ requireWriteOperationOrder exec order /\ requireProgramOrder exec order /\ requireMemoryDataDependence exec order /\ requireDataFlowDependence exec order /\ requireCoherence exec order /\ requireReadValue exec order /\ requireAtomicWBRelease exec order /\ requireSequentialUC exec order /\ requireNoUCBypass exec order
Our Itanium Formal Model (extracted from IntelDocuments – written as a HOL Theory)
See Charme’03, IPDPS’04, CAV’04Various contributions by Yue Yang, Gopalakrishnan, Lindstrom, Slind, Sivaraj, Yu Yang
![Page 54: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/54.jpg)
54
requireStrictTotalOrder exec order
![Page 55: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/55.jpg)
55
requireWriteOperationOrder exec order
Local Write before Local Global Write
Local Write before Remote Global Writes
![Page 56: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/56.jpg)
56
requireProgramOrder exec order
Program Order is defined solely through
Acquires, Releases, and Fences
![Page 57: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/57.jpg)
57
requireMemoryDataDependence exec order
Order two accesses (Read or Write) under these conditions :
IF program-ordered AND the same variable AND
Write is local and RAW (and Read of course is local)
OR Write is local and WAR
OR Both writes are local and WAW
OR Both writes are remote and WAW and Fall in same processor
![Page 58: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/58.jpg)
58
requireDataFlowDependence exec order
Data Dependence Thru the Register-Space
![Page 59: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/59.jpg)
59
requireCoherence exec order
Just Plain-Old Coherence
but for TWO WRITES falling in the WB or UC space
and for EITHER Two Local Writes OR two Remote Writes in the same processor
![Page 60: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/60.jpg)
60
requireReadValue exec order
Reads return Most Recent Writes
![Page 61: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/61.jpg)
61
requireAtomicWBRelease exec order
All Remote Events Stemming from the Same Release-Write Instruction appear to be an Atomic Set
![Page 62: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/62.jpg)
62
requireSequentialUC exec order
In the UC Space, Program-Ordered
UC Read and Write Events, both of which are Local
are ordered as per program order
(the two operations in question could be RR, RW, WR, or WW)
![Page 63: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/63.jpg)
63
requireNoUCBypass exec order
UC-space Operations Do Not Exhibit
Read Bypassing as in TSO
![Page 64: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/64.jpg)
64
requireCoherence exec order =!i j. i IN exec /\ j IN exec ==> isWr i /\ isWr j /\ (i.var = j.var) /\ order i j /\ ((attr_of i.var = WB) \/ (attr_of i.var = UC)) /\ ((i.wrType=Local) /\ (j.wrType=Local) /\ (i.proc=j.proc) \/ (i.wrType=Remote) /\ (j.wrType=Remote) /\ (i.wrProc=j.wrProc)) ==> !p q. p IN exec /\ q IN exec ==> isWr p /\ isWr q /\ (p.wrID = i.wrID) /\ (q.wrID = j.wrID) /\ (p.wrType = Remote) /\ (q.wrType = Remote) /\(p.wrProc = q.wrProc) ==> order p q
A MEMORY MODEL RULE IN HOL
![Page 65: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/65.jpg)
65
How do we know that the actual silicon matches the shared memory model ?
• Pray
• Run tests and manually check results
• ? What else ?
! X . X in exec ? Y . Y in exec …. ? ! /\ … \/ ….
?
One use we have put our Spec to:Post-Si Verification of MP Systems…
![Page 66: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/66.jpg)
66
st8 [12ca20] = 7f869af546f2f14cld8 r25 = [45180] <87b5e547172644a8>ld2 r26 = [2c2a2c] <44a8>ld2 r27 = [45aa2a] <c58e>…
FORMALLY VERIFY “interesting” EXECUTIONS
st8 [45180] = 87b5e547172644a8ld8 r25 = [45180] <87b5e547172644a8>st2 [2c2a2c] = 44a8st2 [45aa2a] = c58e…
P1’s exec
P2’s exec
…
![Page 67: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/67.jpg)
67
TWO APPROACHES: - explicitly QB - implicitly QB
“BOOLIFY”
CONVERTTO
EXECUTIONCHECKERPROGRAM
SPEC OFMEMORY MODELIN hol
Given Execution
QBF
PROGRAM
Given Execution
SATPROBLEM
(Prototyped this; but definitely need to re-code this…)
![Page 68: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/68.jpg)
68
The alternative is to produce a manual proof:
P
st [x] = 1
mf
ld r1 = [y] <0>
Rld . acq r2 = [y] <1>
ld r3 = [x] <0>
Q
st . rel [y] = 1
Atomicity of st.rel
Load of initial valueis before store ofevery other value
Even this simple “Litmus Test” has a 1-page detailed proof
![Page 69: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/69.jpg)
69
The MPEC Tool Flow:
Itanium Ordering rules in HOL
MechanicalProgram Derivation(to be automated)
Checker Program
Satisfiability Problem with Clauses carrying annotations
Sat Solver
SatUnsat
Explanationin the form ofone possibleinterleaving
Unsat CoreExtraction using Zcore
P
st [x] = 1
mf
ld r1 = [y] <0>
R
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
Q
st.rel [y] = 1
• Find Offending Clauses• Trace their annotations• Determine “ordering cycle”
MP execution
to be verified
RECENT WORK
![Page 70: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/70.jpg)
70
Largest example tried to date (courtesy S. Zeisset, Intel)
Proc 1
st8 [12ca20] = 7f869af546f2f14cld r25 = [45180] <87b5e547172644a8>
… 58 more instructions…
st2 [7c2a00] = 4bca
Proc 2
ld4 r24 = [733a74] <415e304>st4.rel [175984] = 96ab4e1f
… 67 more instructions…
ld8 r87 = [56460] <b5c113d7ce4783b1>
• Initially the tool gave a trivial violation
• Diagnosed to be forgotten memory initialization
• Added method to incorporate memory initialization in our tool
• Our tool found the exact same cycle as pointed out by author of test
Cycle found thru our tool:
st.rel (line 18, P1) ld (line 22, P2) mf ld (line 30, P2) st (line 11, P1)
![Page 71: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/71.jpg)
71
Statistics Pertaining to Case Study
• 140 total instructions
• All runs were on a 1.733 GHz 1GB Redhat Linux V9 Athlon
• 1 minutes to generate Sat instance
• 9M clauses ( O(n^3) in terms of instructions ) • 117,823 variables ( not a problem )
• ~1 minute to run Sat (unsat here) – 0.2 sec to do “real work”
• Zcore runs fast – gave 23 clauses in one iteration
![Page 72: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/72.jpg)
72
Overview of MPEC:
• Example of how a HOL rule was turned into a SAT generator
• How the SAT part was done
Throwing an efficient “transitivity blanket” over a
problem to cover it with whatever transitivity it begs for !!
• What more to expect• Related work
![Page 73: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/73.jpg)
73
Gist of constraints :
• Some arrangements are statically known :
• Others are conditional : Implies and
• Some must form an atomic set : Everybody elseStrictly before orStrictly after.
• Many are unordered :
• Find a strict total order satisfying all the above !
![Page 74: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/74.jpg)
74
Gist of constraint ENCODING :
Implies and
1
1
N
1 1 N
i
j
1
• Use Boolean precedence matrix • Capture “i before j” by m_ij
Unit clauses
Boolean formula
See how SAT-generator is derived
Spew out irreflexivity and totality axioms Then throw a “transitivity blanket” on top of all tuples
Strict total order :
Atomic set :
Statically known :
![Page 75: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/75.jpg)
75
* Small Domain method (n logn encoding)
- Generates fantastically hard SAT problems!
- Chokes many SAT solvers – Zchaff-II can handle it well
* Incremental SAT (see CAV’04)
* QBF version : initial prototype needs lots of work - can serve to provide good QBF benchmarks…..
Other Approaches Tried:
![Page 76: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/76.jpg)
76
Approaches to “transitivity blanket”
Naïve : For all tuples i, j, and k, generate
m_ij /\ m_jk m_jk
Too many clauses (1B for a 1000-tuple program)
Better: Obtain transitive-closure of known orderings and then prune irrelevant parts of the blanket
E.g., if ~m_ij is known, don’t generate
m_ij /\ … … as well as … /\ m_ij …
![Page 77: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/77.jpg)
77
Obtaining SAT-generator from HOL
atomicWBRelease(exec,order) = forall (i in exec).(j in exec).(k in exec). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ order(i,j) /\ order(j,k) ==> (j.wrID = i.wrID)
atomicWBRelease(exec,order) = forall (i in exec).(j in exec).(k in exec). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
atomicWBRelease(exec,order) = forall (i in exec). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in exec). (i.wrID = k.wrID) ==> forall (j in exec). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
Initial Spec
Applying Contrapositive
After Reducing quantifier Scopes
![Page 78: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/78.jpg)
78
atomicWBRelease(exec,order) = forall (i in exec). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in exec). (i.wrID = k.wrID) ==> forall (j in exec). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
atomicWBRelease(exec) = forall(i,exec,wb(i))
wb(i) = if ~((attr_of i.var=WB) & (i.op=StRel) & (i.wrType=Remote) then true else forall(k,exec,wb1(i,k))
wb1(i,k) = if ~(i.wrID=k.wrID) then true else forall(j,exec,wb2(i,k,j))
wb2(i,k,j) = if (j.wrID=i.wrID) then true else ~(order(i,j) & order(j,k)) forall(i,S, e(i)) = for all i in S : e(i) (* foldr( map (fn i -> e(i)) (S) (&), true) *)
Transformed Spec
Functional Program that generates the constraints (will be automated)
…Obtaining SAT-generator from HOL
![Page 79: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/79.jpg)
79
Clause annotations for the unsat core for example
op1 = 1; op2 = -1; op3 = -1; op4 = -1; rule = Reflexiveop1 = 4; op2 = 5; op3 = 6; op4 = -1; rule = TransitiveOrderop1 = 4; op2 = 5; op3 = -1; op4 = -1; rule = ProgramOrderop1 = 4; op2 = 6; op3 = 8; op4 = -1; rule = TransitiveOrderop1 = 4; op2 = 11; op3 = 12; op4 = -1; rule = TransitiveOrderop1 = 5; op2 = 6; op3 = -1; op4 = -1; rule = ProgramOrderop1 = 6; op2 = 8; op3 = -1; op4 = -1; rule = TotalOrderop1 = 10; op2 = 11; op3 = -1; op4 = -1; rule = TotalOrderop1 = 11; op2 = 4; op3 = 8; op4 = -1; rule = TransitiveOrderop1 = 11; op2 = 4; op3 = -1; op4 = -1; rule = TotalOrderop1 = 11; op2 = 12; op3 = -1; op4 = -1; rule = ProgramOrderop1 = -1; op2 = -1; op3 = -1; op4 = -1; rule = NoRuleop1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 6; op2 = 8; op3 = -1; op4 = -1; rule = ReadValueop1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = -1; op2 = -1; op3 = -1; op4 = -1; rule = NoRuleop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = 10; op3 = -1; op4 = -1; rule = ReadValue
op1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = 10; op3 = -1; op4 = -1; rule = ReadValueop1 = -1; op2 = -1; op3 = -1; op4 = -1; rule = NoRuleop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = 4; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = -1; op2 = -1; op3 = -1; op4 = -1; rule = NoRuleop1 = 10; op2 = 12; op3 = -1; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = -1; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 10; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 9; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBReleaseop1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBRelease
![Page 80: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/80.jpg)
80
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
denotes an op
Denotes op numbers. Store has both local and remote exec
Building an Error-trail for UNSAT (infeasible executions) :
![Page 81: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/81.jpg)
81
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 4; op2 = 5; op3 = -1; op4 = -1; rule = ProgramOrder
Building an Error-trail…
![Page 82: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/82.jpg)
82
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 5; op2 = 6; op3 = -1; op4 = -1; rule = ProgramOrder
Building an Error-trail …
![Page 83: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/83.jpg)
83
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValue
op1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValue
op1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = R eadValue
op1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValue
op1 = 6; op2 = 8; op3 = -1; op4 = -1; rule = ReadValue
op1 = 6; op2 = -1; op3 = -1; op4 = -1; rule = ReadValue
Building an Error-trail …
![Page 84: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/84.jpg)
84
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 10; op2 = 12; op3 = -1; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = -1; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 10; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 9; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBRelease
op1 = 10; op2 = 11; op3 = 8; op4 = -1; rule = AtomicWBRelease
Building an Error-trail …
![Page 85: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/85.jpg)
85
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = 10; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 11; op2 = 10; op3 = -1; op4 = -1; rule = ReadValue
Building an Error-trail …
![Page 86: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/86.jpg)
86
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 11; op2 = 12; op3 = -1; op4 = -1; rule = ProgramOrder
Building an Error-trail …
![Page 87: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/87.jpg)
87
1 2 3 4
5
6
7 8 9 10
12
11
st [x] = 1
mf
ld r1 = [y] <0>
st.rel [y] = 1
ld.acq r2 = [y] <1>
ld r3 = [x] <0>
op1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = 4; op3 = -1; op4 = -1; rule = ReadValueop1 = 12; op2 = -1; op3 = -1; op4 = -1; rule = ReadValue
Building an Error-trail …
![Page 88: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/88.jpg)
88
HOLRulesFor
ItaniumIn a HOL
Theory File
ZcoreCORE Extractor
“Explain”Error ExplainerAnd DOT fileGeneratorGhostView
AnMPECcable
OcamlProgram
MPEC (MP Execution Checker) Tool Demo
Gentuple AssemblerSAT Converter
Zchaff-II or other
Ganesh sittingdown and coding
Printout of CycleRevealing Error
SAT Result
SAT(GivesInterleaving)
UNSAT
![Page 89: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/89.jpg)
89
Other Tools Developed in UV Group
• Yue (Jason) Yang’s Dissertation webpage
• Itanium Litmus-test Checker in Constraint Prolog
• NemosFinder – Easily Parameterizable Litmus-Checker Suite in Constraint Prolog
• UMM Tool – Easily Parameterizable Murphi Operational Model for writing Operational Specs of Memory Models
• DefectFinder – Demo Prototype of Memory-model Aware Race Analyzer
• Now at MSR (www.cs.utah.edu/~yyang/) -- now [email protected]
![Page 90: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/90.jpg)
90
Part 3: What’s not apparent at first glance
![Page 91: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/91.jpg)
91
Topics:
* Formal verification approaches to memory consistency compliance
* How to model the interface of the shared memory?
- Execution based
- IO mappings based
* What is wrong if an Execution based approach is chosen ?
- Finite-state realizability
* A transducer-based model of shared memory
- Highlights of results
* Whither undecidability ?
![Page 92: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/92.jpg)
92
Formal Verification Approaches:
• Several paper-and-pencil proofs
• Arons (pvs-based)
• McMillan (CTL model-checking based)
• Nalumasu et.al. (Test Automata based)
• Qadeer (1. Finding a serializer. 2. Automated for simple write order)
• Bingham et.al. (Window observer based)
Spec ofShared Memory
Consistency Model
Imp ofShared Memory
Consistency Model(a protocol)
Agreement
![Page 93: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/93.jpg)
93
Other Formal Approaches:
• Park, Dill, Nowatzyk
• Pong and Dubois (several papers)
• Collier’s work
• Ghughal’s adaptation of above for weak memory models
• Chatterjee (CAV’02)
• Yu, Tuttle, Lamport
• Shen, Arvind
• Ahamad, Neiger
• (Check webpage of MPV’00 www.cs.utah.edu/mpv )
• Steinke and Nutt
• Gibbons, Gharachorloo
• Adve, Pugh
• … (a survey will take too long)
![Page 94: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/94.jpg)
94
Modeling the Interface of Shared Memory:
• Trace Based
- Most existing works
• IO Mappings Based
- The original Lazy-caching paper (casual use)
- Kawash and Higham (defines Specs this way;
Implementations not addressed)
- Sezgin et.al. – (defines Specs and Imps + Correspondence)
Spec Imp
Read(proc, addr, data),Write(proc,addr,data), …
Spec Imp
Read_o(proc, addr, data), Write_o(proc,addr,data), …
Read_i(proc, addr), Write_i(proc,addr,data), …
![Page 95: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/95.jpg)
95
What’s “wrong” with trace-based approaches?
• Permits making statements about uninteresting or unrealizable machines
• Muddies exact import of the famous “undecidability result” (Alur et.al)
![Page 96: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/96.jpg)
96
Example 1: Finiteness cannot be adequatelydescribed thru regular sets of executions alone…
Consider the set of executions w(1,a,2) r(1,a,1)* r(2,a,2)* w(2,a,1) -- defines the TEMPORAL order of events
All these are considered SC because we can build a LOGICAL order w(1,a,2) r(2,a,2)* w(2,a,1) r(1,a,1)*
But how can the above TEMPORAL order be generated by a FSM ?
P1 P2--- ---w(a,2) ; r(a,2) ; r(a,2) ;
r(a,1) ; …r(a,1) ; r(a,2) ;
… r(a,1) ; w(a,1) ;
![Page 97: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/97.jpg)
97
Example 1: … continued (take specific unravelling of *)
Temporal Order Logical Orderw(1,a,2) r(1,a,1)2N r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N w(2,a,1)2N r(1,a,1)
A FSM ImplementationOf Seq Consistency
With N Internal States
w(1,a,2) ;
w(1,a,2) ;
Program fedSo far …
Output generatedSo far …
A FSM ImplementationOf Seq Consistency
With N Internal States
w(1,a,2) ;w(1,a,2) ;{ r(1,a)K, r(2,a)L } ;
A FSM ImplementationOf Seq Consistency
With N Internal States
w(1,a,2) ;r(1,a,1) ;
w(1,a,2) ;{ r(1,a)K, r(2,a)L } ;NO w(2,a,1)
FAIL ! O/P w/o Input !!
![Page 98: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/98.jpg)
98
Example 1: … continued (take specific unravelling of *)
Temporal Order Logical Orderw(1,a,2) r(1,a,1)2N r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N w(2,a,1)2N r(1,a,1)
A FSM ImplementationOf Seq Consistency
With N Internal States
wo(1,a,2) ;
wi(1,a,2) ;
Program fedSo far …
Output generatedSo far …
A FSM ImplementationOf Seq Consistency
With N Internal States
wo(1,a,2) ;wi(1,a,2) ;{ ri(1,a)K, ri(2,a)L } ;
A FSM ImplementationOf Seq Consistency
With N Internal States
wo(1,a,2) ;wi(1,a,2) ;{ ri(1,a)K, ri(2,a)L } ;wi(2,a,1)
FAIL ! Too manyinputs w/o output
![Page 99: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/99.jpg)
99
Example 1: … continued (take specific unravelling of *)
Temporal Order Logical Orderw(1,a,2) r(1,a,1)2N r(2,a,2)2N w(2,a,1) w(1,a,2) r(2,a,2)2N w(2,a,1)2N r(1,a,1)
Labeled by
wi(1,a,2) ;
A FSM ImplementationOf Seq Consistency
With N Internal States
wo(1,a,2) ;wi(1,a,2) ;{ ri(1,a)K, ri(2,a)L } ;wi(2,a,1)
FAIL ! Too manyinputs w/o output
wi(1,a,2) ;
{ ri(1,a)K, ri(2,a)L } ;
We can “pump” this loop, thus making it possible to generatethe SAME execution for arbitrary long programs !!
![Page 100: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/100.jpg)
100
Restrictions in contemporary work that enables SC verification:
i.e. Temporal Orders of the form …
w(1,a,2) r(1,a,1)2N r(2,a,2)2N w(2,a,1)
• Bingham, Condon, Hu :
- Require Prefix Closure (“no outputs w/o input”) e.g. the trace of length 1 : r(1,a,1)
- Rule out Prophetic Inheritance
![Page 101: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/101.jpg)
101
Restrictions in contemporary work that enables SC verification:
• Qadeer :
- Requires Simple Write Ordering
The order of the writes to the same addressin the temporal order and the logical order must be the same
- (But they provide an automated model-checking based verification method for this class of SC protocols…)
Temporal Order: w(1,a,1); w(2,a,2); r(3,a,2); r(4,a,1)
Required Logical Order: w(2,a,2); r(3,a,2); w(1,a,1); r(4,a,1)
< diagram of Lazy Caching here >
![Page 102: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/102.jpg)
102
Taxonomy of formal “SC modeling” approaches:
• Alur et.al. :
- Not Necessarily Prefix Closed (NNPC) regular traces model the SC language
- Checking containment of the (regular) language of the Implementation is undecidable
• Bingham, Condon, and Hu :
- DSC trace set (Decisive Sequential Consistency)
• Sezgin’s work :
- Models memory systems using regular transducers
- Defines EXACTLY what finite-state realizable SC systems are
- SC verification is language containment
- Provides a semi-decision procedure for SC verification in this setting
![Page 103: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/103.jpg)
103
Example 2 (Sezgin) : The dangers of trace-based modeling
Imagine a memory system implementation that does this:
• Accept reads and writes• If the first |P| * |A| instructions are writes, and further these contain exactly one write by each processor to each address
THEN go into malevolent mode (disconnect the shared memory) ELSE go into benevolent mode (behave like serial memory)
M1 M2 Mn… Single Serial Memory Unit M
P1 P2 Pn
Malevolent ModeConnections
Benevolent ModeConnections
![Page 104: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/104.jpg)
104
Example 2 (Sezgin) …
Example : P = {1,2,3} and A={a} and D = {0,1,2}
w(1,a,2); r(3,a, 2); w(2,a,1); r(1,a, 1) ; …Benevolent Mode from now on,since the second instrn is a read…
w(1,a,1); w(3,a,2); w(2,a,0); r(1,a,1); r(2,a,0); r(3,a,2); w(1,a,2);
w(2,a,1); r(1,a,2); r(2,a,1); r(3,a,2); …
Malevolent Mode from now on,as we have p*a writes
M1 M2 Mn…Single Serial
Memory Unit M
P1 P2 Pn
w(1,a,1); r(1,a,1); w(1,a,2); r(1,a,2); w(2,a,0); r(2,a,0); w(2,a,1);
r(2,a,1); w(3,a,2); r(3,a,2); r(3,a,2); …
LOGICAL ORDER:
![Page 105: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/105.jpg)
105
Whoa? Any Logical Order will do?!
w(1,a,1); w(3,a,2); w(2,a,0); r(1,a,1); r(2,a,0); r(3,a,2); w(1,a,2);
w(2,a,1); r(1,a,2); r(2,a,1); r(3,a,2); …
w(1,a,1); r(1,a,1); w(1,a,2); r(1,a,2); w(2,a,0); r(2,a,0); w(2,a,1);
r(2,a,1); w(3,a,2); r(3,a,2); r(3,a,2); …
LOGICAL ORDER:
TEMPORAL ORDER:
• A Logical Order had better be not fiction… it should be a possible schedule in a “could have happened” sense
• Viewed from that angle, the above logical order is nonsense because it allows certain actions to be postponed unboundedly
• Sezgin’s formal definition of Implementations builds in boundedness
• BCH address an instance of this in their “past-time SC” idea
• Sezgin’s SC machines give logical order out as Commit Order …
![Page 106: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/106.jpg)
106
Status of SC “undecidability”:
• Alur et.al. : UNDECIDABLE NNPC is
under NNPC unrealistic
• Qadeer : Decidable Simple Write Order
under simple write order rules out some
protocols
• Bingham, Condon, and Hu : Decidable under simple These don’t capture
write order; also in exactly those that
DSC_k are FS realizable
• Sezgin’s work : Decidability open Captures exactly the
class of FS realizable
protocols in a detailed manner
(“Input” or programs explicitly modeled)
![Page 107: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/107.jpg)
107
Concluding Remarks:
• Importance of topic unlikely to diminish
• Platform compliance is a big deal
• High-performance OS kernel writers need to know
• Think of proving a distributed Garbage Collector running on a Weak Memory Model (would be a great PhD topic)
• I’ve omitted too many important names I can’t even remember
• Partial list: Adve, Gharachorloo, Pugh, Arvind, Collier, …
![Page 108: Shared Memory Consistency Models : A broad survey Ganesh Gopalakrishnan* School of Computing, University of Utah, Salt Lake City, UT * Past work supported](https://reader037.vdocuments.us/reader037/viewer/2022110322/56649d4e5503460f94a2e244/html5/thumbnails/108.jpg)
108
Acknowledgements (sorry for omissions):
• Past students / postdoc : Nalumasu, Ghughal, Mokkedem, Hosabettu, Jones, Sivaraj, Yang, Yang, Kuramkote
• Faculty colleagues : Lindstrom, Slind, Carter
• Funding agencies : NSF, SRC
• Industrial Liaisons : Corella, Chou, German, Vaid, Neiger, Zeisset, Park
• Other favorable influences : Mathews, Tuttle, Yu, Joshi, Dill, Pong, Nowatzyk, Lamport, Hu, Condon, Higham, Kawash, Jackson
• Who am I forgetting?