examples of shared memory systems

52
SRC Task 1031.001 Verification of Shared Memory Consistency (Models and) Protocols Start Date : September 2002 Third Year Annual Review, Boulder CO, March 29, 2005 Ganesh Gopalakrishnan (PI) Konrad Slind (Co-PI )

Upload: kagami

Post on 15-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

SRC Task 1031.001 Verification of Shared Memory Consistency (Models and) Protocols Start Date : September 2002 Third Year Annual Review, Boulder CO, March 29, 2005 Ganesh Gopalakrishnan (PI) Konrad Slind (Co-PI ). Examples of shared memory systems. ASCI-White:. (Photo courtesy LLNL / IBM). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Examples of shared memory systems

SRC Task 1031.001

Verification of Shared Memory Consistency (Models and) Protocols

Start Date : September 2002

Third Year Annual Review, Boulder CO, March 29, 2005

Ganesh Gopalakrishnan (PI)Konrad Slind (Co-PI )

Page 2: Examples of shared memory systems

2

Examples of shared memory systems

(Photo courtesy LLNL / IBM)

http://www.theinquirer.net/?article=12145(Article by Nebojsa Novakovic: Thursday 16 October 2003)NOVA HAS been to the Microprocessor Forum and captured this picture of POWER5 chief scientist Balaram Sinharoy holding this eight way POWER5 MCM with a staggering 144MB of cache. Sheesh Kebab!

8 x 2 cpus x 2-way SMT = “32 shared memory cpus” on the palm

Released in 2000-- Peak Performance : 12.3 teraflops. -- Processors used : IBM RS6000 SP Power3's - 375 MHz. -- There are 8,192 of these processors -- The total amount of RAM is 6Tb. -- Two hundred cabinets - area of two basket ball courts.

ASCI-White:

Power-5:

Page 3: Examples of shared memory systems

3

Motivation for (weak) Shared Memory Consistency models and Protocols

• Cannot afford to do eager updates across large SMP systems

• Delayed updates allow considerable latitude in memory consistency protocol design Weak memory models tend to reduce overall protocol complexity Future challenges:

* Multiple protocols that cooperate* Multicore chips

dir dir

Chip-level protocols

Inter-cluster protocols

Intra-cluster protocols

mem mem

Memory

CPU

time

performance

Page 4: Examples of shared memory systems

4

Objectives of our Research

• Verification of shared memory consistency protocols

• Emphasis is on correctness in two broad areas:– Shared Memory Consistency Models

• Also known as “Memory Models”– Shared Memory Consistency Protocols

• Subsumes Cache Coherence protocols

• Demonstrate results on realistic memory models as well as consistency protocols

Page 5: Examples of shared memory systems

5

Emphasis of our Research

• Protocols modeled at the asynchronous “rule” level

• Assume that these are highly optimized hand-crafted protocols for which synthesis is not (yet) an option

• We are yet to work on protocol engine (hardware) verification

Page 6: Examples of shared memory systems

6

Organization of Presentation

• 15-minute overview

• Then 15-minutes of a few specific details

• Finally an Appendix

• Presented in Chronological Order, emphasizing – Year of task performance (Y1, Y2, or Y3) in slide heading– Student involvement, graduation, and employment– How much supported by SRC– Highlights (papers, ideas, and tool development)

• Slides with“Y3” in the title include new work since last review

Page 7: Examples of shared memory systems

7

15-minute overview

Page 8: Examples of shared memory systems

8

Overview of Results to Date

6

5

4

3

2

1

#

None ; Dissertation available

Ali Sezgin (PhD’04; now Asst Prof, Atilim Univ, Turkey). Was NSF funded (not SRC)

Theoretical understanding Decidability + How to specify memory models

75%: Y1, Y2 25%: Y3

UCLID code used for 3 protocols available

Sudhindra Pandav (MS’05 ; just joined Intel Oregon)

Parameterized Cache Coherence Protocol Verification

10%: Y290%: Y3

MP Execution Checker (MPEC) available

Yu Yang (first-year PhD student) H. Sivaraj, PI

Shared Memory Consistency Execution Checker for Itanium

100%: Y3

Partial Order enabled Murphi (POeM) available

Ritwik Bhattacharya (PhD expected Sep’05)

Symbolic Partial Order Reduction

50%: Y1, Y2, 50%: Y3

Checkers in Murphi as well as Constraint Prolog available

Yue Yang (PhD’04; now with Microsoft, Redmond)

Specification of Memory Models

90%: Y1, Y210%: Y3

Parallel Murphi and also Distributed Random-walk Murphi available

Hemanthkumar Sivaraj (MS’04 ; now with Intel Corporation, Bangalore, India)

Distributed Model Checking

100%: Y1,Y2

SoftwareStudents or other researchers

Research TopicWhen Performed + % of effort

Page 9: Examples of shared memory systems

9

How our students have benefited from SRC Funding (* = students funded under SRC; others under contemporaneous NSF grant with complementary goals )

• Will join Intel Oregon for a 6-month internship on HW Verification (FPU)

PhD expected in Spring 2007Robert Palmer

• Worked on the MPEC tool • Working for PhD on verifying Multi- threaded Software

PhD expected Spring 2008 (?)

Yu Yang

Working with Steven German (IBM) and SeungJoon Park (Intel)

PhD expected in Fall 2005Ritwik Bhattacharya *(Steven German)

Joined Atilim University, Turkey (November ’04)

PhD finished in 2004Ali Sezgin

Joined Microsoft (July ’04)PhD finished in 2004Yue Yang *

• Joined Intel Oregon (March ’05)• Did one summer internship at Intel Santa Clara (Summer ’04)

MS finished in 2005Sudhindra Pandav *(Ching Tsun Chou)

Joined Intel Bangalore (July ’04)MS finished in 2004Hemanthkumar Sivaraj *(Ching Tsun Chou andKushagra Vaid)

Industrial Contacts / contributions to our overall efforts

Degree finished or pursuedStudent (Mentor)

Page 10: Examples of shared memory systems

10

1: Distributed Model Checking (Y1, Y2)

• See previous annual review slides for details

• Work in progress now:– Distributed Random-walk does not record error

trails • Emphasis: high state-generation rate + bug location

– Error-trails will be generated using Bounded Model-checking

• Symbolic techniques to reconstruct error trails, capitalizing on knowledge of error state

• Student involved: Xiaofang Chen (PhD student just shifting from systems research to FV)

Page 11: Examples of shared memory systems

11

2: Specification of Memory Models (Y1,Y2,Y3)

• Operational Approach (see appendix)– Abstract Machine Models

• Axiomatic Approach (see appendix)– Constraints that Characterize Legal Executions

• Theoretical understanding (see Sezgin’s dissertation)– Definition of memory models using transducers;

clarification of widely publicized undecidability results

Page 12: Examples of shared memory systems

12

2.3: A fully SAT-based approach – Y3

• Yu Yang ( != Yue Yang), Hemanthkumar Sivaraj, the PI– A parser for value-annotated Itanium MP Assembly Programs – A tool called MPEC to check value-annotated Itanium

assembly programs against the Itanium memory model– Generates SAT instances from given execution trace and the

memory model rules• If SAT, produce witness (interleaving)• If UNSAT, extract evidence (UNSAT Core + draw a cycle for

user annotated with which memory model rules were violated)

• Highlights– The MPEC tool (described next slide) was released– Papers: CAV’04

Page 13: Examples of shared memory systems

13

2.3: How MPEC Works:

Itanium Ordering rules in HOL

CheckerProgram Derivation(by hand, presently)

Checker Program

Satisfiability Problem with Clauses carrying annotations

Sat Solver

SatUnsat

Explanationin the form ofone possibleinterleaving

Unsat CoreExtraction using Zcore

P

st [x] = 1

mf

ld r1 = [y] <0>

R

ld.acq r2 = [y] <1>

ld r3 = [x] <0>

Q

st.rel [y] = 1

• Find Offending Clauses• Trace their annotations• Determine “ordering

cycle”

MP execution to be verified

Page 14: Examples of shared memory systems

14

2.3: …contd…

• What exactly has been achieved?– How to specify industrial memory models in higher order logic

– How well does SAT-based Checking of MP Executions work/scale

– How to extract Annotated Execution Cycles from UNSAT Cores

– A Preliminary Assessment of QBF (yet to continue)

– Code released to check Itanium Executions • 1700 lines of Ocaml• 5600 lines of C++

Page 15: Examples of shared memory systems

15

2.3: …contd…

• Potential uses / impact– MPEC (MP Execution Checker) can help understand the

Itanium memory model through “litmus tests”

– Scaling to larger execution lengths will require either• Debugging approach (not full verification) -- like the Sun TSO-Tool• Considerable SAT engineering needed (not justified now, unless a

member company shows definite interest)

– An MS student (Oystein Thorsen) at Michigan Technological University is porting MPEC to work for the “Unified Parallel C” memory model

• UPC is used to write shared memory code for Scientific Programming

– Collaboration with Lisa Higham (University of Calgary) initiated

Page 16: Examples of shared memory systems

16

3: Partial Order enabled Murphi (Bhattacharya’s PhD) – much of the work during Y3

i : 0…2 ; a : 0..3 ; b : array[0..2] of bool

Rule “1” :True a := (a+1) % 4

Rule “2” :True a := (a+2) % 4

Rule “3” : i >= 0 b[i+1] := True

Rule “4” : i <= 0 b[i+2] := False

Same state

Rule 1

Rule 2Rule 1

Rule 2

Any state S

Same state

Rule 3

Rule 4Rule 3

Rule 4

Any state S

(Final value of a is the same)

(Different array locationsget updated each time)

Tools such as SPIN will considerRule 1 and 2 non-commutingSame with Rule 3 and 4

PO Reduction through SAT-basedSymbolic Analysis

Page 17: Examples of shared memory systems

17

3: POeM (..contd)

• What exactly has been achieved ?– New tool POeM (Partial Order Enabled Murphi) developed

and is available• 2900 lines of Common Lisp and 1400 lines of Perl

– Technical Report detailing algorithm– One paper under submission (DAC’04); others in preparation

• A useful by-product of POeM:

– A tool Mu2SMV to translate Murphi to SMV (…Uclid is similar to SMV, so would be able to re-target)

– It can help create SMV / Uclid models– It also helps conduct more objective “explicit-state vs.

implicit-state” model-checking studies

Page 18: Examples of shared memory systems

18

3: …contd…

• Potential uses / impact due to POeM– Experimental evidence strongly suggests this tool can reduce

the number of states visited during model-checking • Memory is still a problem in explicit-state enumeration

• Further tool engineering will make it very usable– Reduce time to do symbolic analysis

• Use better SAT Methods– Path to UCLID has been implemented

• Use First-order Decision Procedures

– Reduce runtime overhead for model-checking• Engineer code of Murphi to reduce ample-set computation

overhead

– Combine Symmetry and Partial Order Reduction• Theoretically sound for safety ; will be an “industrial scale”

implementation if we succeed

Page 19: Examples of shared memory systems

19

3: Specific Contributions

• New Ample-set Computation Algorithm for Murphi – Basically reported during Y2 ; novelty during Y3

are:• The POeM Tool has been released

– being tried at Intel by Park and IBM by German

• It employs a novel “SAT Carry-over” idea• A new “C1 condition” will soon be incorporated

• In the “details” section, we will focus on

– SAT Carry-over (discussed as topic 3.1), and

– The new C1 condition (discussed as topic 3.2)

Page 20: Examples of shared memory systems

20

6: Parameterized Verification of Coherence Protocols – work done entirely during Y3

• Sudhindra Pandav’s MS work– Counterexample Guided Invariant Discovery method– Tailored for Cache Coherence Protocols

• Takes advantage of the “guard action” style of specification, and the syntax of the property being verified

• Filtering Heuristics to eliminate irrelevant predicates are based on the nature of directory-based protocols

– All experiments done in the UCLID framework

• Can be performed in any system that decides over a similar fragment of logic and generates concrete counterexamples

Page 21: Examples of shared memory systems

21

6: ..contd…

• What exactly has been achieved?– Methodology for Parameterized Cache Protocol Verification based on Counter-

example Guided Invariant Discovery

– Worked “straight out of the box” on a new protocol called the German Ring protocol

– Also applied to the German protocol and the FLASH protocol (modeling both mutual exclusion and data consistency)

– 9 invariants for the mutual exclusion property of German ; 2 more for data consistency (29 for Lahiri)

– 7 invariants for Mutex of FLASH ; 15 more for data consistency (more frugal invariants than in Park’s work using PVS)

– 2 days to model and verify the new German Ring protocol

Page 22: Examples of shared memory systems

22

15 minutes of details

Page 23: Examples of shared memory systems

23

Basics of POeM

• Accept a guarded command system of transitions– G1 A1

– G2 A2

– …

– Gn An

• Build independence matrix by checking enabledness and commutativity via SAT-based analysis

• At run-time, pick a transition and build its dependence closure

• Check C0, C1, C2, C3 conditions of the CGP book and generate reduced state-space– C1 is approximated

Enablednesspluscommutativity

Page 24: Examples of shared memory systems

3.1: Carry-Over of Independence Relations

• When does it work?– Systems parameterized over scalarset variables– Rulesets based on system parameters– Variables parameterized on system parameters

• Proof, details - [BGG05]

POeM

independent

Ruleset 1 Ruleset 2

POeM

independent

Ruleset 1 Ruleset 2

Page 25: Examples of shared memory systems

3.2: Improved Ample Set Heuristic

• Ideal : No path via the green triangle will ever “wake-up” the disabled red transitions

• Existing naïve C1– Make sure that the red triangle is empty- Causes too many C1 condition Violations (16k in German)

• Improved heuristic– Ensure that the red ones can be woken

up ONLY by the firing of the blue triangle of transitions

– Can pre-compute it -- like “independence”

– Experimental results are being awaited !!

Transition t

t’s dependency closure

Disabled dependents

Enabled independents

Page 26: Examples of shared memory systems

26

The POeM Tool and Follow-on Efforts

• The POeM tool will be engineered for higher efficiency

• We will be studying what other property might “carry over” similar to independence

Page 27: Examples of shared memory systems

27

6: Counterexample Guided Invariant Discovery – details of the work

Pun-proven

Pproven

Pun-proven={}“done” Y

N

Pick a propertyP from Pun-proven

Automated Decision

Procedure D

System model M

counterexample

CounterexampleAnalysis

Procedure

AuxiliaryInvariant

AddP

Page 28: Examples of shared memory systems

28

• Exploits structure of transition system specification– Rules of the form “guard action” or “g a” are assumed

• Exploits nature of property to be proved– Properties of the form “Antecedent Consequent”

• How the method works: – Start with most general state– Symbolically simulate a transition– Analyze failures exploiting structure of specifications– Construct auxiliary invariant– If strengthening involves too large a formula, use filtering

heuristics to keep it small• Filtering heuristics exploit nature of directory protocols

6: Highlights of the work

Page 29: Examples of shared memory systems

Structure of a counterexample

• Formally, a counterexample C can be expressed as a tuple <s, , t >, where

– s is the initial state interpretation– t is the next state interpretation– is the transition rule of form: g => a

• Depending on the structure of the counterexample, we construct an auxiliary invariant from

– those predicates in the property P, which have been violated

– predicates from the guard of the rule involved, and ITE (“if-then-else”) conditions in the action

Page 30: Examples of shared memory systems

Counterexample cases

• Counterexample C = <s, , t >

• Property P: X.A(X) => C(X)

• Since, the initial state satisfies the property and the next state violates it, we can classify counterexamples into three different classes.

– (s |= A, s |= C) (t |= A, t !|= C)

– (s !|= A, s !|= C) (t |= A, t !|= C)

– (s !|= A, s |= C) (t |= A, t !|= C)

Page 31: Examples of shared memory systems

Case I: (s |= A, s |= C) (t |= A, t !|= C)

s tg

AC

A~C

A => ~g

SC(A,s)

SC(g,s

)

• Candidate invariant generated: SC(A, s) => ~SC(g,s)

where SC is the satisfying core under interpretation s. SC can be easily computed from the structure of formula SC includes “relevant predicates”

Transition assigned variable of consequent; suppress this when the antecedent is initially true.

Page 32: Examples of shared memory systems

Case II: (s !|= A, s !|= C) (t |= A, t !|= C)

s tg

~A~C

A~C

a

retains

g => C

VC(C, s)

SC(g, s)

•Candidate invariant generated:

~VC(C,s) => ~SC(g, s)

where VC is the violating core under interpretation s. VC can be easily computed from the structure of formula

Pandav’s thesis explains more general versions of these ideas

Transition assigned variable of antecedent; suppress this when the consequent is initially false.

Page 33: Examples of shared memory systems

cid = i

Cache(j) = ex

Cache(i) = in

Example from the German protocol

• Property P: cache(i) = ex cache(j) = in ;

s t

g: ch2(cid) = grant_ex

a: cache(cid) :=ex

~(cache(j) = in) => ~(ch2(i) = grant_ex)

Ch2(i) = grant_ex Cache(i)=exCache(j)=ex

Aux. Invariant Generated

Page 34: Examples of shared memory systems

Other Heuristics: Consistency Requirement

• Verifying predicate of form (p = r) in the consequent.

• Counterexample of form:

s tg

Action: p := qq != r p != r

Candidate invariant of form:g => (q = r)

This simple heuristic was very useful in data consistency verification of GERMAN and FLASH.

Page 35: Examples of shared memory systems

Filtering Heuristics

• Guards of transition rules are complex, containing many predicates.– More the number of predicates in an auxiliary

invariant, more the number of counterexamples

• Need heuristics to filter out irrelevant predicates– Used in data consistency verification of FLASH– Heuristics are protocol dependent– Observed to be successful in handling three

directory based protocols • German, German Ring, FLASH

– Mutual Exclusion and Data Consistency verified

Page 36: Examples of shared memory systems

Filtering Heuristics (contd.)

• Based on rule-type that triggered counter-example• Directory protocols employ P-rules and N-rules

– P-Rule : Initiated by requesting node– N-Rule: Initiated by messages from network

(remote node)• Further classified into Remote Requests and

Grants• Our method describes heuristics based on

– message-type involved in counter-examples, and

– a ranking of variables

Page 37: Examples of shared memory systems

37

Overview of Results

Page 38: Examples of shared memory systems

German Ring: mutex verification

• Property to be verified:~((i!=j) & cache(i)=excl & cache(j)=excl))

• Result:– 3 invariants.– Uclid time: 1.08s– User time: a day

Page 39: Examples of shared memory systems

Comparison: GERMAN

• Mutual Exclusion (GERMAN-I)– Lahiri’s earlier manual proof had 25 auxiliary

invariants and took 47.93s of uclid time.– Ours: 9 auxiliary invariants with 6.02s

• Mutual exclusion (GERMAN-II)– Lahiri’s indexed predicate discovery method to

construct inductive invariant.• Automated - had 24 predicates, 143s uclid time

– Our method: manual, 13 predicates, 2.16s

Page 40: Examples of shared memory systems

Comparison: FLASH

Mutual Exclusion:• Automated predicate discovery method

based on computing weakest precondition (Lahiri) to generate inductive invariant

• Initial set of predicates: – From the property failed to converge to a

fixpoint– From our auxiliary invariants converged to

inductive invariant in 3 iterations.

Page 41: Examples of shared memory systems

Modeling tricks: divide rules

• UCLID cannot model nested ITE’s.• Transition rules contain nested ITE’s• Trick: Split the rules

R: if (c1) then b1 else if (c2) then b2 ;

• Split into two rules– R1: if (c1) then b2 ;

– R2: if (~c1 & c2) then b2 ;

Page 42: Examples of shared memory systems

Modeling tricks: quantified conditions in guards

• UCLID doesn’t allow quantifiers (, )in the description.• Conditions like Qx. a(x) present in the guards, where Q ∈{, } . Example: emptiness test of sh_list in

GERMAN.• Consider a predicate of form: x. a(x)• To model this, we introduce an auxiliary boolean variable

b. Replace x. a(x) by b.• Then introduce an axiom of form: x.(a(x) => b) Justification:

b x. a(x) => (x. a(x) ) => b => x.(a(x) => b)

Page 43: Examples of shared memory systems

43

Status of Invariant Discovery work

• More experience is necessary with our invariant discovery approach – Assess how easy in practice on a variety of

protocols

• Make the process intuitive to a designer– Support front-ends such as tabular descriptions or

scenarios

– Reflect error-trails back onto these descriptions

– Allow designers to understand verification state through controlled symbolic simulations

Page 44: Examples of shared memory systems

44

1) We are assembling a suite of examples including many public examples such as Grbic’s “multi-ring” protocol

2) We will finish the work on the random-walk checker, coupling it with an error-trace generator.

3) We’d like to work on using “scenarios” in cache protocol design

4) We will begin our work on hierarchical cache protocols

5) We’d like to return to the post-Si verification problem of “runtime verification under limited observability.”

6) Checking Memory Orderings

7) Liveness (perhaps as bounded safety, focussing on architectural elements responsible for liveness violation…?)

8) ABSTRACTION MECHANISMS !!

Concluding Remarks and Future Work

Page 45: Examples of shared memory systems

45

Appendix

Page 46: Examples of shared memory systems

46

2.1: The Operational Approach to memory model specification (mostly finished Y1, Y2; one dissertation finished Y3)• Developed by Yue Yang (PhD under the PI and

Lindstrom)– Defended PhD June’04 and working for Microsoft since July’04– Developed the Uniform Memory Model based on abstract

machines– Coded in Murphi (obtain executable memory model specs)– The UMM parameterized executable models can cover a whole

range of memory models• Coherence, PRAM, and Manson / Pugh Java shared memory

have all been specified in UMM

• Papers– Concurrency and Computation : Practice and Experience

(galley proofs of journal paper done)– Joint ACM Java Grande, 2002

Page 47: Examples of shared memory systems

47

2.1: …contd…

• What exactly has been achieved?– Understanding of how to structure operational

memory model specifications to cover wide range of memory models uniformly

– Murphi code of ~2800 lines demonstrating ideas

• Potential uses– Starting-point for developers of new memory

models– Our Itanium Operational model of ICCD’99 could

be specified in the UMM style if there is interest…

Page 48: Examples of shared memory systems

48

2.2: …contd…

• What exactly has been achieved?– Understanding of how to structure axiomatic

memory model specifications to cover wide range of memory models uniformly

– Constraint-Prolog code of ~6000 lines demonstrating ideas

• Potential uses– Understand memory models ; experiment with

them in the Constraint Logic-programming Context– Can develop a memory-model sensitive race

analyzer (ICFEM’04)

Page 49: Examples of shared memory systems

49

2.2: The Axiomatic Approach (mostly finished Y1, Y2; SAT approach + dissertation finished Y3)

• Constraint Prolog programs for• Classical Memory Models (Processor Consistency, etc.)• Itanium (without semaphores, IO space ops, or partial

writes)• Can add these if member companies interested

– Feasibility of SAT-based solution demonstrated (coding by PI)

• Highlights– The MPEC tool (described next slide) was a direct

an outcome– Papers: Charme’03 ; ICFEM’04 ; IPDPS’04 ; CSJP’04

Page 50: Examples of shared memory systems

50

3: A Symbolic PO Reduction Method for Rule-based Specifications of Protocols (Y2, Y3)

• Ritwik Bhattacharya’s PhD work– The industry employs “rule-based” specifications

for cache coherence protocols• Murphi (used by most industries)• TLA+ (used at least within Intel)

– Rule-based specifications capture the parallel “condition / action” style specification of cache coherency engines in Protocol Tables

• Specifies the concurrency in cache protocol engines naturally

• Specifies the low-latency “bursts” of computation:– Check local state and incoming messages– Produce outgoing messages and update local state

Page 51: Examples of shared memory systems

51

3: State Space Reduction Techniques in Murphi • Symmetry Reduction

– Canonicalize based on Scalar Sets

• Hash Compaction– Store only Hash Signatures of states

• Partial-Order Reduction is not available in Murphi– In a nutshell :

• If N concurrent actions can all be done in any order, then don’t force the model-checker to examine all N!

interleavings

– Difficult to syntactically determine when two rules commute– Existing PO Reduction algorithms are based on

• Syntactic checking of commutation between two transitions• Exploiting Sequential Process Structure

These don’t work with Murphi

Page 52: Examples of shared memory systems

52

My favorite example explaining the “C1” condition

Ch ! 3 Ch?x true

bugProcess P Process Q

• Moving P then Q finds the bug (right ample-set)

• Moving Q then P will miss the bug! Reason: there is a transition outside of ample set (namely Ch!3) whose move enables Ch?x that is dependent on the “true” move (part of if/fi). So in the full state-space there is a dependent move that occurs before ample-set move.

Ch empty in this state

Some transition