practical model-checking method for verifying correctness of mpi programs

28
Argonne National Laboratory School of Computing and SCI Institute, University of Utah Practical Model-Checking Method For Verifying Correctness of MPI Programs Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Robert Palmer School of Computing University of Utah Rajeev Thakur, William Gropp Mathematics and Computer Science Division Argonne National Laboratory

Upload: triage

Post on 20-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Practical Model-Checking Method For Verifying Correctness of MPI Programs. Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Robert Palmer School of Computing University of Utah. Rajeev Thakur, William Gropp Mathematics and Computer Science Division Argonne National Laboratory. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Practical Model-Checking MethodFor Verifying Correctness of MPI Programs

Salman Pervez, Ganesh Gopalakrishnan, Robert M. Kirby, Robert PalmerSchool of Computing

University of Utah

Rajeev Thakur, William GroppMathematics and Computer Science Division

Argonne National Laboratory

Page 2: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

• Concurrent algorithms are notoriously hard to design and verify.

• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle advantages of modeling checking approach:- Provides formal framework for reasoning- Allows coverage – examination of all possible process interleavings

• Principle challenges of modeling checking approach:- Requires modeling step- Can lead to “state explosion”

Thesis of the Talk

Thesis: In-Situ modeling checking with dynamic partial-order reduction provides the advantages of the

model checking approach while ameliorating the challenges.2/28

Page 3: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Why MPI is Complex: Collision of Features

– Send

– Receive

– Send / Receive

– Send / Receive / Replace

– Broadcast

– Barrier

– Reduce

– Rendezvous mode

– Blocking mode

– Non-blocking mode

– Reliance on system buffering

– User-attached buffering

– Restarts/Cancels of MPI Operations

– Non Wildcard receives– Wildcard receives– Tag matching – Communication spaces

An MPI program is an interesting (and legal)combination of elementsfrom these spaces

3/28

Page 4: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Conventional Debugging of MPI

• Inspection– Difficult to carry out on MPI programs (low level notation)

• Simulation Based– Run given program with manually selected inputs– Can give poor coverage in practice

• Simulation with runtime heuristics to find bugs– Marmot: Timeout based deadlocks, random executions– Intel Trace Collector: Similar checks with data checking– TotalView: Better trace viewing – still no “model checking”(?)– We don’t know if any formal coverage metrics are offered

4/28

Page 5: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

What is Model Checking?

Navier-Stokes Equations are a mathematical model of fluid flow physics

“V&V” – Validation and Verification“Validate Models, Verify Codes”

“Formal models” can be generated eitherautomatically or by a modeler whichtranslate and abstract algorithms

and implementations.

5/28

Page 6: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Related work on FV for MPI programs

• Main related work is that by Siegel and Avrunin

• Provide synchronous channel theorems for blocking and non-blocking MPI constructs– Deadlocks caught iff caught using synchronous channels

• Provide a state-machine model for MPI calls– Have built a tool called MPI_Spin that uses C extensions to

Promela to encode MPI state-machine

• Provide a symbolic execution approach to check computational results of MPI programs

• Define a static POR algorithm which ameliorates challenge 2.– Schedules processes in a canonical order– Schedules sends when receives posted – sync channel effect– Wildcard receives handled through over-approximation

6/28

Page 7: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Traditional Execution Checking Versus Model Checking

“Execution Checking”

“Model Checking”

In current practice, concrete executions on a few diverse platforms are often used to verifyalgorithms/codes.

Consequence: Many feasible executions mightnot be manifested.

Model checking forces all executions of a judiciously down-scaled model to be examined.

Current focus of our research: minimize modeling effort and error.

7/28

Page 8: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Solution – Runtime (i.e. “In Situ”) Model Checking

• Pioneered by Patrice Godefroid (at Bell labs) • Developed in the context of his Verisoft project. He called it Runtime model checking.

• Godefroid created the dynamic partial-order reduction algorithm in 2005

“In Situ” Model Checking

Fundamental challenges of model checking:• Model creation (and validation)• Managing state explosion

Ameliorate first challengeby running instrumentedversions of the code.

Ameliorate second challengeby pruning the state-space based upon independenceof operations.

8/28

Page 9: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Process 0 Process 1 Process 2 Process 3

Scheduler

Socket Communication

Our Contribution: In Situ Model Checker For MPI

ConsiderWildcard

Receives andTheir Interleaving

9/28

Page 10: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Code to handle MPI_Win_unlock (in general, this is how every

MPI_SomeFunc is structured…) MPI_Win_unlock(arg1, arg2...argN) {

sendToSocket(pID, Win_unlock, arg1,...,argN);

while(recvFromSocket(pID) != go-ahead)

MPI_Iprobe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD...);

return PMPI_Win_unlock(arg1, arg2...argN);

}

An innocuous Progress-Engine “Poker”Introduced for handling one-sided MPI

10/28

Page 11: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Current MPI Constructs Examined

• MPI Constructs Examined:– MPI_Init– MPI_Send– MPI_Ssend– MPI_Recv– MPI_Barrier– MPI_Finalize– MPI_Win_lock– MPI_Win_unlock– MPI_Put– MPI_Get– MPI_Accumulate

11/28

Required creating code whichcommunicated with scheduler.

Required understanding howthe progress engine worked with MPICH (with adjustmentsto the scheduler to employ thisinformation judiciously).

Page 12: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / NULL

Scheduler Options: P0:0 and P1:0

Scheduler Choice:

MPI One-Sided Example

12/28

Page 13: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / NULL

Scheduler Options: P0:0 and P1:0

Scheduler Choice: P1:0

MPI One-Sided Example

13/28

Page 14: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:0

Scheduler Options: P0:0 and P1:1

Scheduler Choice:

MPI One-Sided Example

14/28

Page 15: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:0

Scheduler Options: P0:0 and P1:1

Scheduler Choice: P1:1

MPI One-Sided Example

15/28

Page 16: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:1

Scheduler Options: P0:0 and P1:2

Scheduler Choice:

MPI One-Sided Example

16/28

Page 17: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:1

Scheduler Options: P0:0 and P1:2

Scheduler Choice: P1:2

MPI One-Sided Example

17/28

Page 18: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:2

Scheduler Options: P0:0 and P1:3

Scheduler Choice:

MPI One-Sided Example

18/28

Page 19: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:2

Scheduler Options: P0:0 and P1:3

Scheduler Choice: P1:3

MPI One-Sided Example

19/28

Page 20: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:3

Scheduler Options: P0:0 and P1:4

Scheduler Choice:

MPI One-Sided Example

20/28

Page 21: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: NULL / P1:4

Scheduler Options: P0:0

Scheduler Choice:

MPI One-Sided Example

21/28

Page 22: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: P0:0 / P1:4

Scheduler Options: P0:1

Scheduler Choice: P0:1 – P0:4

MPI One-Sided Example

22/28

Page 23: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Process P0 Process P1

Current Position: P0:4 / P1:4

Scheduler Options: P0:5 and P1:5

Scheduler Choice:

MPI One-Sided Example

Does it matter which choiceIt makes? Are these

independent?

23/28

Page 24: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Partial-Order Reduction

• With 3 processes, the size of an interleaved state space is p3=27

• Partial-order reduction explores representative sequences from each equivalence class

• Delays the execution of independent transitions

• In this example, it is possible to “get away” with 7 states (one interleaving)

24/28

Page 25: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Full = { … }Enabled = {…}Backtrack = {…}

Full = { … }Enabled = {…}Backtrack = {…}

Full = { … }Enabled = {…}Backtrack = {…}

Transition 1

Transition 2

Transition 3

Run the “instrumented” programto populate the full set of transitionsand the enabled set of transitions at each state.

Dynamic Partial-Order Reduction

Given enabled sets E, we want to find backset sets Bsuch that B is a proper subset of E and such that B capturesrepresentatives of all equivalentexecutions (under the notion ofIndependence)

25/28

Page 26: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

MPI Functions Dependence

MPI_Init

MPI_Send

MPI_Ssend

MPI_Recv

MPI_Barrier

MPI_Win_lock

MPI_Win_unlock

MPI_Win_free

MPI_Finalize

None

MPI_Send, MPI_Ssend, MPI_Recv

MPI_Send, MPI_Ssend, MPI_Recv

MPI_Send, MPI_Ssend

None

None

MPI_Win_unlock

None

None

Defining Dependence

26/28

Page 27: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

Program Number of Procs

Interleavings

without DPOR

Interleavings with DPOR

Byte-range (reduced depth)

2 2289 119

Byte-range

(full depth)

2 - 1522

Example Benefits: One-Sided Byte-Range Protocol

27/28

Page 28: Practical Model-Checking Method For Verifying Correctness of MPI Programs

Argonne National Laboratory School of Computing and SCI Institute, University of Utah

• Formal methods, and in particular finite-state model checking, provide a means of reasoning about concurrent algorithms.

• Principle challenges of modeling checking approach:- Requires modeling step- Can lead to “state explosion”

Both of which can be ameliorated by In-Situ Model Checking

Future Work:• Expand number of MPI Primitives (and the corresponding dependence table)• Exploit code-slicing to remove ancillary operations

Funding Acknowledgements:

• NSF (CSR–SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis)• Microsoft (Formal Analysis and Code Generation Support for MPI)• Office of Science – Department of Energy

Summary

28/28