poster chep2012 reduced_original1
TRANSCRIPT
![Page 1: Poster chep2012 reduced_original1](https://reader035.vdocuments.us/reader035/viewer/2022070323/55a04e4a1a28abe54d8b45c7/html5/thumbnails/1.jpg)
Analysing DIRAC's Behavior using Model Checking with Process Algebra
Motivation
Why Formal Methods?
Language & Toolset
From DIRAC to mCRL2
State-space generation
Analysis & Issues
Daniela Remenska - Jeff Templon - Tim Willemse - Henri Bal - Kees Verstoep - Wan FokkinkPhilippe Charpentier - Ricardo Graciani - Elisa Lanciotti - Krzysztof Daniel Ciba - Stefan Roiser
Some drawbacks... Abstraction of the "real" behavior is needed.This means one must build a sound model.
Expertise in formal methods and the systemdomain is necessary.
The state-space of the model can explode.
Actions: atomic building blockscan carry data parameters
Processes: composed of actions, using algebra operators
Built-in data types integers, booleans, lists, sets, bags
Abstract data types
Agents and storage become processes.
Control-flow is abstracted using mCRL2non-deterministic choice and if-then-else constructs.
States of entities are described usingcustom abstract data types.
Future WorkAutomate (to some degree) the translation from code to model.
Verification
Figure 3: State-space visualisation with LTSView
DIRAC backgroundproduction activities and user analysis for LHCb▪
distributed services and light-weight agents▪
"blackboard"or
"shared-memory"paradigm
jobs often get into incorrect (or inconsistent) states
staging requests become stuck
difficult to trace the root of such unexpected behavior many scenarios and components
manual intervention necessary
▪
▪
▪
▪
Based on process algebra lawsno ambiguity
Model checking tools full control over the execution of parallel processes. This way one gains more insight into the system behavior.
Stronger than testing
There are formal or systematic approaches to tackle this!
Automatically explore the entire state-space and check if some "interesting" properties hold.
DIRAC (Python) ~150000 loc
Abstracting the implementation dependson the focus of the analysis.
Check for race-conditions Agents update the state of shared entities.
Systems: Storage and Workload MgmtEntities: Jobs, Cache-Replicas, Tasks
Figure 1: DIRAC subsystems
Figure 2: Job state machine
Problems can be discovered while building and debugging the model:
Properties (Satefy / Progress / Deadlock)Model-checker automatically probes them.
Property violated: counter-example traceis provided.
Figure 5: State-transition visualisation with DiaGraphica
Conclusions
Formal methods are a more rigorous addition to testing, as a way to improve software quality.
A sound model needs to be written manually. This requires experienceand can be error-prone.
Similar techniques can be re-appliedto similar systems, once the learningcurve has lapsed.
Distributed systems are difficult to reason about; many components,all run in parallel.
▪
▪
Figure 6: Violation of progress and safety requirements
Figure 7: "Zombie" job starts running after being killed
Figure 4a: XSim simulator trace of a job workflow Figure 4b: DIRAC logging info of a job workflow