sireum/topi ldp: a lightweight semi-decision procedure for...

15
Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for Optimizing Symbolic Execution-based Analyses * Jason Belt Kansas State University [email protected] Robby Kansas State University [email protected] (Last updated: April 6, 2009) Xianghua Deng Penn State Univ. – Harrisburg [email protected] ABSTRACT Automated theorem proving techniques such as Satisfiabil- ity Modulo Theory (SMT) solvers have seen significant ad- vances in the past several years. These advancements, cou- pled with vast hardware improvements, have drastic impact on, for example, program verification techniques and tools. The general availability of robust general purpose solvers have reduced a significant engineering overhead when de- signing and developing program verifiers. However, most solver implementations are designed to be used as a black box, and due to their aim as as general purpose solvers, they often miss optimization opportunities that can be done by leveraging domain-specific knowledge. This paper presents our effort to leverage domain-specific knowledge for optimizing symbolic execution (SymExe)-based analysis; we present optimization techniques incorporated as a lightweight semi-decision procedure (LDP) that provides up to an order of magnitude faster analysis time when ana- lyzing realistic programs and well-known algorithms. LDP sits in the middle between a SymExe-based analysis tool and an existing SMT solver; it aims to reduce the number of solver calls by intercepting them and attempting to solve constraints using its lightweight deductive engine. 1 Introduction The significant advances of automated decision procedures or Satisfiability Modulo Theories (SMT) [23] solvers have made a broad and deep – often touted as disruptive, impact on the development of, for example, software verification techniques. The field has matured to the point where robust tools have been developed ready to be used/integrated, and a competition to determine the fastest of them all on a set of benchmark is being held annually. The availability of gen- eral purpose solvers have reduced a significant engineering overhead in formal methods research. However, it is often the case that algorithms targeting specific domains are more efficient because they can lever- age some inherent properties of the domains (e.g., [15, 2, 7]). Unfortunately, most solvers are designed as a black box with a fixed Application Programming Interface (API), thus limiting the interaction between solvers and their clients. This makes it difficult to incorporate domain-specific opti- mizations inside the solvers’ engines. On the other hand, * This work was supported in part by the US National Sci- ence Foundation (NSF) award 0709169 and CAREER award 0644288, and by the US Air Force Office of Scientific Re- search (AFOSR). developing a domain-specific SMT solver is a non-trivial en- gineering task with significant resource investment. Thus, we are interested in investigating whether there is a middle ground – a lightweight technique that sits in the middle of a SMT solver and its client to optimize the per- formance of the whole system. The goal of the lightweight technique is to reduce the accumulated time cost of using the SMT solver by making fast decision on certain classes of queries. Below are our design decisions and guidelines when developing such a technique: Sound: Given a formula, a decision procedure (satisfi- ability checker) can determine whether the constraint is either satisfiable (SAT), unsatisfiable (UNSAT), or UN- KNOWN (if the algorithm cannot answer one way or the other). A sound decision procedure does not pro- duce wrong SAT or UNSAT answers, but it may give UN- KNOWN answers for formulae that are actually SAT or UNSAT. However, a sound decision procedure that gives the UNKNOWN answer too often is not particularly use- ful. Thus, other criteria are needed. Efficient: The lightweight technique aims to reduce the overall analysis cost without duplicating much of the func- tionality of existing decision procedures. Thus, it has to be efficient. However, efficient is a loose term. To be pre- cise, we use the linear time/space complexity with respect to some program/analysis characteristics as a guideline. This helps to ensure that the technique does not intro- duce undesirable overheads. Effective: Due to the focus on using linear algorithms, such an approach is necessarily incomplete (i.e., impre- cise) for most interesting classes of problems. Precisely characterizing this imprecision is difficult. Thus, we em- ploy empirical methods for measuring effectiveness. In our research context, the solver’s client is the Sireum/Ki- asan static analysis framework [11] for Java [25] that can check complex Java Modeling Language (JML) contracts [17]. Kiasan is a symbolic execution (SymExe)-based analysis that requires a SMT solver for checking constraint satisfiability (arithmetic, subtyping, etc.). In this work, we focus on con- straint on scalars; Kiasan uses its lazier# initialization algo- rithm for handling heap objects/arrays [11], while subtyping constraints can be handled using an uninterpreted function with axioms for establishing its partial order properties. Contributions: The contributions of this paper are: A novel and effective lightweight decision procedure (LDP) for constraints on scalars that complements existing de- cision procedures. LDP builds upon mature approaches, but customized for SymExe.

Upload: others

Post on 03-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

Sireum/Topi LDP: A Lightweight Semi-Decision Procedurefor Optimizing Symbolic Execution-based Analyses∗

Jason BeltKansas State University

[email protected]

RobbyKansas State [email protected]

(Last updated: April 6, 2009)

Xianghua DengPenn State Univ. – Harrisburg

[email protected]

ABSTRACT

Automated theorem proving techniques such as Satisfiabil-ity Modulo Theory (SMT) solvers have seen significant ad-vances in the past several years. These advancements, cou-pled with vast hardware improvements, have drastic impacton, for example, program verification techniques and tools.The general availability of robust general purpose solvershave reduced a significant engineering overhead when de-signing and developing program verifiers. However, mostsolver implementations are designed to be used as a blackbox, and due to their aim as as general purpose solvers, theyoften miss optimization opportunities that can be done byleveraging domain-specific knowledge.

This paper presents our effort to leverage domain-specificknowledge for optimizing symbolic execution (SymExe)-basedanalysis; we present optimization techniques incorporated asa lightweight semi-decision procedure (LDP) that providesup to an order of magnitude faster analysis time when ana-lyzing realistic programs and well-known algorithms. LDPsits in the middle between a SymExe-based analysis tooland an existing SMT solver; it aims to reduce the numberof solver calls by intercepting them and attempting to solveconstraints using its lightweight deductive engine.

1 Introduction

The significant advances of automated decision proceduresor Satisfiability Modulo Theories (SMT) [23] solvers havemade a broad and deep – often touted as disruptive, impacton the development of, for example, software verificationtechniques. The field has matured to the point where robusttools have been developed ready to be used/integrated, anda competition to determine the fastest of them all on a set ofbenchmark is being held annually. The availability of gen-eral purpose solvers have reduced a significant engineeringoverhead in formal methods research.

However, it is often the case that algorithms targetingspecific domains are more efficient because they can lever-age some inherent properties of the domains (e.g., [15, 2,7]). Unfortunately, most solvers are designed as a black boxwith a fixed Application Programming Interface (API), thuslimiting the interaction between solvers and their clients.This makes it difficult to incorporate domain-specific opti-mizations inside the solvers’ engines. On the other hand,

∗This work was supported in part by the US National Sci-ence Foundation (NSF) award 0709169 and CAREER award0644288, and by the US Air Force Office of Scientific Re-search (AFOSR).

developing a domain-specific SMT solver is a non-trivial en-gineering task with significant resource investment.

Thus, we are interested in investigating whether there isa middle ground – a lightweight technique that sits in themiddle of a SMT solver and its client to optimize the per-formance of the whole system. The goal of the lightweighttechnique is to reduce the accumulated time cost of usingthe SMT solver by making fast decision on certain classes ofqueries. Below are our design decisions and guidelineswhen developing such a technique:

• Sound: Given a formula, a decision procedure (satisfi-ability checker) can determine whether the constraint iseither satisfiable (SAT), unsatisfiable (UNSAT), or UN-KNOWN (if the algorithm cannot answer one way orthe other). A sound decision procedure does not pro-duce wrong SAT or UNSAT answers, but it may give UN-KNOWN answers for formulae that are actually SAT orUNSAT. However, a sound decision procedure that givesthe UNKNOWN answer too often is not particularly use-ful. Thus, other criteria are needed.• Efficient: The lightweight technique aims to reduce the

overall analysis cost without duplicating much of the func-tionality of existing decision procedures. Thus, it has tobe efficient. However, efficient is a loose term. To be pre-cise, we use the linear time/space complexity with respectto some program/analysis characteristics as a guideline.This helps to ensure that the technique does not intro-duce undesirable overheads.• Effective: Due to the focus on using linear algorithms,

such an approach is necessarily incomplete (i.e., impre-cise) for most interesting classes of problems. Preciselycharacterizing this imprecision is difficult. Thus, we em-ploy empirical methods for measuring effectiveness.

In our research context, the solver’s client is the Sireum/Ki-asan static analysis framework [11] for Java [25] that cancheck complex Java Modeling Language (JML) contracts [17].Kiasan is a symbolic execution (SymExe)-based analysis thatrequires a SMT solver for checking constraint satisfiability(arithmetic, subtyping, etc.). In this work, we focus on con-straint on scalars; Kiasan uses its lazier# initialization algo-rithm for handling heap objects/arrays [11], while subtypingconstraints can be handled using an uninterpreted functionwith axioms for establishing its partial order properties.

Contributions: The contributions of this paper are:

• A novel and effective lightweight decision procedure (LDP)for constraints on scalars that complements existing de-cision procedures. LDP builds upon mature approaches,but customized for SymExe.

Page 2: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

1 i n t min ( i n t a , i n t b ) {2 i n t z = a ;3 i f ( b < a )4 z = b ;5 i f ( z > a | | z > b )6 a s s e r t f a l s e ;7 r e t u r n z ;8 }

〈α, β, 0, T 〉2��〈α, β, α, T 〉

3,Tvvnnnnn 3,F

((PPPPP

〈α, β, α, β < α〉4uukkkkkk 〈α, β, α, β ≥ α〉

5,Tvvnnnnn

5,F ;7

��〈α, β, β, β < α〉5,T

wwnnnnn 5,F ;7))

〈α, β, α, F 〉

〈α, β, β, F 〉 〈α, β, β, β < α〉 〈α, β, α, β ≥ α〉Figure 1: A Symbolic Execution Example

• Investigations on foundational properties of LDP: its sound-ness and complexity.• An efficient implementation of the approach – Sireum/-

Topi LDP that is relatively easy to maintain by employingmodel-based engineering and generative programming.• Rigorous empirical evaluation of LDP via randomized dou-

ble testing with an existing solver, and an extensive setof benchmark illustrating the effectiveness of LDP. In ourexperiments, we observed LDP provides up to 20 timesfaster analysis time (e.g., from 2.5 hours to 7.7 minutes).We found that LDP is very effective for a particularlylarge class of programs, as well as significantly benefitingassertion checking of their deep semantical properties.

Organization: The rest of the paper is organized as fol-lows. Section 2 presents some background that sets the re-search context motivating the proposed approach. Section 3presents LDP by first describing some key observations thatit takes advantages of, followed by presentation of LDP algo-rithms and discussion of their implementation. Section 4 de-scribes foundational and empirical evaluations of LDP, withdiscussion on a class of programs benefited by using LDP.Section 5 discusses related work, and Section 6 concludes.

2 Background

Symbolic execution (SymExe) was introduced as a techniquefor program testing and debugging [16]. The strength ofSymExe is it can analyze open programs with unknownvalues which are represented as symbolic values. Duringthe execution, SymExe discovers relationships among val-ues as constraints that hold for the execution path. WhenSymExe encounters a branching point, it explores all fea-sible branches with respect to its accumulated constraints.Thus, SymExe analysis forms a symbolic computation tree.Figure 1 illustrates the symbolic computation tree of anexample method min (each tree node is a symbolic state〈a, b, z,PC〉 that associates a symbol/concrete value to a, b, z,and predicate PC that constrains the symbols).

When symbolically executing min with no initial infor-mation about its arguments, SymExe assigns symbols forunknown values in the initial state of min; suppose it usesa symbol α for a, symbol β for b, 0 for z, and the accumu-lated constraint PC set to true (no constraints have beenimposed). When executing line 3, SymExe does not havesufficient information to decide which branch to take be-cause PC ∧ (β < α) and PC ∧ ¬(β < α) are both satisfiableunder the current symbolic state – thus, both branches areexplored. As each branch is traversed, the predicate is aug-mented with a constraint corresponding to the logical con-dition that would have caused the particular branch to befollowed. Thus, the predicate PC is often referred to as thepath condition, because it represents the conditions on vari-ables that would be necessary for execution to flow down aparticular path. If the PC becomes false, the path is infea-sible, hence it can be safely abandoned. The min example

shows that the true branch of line 5 is always infeasible. Anassignment of concrete values to the symbols satisfying PCcan be used to form a test case that drives execution alongthe corresponding path.

Kiasan adopts the above SymExe algorithm for Java byte-code; that is, Kiasan is essentially a symbolic virtual ma-chine for Java. A formal presentation of Kiasan algorithmsare presented in [11], and they are relatively sound and com-plete with respect to bytecode concrete executions.

Termination of SymExe: SymExe, in essence, exploresa program’s execution paths without merging states. Thus,in the presence of diverging loops/recursions, the depth ofthe program symbolic computation tree is infinite. To ensureSymExe terminates in such situations, a common workaroundis to impose some sort of bounding such as bounds on loopiterations, length of call chains, and depth.

Kiasan can use the above forms of bounding. However,when analyzing heap objects, Kiasan uses bounds on thelength of reference chain for objects and the number ofunique indices arrays can be accessed with [9]. With thesebounding strategies, we were able to rigorously demonstratethe aliasing case-optimality of Kiasan’s fastest algorithm forseveral complex data structures; this was done by developingcase-optimal metrics for those data structures and measurethe number of test cases Kiasan produces [11]. These met-rics can serve as indicators to determine whether Kiasan isproducing the expected number of test cases, which is usefulfor experimentation (e.g., an optimization algorithm shouldnot change the number of test cases Kiasan should generate).

Interaction with Decision Procedures: SymExe usesdecision procedures for two purposes: (1) to determine pathfeasibility, and (2) to retrieve models for generating con-crete test cases. At each branching point, SymExe needsto decide path feasibility in order to prune spurious paths.More specifically, it relies on decision procedures, for exam-ple SMT solvers, to decide whether the conjunction of thebranching condition and the PC is satisfiable. For exam-ple, at line 3 of Figure 1, SymExe explores both branchesbecause they are both feasible. SymExe can use constrain-t/SMT solvers to generate concrete test cases. At the end ofeach feasible path, the path condition contains all the con-straints that force the execution to follow that path. Thus,a concrete test case that covers the path can be generatedby any concrete assignment of symbols that satisfies the endpath condition. For example, the path 2–3,T–4–5,F–7 inFigure 1 has an end path condition β < α. Clearly, assign-ment β = 0 and α = 1 satisfies the path condition. Thus, atest case a = 1, b = 0 can be used as a representative testfor that path.

Kiasan can use both CVC3 [4] and Yices [14] as decisionprocedures, although Kiasan can only use Yices to generatecounter-examples or test cases.

3 Approaches

In this section, we first describe the properties of Kiasan’sSymExe that we exploited when developing LDP; we focusour discussion to constraints on scalar values because this isthe target of LDP’s optimizations.

As mentioned earlier, Kiasan is a SymExe-based anal-ysis framework that employs a stateless depth-first search(DFS without a seen set) with delta backtracking to exploresymbolic computation tree structures of programs. Kiasan

Page 3: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

maintains SymExe program path condition as a stack of con-straints. As the DFS progresses, Kiasan pushes/pops con-straints that it discovers. For efficiency, it abandons pathswhose conjunction of constraints become unsatisfiable. Ki-asan never merges/summarizes states because it is hard todefine a precise, general, and automatic heap abstraction forany property that a user may assert.

Kiasan for Java analyzes programs at the bytecode level,where all Java short-circuit boolean binary operators arecompiled as simple ifs and gotos. Consequently, each con-straint in the stack has the following simple forms (α, β, γare symbolic values, c stands for a constant,./∈{<,≤,=, 6=,≥, >}, and �∈ {+,−, ∗, /,%}):• Symbol/Constant Test Constraints: α./ c, c ./α• Symbol/Symbolic Test Constraints: α./ β• Symbol/Binary Arithmetic Expression (BAE) Constraints:β = c �α, β = α � c, γ = α �β

In general, Kiasan introduces new symbolic values as theresult of operations involving symbolic values (at most 1new symbolic value per bytecode instruction); obvious con-straints such as α = α are always evaluated directly. Kiasanalso directly computes operations on concrete values. More-over, casting between types are explicit at the bytecode level.Therefore, the arithmetic forms above only constrain vari-ables/constants in the same numerical type domain. Only32/64-bit integers (i.e., int, long) and floating point numbers(i.e., float, double) are used at the bytecode level.1

A symbolic value represents a single constant value. Thatis, its value never changes throughout the execution. AsSymExe progresses along an execution path, a symbolic valuecan only be constrained more (i.e., the set of values that itrepresents can only get smaller; an empty set of values meansthe path condition is unsatisfiable).

3.1 Algorithms

LDP has been designed to leverage the above observationsin order to make fast decision when possible. LDP can beviewed as a constraint caching device with the ability toinfer certain implicit constraints based upon the facts it ismaintaining; its inference engine employs a linear intervalanalysis as well as other techniques.

Constraint canonicalization: Kiasan timestamps sym-bolic values. Thus, there is a total order v on symbolicvalues that can be used to canonicalize constraints. Forexample, LDP rewrites v′+ v as v+ v′ if v v v′. Further-more, LDP rewrites strict inequalities < and > on sym-bolic/constant integers as non-strict inequalities ≤ and ≥,respectively. For instance, LDP rewrites α< 1 as α≤ 0 onsymbolic integer α. More precisely, the canonical constraintforms are as follows:• α./ c, where ./∈{<,≤,=,≥, >, 6=}• α./ β, where ./∈{<,≤, >,≥,=, 6=} and αvβ• β = α � c, where �∈ {+,−, ∗, /,%}, and αvβ• β = c �α, where �∈ {−, /,%}, and αvβ• γ = α �β, where �∈ {+, ∗} and αvβvγ• γ = α �β, where �∈ {−, /,%}, αvγ and βvγConstraint management protocol: The management ofconstraints in LDP are done through two categories of oper-ations: queries and commands [24]. Queries can be used to

1LDP treats Java scalar values as arbitrary-precision inte-gers and reals because we are not focusing on errors due tooverflow/underflow. If such a need arises, decision procedureon bit-vectors can be used instead.

ask LDP whether a certain constraint C is consistent withrespect to the constraints in its current context bC (initially,the context has no constraint). The possible answers for thistype of query from LDP are:

• VALID, only if bC =⇒ C (i.e., bC ∧C is satisfiable and bC ∧¬Cis unsatisfiable);

• UNSAT, only if bC ∧C is unsatisfiable and bC ∧¬C is satis-fiable;

• BSAT, only if both bC ∧C and bC ∧¬C are satisfiable;• UNKNOWN, otherwise.Queries can also be used to ask LDP about certain facts thatit knows such as asking whether a symbolic value is equalto a constant.

Commands are used to update LDP’s context when addingconstraints. Before adding a constraint LDP requires itsclient to query whether its addition will still yield a satis-fiable set of constraints. Only if that is the case, updatecommands can be used. This simplifies the design and im-plementation of LDP algorithms while not reducing its effi-ciency as described later in the next section.

3.1.1 Managing Symbol/Constant Test Constraints

For α./ c constraints where ./∈{<,≤, >,≥}, LDP main-tains an interval (over-approximation) for each α. For α= c,LDP remembers that α is actually c. Whenever LDP learnsthat c≤α≤ c, it normalizes this fact as α= c. For α 6= c,LDP records the set of constants that α is not equal to, andit updates α’s intervals appropriately. For example, the con-straints α 6= c and α≤ c are normalized to α<c (α≤ c− 1 ifα is a symbolic integer).

Figure 2(a) and (b) present LDP query and update direc-tives for α<c as decision tables. LDP has separate tablesfor different forms of constraints; the reader is referred to [5]for complete query/update directive tables.

Constraint query: The α<c? table represents the deci-sions LDP can make on the query to determine whether α isless than the constant c based on the interval LDP is main-taining for α. The interval information is stored in a set ofmaps which are represented by the first column labeled C.The way to interpret the table is as follows: starting fromthe first row, if the C= table contains a mapping for α thenlook up its value d (i.e., d = C(α)) and compare the mappedvalue to c. If c≤ d (i.e., c is to the left of d) then the con-straint is unsatisfiable, or UNSAT. If c>d (i.e., it is to theright of d; Figure 2(a)#3) then the constraint is valid underthe current path condition.

If LDP does not contain a mapping for a particular rowthen it will move to the next row in the table which, in thiscase, is C≥. If C≥ contains a mapping for α then there existsa non-strict lower bound d for α (i.e., d = C≥(α)). If c≤ d,then clearly c>d cannot hold so LDP returns UNSAT. Inthe case where c>d, LDP checks to see if an upper boundhas been set for α. If so, LDP will return the answer re-trieved after consulting C≤ (or the C< table in the next rowif C≤ does not contain a mapping for α). Otherwise, LDPreturns UNKNOWN.

For example, suppose bC is α< 3 and a query is asked forα< 0?. If the α< 0 is asserted then a tighter upper bound onα would be found. If, on the other hand, ¬(α< 0) is asserted,then a new lower bound on α would be defined. Clearly, bothcases are satisfiable. Now consider the same test againstbC=

V{α< 3, α>β, β≥ 1}. In this case, α’s lower bound is

constrained by β, thus, α< 0 cannot hold. In these situa-

Page 4: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

(a) ! < c ?C Test Result Value Example #

C=

c < C(!) UNSAT!

d

!!c

< "1

c = C(!) UNSAT!

d

!!c, d

< "2

c > C(!) VALID!

d

!!c

< "3

C!c < C(!) UNSAT

!d

! >c

< "4

c = C(!) UNSAT!

d

! >c, d

< "5

c > C(!) C"?!

d

! >c

< "6

C>

c < C(!) UNSAT!

d

" >c

< "7

c = C(!) UNSAT!

d

" >c, d

< "8

c > C(!) C"?!

d

" >c

< "9

C"c < C(!) UNKNOWN

!d

< !c

< "10

c = C(!) UNKNOWN!

d< !

c, d< "

11

c > C(!) VALID!

d< !

c< "

12

C<

c < C(!) UNKNOWN!

d< "

c< "

13

c = C(!) VALID!

d< "

c, d< "

14

c > C(!) VALID!

d< "

c< "

15

(b) ! < c !C Update Rules #

C!T) ! < c! C<[! "! c] 1F) ! # c! C![! "! c] 2

C>T) ! < c! C<[! "! c] 3F) ! # c! C>[! "! $], C![! "! c] 4

C"T) ! < c! C"[! "! $], C<[! "! c] 5

F) ! # c!if c = C"(!)then C"[! "! $], C=[! "! c]else C![! "! c]

6

C<T) ! < c! C<[! "! c] 7F) ! # c! C![! "! c] 8

(d) ! != c !C Update Rules #

C!=T) ! != c" C!=[! #" C(!) $ {c}] 1F) ! = c" C",>,#,<[! #" %], C=[! #" c] 2

C"T) ! != c"

if c = C"(!)then C"[! #" %], C>[! #" c]else C!=[! #" C(!) $ {c}]

3

F) ! = c" C",>,#,<[! #" %], C=[! #" c] 4

C>T) ! != c" C!=[! #" C(!) $ {c}] 5F) ! = c" C>,#,<[! #" %], C=[! #" c] 6

C#T) ! != c"

if c = C#(!)then C#[! #" %], C<[! #" c]else C!=[! #" C(!) $ {c}]

7

F) ! = c" C#,",>[! #" %], C=[! #" c] 8

C<T) ! != c" C!=[! #" C(!) $ {c}] 9F) ! = c" C<,>[! #" %],C=[! #" c] 10

(c) ! != c ?C Test Result Value Example #

C!=c " C(!) VALID

!d

!!c, d

!!1

c !" C(!) C=?!

d

!!c!!

2

C=

c < C(!) VALID!

d

!!c!!

3

c = C(!) UNSAT!

d

!!c, d

!!4

c > C(!) VALID!

d

!!c!!

5

C"c < C(!) VALID

!d

! >c!!

6

c = C(!) UNKNOWN!

d

! >c, d

!!7

c > C(!) C#?!

d

! >c!!

8

C>

c < C(!) VALID!

d

" >c!!

9

c = C(!) VALID!

d

" >c, d

!!10

c > C(!) C#?!

d

" >c!!

11

C#c < C(!) UNKNOWN

!d

< !c!!

12

c = C(!) UNKNOWN!

d< !

c, d

!!13

c > C(!) VALID!

d< !

c!!

14

C<

c < C(!) UNKNOWN!

d< "

c!!

15

c = C(!) VALID!

d< "

c, d

!!16

c > C(!) VALID!

d< "

c!!

17(e) ! < " ?V Test Result Value #V= " ! V (!) UNSAT 1V != " ! V (!) V<? 2V< " ! V (!) VALID 3V" " ! V (!) UNKNOWN 4V# " ! V (!) UNSAT 5V> " ! V (!) UNSAT 6

(f) ! < " !V Update Rules #

V !=T) ! < " ! V !=[! "! V (!) \ {"}], V<[! "! V (!) # {"}] 1F) ! $ " ! V !=[! "! V (!) \ {"}], V>[! "! V (!) # {"}] 2

V"T) ! < " ! V"[! "! V (!) \ {"}], V<[! "! V (!) # {"}] 3F) ! $ " ! V"[! "! V (!) \ {"}], V=[! "! V (!) # {"}] 4

Figure 2: LDP query and update directives for the constraints α<c, α 6= c, and α<β

tions, LDP returns UNKNOWN, thus, LDP’s client shoulduse a more complete decision procedure DP.

Constraint update: Continuing with C=α<c, if LDPreturns UNKNOWN, and DP indicates PC ∧C is satisfiable,LDP will then perform the update actions listed in Figure2(b). For this table the C column indicates the mappingthat was used to make the decision2, and the next columnindicates the update actions LDP should take. The rowsbeginning with ‘T)’ represent the actions to be performedwhen C is asserted. The ‘F)’ row is the case where DPconcluded that PC ∧¬C was satisfiable.

For example, suppose the mapping used to make LDP’squery decision was C≤, and DP returned SAT for PC ∧C.In this case LDP will drop the C≤ mapping for α (i.e.,C≤[α 7→ ⊥]) and add/override the C< mapping for α (i.e.,C<[α 7→ c]). Now, suppose that DP indicates that PC ∧¬Cis satisfiable. In this case LDP’s action depends on the valuemapped by C≤(α), which we will again denote as d. If c= d,then we have d≤α≤ c which implies α= c, thus, the tighterbound C=[α 7→ c] is recorded. Otherwise, the lower boundfor α is set by mapping C≥[α 7→ c].

3.1.2 Managing Symbol/Symbol Test Constraints

For α./ β constraints, LDP maintains a set of mappingswhich indicate how symbolic values are inter-related. In con-trast to the symbol/constant constraints, a symbolic valuecan be related to more than one symbolic values for a given

comparison operator. For example, for bC=α<β ∧α<γ,

2Constraint queries can return “cookies crumbs” that arepassed to constraint updates to implement this efficiently.

LDP maintains the mapping α< {β, γ}.

Constraint query: Figure 2(e) presents LDP query di-rective rules for the constraint C=α<β. Intuitively, LDPconsults the table similarly to the table discussed previously.LDP first checks to see if β is in the set of variables that αequals to. If it is, then C cannot hold. Otherwise, LDPconsiders to the next row and checks if β exists in the setof variables which α is not equal to. Since α 6=β does notimply α<β, LDP continues its search by checking the lowerand upper bound sets being maintained for α.

If β is not contained in any of the mappings for α but βis related to some other variables, then LDP safely returnsUNKNOWN (see Section 3.1.4 for a discussion on how LDPhandles the case where only the first condition holds). To

see why, consider the case where bC=V{γ >β, γ≤ 2, α> 3},

thus, there is an implicit constraint β < 2. Therefore, α 6<β.

Constraint update: Figure 2(f) details the update rulesLDP uses for α<β. In the case where no mapping betweenα and β exists, LDP simply adds the constraint α<β orα≥β (depending on which branch is followed). Otherwise,LDP uses the instructions listed in the relevant row of thetable to consistently update its context.

For example, LDP decides UNKNOWN if the query α<βis asked under the PC = α≤β (Figure 2(e)#4). If ’T)’ isfollowed, LDP will use Figure 2(f)#3 to update its contextby removing β from α’s symbolic V≤ set and adding it toα’s V< set. Otherwise, Figure 2(f)#4 allows LDP to inferthat α=β as this is implied by α≤β≤α.

Page 5: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

3.1.3 Managing Symbol/BAE Constraints

LDP leverages the functional consistency of binary arith-metic expressions over symbolic values (which represent con-stants) to reduce the number of symbolic value introduc-tions. That is, LDP caches expressions over symbolic valuesand reuses symbolic values, and it provides its client ac-cesses to the cache. For example, suppose Kiasan interpretsthe following code under the state x=α and y=β (and αvβ):

x + y < 0 && y + x > 0.

The first arithmetic expression will be assigned to a newsymbolic value γ (assuming α+β has never been interpretedbefore) and γ = α+β will be added to the path conditionalong with γ < 0. When Kiasan executes the second ex-pression, it can retrieve γ using LDP and then ask whetherγ > 0 which is clearly inconsistent, thus, Kiasan always givesfalse as the result of the expression.

3.1.4 Other Optimizations

Union-Find (UF): LDP uses a UF algorithm to maintainequivalence classes of symbolic values. We specifically de-signed an algorithm that leverages v. The intuition is toalways use a symbolic value element that has the earliesttimestamp, thus, it serves as a more stable representativewith respect to Kiasan DFS backtracking (i.e., the earlierthe timestamp is, the later the corresponding symbolic valueis un-introduced). In addition, we leverage fact that LDPuses find more often than union; that is, find must be a re-ally fast operation, while union can be a bit more expensive.LDP’s UF algorithm is as follows (7→ is lifted to also workon sets, and S is ∅ initially):• find(S, e) = e, if S(e) is undefined; otherwise, S(e);• union(S, e, e′) is defined as:

– if find(S, e) v find(S, e′), then S[e′ 7→ r, E 7→ r],where r = find(S, e) and E = {e′′ | (e′′, e′) ∈ S};

– otherwise, union(S, e′, e).It is clear that with respect to the number of S’s elements,the UF algorithm above is linear in space. Given that the Sis implemented as a hash table then find has a constant timeon average and worst-case linear time complexity; however,union is always linear.

Representative propagation and constraint merging:When LDP learns α=β, LDP uses the UF algorithm aboveto maintain the representative of the equivalence class con-taining α and β (i.e., α = β). Assuming that α is the leastsymbol with respect to v, LDP will substitute α for β inany subsequent query or update expression it is given (i.e.,any symbols used in Figure 2 actually refer to the symbols’representative). In order to maintain a consistent contextwhen representatives are introduced, LDP merges the con-crete and symbolic constraints of the two variables and as-signs the bounds to the representative. For example, sup-pose PC =

V{β > 3, α≤β, α≥β} and the test constraint is

α> 0. LDP is able to infer that α≤β≤α, thus α=β. LDPwill set α’s representative to be β’s representative and mergethe bounds; hence, α=β and α> 3. Therefore, LDP will beable to answer α> 0.

Returning BSAT: Before LDP returns UNKNOWN itchecks to see if it can conclude the stronger decision BSAT.For α./ c constraints, LDP returns BSAT provided that αis not related to another symbol. For the case where LDP isnot able to decide BSAT, consider the query α≥ 0 againstPC =

V{α<β, β < 0}. Clearly, the original constraint is

not valid since α is constrained by β to be strictly less than0. If the constraint α<β were removed from the context,then clearly both α≥ 0 and ¬α≥ 0 would be satisfiable.

For α./ β constraints, LDP returns BSAT provided thatα and β are not related to other symbols and at least one ofthem is not associated with a concrete bound. For example,if PC =

V{α<β, β <γ}, then the constraint α>γ is unsat-

isfiable which implies that its negation is valid. If the samequery was made using an empty context, or one in which αand γ are not related to other symbols, then clearly boththe formula and its negation are satisfiable.

3.1.5 Assessment

As can be observed, LDP’s inference engine is limited. Forexample, LDP cannot conclude α≤ γ from the α≤β andβ≤ γ (i.e., transitive closure on inequalities in general). Weconsciously decide to use linear complexity as our guidelinewhen deciding whether to incorporate/customize an algo-rithm in LDP. We use the term “guideline” because it mightbe the case that even though an algorithm is linear, it maynot prove to be effective in reducing the overall analysis cost.Conversely, non-linear algorithms may prove to be effective.Finding the delicate balance on what to incorporate in LDPand what to defer to the underlying solver (i.e., SMT solverssuch as Yices would be able to solve the above problem) is aninteresting research question that we plan to pursue further.Regardless, we submit that, while simple, LDP is effective;the experimental results in Section 4 demonstrates that LDPis beneficial for a large class of programs.

3.1.6 Integration with Kiasan

Decision procedures: Kiasan organizes LDP and an un-derlying decision procedure DP in a waterfall model. Thatis, it consults LDP first; only if LDP returns UNKNOWN,

Kiasan consults DP. More specifically, for a context bC, a testconstraint C, and a path condition PC:• If LDP returns VALID, Kiasan continues executing along

the path, and it remembers not to explore the other pathwhere ¬C, thus reducing backtracking and a consistencycheck. Next, C is added to PC, but the fact is not pushedto DP. This reduces the constraints maintained by DPand saves time by not sending the constraint over to DP.• If LDP returns UNSAT, Kiasan directly backtracks and

explores the path where ¬C.• If LDP returns BSAT, Kiasan will explore both branches.

Depending on which branch is chosen, Kiasan adds C or

¬C to PC, LDP’s bC, and DP’s. That is, Kiasan avoidscalling DP’s consistency check, thus, reducing time.• Otherwise, LDP returns UNKNOWN. Kiasan will consult

DP to check the satisfiability of C. If it is satisfiable,Kiasan continues executing along the path; otherwise, itbacktracks.

Non-test constraints are directly added to PC and LDP/DP.

Constant propagation: When Kiasan interprets opera-tion/test instructions involving a symbolic value α, it firstasks LDP whether it has inferred that α is equal to a con-stant. If so, Kiasan uses the constant instead. Thus, undercertain situations, Kiasan does concrete execution instead ofthe more expensive SymExe. It such situations, Kiasan doesnot even call LDP’s constraint queries/updates, use DP, norpushing more constraints to its PC. This effectively reducesthe number of decision procedure calls.

For example, consider the following code snippet:

Page 6: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

if (x == 3 && (y == 5 || y == 7)) z = y / x;

Kiasan is able to compute the non-linear arithmetic expres-sion y / x concretely, because LDP knows that the symbolicvalues for x and y are actually constants. Without weaken-ing non-linear arithmetics such as the above as uninterpretedfunctions, Yices will reject the constraints generated for thetrue branch. This is because Yices does not perform con-stant propagation [13], which is understandbly more difficultto do in a general purpose solver.

Expression canonicalization and caching: When Ki-asan builds a symbolic expression, it asks LDP to build thecanonical form and to determine whether there has been asymbolic value associated to it. If so, Kiasan uses the ex-isting symbolic value for the result of the expression. Thisreduces the number of variables a decision procedure needsto remember or maintain.

Backtracking: When latched in Kiasan, LDP piggy-backson Kiasan’s backtracking facility. That is, it does not use itsown backtracking algorithm. Instead, LDP stores instruc-tions to reverse its states’ delta and attach the instructionsto Kiasan’s backtracking data structure. Thus, LDP’s con-straints management is synchronized with Kiasan SymExeDFS algorithm. This reduces a bit of space/time overheadthan when using a separate backtracking engine.

3.2 Model-based Engineering

LDP’s table-based approach consists of 24 directive tables.As one can imagine, implementing all LDP directives arenon-trivial, laborious, and error-prone. Thus, we employgenerative programming by translating a model of LDP di-rective tables to Java source code that implements them.In addition, all the (LATEX) tables in Figure 2 (including thegraphs in the Example column) are also auto-generated fromthe same model. One benefit of this approach is “bug consis-tency”; that is, if the translation scheme is wrong, it is veryapparent and easy to detect because it affects most code inthe generated implementations. This approach also makes iteasier to maintain and revise LDP as we experimented withit. Furthermore, optimization can be done by customizingthe translator for different numerical type domains. For ex-ample, for integers, we can leverage the additional fact thatstrict inequalities can be normalized to non-strict inequali-ties. Therefore, there are two categories of LDP implemen-tations targeted for integers and reals which are generatedfrom the same models.

4 Evaluation

4.1 Foundational

Soundness: One key goal of LDP is not to give wrong re-sults. That is, when LDP returns exact results like VALID,UNSAT, BSAT, the results must be correct. Suppose the

path condition is PC and LDP’s context is bC (implied bythe mappings such as V and C) and the decision procedure’s

context is bD.

Theorem 1. (a) If the query result of a constraint C fromLDP is VALID, then PC =⇒ C. If the result is UNSAT,then PC ∧C= false. If the result is BSAT, then both PC ∧Cand PC ∧¬C are satisfiable; and (b) PC= bC= bD.

Proof sketch: This can be proved by strong co-induction onthe number of constraints (conjuncts) in the path condition.

The inductive step can be shown by a case analysis of thereturn value of LDP’s query on both Part(a) and Part(b).The reader is referred to [5] for more details.Complexity: We will examine the space complexity andthe time complexity of the query/update/canonicalization/-caching operations in terms of the number of constraints inthe path condition, n. Since each table entry is added onlyafter a constraint is added to the path condition, the numberof entries in the tables is less than or equal to cn for someconstant c. Therefore, LDP has a space complexity of O(n).The worst-case time complexities of the operations are:• query operation has O(n) complexity since there are fixed

numbers of table lookups, and the complexity of the setmembership test is linear by the customized UF algorithm(for representative lookup).• canonicalization operation has O(1) time complexity since

it only permutes at most three elements each time.• caching table lookup operation has O(n).• update operation has O(n) time complexity since the most

complex operation that the update does is to merge twotable entries, for example, when a constraint α=β is in-troduced. In this case, the update will be the union op-eration of the UF algorithm. Furthermore, the union islinear to |S| which is the number of variables. Since eachconstraint introduces at most one new variable, the num-ber of variables is less than or equal to n. We concludethat update operation has O(n) time complexity.

4.2 Empirical

4.2.1 Validating Soundness

Since LDP can only handle limited forms of constraints dueto our design decisions, we cannot leverage the SMT LIBbenchmarks [3] to test LDP. Thus, when testing the sound-ness of LDP, we appeal to the concept of redundancy. Thatis, since LDP should be sound, its answers must agree with asound decision procedure DP in a consistent manner. Moreprecisely, consistency is defined as follows:• If LDP returns VALID then DP returns SAT for the for-

mula and UNSAT for the negation of the formula;• If LDP returns BSAT then DP returns SAT for both the

formula and the negation of the formula;• If LDP returns UNSAT then DP returns UNSAT;We chose Yices [14] as DP due to its maturity/performance.

In order to empirically validate the soundness of LDP,a test harness was created to automate the testing pro-cess. The inputs are the number of variables to be usedand the maximum size of the path condition (i.e., the maxi-mum number of conjuncted satisfying constraints). Tests aregenerated by non-deterministically generating a constraintin the form of those listed in Section 3.1. The generatedconstraint is then passed to Yices and LDP to check thevalidity of the constraint against the current path condi-tion. If the answers from Yices and LDP are not consistent,then an error is raised. If the answers are UNSAT, thenthe constraint is dropped. Otherwise, if Yices indicates theconstraint is consistent with the current path condition andLDP returned BSAT/UNKNOWN, then the constraint isadded to LDP. Additional constraints are generated untilthe maximum size of the path condition is reached.

In order to test various combination of the |PC| and thenumber of variables used, the harness was started by select-ing the |PC| to be 4 and then gradually increasing this valueafter a set of tests completed. For each |PC|, the harness

Page 7: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

began by selecting from a pool of 3 unique variables. Oncethe |PC| was fulfilled, the harness generated a fresh variableand added it to the pool. Then the PC of both Yices andLDP was cleared and a new path was generated. The har-ness reran the tests 50 times with different seed values foreach combination of |PC| and variable pool size. For thesetests, Kiasan is not needed since we are only concerned withchecking the consistency of LDP (i.e., the constraints do notneed to be derived from Java programs).

Early in the development of LDP, randomized double test-ing uncovered some inconsistencies and typos in LDP’s model.Since then, we have conducted this randomized double test-ing for weeks and all the tests finished without an error.

4.2.2 Validating Effectiveness

Kiasan’s optimized engine as the experiment basis:Kiasan SymExe engine has incorporated several optimiza-tions; we now describe some Kiasan optimizations that arerelevant when evaluating LDP’s effectiveness.

Reducing DP calls and backtracking points: Topi does notpush/pop constraints to the underlying decision procedureuntil it is really needed (i.e., at decision points when consis-tency check should be done). This way, the number of timesKiasan calls DP is reduced. In addition, DP backtrackingmay be expensive, thus, we optimized Topi such that it onlycreates DP backtracking point just before a decision point.Thus, when Kiasan backtracks, DP is backtracked directlyto the last decision point.

A native binding for SMT solvers: Most high-performingSMT solvers are written in C or C++, while Kiasan is im-plemented in Java. Thus, there is an engineering problemon how to communicate constraints and answers betweenthe two. A Java Native Interface (JNI) binding has beenused previously, and it has proved beneficial [1]. Instead ofdeveloping JNI bindings for each native solvers (e.g., CVC3,Yices, etc.) or sending constraints as expression strings(which a DP will then have to parse), we define a bytecodeinstruction set for communicating constraints. Topi convertsKiasan’s (buffered) constraints to the bytecode form beforesending it over to the native side. At the native side, thebytecode is processed to rebuild the constraint tree, whichis then pushed to the solver’s context. Bytecode is gener-ally cheaper to parse than strings, and Topi’s bytecode issolver-independent.

Constant propagation: Kiasan already incorporates someform of constant propagation. Whenever it executes equali-ty/disequality comparisons between a symbolic value α anda constant c, it remembers that α is actually c (dependingon the branch it follows; the true branch for equality, andthe false branch for disequality).

Experimental setup: When evaluating LDP, we are usingKiasan’s optimized version as described above. To conductthe experiment, we use a machine equipped with 2.8 GHzprocessors running OS X 10.5.6 and Java 1.6 64-bit with thestandard JVM’s default heap size.

Experiment Data: Table 1 presents Kiasan/LDP exper-iment data. The columns c and m indicate the Java pro-gram class and method names analyzed (source code avail-able at [25]). The k column indicates the maximum lengthof reference chain of objects considered and the maximumunique array indices each array can be accessed with [9]. Thelarger the value of k is, the more number of test cases is go-

1 //@ en s u r e s i s S o r t e d ( x ) ;2 vo id s e l e c t i o n S o r t ( @NonNull i n t [ ] x ) {3 f o r ( i n t i = 0 ; i < x . l e n g t h − 1 ; i ++)4 f o r ( i n t j = i + 1 ; j < x . l e n g t h ; j ++)5 i f ( x [ i ] > x [ j ] )6 { i n t t = x [ i ] ; x [ i ] = x [ j ] ; x [ j ] = t ; }7 }89 @Pure boolean i s S o r t e d ( @NonNull i n t [ ] a ) {

10 f o r ( i n t i = 0 ; i < a . l e n g t h − 1 ; i ++)11 i f ( a [ i ] > a [ i + 1 ] ) r e t u r n f a l s e ;12 r e t u r n t rue ;13 }

Figure 3: Selection Sort method

ing to be generated by Kiasan, but it may be more expensive(Kiasan reaches 100% feasible instruction and branch cov-erage with k = 4 for most examples listed in Table 1 [10]3).The column Y indicates Kiasan running time with Yices,and the column C is for with CVC3; Signs + and - indicatewhether LDP is used or not, respectively. For k < 6 on theleft and right tables, and k < 7 on the middle table, therunning time is an average of 10 runs. This was done be-cause the analysis time is very fast. The rest is an averageof 5 runs, except for methods running over 1 hour wherewe did only 1 run. The R column indicates the percentageof time reduced by using LDP (e.g., (Y-) - (Y+)/(Y-)).For each method and a bound k, Kiasan with/without LDPgenerated the same number of test cases, and they pass thecase optimality metrics presented in [11] (for methods wheresuch metrics are available). No other forms of bounding areused for the experiment.

Assessment: As can be observed, LDP gives the most sig-nificant time cost reduction on the array-oriented examplesin the middle tables of Table 1. The tree data structuressuch as AATree, AvlTree, BinarySearchTree, and TreeMap(a red-black tree implementation) in Table 1 are also ben-efited from LDP caching strategy because the keys for thedata structure are scalars. Kiasan’s benchmark also includesobject keys [9]. In this case, LDP will not give improvementin analysis cost since these examples use object compari-son abstracted by contracts (thus, treated as an uninter-preted function) instead of scalar comparison. One step fur-ther, programs involving no operations and tests on scalarswill not benefit from LDP. However, in such cases, LDPqueries/updates are not called, thus, there is no overhead.LDP does not give a significant improvement if there is onlyof a few number of scalar constraints as illustrated in thedata for k = 3 where LDP actually introduced some timeoverheads. However, as one can observe, the actual timeoverhead is less than 1 second; it is actually hard to accu-rately compare split second running times. The next sectiongives a detailed walkthrough on a representative example toillustrate why LDP can give drastic time cost saving.

Walkthrough on selectionSort: As can be observed fromthe experimental results, one common class of problems forwhich LDP provides significant benefit is array accesses. Se-lection sort as seen in Figure 3 is an example of this class.The outer loop keeps track of the sorted and unsorted por-tions of the array. The inner loop finds the least element ofthe remaining unsorted items and swaps it with the item atthe head of the unsorted portion, thus effectively adding it to

3Sireum/Kiasan uses a finer-grained k-bound, thus, its kvalue is generally 2 more of earlier Kiasan’s k-bound to gen-erate the same number of test cases.

Page 8: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

c m k Y- Y+ RA

AT

ree

con

tain

s 3 0.93 0.94 0%4 1.03 1.02 1%5 1.54 1.45 5%6 18.20 14.39 20%

fin

dM

ax 3 0.94 0.96 -1%

4 0.98 0.96 2%5 1.12 1.08 3%6 8.54 8.10 5%

fin

dM

in 3 0.91 0.90 1%4 0.92 0.93 -1%5 1.08 1.04 4%6 8.40 7.97 5%

inse

rt

3 0.74 0.76 -1%4 0.83 0.82 1%5 1.32 1.20 8%6 19.24 14.92 22%

rem

ove

3 0.72 0.73 0%4 0.81 0.78 3%5 1.28 1.18 8%6 18.78 14.65 21%

Avl

Tre

e

fin

d

3 0.71 0.69 2%4 0.79 0.77 2%5 2.03 1.79 11%6 1:51.58 1:25.24 23%

fin

dM

ax 3 0.70 0.69 2%

4 0.71 0.70 0%5 1.06 1.02 3%6 23.29 21.31 8%

fin

dM

in 3 0.68 0.68 0%4 0.72 0.72 0%5 1.05 1.00 4%6 23.24 21.37 8%

inse

rt

3 0.70 0.69 1%4 0.82 0.79 3%5 2.05 1.84 10%6 1:42.84 1:18.44 23%

Bin

aryS

earc

hT

ree

fin

d

3 0.66 0.64 2%4 0.71 0.66 6%5 1.49 1.27 14%6 1:32.11 1:00.27 34%

fin

dM

ax 3 0.63 0.65 -3%

4 0.65 0.64 1%5 0.81 0.75 6%6 10.01 8.12 18%

fin

dM

in 3 0.62 0.62 0%4 0.64 0.64 0%5 0.82 0.78 4%6 10.10 8.20 18%

inse

rt

3 0.62 0.64 -3%4 0.69 0.68 1%5 1.57 1.29 17%6 1:36.43 1:03.08 34%

rem

ove

3 0.63 0.62 2%4 0.68 0.67 2%5 1.42 1.23 13%6 1:27.96 0:58.46 33%

Sta

ckL

i po

p

3 0.58 0.58 0%4 0.60 0.58 3%5 0.59 0.58 2%6 0.62 0.60 2%

pu

sh

3 0.39 0.40 -2%4 0.41 0.40 2%5 0.41 0.40 1%6 0.42 0.42 0%

Tre

eMa

p

get

3 1.10 1.11 0%4 1.19 1.18 0%5 1.93 1.82 5%6 3:37.90 2:50.39 21%

last

Key

3 0.84 0.86 -2%4 0.89 0.90 -1%5 1.46 1.33 9%6 3:31.63 2:44.12 22%

pu

t

3 1.18 1.21 -1%4 1.48 1.44 2%5 4.33 3.59 16%6 9:30.27 7:40.51 19%

rem

ove

3 1.19 1.17 1%4 1.43 1.42 0%5 4.22 3.52 16%6 9:22.14 9:00.18 3%

c m k Y- Y+ R

Arr

ayP

art.

par

titi

on

3 0.31 0.29 4%4 0.34 0.33 4%5 0.40 0.35 11%6 0.54 0.42 22%7 0.78 0.51 34%

Dis

jSet

s Fin

d

3 0.77 0.45 41%4 1.70 0.65 61%5 17.89 2.25 87%6 5:17.27 0:24.65 92%7 1:56:48.26 0:07:08.20 93%

un

ion

3 0.45 0.36 19%4 1.50 0.58 61%5 14.98 3.01 79%6 3:48.78 38.02 83%7 1:13:23.98 0:10:43.47 85%

Dis

jSet

sFa

st Fin

d

3 0.42 0.31 26%4 1.74 0.46 73%5 22.44 2.10 90%6 6:53.51 0:26.21 93%7 2:33:15.16 0:07:40.08 94%

un

ion

3 0.57 0.33 42%4 3.20 0.72 77%5 38.42 5.12 86%6 10:04.02 1:07.68 88%7 3:09:54.60 0:18:55.07 90%

Sor

t inse

rtio

n. 3 0.24 0.24 -1%

4 0.28 0.26 8%5 0.46 0.31 33%6 1.63 0.56 65%7 10.86 2.07 80%

sele

ctio

n. 3 0.25 0.23 7%

4 0.32 0.25 23%5 0.79 0.33 57%6 5.03 0.85 83%7 51.34 5.85 88%

Sta

ckA

r

po

p

3 0.39 0.39 -1%4 0.41 0.40 0%5 0.40 0.40 1%6 0.39 0.41 -5%7 0.38 0.40 -5%

c m k C- C+ R

Arr

ayP

art.

par

titi

on

3 0.36 0.32 11%4 0.45 0.37 18%5 0.69 0.46 34%6 1.26 0.68 45%7 2.55 1.15 54%

Dis

jSet

s Fin

d

3 0.72 0.46 35%4 3.86 1.06 72%5 47.76 6.48 86%6 14:01.29 1:18.44 90%

un

ion

3 0.87 0.53 38%4 5.24 1.96 62%5 1:00.23 0:19.21 68%6 15:30.83 4:32.83 70%

Dis

jSet

sFa

st

Fin

d

3 1.40 0.53 62%4 10.40 1.81 82%5 2:14.34 0:15.33 88%6 38:33.00 3:23.74 91%

un

ion

3 3.31 1.10 66%4 31.25 7.89 74%5 6:46.34 1:35.12 76%6 2:26:4.76 0:25:20.14 82%

Sor

t inse

rtio

n. 3 0.34 0.32 6%

4 0.53 0.36 32%5 1.64 0.65 60%6 9.76 2.33 76%7 1:13.86 0:14.51 80%

sele

ctio

n. 3 0.31 0.26 15%

4 0.60 0.35 41%5 2.73 0.88 67%6 22.80 5.53 75%7 4:04.45 0:52.70 78%

Sta

ckA

r

po

p

3 0.40 0.41 -3%4 0.40 0.40 1%5 0.41 0.41 -1%6 0.42 0.40 2%7 0.43 0.41 3%

c m k C- C+ R

AA

Tre

e

con

tain

s 3 0.96 0.94 1%4 1.12 1.08 3%5 2.75 2.22 19%6 1:13.04 0:54.80 24%

fin

dM

ax 3 0.98 0.97 0%

4 1.02 1.01 1%5 1.80 1.67 7%6 45.91 41.62 9%

fin

dM

in 3 0.93 0.92 0%4 0.96 0.97 0%5 1.78 1.61 9%6 45.75 41.60 9%

inse

rt

3 0.78 0.76 2%4 0.95 0.90 4%5 2.55 1.98 22%6 1:14.34 0:55.79 24%

rem

ove

3 0.76 0.75 1%4 0.91 0.86 4%5 3.43 2.16 37%6 3:23.61 1:18.75 61%

Avl

Tre

e

fin

d

3 0.73 0.70 4%4 0.87 0.85 2%5 4.44 3.32 25%6 5:58.92 4:08.68 30%

fin

dM

ax 3 0.71 0.72 0%

4 0.78 0.75 3%5 2.17 1.92 11%6 1:58.74 1:46.48 10%

fin

dM

in 3 0.71 0.70 2%4 0.78 0.77 2%5 2.16 1.91 11%6 1:58.10 1:46.38 9%

inse

rt

3 0.72 0.70 3%4 0.92 0.86 6%5 4.58 3.40 25%6 5:54.52 4:05.94 30%

Bin

aryS

earc

hT

ree

fin

d

3 0.67 0.66 1%4 0.75 0.74 1%5 3.56 2.40 32%6 6:07.90 3:28.99 43%

fin

dM

ax 3 0.65 0.64 0%

4 0.69 0.68 0%5 1.31 1.08 17%6 42.90 30.52 28%

fin

dM

in 3 0.65 0.62 4%4 0.67 0.68 0%5 1.34 1.08 19%6 43.07 30.60 28%

inse

rt

3 0.66 0.64 3%4 0.78 0.74 5%5 3.74 2.46 34%6 6:18.68 3:32.32 43%

rem

ove

3 0.65 0.63 2%4 0.76 0.72 5%5 3.44 2.46 28%6 6:05.84 3:34.47 41%

Sta

ckL

i po

p

3 0.60 0.60 -1%4 0.58 0.61 -6%5 0.61 0.60 2%6 0.64 0.63 1%

pu

sh

3 0.41 0.40 2%4 0.39 0.41 -5%5 0.41 0.42 -1%6 0.46 0.45 2%

Tre

eMa

p

get

3 1.14 1.13 0%4 1.54 1.41 8%5 16.11 7.81 51%6 3:33:8.17 0:26:45.50 87%

last

Key

3 0.86 0.88 -2%4 1.11 0.98 11%5 11.20 2.88 74%6 3:19:11.29 0:12:23.91 93%

pu

t

3 1.27 1.22 4%4 2.29 1.64 28%5 50.20 9.38 81%6 7:22:30.13 0:31:12.94 92%

rem

ove

3 1.24 1.20 2%4 2.26 1.64 27%5 52.71 10.21 80%6 7:40:46.98 0:35:36.93 92%

Table 1: Experiment Data with Time Format h::mm:ss.ms – h=hour, mm=minutes, ss=seconds, ms=2 digits (rounded) milliseconds(all timing data include code and JML contract processing times)

Page 9: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

the end of the sorted portion. The contract for selectionSortis provided as a JML specification which states a precondi-tion that the array x must be non-null and the postconditionthat x must be sorted according to the isSorted method.

To illustrate LDP’s effectiveness, suppose that the sym-bolic value α has been assigned to x.length. As an invariantof an array length, Kiasan adds the initial constraint α≥ 0.As can be observed, the loop variables i and j will only holdconcrete values (i.e., constant values initially with constantincrements). When executing the loop condition, x.length -1 is evaluated to β with the constraint β = α− 1 pushed.

Tightening bounds and constant propagation: During thefirst iteration of the outer loop of selectionSort, Kiasan queriesLDP if 0<β. Since β is related to α, LDP returns UN-KNOWN. Thus, Kiasan asks a decision procedure DP whichindicates that both the original constraint and its nega-tion are satisfiable. Suppose the false branch is exploredfirst, which causes the constraint β≤ 0 to be pushed. Thus,PC=

V{α≥ 0, β = α− 1, β≤ 0}. After selectionSort ends,

Kiasan checks the isSorted postcondition. The loop con-dition at line 10 is evaluated to β due to LDP’s β = α− 1cache. Hence, the loop condition amounts to querying 0<β;LDP returns UNSAT, and the check passes.

Now consider the case where Kiasan enters the outer loopof selectionSort. Since the inner loop condition operates onx.length (i.e., α), LDP is able to infer that α is always equalto a constant whenever the condition fails. For example,consider the first iteration of the outer loop in which i= 0,and thus, j= 1. Kiasan proceeds by evaluating the innerloop condition. As α is involved in a relation, LDP re-turns UNKNOWN for the query of 1<α; thus, the query ispassed to DP, which then determines that both the originalconstraint and its negation are satisfiable. Suppose Kiasanchooses to follow true branch, α> 1 is pushed, which LDPrewrites as α≥ 2.

At the second iteration of the inner loop, Kiasan incre-ments j and queries LDP if 2<α. Again, LDP answers UN-KNOWN. Suppose Kiasan follows the false branch (2≥α),then LDP will be able to infer α= 2 from 2≤α≤ 2. Whenthe postcondition isSorted is checked, Kiasan will be able tosubstitute x.length with the concrete value 2, and thus willbe able to concretely execute the loop since both the indexand array length have concrete values.

In general, array accesses are potentially benefited fromLDP. This is because once Java arrays are created, theirlength are fixed. Once LDP is able to infer a constant for anarray length by tightening its bound, array accesses usingconcrete indices are interpreted concretely (i.e., LDP andDP are not called to make decisions) for the rest of the pathexecution. Moreover, array index bound checks can also bedone concretely.

Expression caching : The inner loop of selectionSort intro-duces relational constraints on the elements contained in x,either x[i]> x[j] or x[i]≤ x[j] for some i and j at line 5 of Fig-ure 3. LDP caches results that are either decided by LDP orthe underlying DP. This concept of caching becomes usefulwhen the postcondition isSorted is called. For a given explo-ration path, LDP will maintain the necessary informationto answer VALID or UNSAT for all of the tests performedat line 11 in Figure 3.

In general, (scalar) assertion/contract checking are poten-tially benefited by LDP because it is often the case that as-

sertion and contract conditions are essentially “redundant”checks with respect to the program behaviors, especially ifthe program is correct. Thus, LDP’s caching strategy maybenefit in such situations.

5 Related Work

The constraints managed by LDP’s Symbol/Constant di-rective tables is a subset of interval analysis. This is a well-studied area with classical work such as [18]. An interestingparallel is the work by Cousot and Cousot [8] where theyuse interval analysis in abstract interpretation; SymExe, likemany (sound) static analysis techniques, is an abstractioninterpretation technique. However, complex operations usu-ally employed in abstract interpretation using interval anal-ysis such as widening are not used in LDP, because SymExedoes not merge states at program joint points. Due toSymExe’s characteristics, LDP only needs to tighten boundson symbolic value. LDP only uses an intersection operationon a pair of intervals (i.e., when merging constraints on twosymbols). This simplifies the design and development ofLDP while making it easier for LDP’s interval analysis tobe modeled as directive tables.

Term rewriting and canonicalization of expressions [12]is a well-used technique employed in many different areas.LDP utilizes these concepts to uniquely identify some se-mantically equivalent expressions via expression caching. Ex-pression caching has been used previously in a symbolicexecution-based analysis by [22] where the authors uses Her-brand equivalence [20] between symbolic expressions whichstates the expressions are equivalent only when they are ex-actly equal (i.e., the equivalence relation is operation in-terpretation independent) to check result equivalence of asingle-threaded and a parallel program developed to per-form the same scientific computations. LDP’s arithmeticexpression caching facility allows Kiasan to partially benefitfrom this concept, however, LDP does not attempt to com-pletely detect Herbrand equivalences. It only implementssimple caching strategy that can be done linearly, becauseLDP does not aim to provide a complete answer (such deci-sions are deferred to the underlying decision procedure). [21,6] use simple constraint subsumption (e.g., if a constraint Cis in the PC conjuncts, then avoid branches with condition¬C). LDP’s inference engine subsumes these approaches.

Constant propagation is a well-known technique, for ex-ample, in dataflow framework [19]. However, SymExe canbenefit more from a constant propagation and LDP’s strongerinference engine due to its path-sensitive information thatdoes not lose precision due to merging of dataflow facts.

6 Conclusion and Future WorkIn this paper, we have presented a lightweight decision pro-cedure (LDP) effective for reducing analysis time cost ofSymExe-based techniques. Our experimental results demon-strate that LDP can have up to an order of magnitude fasteranalysis time. The approach employs several well-studiedand widely known algorithms such as interval analysis, termrewriting, constraint caching, and constant propagation, butadapted to SymExe’s characteristics and our focus on linearspace and time complexity algorithms with respect to thesize of SymExe path condition. Any SymExe program anal-ysis working on three-address code can benefit from LDP’sapproach because it generates similar forms of constraints.Furthermore, the model-based engineering approach in LDP

Page 10: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

development enables adaptation to other framework by de-veloping custom translators.

Future work includes conducting larger case studies to de-termine opportunities for further optimizations, and incor-porating other, possibly non-linear algorithms and evaluat-ing their effectiveness. Other area of interest is to mechan-ically prove the soundness of LDP directive tables using aproof assistant.

7 Acknowledgments

The authors would like to thank John Hatcliff, MatthewDwyer, David Schmidt, Xinming Ou, and Jooyong Lee forinsightful discussion on LDP, and Bruno Duterte and JohnRushby for insightful discussion on Yices internal algorithmsand SMT solvers in general.

8 References

[1] S. Anand, C. S. Pasareanu, and W. Visser. JPF-SE: Asymbolic execution extension to Java PathFinder. InInternational Conference on Tools and Algorithms forConstruction and Analysis of Systems (TACAS 2007),pages 134–138, Braga, Portugal, March 2007.

[2] L. Baresi, C. Ghezzi, and L. Mottola. On accurateautomatic verification of publish-subscribearchitectures. In ICSE ’07: Proceedings of the 29thInternational Conference on Software Engineering,pages 199–208, Washington, DC, USA, 2007. IEEEComputer Society.

[3] C. Barrett, S. Ranise, A. Stump, and C. Tinelli. TheSatisfiability Modulo Theories Library (SMT-LIB).www.SMT-LIB.org, 2008.

[4] C. Barrett and C. Tinelli. CVC3. In W. Damm andH. Hermanns, editors, Proceedings of the 19th

International Conference on Computer AidedVerification (CAV ’07), volume 4590 of Lecture Notesin Computer Science, pages 298–302. Springer-Verlag,July 2007. Berlin, Germany.

[5] J. Belt, Robby, and X. Deng. Sireum/Topi LDP: Alightweight semi-decision procedure for optimizingsymbolic execution-based analyses. Technical ReportSAnToS TR2009-03-16, Kansas State University, Mar2009. Available from http://people.cis.ksu.edu/

~belt/reports/SanToS-TR2009-03-16.pdf.

[6] C. Cadar, D. Dunbar, and D. R. Engler. KLEE:Unassisted and automatic generation of high-coveragetests for complex systems programs. In OSDI, pages209–224, 2008.

[7] M. B. Cohen, M. B. Dwyer, and J. Shi. Exploitingconstraint solving history to construct interaction testsuites. In TAICPART-MUTATION ’07: Proceedingsof the Testing: Academic and Industrial ConferencePractice and Research Techniques - MUTATION,pages 121–132, Washington, DC, USA, 2007. IEEEComputer Society.

[8] P. Cousot and R. Cousot. Static determination ofdynamic properties of programs. In Proceedings of theSecond International Symposium on Programming,pages 106–130. Dunod, Paris, France, 1976.

[9] X. Deng, J. Lee, and Robby. Bogor/Kiasan: Ak-bounded symbolic execution for checking strongheap properties of open systems. In 21st IEEE/ACM

International Conference on Automated SoftwareEngineering (ASE), pages 157–166, 2006.

[10] X. Deng, Robby, and J. Hatcliff. Kiasan/KUnit:Automatic test case generation and analysis feedbackfor open object-oriented systems. In Testing:Academic and Industrial Conference – Practice andResearch Techniques, 2007.

[11] X. Deng, Robby, and J. Hatcliff. Towards acase-optimal symbolic execution algorithm foranalyzing strong properties of object-orientedprograms. In Proceedings of the 5th IEEEInternational Conference on Software Engineering andFormal Methods, pages 273–282, 2007.

[12] N. Dershowitz and J.-P. Jouannaud. Rewrite systems.In Handbook of Theoretical Computer Science, VolumeB: Formal Models and Sematics (B), pages 243–320,1990.

[13] B. Duterte. Personal communication, 2008.

[14] B. Dutertre and L. de Moura. The Yices SMT solver.Tool paper at http://yices.csl.sri.com/tool-paper.pdf,August 2006.

[15] M. B. Dwyer, J. Hatcliff, Robby, and V. P.Ranganath. Exploiting object escape and lockinginformation in partial-order reductions for concurrentobject-oriented programs. Formal Methods in SystemDesign, 25(2-3):199–240, 2004.

[16] J. C. King. Symbolic execution and program testing.Communications of the ACM, 19(7):385–394, 1976.

[17] G. T. Leavens, A. L. Baker, and C. Ruby. JML: aJava modeling language. In In Formal Underpinningsof Java Workshop (at OOPSLA’98), 1998.

[18] R. Moore. Interval Analysis. Prentice Hall, 1966.

[19] F. Nielson, H. R. Nielson, and C. Hankin. Principlesof Program Analysis. Springer-Verlag New York, Inc.,Secaucus, NJ, USA, 1999.

[20] O. Ruthing, J. Knoop, and B. Steffen. Detectingequalities of variables: Combining efficiency withprecision. In SAS ’99: Proceedings of the 6thInternational Symposium on Static Analysis, pages232–247, London, UK, 1999. Springer-Verlag.

[21] K. Sen, D. Marinov, and G. Agha. CUTE: a concolicunit testing engine for c. In ESEC/FSE-13:Proceedings of the 10th European software engineeringconference held jointly with 13th ACM SIGSOFTinternational symposium on Foundations of softwareengineering, pages 263–272, New York, NY, USA,2005. ACM.

[22] S. F. Siegel, A. Mironova, G. S. Avrunin, and L. A.Clarke. Using model checking with symbolic executionto verify parallel numerical programs. In ISSTA ’06:Proceedings of the 2006 international symposium onSoftware testing and analysis, pages 157–168, NewYork, NY, USA, 2006. ACM.

[23] C. Tinelli. A DPLL-based calculus for groundsatisfiability modulo theories. In JELIA ’02:Proceedings of the European Conference on Logics inArtificial Intelligence, pages 308–319, London, UK,2002. Springer-Verlag.

[24] K. Walden and J.-M. Nerson. Seamless object-orientedsoftware architecture: analysis and design of reliablesystems. Prentice-Hall, Inc., Upper Saddle River, NJ,USA, 1995.

Page 11: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

[25] Sireum website. http://www.sireum.org, 2009.

APPENDIX

A Extended ProofIn this section, we will first describe a property of LDP’ssymbol to constant tables, C. Then we will prove two Lem-mas about the query and update operations of LDP. Finally,we will prove that LDP is sound, that is, when LDP returnsa concrete result such as VALID, UNSAT, BSAT, the resultis correct.

Property 1. For any symbol α, it can not have map-pings in C≤ and C< at the same time, that is, C≤(α) = ⊥or C<(α) = ⊥. Similarly, it can not have mappings in C≥and C> at the same time, that is, C≥(α) = ⊥ or C>(α) = ⊥.

It is apparent from the query and update processes. Oneimmediate consequence of this property of C tables is that allentries ignored by jumps in LDP directive tables for querieshave no mapping of symbol α. Therefore, if LDP returnsUNKNOWN in C tables, LDP has examined all the symbolto constant constraints about the queried symbol.

In the Lemmas about the query and update processes andthe final soundness theorem, we will use following notations.

Let the path condition be PC and LDP’s context be bC (im-plied by the mappings such as C, V , and variable to binary

expression map) and the decision procedure’s context be bD.The current constraint is C. On receiving the constraint C,LDP will canonicalize the constraint into C′. Clearly, C = C′.Without loss of generality, we use C for C′.

Lemma 1. Suppose constraint C is a test constraint. ThenKiasan will perform a query operation on LDP. (a) If LDP

returns VALID then bC =⇒ C. (b) If LDP returns UNSAT thenbC =⇒ ¬C. (c) If LDP returns BSAT then bC ∧C and bC ∧¬Care both satisfiable.

Proof. We proceed by a case analysis according to theforms of the constraint C. Since all the cases are similar, weonly show the case of C = α<c. We further divide the proofinto sub-cases according the result of LDP’s query processreturns:• VALID: In the LDP directive table, we can see that

LDP reaches the conclusion at lines 3, 14, and 15. Sup-pose LDP returns VALID at line 3. We know c>d and

α= d is in bC. Clearly bC =⇒ α<c. The reasoning forlines 14 and 15 is similar.• UNSAT: Then LDP returns at lines 1, 2, 4, 5, 7, or 8

in the directive table. Suppose LDP returns UNSAT

at line 1. We know α= d is contained in bC and c<d.Thus bC =⇒ ¬(α<c). The reasoning for the rest of thelines is similar.• BSAT: LDP returns BSAT at lines 10, 11, 13 when α is

not related to another variable or at lines 6 and 9 whenthe rest of the lookup can not find an α entry. Withoutloss of generality, let us just consider when LPD returnsBSAT at line 6. By Property 1, we know C>(α) = ⊥.So LDP has exhaustively checked all constraints thatare relevant to α and found the intersections of therelevant constraints in bC with C and ¬C are not empty.

Therefore, bC ∧C and bC ∧¬C are both satisfiable.

Lemma 2. Suppose bC′ is the resulting context of LDP af-ter the update operation. If the true branch is chosen andbC ∧C is not false then bC′= bC ∧C. If the false branch is cho-

sen and bC ∧¬C is not false then bC′= bC ∧¬C.

Proof. If the constraint bC is a non-test constraint, thenbC is directly added into LDP tables. Thus we get bC′ = bC ∧C.Suppose bC is a test constraint. Then LDP will have a queryfirst. If the result of the query is BSAT or UNKNOWN, thenLDP will update according to the update tables. Similar toLemma 1, we proceed by a case analysis according to theform of the constraint C. We show the case of C = α<cand the rest of the cases are similar. The proof is furtherdivided into sub-cases according to LDP’s update rules, thatis, where the query concludes.• C=: it vacuously holds since the table will only return

VALID or UNSAT.• C≥: For the true branch, the update rule adds c to α’s

C< mapping. So we have bC′= bC ∧α<c. For the falsebranch, the update rule adds c to α’s C≥ mapping so

we have bC′= bC ∧α≥ c.• C>: For the true branch, the update rule adds c to α’s

C< mapping so bC′ = bC ∧ α < c. Suppose the queryends at the false branch. From the query rule we know

c>d where α>d ∈ bC. Suppose bC = bC1 ∧α>d. Then

from the update rule, we have bC′ = bC1 ∧α≥ c. Since

c>d, we have bC′ = bC1 ∧α≥ c = bC1 ∧α>d∧α≥ c =bC ∧α≥ c.• C≤: Suppose the query ends in true branch. From the

query rule we know c≤ d where α≤ d ∈ bC. SupposebC = bC1 ∧α≤ d. Then from the update rule, we havebC′ = bC1 ∧α<c. Since c≤ d, we have bC′ = bC1 ∧α≤ c =bC1 ∧α≤ d∧α<c = bC ∧α<c.Suppose the query ends at the false branch. Then we

need to update bC with the constraint ¬C = α≥ c. Bythe query rule, there are two cases: c= d or c<d where

α≤ d is in bC. (I) c= d. Then bC ∧α≥ c = bC ∧ c≤α≤ c =bC ∧α= c since α≤ d is in bC and c= d. The update rulesdrop α’s C≤ mapping and any lower bound and addsc to α’s C= mapping. Referring to the updated con-

text as bC1, we have bC′= bC1 ∧α= c. We are only left

to show that bC1 ∧α= c= bC ∧α= c. This is true sincebC′ is not falseand the differences between bC1 and bC areonly constraints relating α and constants. (II) c<d.From the query rule, there is either no lower bound forα, that is C≤ and C< have no α or c > d where d iseither less than or equal to α. In both of cases, α≥ cwill define a tighter lower bound on α. So the updaterule drops the existing lower bound and adds c to α’s

C≥ mapping. Referring to the updated context as bC1,

we have bC′= bC1 ∧α≥ c = bC ∧α≥ c.• C<: Suppose the query ends at the true branch. The

update rule adds c to α’s C< mapping so bC′ = bC∧α < c.Suppose the query ends at the false branch. This caseis similar to case (II) of the false branch of C≤.

To prove the soundness theorem, we assume the underlyingdecision procedure is sound and complete.

Page 12: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

Theorem 2. (a) If the query result of a constraint C fromLDP is VALID, then PC =⇒ C. If the result is UNSAT,then PC ∧C= false. If the result is BSAT, then both PC ∧Cand PC ∧¬C are satisfiable; and (b) PC= bC= bD.

Proof. We prove by strong co-induction on the num-ber of constraints in PC. Basis, the number is 0. Then

PC= bC= bD= true. Clearly Part(b) holds. Furthermore,Part(a) holds vacuously. Inductive step, assume the induc-tion hypothesis (IH) that for all number of constraints in PCless than k, Part(a) and Part(b) hold. Assume the next con-straint is C. If C is a non-test constraint, Part(a) holds vac-

uously. Since C is added to PC and bC and bD and IH, Part(b)holds. Suppose C is a test constraint. We need to show thatafter the query of C, Part(a) holds; after the query of C and

the update operation, Part(b) holds, that is, PC′= bC′= bD′where PC′, bC′, and bD′ are the new path condition, LPD’scontext, decision procedure’s context respectively.

Part(a). (I) Suppose the result of query is VALID. Thus,bC =⇒ C by Lemma 1. Since PC= bC by IH and bC =⇒ C,we have PC =⇒ C. (II) Now, suppose the result is UNSAT.

Thus, bC =⇒ ¬C by Lemma 1. Since we also know PC= bCby IH, we have PC =⇒ ¬C. Hence, ¬(PC ∧C). (III) Lastly,

suppose the result is BSAT. Then bC ∧C and bC ∧¬C are sat-

isfiable by Lemma 1. Since bC = PC by IH, we have bothPC ∧C and PC ∧¬C are satisfiable. Hence, we have shownPart(a).

Part(b). We also divide the proof into cases according to theresult of the query. (I) Suppose the result of query is VALID.

Thus, we get bC =⇒ C, and there is no update, that is,bC= bC′ and bD= bD′. By Part(a), we know PC =⇒ C. Thus,

we have PC′ = PC ∧C=PC. Consequently, PC′ = bC′= bD′.(II) Suppose the result is UNSAT. From Part(a), we know¬(PC ∧C). Thus, the system will backtrack which effectivelybrings PC′ to a state of less than k constraints. There-fore, Part(b) holds by IH. (III) Now, suppose the resultis BSAT and the system takes the true branch. We havebC ∧C is satisfiable by Lemma 1. Thus, by Lemma 2, bC′ =bC ∧C. System updates its path condition: PC′ = PC ∧Cand pushes C into the decision procedure: bD′ = bD∧C.Combining with IH, we have PC′ = bC′= bD′. The case ofsystem taking the false branch is symmetric. (IV) Lastly,suppose the result is UNKNOWN. LDP will query the deci-sion procedure. If the decision procedure returns satisfiable

for C, then bD∧C is satisfiable by the assumption of thedecision procedure being sound and complete. By IH, we

have bC ∧C is satisfiable. Therefore, we can apply Lemma 2

and get bC′ = bC ∧C. We also know that PC′ = PC ∧C andbD′ = bD∧C. Hence, PC′ = bC′= bD′. The proof of decisionprocedure returning satisfiable for ¬C is similar.

B LDP Directive Tables

α<c ?C Test Result Value Example #

C=c < C(α) UNSAT

-d

ssc

< c1

c = C(α) UNSAT

-d

ssc, d

< c2

c > C(α) VALID

-d

ssc

< c3

C≥c < C(α) UNSAT

-d

s >

c

< c4

c = C(α) UNSAT

-d

s >

c, d

< c5

c > C(α) C≤?

-d

s >

c

< c6

C>

c < C(α) UNSAT

-d

c >

c

< c7

c = C(α) UNSAT

-d

c >

c, d

< c8

c > C(α) C≤?

-d

c >

c

< c9

C≤c < C(α) UNKNOWN

-d

< sc

< c10

c = C(α) UNKNOWN

-d

< sc, d

< c11

c > C(α) VALID

-d

< sc

< c12

C<

c < C(α) UNKNOWN

-d

< cc

< c13

c = C(α) VALID

-d

< cc, d

< c14

c > C(α) VALID

-d

< cc

< c15

α<c !C Update Rules #

C≥F) α ≥ c → C≥[α 7→ c] 1T) α < c → C<[α 7→ c] 2

C>F) α ≥ c → C>[α 7→ ⊥], C≥[α 7→ c] 3T) α < c → C<[α 7→ c] 4

C≤F) α ≥ c →

if c = C≤(α)then C≥,>,≤[α 7→ ⊥], C=[α 7→ c]else C>[α 7→ ⊥], C≥[α 7→ c]

5

T) α < c → C≤[α 7→ ⊥], C<[α 7→ c] 6

C<F) α ≥ c → C>[α 7→ ⊥], C≥[α 7→ c] 7T) α < c → C<[α 7→ c] 8

Table 2: Query and Update Rules for α<c

Page 13: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

α≤ c ?C Test Result Value Example #

C=c < C(α) UNSAT

-d

ssc

< s1

c = C(α) VALID

-d

ssc, d

< s2

c > C(α) VALID

-d

ssc

< s3

C≥c < C(α) UNSAT

-d

s >

c

< s4

c = C(α) C>?

-d

s >

c, d

< s5

c > C(α) C>?

-d

s >

c

< s6

C>

c < C(α) UNSAT

-d

c >

c

< s7

c = C(α) C≤?

-d

c >

c, d

< s8

c > C(α) C≤?

-d

c >

c

< s9

C≤c < C(α) UNKNOWN

-d

< sc

< s10

c = C(α) VALID

-d

< sc, d

< s11

c > C(α) VALID

-d

< sc

< s12

C<

c < C(α) UNKNOWN

-d

< cc

< s13

c = C(α) VALID

-d

< cc, d

< s14

c > C(α) VALID

-d

< cc

< s15

α≤ c !C Update Rules #

C≥T) α ≤ c →

if c = C≥(α)then C≥[α 7→ ⊥], C=[α 7→ c]else C≤[α 7→ c]

1

F) α ≥ c → C≥[α 7→ c] 2F) α > c → C≥[α 7→ ⊥], C>[α 7→ c] 3

C>T) α ≤ c → C≤[α 7→ c] 4F) α > c → C>[α 7→ c] 5

C≤T) α ≤ c → C≤[α 7→ c] 6

F) α ≥ c →if c = C≤(α)then C≤[α 7→ ⊥], C=[α 7→ c]else C≥[α 7→ c]

7

F) α > c → C>[α 7→ c] 8

C<T) α ≤ c → C<[α 7→ ⊥], C≤[α 7→ c] 9F) α > c → C>[α 7→ c] 10

Table 3: Query and Update Rules for α≤ c

α>c ?C Test Result Value Example #

C=c < C(α) VALID

-d

ssc

c >

1

c = C(α) UNSAT

-d

ssc, d

c >

2

c > C(α) UNSAT

-d

ssc

c >

3

C≤c < C(α) C≥?

-d

< sc

c >

4

c = C(α) UNSAT

-d

< sc, d

c >

5

c > C(α) UNSAT

-d

< sc

c >

6

C<

c < C(α) C≥?

-d

< cc

c >

7

c = C(α) UNSAT

-d

< cc, d

c >

8

c > C(α) UNSAT

-d

< cc

c >

9

C≥c < C(α) VALID

-d

s >

c

c >

10

c = C(α) UNKNOWN

-d

s >

c, d

c >

11

c > C(α) UNKNOWN

-d

s >

c

c >

12

C>

c < C(α) VALID

-d

c >

c

c >

13

c = C(α) VALID

-d

c >

c, d

c >

14

c > C(α) UNKNOWN

-d

c >

c

c >

15

α>c !C Update Rules #

C≤F) α ≤ c → C≤[α 7→ c] 1T) α > c → C>[α 7→ c] 2

C<F) α ≤ c → C<[α 7→ ⊥], C≤[α 7→ c] 3T) α > c → C>[α 7→ c] 4

C≥F) α ≤ c →

if c = C≥(α)then C≥[α 7→ ⊥], C=[α 7→ c]else C≤[α 7→ c]

5

T) α > c → C≥[α 7→ ⊥], C>[α 7→ c] 6

C>F) α ≤ c → C≤[α 7→ c] 7T) α > c → C>[α 7→ c] 8

Table 4: Query and Update Rules for α>c

Page 14: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

α≥ c ?C Test Result Value Example #

C=c < C(α) VALID

-d

ssc

s >

1

c = C(α) VALID

-d

ssc, d

s >

2

c > C(α) UNSAT

-d

ssc

s >

3

C≤c < C(α) C≥?

-d

< sc

s >

4

c = C(α) C≥?

-d

< sc, d

s >

5

c > C(α) UNSAT

-d

< sc

s >

6

C<

c < C(α) C≥?

-d

< cc

s >

7

c = C(α) UNSAT

-d

< cc, d

s >

8

c > C(α) UNSAT

-d

< cc

s >

9

C≥c < C(α) VALID

-d

s >

c

s >

10

c = C(α) VALID

-d

s >

c, d

s >

11

c > C(α) UNKNOWN

-d

s >

c

s >

12

C>

c < C(α) VALID

-d

c >

c

s >

13

c = C(α) VALID

-d

c >

c, d

s >

14

c > C(α) UNKNOWN

-d

c >

c

s >

15

α≥ c !C Update Rules #

C≤F) α ≤ c → C≤[α 7→ c] 1

T) α ≥ c →if c = C≤(α)then C≤[α 7→ ⊥], C=[α 7→ c]else C≥[α 7→ c]

2

F) α < c → C≤[α 7→ ⊥], C<[α 7→ c] 3

C<T) α ≥ c → C≥[α 7→ c] 4F) α < c → C<[α 7→ c] 5

C≥F) α ≤ c →

if c = C≥(α)then C≥[α 7→ ⊥], C=[α 7→ c]else C≤[α 7→ c]

6

T) α ≥ c → C≥[α 7→ c] 7F) α < c → C<[α 7→ c] 8

C>T) α ≥ c → C>[α 7→ ⊥], C≥[α 7→ c] 9F) α < c → C<[α 7→ c] 10

Table 5: Query and Update Rules for α≥ c

α= c ?C Test Result Value Example #

C=c < C(α) UNSAT

-d

ssc

ss1

c = C(α) VALID

-d

ssc, d

ss2

c > C(α) UNSAT

-d

ssc

ss3

C6=c ∈ C(α) UNSAT

-d

ss4

c 6∈ C(α) C≤?

-d

ss5

C≤c < C(α) C≥?

-d

< sc

ss6

c = C(α) C≥?

-d

< sc, d

ss7

c > C(α) UNSAT

-d

< sc

ss8

C<

c < C(α) C≥?

-d

< cc

ss9

c = C(α) UNSAT

-d

< cc, d

ss10

c > C(α) UNSAT

-d

< cc

ss11

C≥c < C(α) UNSAT

-d

s >

c

ss12

c = C(α) UNKNOWN

-d

s >

c, d

ss13

c > C(α) UNKNOWN

-d

s >

c

ss14

C>

c < C(α) UNSAT

-d

c >

c

ss15

c = C(α) UNSAT

-d

c >

c, d

ss16

c > C(α) UNKNOWN

-d

c >

c

ss17

α= c !C Update Rules #

C6=T) α = c → C=[α 7→ c] 1F) α 6= c → C 6=[α 7→ C(α) ∪ {c}] 2

C≤T) α = c → C≤,≥,>[α 7→ ⊥], C=[α 7→ c] 3

F) α 6= c →if c = C≤(α)then C≤[α 7→ ⊥], C<[α 7→ c]else C6=[α 7→ C(α) ∪ {c}]

4

C<T) α = c → C<[α 7→ ⊥], C=[α 7→ c] 5F) α 6= c → C 6=[α 7→ C(α) ∪ {c}] 6

C≥T) α = c → C≥,≤,<[α 7→ ⊥], C=[α 7→ c] 7

F) α 6= c →if c = C≥(α)then C≥[α 7→ ⊥], C>[α 7→ c]else C6=[α 7→ C(α) ∪ {c}]

8

C>T) α = c → C>,≤,<[α 7→ ⊥], C=[α 7→ c] 9F) α 6= c → C 6=[α 7→ C(α) ∪ {c}] 10

Table 6: Query and Update Rules for α= c

Page 15: Sireum/Topi LDP: A Lightweight Semi-Decision Procedure for ...people.cis.ksu.edu/~belt/reports/SanToS-TR2009-03-16.pdf · 3/16/2009  · 7]). Unfortunately, most solvers are designed

α 6= c ?C Test Result Value Example #

C 6=c ∈ C(α) VALID

-d

ss1

c 6∈ C(α) C=?

-d

ss2

C=c < C(α) VALID

-d

ssc

ss3

c = C(α) UNSAT

-d

ssc, d

ss4

c > C(α) VALID

-d

ssc

ss5

C≥c < C(α) VALID

-d

s >

c

ss6

c = C(α) UNKNOWN

-d

s >

c, d

ss7

c > C(α) C≤?

-d

s >

c

ss8

C>

c < C(α) VALID

-d

c >

c

ss9

c = C(α) VALID

-d

c >

c, d

ss10

c > C(α) C≤?

-d

c >

c

ss11

C≤c < C(α) UNKNOWN

-d

< sc

ss12

c = C(α) UNKNOWN

-d

< sc, d

ss13

c > C(α) VALID

-d

< sc

ss14

C<

c < C(α) UNKNOWN

-d

< cc

ss15

c = C(α) VALID

-d

< cc, d

ss16

c > C(α) VALID

-d

< cc

ss17

α 6= c !C Update Rules #

C 6=F) α = c → C≥,>,≤,<[α 7→ ⊥], C=[α 7→ c] 1T) α 6= c → C6=[α 7→ C(α) ∪ {c}] 2

C≥F) α = c → C≥,>,≤,<[α 7→ ⊥], C=[α 7→ c] 3

T) α 6= c →if c = C≥(α)then C≥[α 7→ ⊥], C>[α 7→ c]else C6=[α 7→ C(α) ∪ {c}]

4

C>F) α = c → C>,≤,<[α 7→ ⊥], C=[α 7→ c] 5T) α 6= c → C6=[α 7→ C(α) ∪ {c}] 6

C≤F) α = c → C≤,≥,>[α 7→ ⊥], C=[α 7→ c] 7

T) α 6= c →if c = C≤(α)then C≤[α 7→ ⊥], C<[α 7→ c]else C6=[α 7→ C(α) ∪ {c}]

8

C<F) α = c → C<,≥,>[α 7→ ⊥], C=[α 7→ c] 9T) α 6= c → C6=[α 7→ C(α) ∪ {c}] 10

Table 7: Query and Update Rules for α 6= c

α<β ?V Test Result Value #V= β ∈ V (α) UNSAT 1V 6= β ∈ V (α) V<? 2V< β ∈ V (α) VALID 3V≤ β ∈ V (α) UNKNOWN 4

V≥ β ∈ V (α) UNSAT 5

V> β ∈ V (α) UNSAT 6

α<β !V Update Rules #

V 6=F) α ≥ β → V6=[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 1T) α < β → V 6=[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 2

V≤F) α ≥ β → V≤[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3T) α < β → V≤[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 4

Table 8: Query and Update Rules for α<β

α≤ β ?V Test Result Value #V= β ∈ V (α) VALID 1V 6= β ∈ V (α) V≤? 2

V≤ β ∈ V (α) VALID 3

V< β ∈ V (α) VALID 4V≥ β ∈ V (α) UNKNOWN 5

V> β ∈ V (α) UNSAT 6

α≤ β !V Update Rules #

V 6=T) α ≤ β → V6=[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 1F) α > β → V6=[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 2

V≥T) α ≤ β → V≥[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3F) α > β → V≥[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 4

Table 9: Query and Update Rules for α≤β

α>β ?V Test Result Value #V= β ∈ V (α) UNSAT 1V 6= β ∈ V (α) V≥? 2

V≥ β ∈ V (α) UNKNOWN 3

V> β ∈ V (α) VALID 4V< β ∈ V (α) UNSAT 5V≤ β ∈ V (α) UNSAT 6

α>β !V Update Rules #

V 6=F) α ≤ β → V6=[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 1T) α > β → V6=[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 2

V≥F) α ≤ β → V≥[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3T) α > β → V≥[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 4

Table 10: Query and Update Rules for α>β

α≥ β ?V Test Result Value #V= β ∈ V (α) VALID 1V 6= β ∈ V (α) V≥? 2

V≥ β ∈ V (α) VALID 3

V> β ∈ V (α) VALID 4V≤ β ∈ V (α) UNKNOWN 5

V< β ∈ V (α) UNSAT 6

α≥ β !V Update Rules #

V 6=T) α ≥ β → V6=[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 1F) α < β → V6=[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 2

V≤T) α ≥ β → V≤[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3F) α < β → V≤[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 4

Table 11: Query and Update Rules for α≥β

α= β ?V Test Result Value #V= β ∈ V (α) VALID 1V 6= β ∈ V (α) UNSAT 2V< β ∈ V (α) UNSAT 3V> β ∈ V (α) UNSAT 4V≤ β ∈ V (α) UNKNOWN 5

V≥ β ∈ V (α) UNKNOWN 6

α= β !V Update Rules #

V≤T) α = β → V≤[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 1F) α 6= β → V≤[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 2

V≥T) α = β → V≥[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3F) α 6= β → V≥[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 4

Table 12: Query and Update Rules for α=β

α 6= β ?V Test Result Value #V 6= β ∈ V (α) VALID 1V= β ∈ V (α) UNSAT 2V< β ∈ V (α) VALID 3V> β ∈ V (α) VALID 4V≤ β ∈ V (α) UNKNOWN 5

V≥ β ∈ V (α) UNKNOWN 6

α 6= β !V Update Rules #

V≤F) α = β → V≤[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 1T) α 6= β → V≤[α 7→ V (α) \ {β}], V<[α 7→ V (α) ∪ {β}] 2

V≥F) α = β → V≥[α 7→ V (α) \ {β}], V=[α 7→ V (α) ∪ {β}] 3T) α 6= β → V≥[α 7→ V (α) \ {β}], V>[α 7→ V (α) ∪ {β}] 4

Table 13: Query and Update Rules for α 6=β