b2 framework for infinite horizon stochastic linear · pdf fileuniversit´e de paris 13;...

Noname manuscript No.(will be inserted by the editor)

A Benders squared (B2) framework for infinite-horizonstochastic linear programs

Giacomo Nannicini · Emiliano Traversi ·

Roberto Wolfler Calvo

Received: date / Accepted: date

Abstract We propose a nested decomposition scheme for infinite-horizon stochas-tic linear programs. Our approach can be seen as a provably convergent extensionof stochastic dual dynamic programming to the infinite-horizon setting: we ex-plore a sequence of finite-horizon problems of increasing length until we can proveconvergence with a given confidence level. The methodology alternates between aforward pass to explore sample paths and determine trial solutions, and a back-ward to generate a polyhedral approximation of the optimal value function bycomputing subgradients from the dual of the scenario subproblems. A compu-tational study on a large set of randomly generated instances for two classes ofproblems shows that the proposed algorithm is able to effectively solve instancesof moderate size to high precision, provided that the instance structure allows theconstruction of stationary policies with satisfactory objective function value.

Keywords Benders Decomposition · Stochastic Programming · Cutting Planes

1 Introduction

In the context of optimal planning under uncertainty, there are many situationsin which the decision maker is interested in solving a steady-state planning prob-lem. Such a scenario arises whenever there are repeated decisions that have to be

G. NanniciniIBM T.J. Watson, Yorktown Heights, NYE-mail: [email protected]

E. TraversiLaboratoire d’Informatique de Paris Nord,Universite de Paris 13; and Sorbonne Paris Cite,CNRS (UMR 7538), 93430 Villetaneuse, FranceE-mail: [email protected]

R. Wolfler CalvoLaboratoire d’Informatique de Paris Nord,Universite de Paris 13; and Sorbonne Paris Cite,CNRS (UMR 7538), 93430 Villetaneuse, FranceE-mail: [email protected]

2 Giacomo Nannicini et al.

taken over an infinite (i.e. unbounded) time horizon, for example if production ofa given set of items has to be planned every day under uncertain demands, andthis process is repeated indefinitely. In this type of model there is a discrete timeindex t that ranges from zero (present time) to infinity, and a decision problemmust be solved for every integer value of t (called a stage). The decision problemat each stage depends on the state of the system, which is influenced by both therealization of uncertain events, and previous decisions (often called actions). Forexample, the state of the system could be the inventory of the items at hand, andthe actions could be the set of items ordered from a supplier at that point in time.Each action incurs a cost, and the objective is to find a rule that maps states intoactions (called a policy) in order to minimize the total expected cost over the timehorizon t = 0, . . . ,∞. To guarantee finiteness of the cost, a geometric discountfactor for the cost is typically introduced, to model the fact that the present-timevalue of decisions that will be taken in the far future is small, and goes to zerothe longer the distance in time. A problem of this form is naturally described asa dynamic program (DP, to be distinguished from “dynamic programming” bycontext), introduced in the seminal work of Bellman [1957]. Well-known method-ologies to solve infinite-horizon DPs are value iteration and policy iteration, whichconverge to an optimal solution under some assumptions, see Bertsekas [1995].These methodologies however encounter considerable difficulties when the statespace (the set of possible states) is infinite, because in that case the basic iterationstep of these algorithms is no longer able to loop over all the states Powell [2011,Ch. 3].

A class of DPs exhibiting infinite state and action spaces is that of multistagestochastic linear programs, see e.g. Birge [1985], Birge and Louveaux [2011], Slykeand Wets [1969]. This type of problems finds many applications within the broadcontext of optimal planning under uncertainty. Examples of such applications arefinancial planning Mulvey and Vladimirou [1991, 1992], Zenios [1993] and capacityexpansion Birge and Louveaux [2011, S. 1.3]. Despite their infinite state and actionspaces, multistage stochastic LPs can be solved using various methodologies underthe assumption that the time horizon is finite: a widely used approach is StochasticDual Dynamic Programming (SDDP), first proposed in Pereira and Pinto [1991]and with many follow-up works, e.g. Donohue and Birge [2006]. SDDP is essen-tially a Benders decomposition scheme that assumes that the sample space Ω isdiscrete and finite, and the realizations of uncertainty are time-wise (interstage)independent. In case the sample space is not discrete and finite, sampling can beused to reduce to the finite case, as discussed in Shapiro [2011], where error boundsare provided.

This paper presents a Benders decomposition scheme for infinite-horizon stochas-tic LPs that is provably convergent to an optimal policy. Our methodology canbe seen as an extension of SDDP to the infinite-horizon setting, and it worksunder similar assumptions: the sample space Ω is discrete and finite, and the re-alizations of uncertainty are not only interstage independent but also identicallydistributed. Notice that some degree of (linear) stagewise dependence can be in-troduced by modifying the problem formulation extending the state variables justas in SDDP, see e.g. Shapiro [2011]. Requiring identically distributed data at eachstage is natural for an infinite-horizon problem when one is interested in a steady-state solution, because such as solution may not be reached if the data processchanges over time. The solution methodology follows a typical Benders scheme for

B2 for infinite-horizon stochastic linear programs 3

multistage problems, but it dynamically determines the length of the time hori-zon to explore, and exploits the i.i.d. assumption to strengthen the cut generationproblems. An alternative approach to find an approximate solution to the problemwould be to determine a large enough truncation of the time horizon, based onthe required precision, and apply SDDP to the truncated problem: our computa-tional experiments show that such an approach is hardly viable in practice, andthe methodology that we propose is several orders of magnitude faster than thisnaive approach on our set of test problems.

We test the proposed methodology on two classes of randomly generated (con-tinuous) problems: a classical multi-product production planning problem, and therebalancing problem. The rebalancing problem arises e.g. in the context of bikesharing systems, where we are looking for a policy to optimally rebalance the avail-ability of bikes at a given set of stations, subject to uncertain demand. Notice thatthese problems are often stated with integrality restrictions on the decision vari-ables, whereas in this paper we aim to solve their continuous relaxation. However,the solution of multi-stage and infinite-horizon problems with discrete variables isnotoriously difficult, and often practically impossible, due to nonconvexities. Forthis reason, a commonly employed heuristic strategy consists in determinining apolyhedral description of the optimal policy for the continuous relaxation of theproblem, adding integrality restrictions on the decision variables only at a latertime to determine a feasible (and most likely suboptimal) policy. If the originalproblem involves discrete variables, the approach presented in this paper can beused as a heuristic in this manner. Numerical experiments show that our algorithmis considerably more effective than applying SDDP to a truncated problem, and isable to solve medium-sized instances (tens of variables, constraints, and scenarios)to high accuracy, i.e. with small gaps between lower and upper bounds. This sug-gests that the methodology could scale to larger instances (hundreds of variablesand constraints) provided the user is willing to accept lower accuracy. Since ourmethodology makes extensive use of Benders cuts, our computational evaluationsheds some light on which implementation details are more effective in practice,in particular showing that a multicut implementation Birge and Louveaux [1988]converges much faster than a single cut implementation, which is predominant inthe literature, e.g. Pereira and Pinto [1991], Kim and Mehrotra [2015], Zou et al.[2016].

The rest of this paper is organized as follows. Section 2 formally describes theproblem studied in this paper and states our assumptions. Section 3 paves the wayfor a decomposition algorithm. Section 4 presents a solution methodology to obtaindual bounds, discusses the computation of primal bounds, and the convergenceproperties of our scheme. Section 5 provides a numerical evaluation of the proposedmethodology, and Section 6 concludes the paper. The rest of this section overviewssome of the existing literature that is most relevant for our paper.

Literature review. The area of multistage stochastic LP has received considerableattention in the mathematical optimization literature, due to its many real-worldapplications. Grinold [1977] studies how an infinite-horizon LP can be approxi-mated by a finite horizon problem, but the paper considers a deterministic settingrather than a stochastic setting. In Romeijn et al. [1992], an extension of LPduality theory to infinite-dimensional problems is studied. However,the literature


discussing practical solution strategies typically focuses on finite problems, i.e.,finite horizon, finite number of variables and constraints.

The most commonly used solution methodologies exploit LP duality to devisesome form of decomposition scheme, see Birge [1985], Pereira and Pinto [1991],Slyke and Wets [1969]. Typically, the data of the problem is assumed to be in-terstage independent, which facilitates the decomposition scheme: this is also theassumption in the present paper. However, some work in addressing interstage de-pendency can be found in the literature: for example, Infanger and Morton [1996]studies how to adapt Benders cuts to the case of a model with interstage depen-dence, while Guigues [2014] proposes a SDDP scheme that can take into accountsome types of interstage dependency. The popularity of SDDP prompted somecomputational studies on how to improve its performance: for example, de Matoset al. [2015] discusses the tuning of SDDP to solve a specific instance, and Birgeand Louveaux [1988] investigates the issue of multicut versus single cut on fivesmall test problems. There have also been studies about sequencing protocols tochoose the order in which subproblems of the decomposition scheme are solved,see Birge [1985], Gassmann [1990], Donohue and Birge [2006]. To deal with prob-lems with a scenario tree that is too large to handle, a methodology to select anǫ-optimal subtree of the scenario tree is given in Heitsch and Rmisch [2009]. Whilenested decomposition approaches are usually formulated for continuous problem,they can be extended to integer problems under some conditions, see the recentwork Zou et al. [2016].

The discussion so far focused on decomposition schemes for multistage stochas-tic LP, under the assumption that uncertainty is described in the form of a scenariotree. A nested decomposition approach based on sampling, rather than scenariotrees, is described in Donohue and Birge [2006].

2 Problem formulation and assumptions

Given a sample space Ω, we consider a stochastic linear mathematical optimizationproblem of the following form:

min c⊤x0 + h⊤y0 +

∞∑

t=1

δt∫

ω∈Ω

(c⊤xt + h⊤yt) Pr(dω)(IHLP-S)

Ax0 +Gy0 ≥ b0

t = 1, . . . ,∞, ω ∈ Ω Axt − Tyt−1 +Gyt ≥ bt(ω)

t = 0, . . . ,∞, ω ∈ Ω Dxt ≥ dt(ω)

t = 0, . . . ,∞, ω ∈ Ω Wyt ≥ wt(ω)

where A ∈ Rm×n1 , G ∈ R

m×n2 , T ∈ Rm×n2 , D ∈ R

p×n1 , W ∈ Rq×n2 , 0 ≤ δ < 1,

bt(ω) ∈ Rm, dt(ω) ∈ R

p, wt(ω) ∈ Rq, and the matrices D,W subsume lower and

upper bounds on the decision variables. Notice that the vector of right-hand sidevalues at time t, collectively denoted Rt(ω) = (bt(ω), dt(ω), wt(ω)), is a randomvariable. In this formulation, uncertainty affects the right-hand side vectors of themathematical program only. A generalization allows the matrices A,G, T,D,Wto be stochastic, but our choice simplifies the exposition and is sufficient for the


applications that motivated this paper. The matrices G and T allow us to easilycast optimal planning under uncertainty problems into our framework.

We make the following assumptions, discussed below:

(A1) There exists U <∞ such that imposing the constraints ‖xt‖∞ ≤ Ut, ‖yt‖∞ ≤Ut does not change the solution to (IHLP-S).

(A2) Ω = ω1, . . . , ωκ with κ <∞.(A3) Rt, t = 0, . . . ,∞, are i.i.d.(A4) The problem has relatively complete recourse, i.e., at every stage t = 1, . . . ,∞,

for every feasible yt−1, there exists a feasible choice of xt, yt.(A5) There is no feasible direction with unbounded cost for t = 0, . . . ,∞ and

ω ∈ Ω.

Assumption (A1) implies that xt and yt are bounded for any finite t. Assumption(A2) enforces finiteness of the sample space possibly taking a finite number of sam-ples from the original space, see e.g. Shapiro [2011]. It is a standard assumptionfor approaches based on Benders decomposition Birge and Louveaux [2011]. As-sumption (A3) is fundamental for the computational efficiency of the methodologydiscussed in this paper, which is based on SDDP. As mentioned in the introduction,this assumption ensures that the data process has a steady state, and some degreeof (linear) stagewise dependence can be introduced extending the state variablesShapiro [2011]. Assumption (A4) simplifies our exposition and avoid dealing withhard-to-detect infeasibilities of (IHLP-S). Assumption (A5) ensures that the costsδt(c⊤xt + h⊤yt) are uniformly bounded, see e.g. Romeijn et al. [1992].

The constraints of (IHLP-S) describe the evolution of the system through time.In particular, the variables yt represent the state of the system from the dynamicprogramming point of view Bertsekas [1995], because for any given τ , problem(IHLP-S) is completely determined for time t = τ, . . . ,∞ given the realizationRτ and yτ−1. Since this is a stochastic optimization problem, its solution is apolicy that maps states to actions: hence, in (IHLP-S) the symbols xt, yt shouldbe intended as functions rather than vectors, i.e. values of the decision variables.

It is well known that under our assumptions the problem admits an optimalstationary policy, see e.g. Bertsekas [1995]. More specifically, regardless of the ini-tial conditions R0, there exists an optimal value function V ∗(y) that gives theoptimal cost-to-go, i.e. the minimum infinite-horizon discounted cost incurred forbeing in state y at stage 0. The optimal policy for realization ωk is then sim-ply given by the solution of the Bellman equations Bellman [1957], which in thisspecific case read:

min c⊤x+ h⊤y + V ∗(y)

Ax− T y +Gy ≥ b(ωk) (1)

Dx ≥ d(ωk)

Wy ≥ w(ωk)

Conversely, any stationary policy that attains the above minimum for every statey is optimal.

Let K := 1, . . . , κ. Due to nonanticipativity, the number of possible statesand therefore of possible decisions that can be taken at time t is equal to thenumber of sample paths up to time t, which is |K|t under our assumptions. Fork ∈ K, denote pk = Pr(ωk), bk = b(ωk), dk = d(ωk), wk = w(ωk), since the


time index is not necessary for b, d, w due to independence. For t = 0, . . . ,∞, letSt := Kt = K ×K × · · · ×K

︸︷︷︸

t times

; hence, St represents the possible sample paths up

to stage t. For s ∈ Sτ , we denote by ps the probability of occurrence of all sample

paths that concide with s up to stage τ ; by (A3), ps =∏

t=1,...,τ

pst. Given s ∈ Sτ ,

we denote by s := (s1, . . . , sτ−1), i.e. the τ -tuple s truncated to the (τ − 1)-thelement, and by s := sτ , i.e. the last element of the tuple. We can reformulate(IHLP-S) as follows:

min c⊤x0 + h⊤y0 +

∞∑

t=1

δt∑

s∈St

ps(c⊤xt(s) + h⊤yt(s))(IHLP)

Ax0 +Gy0 ≥ b0

t = 1, . . . ,∞, s ∈ St Axt(s)− Tyt−1(s) +Gyt(s) ≥ bs

t = 0, . . . ,∞, s ∈ St Dxt(s) ≥ ds

t = 0, . . . ,∞, s ∈ St Wyt(s) ≥ ws

In the formulation (IHLP), we allow a different decision xt(s) and state yt(s) forevery sample path. Thus, if (IHLP) could be solved, its solution would be the sameas (IHLP-S), but in (IHLP) the symbols xt(s) and yt(s) denote vectors of deci-sion variables, in the usual mathematical programming fashion. The formulation(IHLP) is therefore an infinite-horizon discounted linear program. This problemis called “doubly-infinite” because it has an infinite number of decision variablesand constraints Romeijn et al. [1992], but just as in Romeijn et al. [1992], eachvariable appears in a finite number of constraints, and each constraint contains afinite number of variables, see also Schochetman and Smith [1992].

3 Benders decomposition: preliminaries

Problem (IHLP) cannot be solved directly by means of linear programming toolsbecause of its infinite size. However, under assumptions (A1)-(A5) the optimalcost of (IHLP) is bounded because future costs decrease following a geometricseries with a discount factor δ < 1. Hence, we are interested in a methodology todetermine the optimal cost and the optimal policy.

In this paper we propose a solution methodology based on Benders decom-position. Benders decomposition has been successfully applied to a finite-horizonversion of (IHLP), for example in SDDP Pereira and Pinto [1991] and in the L-shaped method Slyke and Wets [1969]. We refer to Birge and Louveaux [2011] fora more detailed discussion of decomposition methods in stochastic programming,including nested decomposition methods for multi-stage problems. Note that apossible way to solve (IHLP) within a certain optimality tolerance ǫ > 0 is todetermine a large enough truncation τ of the time horizon so that the objectivefunction contribution of

∑∞t=τ δ

t∑

s∈Stps(c

⊤xt(s) + h⊤yt(s)) ≤ ǫ. Then, solv-ing the truncated problem, for example via SDDP, yields a solution with errorat most ǫ. The desired τ exists due to the boundedness assumption, but it maynot be easy to determine its value. Moreover, in practice it is likely that τ will bevery large if ǫ is small, therefore it would be very inefficient or even practically


impossible to compute the solution to the truncated problem using a methodologyfor finite-horizon problems, and a tailored decomposition scheme is desirable. Thealgorithm that we propose is based on SDDP, and can be seen as its extension tothe infinite-horizon case.

The Benders decomposition approach relies on linear programming duality:the objective function contribution of the parts of the linear program eliminatedfrom the master is estimated using the property that optimal dual vectors aresubgradients of the optimal primal cost expressed as a function of the right-handside vector, see e.g. Bertsimas and Tsitsiklis [1997, Ch. 5]. We remark that, becauseour problem is doubly-infinite, strong duality may not hold in general: see e.g.Duffin and Karlovitz [1965], which points out the existence of a duality gap incase some interior point conditions fail to hold. However, an advantage of theapproach presented in this paper is that the algorithm works with a sequence offinite-dimensional linear programs only, and the convergence proof relies on well-known properties of dynamic programs and a classical cutting plane approachKelley [1960]. For this reason, we do not have to rely on sufficient conditions forstrong duality Romeijn et al. [1992].

It will be convenient to introduce auxiliary variables to represent upper boundson the one-stage costs. From now on, zt(s) denotes an upper bound on the costfrom time t to ∞ starting from sample path s ∈ St. We obtain the followingformulation, which is easily seen to be equivalent to (IHLP):

min c⊤x0 + h⊤y0 + δ∑

k∈K

pkz1(k)(IHLP-F)

Ax0 +Gy0 ≥ b0


t = 0, . . . ,∞, s ∈ St Dxt(s) ≥ ds

t = 0, . . . ,∞, s ∈ St Wyt(s) ≥ ws

t = 1, . . . ,∞, s ∈ St zt(s)− c⊤xt(s)− h⊤yt(s)− δ∑

k∈K

pkzt+1(s× k) ≥ 0

We can apply Benders decomposition on this problem, solving a master problemwith variables x0, y0, z1, keeping the constraints for t = 0 only and estimatingthe contribution of the relaxed constraints through a set of Benders cuts C ob-tained from the dual variables of the slave problem. Note that the slave problemhas an infinite number of variables and constraints and therefore it may not berepresentable by a finite number of Benders cuts, see Sect. 4. However, as notedin Sect. 3, there exists a truncation of the time horizon that yields an ǫ-optimalsolution. Hence, for now we simply assume that we aim at representing the contri-bution of the (infinite-dimensional) slave problem with a finite number of Benderscuts, and we will show in Sect. 4.2 that our methodology indeed converges to an ǫ-optimal solution. The proposed decomposition scheme yields the following masterproblem:

min c⊤x0 + h⊤y0 + δ∑

k∈K

pkz1(k)(MP-L0)

Ax0 +Gy0 ≥ b0

Dx0 ≥ d0


Wy0 ≥ w0

∀γ ∈ C γ⊤y y0 +∑

k∈K

γkz1(k) ≥ γ0.

Notice that the Benders cuts do not involve the variables x0 because these donot appear in the relaxed constraints. This yields κ feasibility subproblems of thefollowing form, where j ∈ K is fixed and y0, z1(j) are given:

min 0(SP-L1-j)

Ax1 +Gy1 ≥ bj + T y0

Dx1 ≥ dj

Wy1 ≥ wj

−c⊤x1 − h⊤y1 − δ∑

k∈K

pkz2(j × k) ≥ −z1(j)


t = 2, . . . ,∞, s ∈ St Dxt(s) ≥ ds

t = 2, . . . ,∞, s ∈ St Wyt(s) ≥ ws

t = 2, . . . ,∞, s ∈ St zt(s)− c⊤xt(s)− h⊤yt(s)− δ∑

k∈K

pkzt+1(s× k) ≥ 0

Since z1(j) is given, the slave problems in our Benders decomposition approachare feasibility problems and we generate feasibility cuts. However, because of as-sumption (A4), the only possible infeasibility in (SP-L1-j) is a violation of theupper bound on the objective function imputed from the master z1(j); then it iseasy to verify that these cuts provide lower bounds to z1(j) in the master, andare in fact equivalent to Benders optimality cuts. Problem (SP-L1-j) is again adoubly-infinite linear program, and we want to solve it with the same decompo-sition scheme: we consider a master problem with the x1(j), y1(j), z2(j) variablesonly, keep the constraints for t = 1 and delegate the rest to the slave problems,one for each scenario. The general form of the level τ slave for scenario j ∈ K,where the value for yτ−1, zτ (j) is fixed and given by the master at level τ − 1, is:

min 0(SP-Lτ -j)

Axτ +Gyτ ≥ bj + T yτ−1

Dxτ ≥ dj

Wyτ ≥ wj

−c⊤xτ − h⊤yτ − δ∑

k∈K

pkzτ+1(j × k) ≥ −zτ (j)

t = τ + 1, . . . ,∞, s ∈ St Axt(s)− Tyt−1(s) +Gyt(s) ≥ bs

t = τ + 1, . . . ,∞, s ∈ St Dxt(s) ≥ ds

t = τ + 1, . . . ,∞, s ∈ St Wyt(s) ≥ ws

t = τ + 1, . . . ,∞, s ∈ St zt(s)− c⊤xt(s)− h⊤yt(s)− δ∑

k∈K

pkzt+1(s× k) ≥ 0

Iterating this idea and solving (SP-Lτ -j) by Benders decomposition, the masterproblem of the level τ slave is the following problem, where we indicate labels for


the dual variables corresponding to each set of constraints between square bracketson the right:

min 0(MP-Lτ -j)

Axτ +Gyτ ≥ bj + T yτ−1 [πy]

Dxτ ≥ dj [πd]

Wyτ ≥ wj [πw]

−c⊤xτ − h⊤yτ − δ∑

k∈K

pkzτ+1(j × k) ≥ −zτ (j) [πz ]

∀γ ∈ C γ⊤y yτ +∑

k∈K

γkzτ+1(j × k) ≥ γ0. [πγ ]

The decomposition algorithm that we propose always generates Benders cuts fromproblems of the form (MP-Lτ -j). The following proposition establishes validityof the Benders cuts generated from any given (MP-Lτ -j) with fixed j = forevery other problem (MP-Lτ -j). Since (MP-Lτ -j) is a relaxation of (SP-Lτ -j), theBenders cuts are valid for (SP-Lτ -j).

Proposition 1 Let π be an unbounded ray of the dual of (MP-Lτ -j) with τ = τ ,j = , C = C. Then the inequality π⊤

y Tyt − πzzt+1() ≤ −π⊤b − π⊤d d − π⊤

ww −∑

γ∈C π⊤γ γ0 is valid for every problem (MP-Lτ -j) for every τ , j and C ⊇ C,

including (MP-L0).

Proof The dual of (MP-Lτ -j) is:

max π⊤y (bj + T yτ−1) + π⊤

d dj + π⊤wwj − πz zτ (j) +

∑

γ∈C

π⊤γ γ0(D-MP-Lτ -j)

π⊤y A+ π⊤

d D − πzc⊤ ≤ 0

π⊤y G+ π⊤

wW − πzh⊤ +

∑

γ∈C

πγγy ≤ 0

k ∈ K −πzδpk +∑

γ∈C

πγγk ≤ 0

πy , πd, πw, πz , πγ ≥ 0

Notice that the feasible region of this problem depends neither on τ nor on j ∈ K,and is therefore the same for all problems (MP-Lτ -j) provided C does not change.Furthermore, it is easy to verify that solutions to (D-MP-Lτ -j) are feasible for thedual of (MP-Lτ -j) with a larger set of cuts C ′ ⊇ C, as the dual variables for thenew cuts can simply be fixed to zero.

Feasibility for all problems (MP-Lτ -j) requires that the dual is bounded, hencefor every unbounded ray of (D-MP-Lτ -j), we must impose that the ray makes anegative angle with the objective function, i.e. π⊤

y (bj + T yτ−1) + π⊤d dj + π⊤

wwj −

πz zτ (j) +∑

γ∈C π⊤γ γ0 ≤ 0. Remembering that the ray π comes from (MP-Lτ -j)

with j = , C = C and that the only decision variables coming from the previous-level problem in the expression above are yτ−1 and zτ (), we obtain the desiredinequality. Notice that since (MP-Lτ -j) for τ = 1 is a slave problem for the Bendersdecomposition in which (MP-L0) is the master, the inequality is valid for (MP-L0)as well.


Algorithm 1 Infinite-horizon B2 algorithm.

1: C ← ∅, τ ← 0, LB ← −∞, UB ← +∞, S∗ ← ∅2: while stopping criterion not satisfied do

3: Increment τ4: Choose Sτ ⊆ Sτ

5: for s ∈ Sτ do

6: for t = 0, . . . , τ do

7: Let y−1(s) = 0.8: Solve P (bst + T yt−1(s), dst , wst , C). Let (xt(s), yt(s), zt+1(s)) be the solution.9: for t = τ, . . . , 1 do

10: Generate Benders cuts from RB(bst + T yt−1(s), dst , wst , zt+1(s), C).11: Add cuts to C.12: for s ∈ S∗ do

13: Sample k ∈ K and extend s← s× k14: Solve P (bk + T yτ−1(s), dk, wk, C). Let (xτ (s), yτ (s), zτ+1(s)) be the solution.15: Generate Benders cuts from RB(bk + T yt−1(s), dk , wk, zt+1(s), C).16: Add cuts to C.17: S∗ ← S∗ ∪ Sτ

18: Solve P (b0, d0, w0, C). Let LB be the value of its optimal solution.19: UB ← ComputeUB(τ, S∗, δ, C, α)20: return (x0, y0), LB, UB

Algorithm 2 ComputeUB(τ, Sτ , δ, C, α)

1: U ← ∅2: for s ∈ Sτ do

3: for t = 0, . . . , τ do

4: Let y−1 = 0.5: Solve P (bst + T yt−1, dst , wst , C). Let (xt, yt, zt+1) be the solution.6: U ← U ∪

∑τt=0 δ

t(c⊤xt + h⊤yt) + δτ+1∑

k∈K cost(y + yτ , y, ωk)

7: return µ(U) +qα

√

|Sτ |σ(U) + δτ+2

1−δ

∑

k∈K pkcost(y, y, ωk)

From a computational perspective, the important consequence of Prop. 1 is thata Benders cut generated at any level of the decomposition scheme, i.e. for any τ ,can be added to all levels of the decomposition. This is in contrast to the classicalSDDP approach, where cuts are generated in a backward fashion starting fromthe last time period, and added only to the master problems of the immediatelypreceding time period. The possibility of sharing the same pool of Benders cutsamong the different levels of the decomposition is due to fact that we are workingon an infinite-horizon, in which case the shape of the value function is the same,regardless of the specific time period. This is not the case in the classical SDDPapproach, where the time horizon is finite. On the other hand, we remark that ifthe matrices A,D,G,W depend on j rather than being fixed, then a cut generatedaccording to Prop. 1 is valid only for problems (MP-Lτ -j) with the corresponding.

4 Description of the algorithm and convergence

Our decomposition algorithm, called B2, relies on a nested Benders decomposi-tion scheme. The algorithm works in a similar way to SDDP, alternating between a


forward pass that determines a sequence of trial solutions, an a backward pass thatimproves the value function approximation at the trial solutions. A pseudocodedescription of algorithm B2 is given in Algorithm 1. The auxiliary problems solvedby the algorithm are defined below: P (b, d, w, C) is the problem solved in the for-ward pass, RB(b, d, w, z, C) the problem solved in the backward pass. The label“P” stands for “Primal”, “RB” for “Relaxed Benders”. Note that RB(b, d, w, z, C)is a feasibility problem, consistently with (MP-Lτ -j).

min c⊤x+ h⊤y + δ∑

k∈K

pkzk

Ax+Gy ≥ bDx ≥ dWy ≥ w

∀γ ∈ C γ⊤y y +∑

k∈K

γkzk ≥ γ0

P (b, d, w, C)

min 0Ax+Gy ≥ b

Dx ≥ dWy ≥ w

−c⊤x− h⊤y − δ∑

k∈K

pkzk ≥ −z

∀γ ∈ C γ⊤y y +∑

k∈K

γkzk ≥ γ0

RB(b, d, w, z, C)

The main idea of the algorithm is to consider a truncated time horizon and applythe nested Benders decomposition scheme to the resulting problem, dynamicallyexpanding the time horizon until convergence is reached. During the course of thealgorithm we take advantage of assumption (A3) and Prop. 1 to immediately addall Benders cuts to all levels of the decomposition. Upper bounds are obtained usinga heuristic, see Sect. 4.1, combined with the classical SDDP approach to computeupper bounds, which would not work on its own because we are considering aninfinite-horizon setting.

The B2 algorithm bears strong similarities with value iteration (see Bertsekas[1995], Powell [2011]), because it iteratively constructs an approximation of thevalue function: it starts from the empty (i.e., identically zero) approximation,and in each iteration it improves the approximation relying on the one obtainedin the previous step. However, a straightforward value iteration scheme wouldnot work in our setting because the state space is infinite, hence we cannot loopover all the states to apply the value function iteration. The convergence rateassociated with a value iteration scheme, e.g. Powell [2011, Thm. 3.4.2 – checkprinted version], can in principle be adapted, but its usefulness would be verylimited in our context because its application requires a bound on the absolutechange of the value function approximation over the entire space state, which is apolyhedral set in this paper rather than a finite set as in the classical DP setting.

4.1 Computation of upper bounds

Any policy for problem (IHLP) corresponds to an upper bound, but the infinite-horizon nature of the problem introduces difficulties in checking feasibility of a


policy, and in computing its value. In practice, to overcome these difficulties weare interested in policies that guarantee feasibility by construction, and for whichthe corresponding value can be computed in a finite number of steps, e.g. a closed-form formula. The approach that we propose to compute upper bounds is describedin Algorithm 2, where the function cost(·, ·, ·) and the meaning of the vector y willbe explained below. The computation is split into two parts: the first part concernsestimating the cost of a feasible policy up to stage τ , where τ is the length of thetime horizon considered at that specific iteration by the B2 algorithm, and thecomputation is done in the usual SDDP fashion, yielding an upper bound with agiven confidence level; the second part computes the cost of a feasible policy fromstage τ to ∞, which we will now discuss in more detail.

To obtain an estimation of the cost from a given stage τ to ∞, we limit ourattention to stationary policies, i.e. policies that always leave the system in thesame state regardless of the time index. Note however that the existence of suchpolicies is not guaranteed. Therefore, our implementation by default applies a rou-tine that tries to compute a stationary policy in a heuristic way; if this routinefails, which is possible depending on the problem at hand, it is necessary to pro-vide a problem-specific algorithm to compute upper bounds. The advantage of astationary policy is obvious: suppose that given a realization ω = ωk, we can com-pute the cost cost(y, y′, ωk) of a policy that starts from state y with right-handside vector Rt(ωk), and leaves the system in state y′. Then we have:

Eω

[∞∑

t=τ

δtcost(y, y, ω)

]

=δτ

1− δ

(∑

k∈K

pkcost(y, y, ωk)

)

,

because the cost depends only on ω and not on t, as the system is always leftin the same state. Notice that the right-hand side of the equation above is easilycomputable once cost(y, y, ωk) is available.

Our default strategy to compute the cost of a stationary policy is the following.We first try to determine a value y of the state variables y that allows feasibilityto be recovered just by changing the x variables, for all realizations ωk. In otherwords, we try to determine a value y such that imposing yt = y does not make(IHLP) infeasible. We remark that assumption (A4) does not guarantee that sucha value exists because we are now imposing yt−1 = yt, which is not part of theoriginal problem. If the desired value y exists, we can compute cost(y, y, ωk) forany value y simply by solving (1) setting y = y and V ∗(y) ≡ 0. Hence, the maindifficulty is determining y. We attempt to do so by solving a linear program. Let bUbe the componentwise maximum of b1, . . . , bκ, dU the componentwise maximum ofd1, . . . , dκ, wU the componentwise maximum of w1, . . . , wκ. We solve the problem:

min c⊤x+ h⊤yAx+ (G− T )y ≥ bU

Dx ≥ dUWy ≥ wU ,

(2)

that can be interpreted as considering a worst case scenario in which all resourcesare maximally constrained, yt−1 = yt (hence the constraint Ax+(G−T )y ≥ bU ),and we try to determine the least-cost state that satisfies the constraints. Notethat by definition of bU , dU , wU , if (2) has a solution y, then we are guaranteedthat all scenarios admit a stationary policy with state y. If (2) does not have a


solution our heuristic fails, and we must resort to a problem-specific approach toconstruct upper bounds. We remark that based on the problem structure it issometimes possible to check if (2) is guaranteed to have a solution, e.g. D ≥ 0,dU = 0, wU = 0, and A ≥ 0 with maxj=1,...,n1

Aij > 0 for all i = 1, . . . ,m, inwhich case y = 0 is feasible for (2). This is the type of structure encountered intwo of the applications tested in Sect. 5.

4.2 Convergence

We now discuss convergence of the B2 algorithm as given in Algorithm 1. The basicidea for the proof is to apply Benders decomposition to the problem obtained witha sufficiently large truncation of the time horizon, and show that as the numberof iterations approaches infinity, with probability 1 our algorithm yields valuefunctions that are at least as good as those of the truncated problem.

We call a policy ǫ-optimal if, from the initial state R0, it determines a sequenceof action that yields a total cost no more than ǫ away from the optimal total cost.Let T be such that

∑∞t=T δ

t∑

s∈Stps(c

⊤xt(s) + h⊤yt(s)) ≤ ǫ, i.e. T is largeenough that the cost contribution of subsequent time periods is smaller than ǫ.Let PT be the problem obtained by truncating (IHLP) at time horizon T . Forstage t = 0, . . . , T and k ∈ K, let V k

t (yt−1) be the corresponding value function,i.e. V k

t (yt−1) gives the optimal cost-to-go for PT starting at stage t with state yt−1

when scenario k ∈ K realizes at stage t. Since PT is a multi-stage linear program, itcan be solved using nested Benders decomposition, computing all value functionsV kt , t = 0, . . . , T, k ∈ K in the process. Note that to compute each value function

we only need Benders cuts obtained from extreme points of the dual of the slaveproblems, hence all V k

t are piecewise linear convex functions with a finite (althoughpossibly huge) number of pieces. Furthermore, an ǫ-optimal solution to (IHLP) canbe computed as the solution of the following problem:

min c⊤x0 + h⊤y0 +∞∑

t=1

δt∑

k∈K

pkVk1 (y0)

Ax0 +Gy0 ≥ b0

Dx0 ≥ d0

Wy0 ≥ w0.

Since V k1 are convex piecewise linear, this problem can be cast as an LP. We will

show that Algorithm 1 constructs an approximation of this problem.Let ℓ be a lower bound on the single period costs; in other words, the constraint

c⊤xt + h⊤yt ≥ ℓ is redundant. It will be convenient to define V kt (yt−1) = ℓ for all

t > T . Let V kC be the value function induced by the Benders cuts collected in C,

i.e. V kC (y) := minzk : γ⊤y y +

∑

k∈K γkzk ≥ γ0 ∀γ ∈ C. We can assume w.l.o.g.

that the cut zk ≥ ℓ is included in C for all k ∈ K. Define the sets Qkt := (y, z) :

z ≥ V kt (y), i.e. the epigraph of V k

t (yt−1), Qkt (ǫ) := (y, z) : z ≥ V k

t (y)− ǫ, i.e.an ǫ-relaxation of the epigraph of V k

t (yt−1).

Lemma 1 As the number of iterations of Algorithm 1 goes to infinity, for any

given ǫ > 0, for all t = 1, . . . ,∞, k ∈ K, and for all (xt−1, yt−1, zt) visited by the

forward pass, (yt−1, VkC (yt−1)) ∈ Qk

t (ǫ) with probability 1.


Proof The proof is by backward induction on t = T, . . . , 1. It is sufficient to provethis claim for a given k ∈ K: the proof is identical for all k. Notice that as thenumber of iterations goes to infinity, the length of the time horizon visited byAlgorithm 1 grows infinitely large.

The basic step is trivial: whenever t > T , V kC (yt−1) ≥ ℓ = V k

t (yt−1).We now prove the induction step at time t ≤ T . Suppose the inductive claim

is not true, i.e. there exists a sample path s ∈ St such that (xt−1, yt−1, zt) isvisited by the forward pass following s, and (yt−1, zt) 6∈ Qk

t (ǫ), which implieszt < V k

t (yt−1) − ǫ, with probability greater than zero. Recall that by definitionzt = V k

C (yt−1).As the number of iterations goes to infinity, a sample path that coincides with

s up to stage t will be sampled infinitely often. The solution of the forward passproblem P (bk + T yt−1(s), dk, wk, C) has value equal to:

v := min c⊤xt + h⊤yt + δ∑

k∈K

pkVkC (yt)

Axt +Gyt ≥ bk + T yt−1

Dxt ≥ dk

Wyt ≥ wk,

and because V kC (yt) ≥ V k

t+1(yt)− ǫ/2 by the induction hypothesis (we can chooseany value of ǫ for convergence, and here ǫ/2 is convenient), the objective functionvalue is greater than or equal to:

V kt (yt−1)− ǫ/2 := min c⊤xt + h⊤yt + δ

∑

k∈K

pkVkt+1(yt)− ǫ/2

Axt +Gyt ≥ bk + T yt−1

Dxt ≥ dk

Wyt ≥ wk.

But then zt < V kt (yt−1)−ǫ < V k

t (yt−1)−ǫ/2 < v, hence a violated Benders cut willbe generated during the backward pass. The new cut is either supporting for theset Qk

t (at the point (yt−1, Vkt (yt−1))), or cuts inside the set. As a straightforward

generalization of Kelley [1960, Sect. 2], the cutting process must converge to apoint ǫ/2-close to the set Qt

k after a finite number of cutting planes. Indeed,notice that the assumptions of Kelley [1960, Sect. 2] are satisfied, as y belongsto a compact set, Qt

k(ǫ) is convex, and the gradients of the cutting planes arebounded (a cutting plane with unbounded gradient would not be valid for Q0

k, acontradiction). Thus, as the number of iterations goes to infinity, the probabilitythat (yt−1, V

kt (yt−1)) 6∈ Qt

k(ǫ) goes to 0, which concludes the proof.

Theorem 1 For any ǫ > 0, the policy computed by Algorithm 1 converges to an

ǫ-optimal policy with probability 1. Furthermore, the algorithm returns a gap that,

with a confidence level of α, is an upper bound to the gap between the cost of the

computed policy, and the cost of the optimal policy.

Proof By Lemma 1, the policy computed by Algorithm 1 converges to an ǫ-optimalpolicy with probability 1, and the lower bound LB converges to an ǫ-optimal cost.


At the same time, as the number of iterations goes to infinity, the sample mean ofthe costs of the (uniformly drawn) sample paths converges to an unbiased estimateof the true cost, and using the central limit theorem, the α quantile of the costs ofthe sample paths is larger than the true cost with a confidence of α. Since the costof a heuristic policy from time τ to ∞ provides a deterministic upper bound to thecost from τ to ∞, the upper bound UB computed by Algorithm 2 is a valid upperbound with a confidence of α. When the algorithm stops, the difference betweenthe cost of the policy implicitly defined by the value function approximation, andthe cost of the optimal policy, is therefore smaller than UB−LB with a confidenceof α.

5 Computational experiments

In this section we report and discuss the results of a computational evaluation ofthe algorithm proposed in this paper. To test the validity of our algorithm we usea test-bed base on two problems considered over an infinite time horizon: Produc-tion Planning problem with Backlog (PPB), and Continuous Rebalancing problem(CR). For each problem, we provide a mathematical programming formulation, wedescribe how we constructed the set of instances used in our computational evalu-ation, and we discuss the empirical behavior of our algorithm, also in comparisonto other algorithms when applicable. All the experiments reported in this sectionare executed on a homogeneous cluster equipped with Xeon E7-4850 processors(2.00GHz 64 GB RAM). Note that since the experiments are run in parallel andjob scheduling is unpredictable, the timing statistics reported here are likely tobe affected by some noise. However, the ranking of the algorithms indicated byaggregate statistics is so clear that the noise introduced by running parallel com-putations is irrelevant. In particular, standard deviations can be large and we donot report them for concisiveness, but our conclusions are supported by additionalanalysis, as will be explained in the text. Before providing a full description of thetest problems, we briefly comment on important implementation choices.

5.1 Implementation details

We implemented the B2 algorithm in Python 2.7, using Cplex 12.6.1 as an LPsolver via its Python interface. As a stopping criterion for the algorithm, weadopted a check on the absolute and relative gap between lower and upper bounds,where the relative gap is computed as UB−LB

UB+10−10 . Notice that the upper bounds areestimated via sampling and are only valid at a given confidence level, set to 95%in our experiments. These stopping criteria are employed with the same tolerancesby all algorithms.

Two parameters have a major impact on the computational performance ofthe B2 algorithm:

– The number ρ of rounds before purging cuts. We opted for a dynamic manage-ment of the set C of Benders cuts, as dynamic management typically improvesperformance in LP-based cutting plane algorithms. At each iteration, we checkwhich Benders cuts are active. A cut is considered active on a current forward


pass solution if the corresponding dual variable is smaller than 10−5 in abso-lute value. If a cut is inactive for more than ρ LP solves, it is removed fromC. We remark that in principle this could affect convergence of the algorithmfrom a theoretical point of view, as the value function approximations maydeteriorate after cut removal.

– The number φ of sample paths before increasing τ . An important aspects inthe B2 Algorithm is the choice of the size of Sτ , see Alg. 1, line 4: we denoteby φ = |Sτ | the number of sample paths generated before increasing the lengthτ of the time horizon.

5.2 Production Planning with Backlog

Production Planning (PP) is a classical problem in the stochastic LP literature,see e.g. Romeijn et al. [1992], Ghate and Smith [2009] for a discussion on infinite-horizon problems. PP requires deciding the production quantity of a set of itemsin each stage of the planning horizon, so as to satisfy demand at minimum cost.In this work, we consider a version of PP in which backlogs are allowed. PP withBacklogs (PPB) allows postponing part of the demand to a future stage, for a price.Another common PP model allows carrying over inventory to subsequent stages,rather than backlogging. From a mathematical point of view, the two models aresimilar (the only difference is that some decision variables have their sign flipped),and in our experience they also perform in a similar way from a computationalpoint of view. Since we did not observe any significant computational difference,we limit the discussion to PPB.

The problem can be modeled as follows. We consider the problem of decidingthe production effort xj,t dedicated to a certain group of items j at stage t inorder to satisfy an uncertain demand. There are M item categories indexed byi = 1, . . . ,M with demands bi,t, and N different production plans indexed byj = 1, . . . , N with the property that dedicating one time unit xj,t to productionplan j produces aij units of item i. Any unfulfilled demand can be backlogged andsatisfied in future stages, but this carries a cost. The variable yi,t represents theamount of unsatisfied demand of category i at stage t. We have two costs in theobjective function: cj represents the cost of producing a unit of item j, while hirepresents the cost of backlogging one unit of category i. This yields the followingmodel:

min

N∑

j=1

cjxj,0 +

M∑

i=1

hiyi,0 +

∞∑

t=1

δt∑

k∈K

Pr(ωk)(

N∑

j=1

cjxj,t +

M∑

i=1

hiyi,t)

i = 1, . . . ,M∑

j∈N

aijxj,0 + yi,0 ≥ b0

t = 1, . . . ,∞, i = 1, . . . ,M, k ∈ K

N∑

j=1

aijxj,t − yi,t−1 + yi,t ≥ bi,t(ωk)

t = 0, . . . ,∞, j = 1, . . . , N xj,t ≥ 0

t = 0, . . . ,∞, i = 1, . . . ,M yi,t ≥ 0


We generate instances with the following values:M ∈ 10,20,N ∈ M

2,M, 2M,

and κ ∈ 10, 20, 30, 40, 50. Let U(α, β) be an integer drawn uniformly at randombetween α and β. For each combination of the parametersM,N, κ, we generate 10

random instances with the following coefficients: cj = R(1, 100), hi =R(75,125)

75N,

aij = max(0,R(cj − 10, cj +10)) and bi = R(0.75

∑Nj=1

aij

2, 1.25

∑Nj=1

aij

2), where

as usual i = 1, . . . ,M and j = 1, . . . , N . This yields a total of 300 PPB instances.This choice of parameters is dictated by the desire to obtain challenging in-

stances, i.e., instances in which determinining the trade-off between productionand backlogging is nontrivial. Our starting point was the discussion in Martelloand Toth [1990] about the generation of difficult knapsack problems.

5.3 Continuous Rebalancing Problem

The Continuous Rebalancing Problem (CRP) concerns the allocation of items froma finite pool to a set of locations, when facing an uncertain demand from customersthat move items between two locations, see for example Laporte et al. [2015]. Togive a concrete example, we will call the items “bikes”. The locations can thenbe thought of as bike sharing stations. The system operator can rebalance thenumber of bikes once a day (e.g. late at night), incurring some cost. Let N be thenumber of locations, identified by their labels 1, . . . , N , with pairwise distances cij ,i, j = 1, . . . , N . Let d0i be the number of bikes at location i at the beginning of thetime horizon, and bij,t(ωk) the number of bikes that users wish to rent to go fromi to j in period t and scenario k (this number is also referred to as a demand). LetD =

∑Ni=1

d0i be the total number of bikes. For each location i = 1, . . . , N , wedefine a fictitious location i′ that represents location i at the end of a stage. At thebeginning of each time period, yi,t−1 represents the number of bikes at locationi. Variables xij,t represent the number of bikes moved from i to j by the systemoperator at the beginning of time period t, which comes at a cost. Variables fij′,trepresent the number of bikes moved by the users during period t (fii′,t representsbikes that did not move, or moved but ended up in their starting location). Finally,variables xij′,t represent lost (unfulfilled) demand between i and j′. We implicitlyassume that lost demand has a higher unit cost than the cost for moving a bikeincurred by the system operator, otherwise the system operator has no incentiveto intervene.

min∑

i,j∈N,j 6=i

(cijxij,0 + hijxij,0) +∞∑

t=1

δtκ∑

k=1

Pr(ωk)(∑

i,j∈N,j 6=i

(cijxij,t + hijxij,t))

i ∈ N∑

j∈N

xji,0 −∑

j∈N

xij,0 −∑

j′∈N ′

fij′,0 ≥ −d0i (3)

i′ ∈ N ′ −yi,0 +∑

j∈N

fji′,0 ≥ 0 (4)

t=1,...,∞,i∈N

∑

j∈N

xji,t −∑

j∈N

xij,t + yi,t−1 −∑

j′∈N ′

fij′,t ≥ 0 (5)


t=1,...,∞,i′∈N ′ −yi,t +

∑

j∈N

fji′,t ≥ 0 (6)

t=1,...,∞,k∈K,

i∈N,j′∈N ′

fij′,t + xij′,t ≥ bij,t(ωk) (7)

t = 0, . . . ,∞∑

i∈N,j′∈N ′

fij′,t = D (8)

t = 0, . . . ,∞∑

i∈N

yi,t ≥ D (9)

t=0,...,∞,i,j∈N xij,t, xij,t, fij′,t, yi,t ≥ 0 (10)

The objective function minimizes the total cost, that consists of two parts:a cost that the system operator incurs to move bikes in order to rebalance thelocations, and a cost for unfulfilled demand. Constraints (3) and (5) representthe natural inventory constraints: users can only take bikes that are present in astation at the beginning of the time period. Constraints (4) and (6) impose thatthe number of bikes at a given location at the beginning of a time period is equalto the number of bikes brought to that location by users, plus bikes that werealready present at the beginning of the time period and did not leave the station.Constraints (7) state that the total demand is split into fulfilled and unfulfilleddemand. Finally (8) and (9) define the total number of bikes that are present inthe system.

For the generation of random instances of the CRP problem, besides the sizeparameter N and the number of scenarios κ, we define a saturation parameterψ that represents the fraction of bikes in the system that are in demand at eachstage. If the saturation parameter is small, the system naturally balances itselfand the optimal policy is to do nothing. This leads to uninteresting instances froma computational point of view, as the starting policy (with the identically zerofunction as initial value function estimate) is optimal.

We generated instances withN = 10, κ ∈ 10,20, 30, 40, 50 and ψ ∈ 1.0,0.99,0.98,0.97.These values for ψ provided the most challenging problems. For each combina-tion of these three parameters, we randomly generate 10 different instances inthe following way. First, we sample a base scenario bij = U(1, 100), and setting

D =∑

i,j=1,...,N,i 6=j bij and d0i =D

N. Then, we generate the rhs value for the sce-

narios setting bij = U(0.75ψbij, 1.25ψbij), i, j = 1, . . . , N, i 6= j. We consider onlyfeasible scenarios, i.e., scenarios where the number of bikes is lower than the totaldemand D. For all instances, we set the cost coefficients to cij = 1 and hij = 100,i, j = 1, . . . , N . The final CR testbed consists of 200 instances.

5.3.1 Experiments on PPB instances

We begin our numerical study with an empirical evaluation of several versions ofthe B2 algorithm, as well as the classical SDDP approach. In this section, unlessotherwise stated the absolute gap is set to 1, the relative gap is set to 1%, ρ is setto 2 and φ is set to 10.

In our first experiment we consider two versions of B2: a multicut implemen-tation and a single cut implementation. This terminology refers to the way cutsfrom the scenario subproblems are handled: in a multicut implementation, each


M N k B2 B2 aggr PP

10 5 10 1.2 7.6 2176.520 5.3 44.6 9376.930 9.2 3843.9 14189.340 11.1 255.4 18392.250 19.7 4021.4 25445.6

Avg. 9.3 1634.6 13916.1

Absolute gap = 1

M N k B2 B2 aggr PP

10 5 10 1.0 5.5 263.820 2.1 22.8 924.630 3.6 45.6 1863.340 5.2 95.4 2361.150 8.8 3739.1 4758.7

Avg. 4.1 781.7 2034.3

Absolute gap = 100

Table 1: Average CPU times for three algorithms on the Production Planning withBacklog problem. Each row indicates the average over 10 instances.

violated scenario produces a cut, that is added to the master problem directly;in a single cut implementation, cuts from all scenarios are aggregated based onthe corresponding probability, and a single aggregated cut is added to the masterproblem. As the aggregated cut is implied by the individual cuts, the multicut im-plementation produces tighter bounds, but this comes at the cost of introducingextra constraints. The seminal paper Pereira and Pinto [1991] proposes a single cutimplementation. The difference between single and multicut was investigated inBirge and Louveaux [1988] with an empirical study of limited size (only five prob-lem instances), suggesting that multicut speeds up convergence. However, singlecut implementations are still dominant in the literature.

Furthermore, in this first experiment we test B2 against SDDP over a finitetime horizon as described in Pereira and Pinto [1991]. To do so, given the allowedabsolute gap, we heuristically determine the length of the time horizon τ ′ suchthat the total cost in subsequent time periods does not exceed 90% of the allowedabsolute gap. This calculation is based on an estimation of the expected single-stage costs. Once τ ′ is determined, we apply the SDDP implementation of Pereiraand Pinto [1991], employing the usual stopping criteria on absolute and relativegap with a small modification to account for the objective function contributionafter τ ′. Results for this experiment are reported in Table 1 for two different valuesof absolute gap. We compare the multicut implementation of B2 (column “B2”),the single cut implementation of B2 (column “B2 aggr”), and the Pereira-Pintofinite horizon SDDP (column “PP”).

Table 1 clearly indicates that B2 with multicut implementation is faster thanthe other algorithms. Note that we only report results for small instances, becausethe running times for PP get prohibitively high on larger problems. While increas-ing the absolute gap helps especially PP, larger values (e.g., 1000) did not yielda significant reduction of the running times, and a very large absolute gap makessome instances trivial to solve. We verified that B2 is faster than B2 aggr for allinstance sizes that we could try. As an additional verification, we performed pair-wise comparisons of the algorithms for each group of instances. For equal instancesize, B2 is faster than B2 aggr, which is in turn faster than PP, in all cases. Weconsider this to be compelling evidence of the ranking of the algorithms. Basedon this discussion, from now on we focus our attention on B2 only, with a smallabsolute gap set to 1.

In a second round of experiments on PPB instances, we investigate the impactof the algorithmic parameters ρ and φ, see Section 5.1, on a large variety of instance


(ρ,φ)

M N k (2,2) (2,5) (2,10) (5,2) (5,5) (5,10) (10,2) (10,5) (10,10)

10 5 10 1.4 1.0 1.0 1.4 0.8 1.1 1.6 1.1 1.220 6.9 4.9 5.3 5.2 4.6 5.9 6.1 5.9 5.330 3673.7 7.5 6.2 18.7 8.1 7.2 18.5 10.4 9.240 7.6 6.0 6.6 8.6 7.2 8.6 10.7 9.1 11.150 12.6 13.2 11.5 12.8 12.5 15.2 16.8 16.9 19.7

10 10 9.8 66.7 76.3 9.6 17.4 42.5 8.9 16.8 35.620 3792.0 3927.4 4574.8 80.9 293.6 846.2 82.9 197.2 541.730 4246.7 1636.8 5712.1 315.6 549.4 2310.1 275.3 563.4 1738.340 3861.4 1681.4 3976.5 160.0 904.5 2177.6 191.0 681.0 1822.950 10407.2 9599.3 13009.1 2479.8 7693.7 9956.6 1289.8 4087.7 9763.9

20 10 203.4 462.0 1596.3 123.8 353.8 995.7 115.2 306.7 792.020 4389.6 5423.6 14021.8 574.8 2673.5 10796.4 514.4 1838.8 8069.130 4103.2 8671.1 20153.1 1298.2 5835.2 18457.9 1006.5 4661.4 16338.640 15050.6 24742.3 37995.4 4365.7 22362.7 36847.2 2771.4 20072.6 35744.750 16032.6 30212.9 35363.4 13158.7 29841.8 36326.7 12690.2 27620.9 34409.9

20 10 10 4.5 13.0 22.8 4.2 9.4 21.0 4.3 10.1 19.720 52.2 42.2 78.6 33.8 40.8 74.0 38.0 44.5 77.830 316.8 1001.1 1997.6 169.1 560.9 1612.2 150.8 514.7 1376.040 113.4 132.6 278.0 98.2 116.1 161.6 124.3 129.5 220.050 135.5 913.9 3363.5 143.3 672.6 1923.0 178.4 593.6 1759.8

20 10 17.7 22.0 38.1 15.2 20.7 36.0 16.9 20.9 33.820 357.5 742.3 2277.3 238.2 450.2 1630.2 190.3 424.1 1315.730 572.5 2325.8 7421.4 407.3 1363.6 6065.5 445.8 1197.3 5687.740 6540.4 10881.2 17995.5 1820.5 10200.6 16330.7 1463.5 8499.7 15533.350 8349.3 15674.5 20665.2 6145.2 15511.0 19846.6 5195.7 15418.0 20264.7

40 10 558.1 1146.8 5627.8 373.3 841.6 3293.8 321.6 726.3 2348.620 15096.6 25070.4 35448.0 4209.2 21087.5 34286.6 2695.4 19003.2 33595.230 29509.0 38800.8 39364.3 23482.8 38752.6 39989.0 20793.4 34792.3 40359.340 29918.5 36610.3 41725.3 28272.9 36621.1 42492.6 27592.6 35925.5 44750.750 38656.2 39688.9 43527.5 36103.6 41466.8 44181.1 36020.7 39861.1 44588.7

Avg. 6533.2 8650.7 11878.0 4137.7 7942.5 11024.6 3807.7 7241.7 10707.8

Table 2: Average CPU times for B2 with different values of ρ and φ on the Pro-duction Planning with Backlog problem. Each row indicates the average over 10instances.

sizes In Table 2 we report average CPU times over 10 instances when using valuesρ = 2, 5, 10 and φ = 2, 5, 10.

All results obtained with ρ = 2 exhibit high standard deviation, indicatingunstable behavior. The erratic behavior induced by ρ = 2 is evident also fromthe average times, with large fluctuations across the table. This is due to purgingcuts too aggressively, so that useful cuts are frequently deleted and have to begenerated again in subsequent iterations. We therefore only analyze results withρ = 5 and ρ = 10 in the following. Among the other parameterizations,φ = 2 yieldsthe fastest convergence. Indeed, φ = 2 is faster than φ = 5 and φ = 10 on everyinstance, not only on average. This indicates that increasing the length of the timehorizon aggressively is more effective than investing a significant amount of timein exploring many sample paths for fixed time horizon. As expected, increasing ofk, M , and N makes the problem more difficult to solve. Nevertheless, we are ableto solve all instances to within the specified gap.

B2forinfinite-h

orizo

nsto

chastic

linearprograms

21

gap(ρ, φ)

45% 46% 47% 48% 49% 50%|N | ψ k (10,2) (5,2) (10,2) (5,2) (10,2) (5,2) (10,2) (5,2) (10,2) (5,2) (10,2) (5,2)

10 0.97 10 2370.7 9155.8 2174.8 5995.0 1683.2 2757.5 1651.5 2316.9 1597.9 1745.6 1375.8 1641.520 2770.3 3242.8 2492.5 3145.8 2439.3 2779.3 2351.8 2689.0 2277.1 2599.0 2233.7 2488.830 5604.5 6960.7 5079.7 6736.3 4857.0 6441.6 4714.1 5719.0 4108.3 5401.5 4038.7 5320.740 9290.5 12701.8 8817.6 12059.7 8564.0 11078.1 8258.1 11058.8 7783.3 10651.0 7533.3 10285.850 22829.6 25433.1 21782.0 25088.1 21602.9 24756.4 20109.5 24384.9 19640.6 24343.2 18977.3 23569.1

0.98 10 1463.1 1642.0 1368.0 1455.0 1245.7 1416.7 1232.3 1351.9 1138.2 1170.0 1116.6 1142.620 1813.5 2027.2 1837.8 2044.5 1764.6 1951.8 1746.8 1899.0 1610.9 1785.6 1585.1 1747.430 3174.0 3447.5 2897.2 3278.1 2916.6 3200.1 2922.7 3016.2 2811.2 3026.0 2639.9 2840.640 5686.4 6765.8 5314.3 6658.0 5322.3 6480.1 5133.8 6054.9 4765.0 6006.0 4659.5 5799.050 18520.2 22706.6 17757.9 22292.8 17413.2 21652.9 16904.8 21391.9 16028.8 21146.0 15561.2 20696.4

0.99 10 654.0 762.7 629.3 656.1 661.0 712.1 585.0 693.6 619.0 692.5 585.5 585.320 1232.7 1321.8 1108.7 1190.6 1070.7 1219.8 1112.3 1180.3 1078.5 1075.6 997.2 1106.830 2226.8 2242.3 2179.8 2255.8 2016.6 2170.0 2036.9 2210.6 1960.2 2076.5 1802.3 2057.940 3553.4 3982.9 3312.1 3929.9 3368.0 3852.9 3274.6 3513.7 3193.6 3436.9 3002.3 3470.450 10414.8 14139.6 9597.3 13813.5 9450.4 13526.3 9242.5 12799.0 8682.7 12624.0 8473.5 12212.4

1.00 10 309.5 344.1 291.0 332.4 308.6 319.7 262.2 307.0 274.6 304.7 261.0 253.020 662.1 702.8 625.9 621.1 623.8 639.8 599.6 612.9 598.9 613.8 550.6 553.330 1331.3 1404.2 1300.7 1305.4 1144.5 1287.0 1138.3 1205.5 1176.0 1127.9 1126.9 1150.840 1944.0 1928.4 1906.7 1917.9 1802.1 1924.7 1752.2 1850.7 1691.4 1745.5 1503.0 1664.650 4379.7 5088.5 3884.7 4905.3 3977.3 4698.4 3829.1 4368.1 3733.0 4292.4 3518.1 4100.8

Avg. 5011.6 6300.0 4717.9 5984.1 4611.6 5643.3 4442.9 5431.2 4238.5 5293.2 4077.1 5134.4

Table 3: Average CPU times for B2 on the Continuous Rebalancing problem. Each row indicates the average over 10 instances.


5.3.2 Comments on CRP instances

In Table 3, we report results for ρ = 10, 5 and φ = 2 on the CR instances.Solving CR instances of reasonable size to optimality is considerably hard,

much harder than PPB instances. This is mainly due to the difficulty of findinga good stationary policy. Even if the technique explained in Section 4.1 can beapplied to CR instances, it typically provides bad primal bounds, and closing thegap often becomes an insurmountable obstacle. For this reason, in Table 3 werestrict ourselves to instances with N = 10 and a relative gap (column “gap” inthe Table) between 45% and 50%. An interesting observation is that problemswith smaller saturation are more difficult to solve: this is because for instanceswith low saturation there is a large number of unused bikes in each time step, andin exteme cases the optimal policy has a value close to zero (i.e., it is optimal tolet the system balance itself). This increases the difficulty because when the lowerbound is close to zero, the upper bound also must take a very low value to closethe gap, but our primal heuristic makes conservative assumptions on the futuredemand, and is therefore not able to close the gap.

To investigate more in detail whether or not the difficulty in closing the gap ismainly due to the primal bound, we add an additional stopping criterion based onlyon the improvement of the dual bound. More precisely, let LBi be the dual boundobtained at iteration i: we stop the algorithm at iteration i if

LBi−LBi−1

LBi

≤ 10−3. InTable 4, we compare the results obtained when adding the new stopping criterion(columns “dual impr”) with those obtained with the standard criteria based onabsolute and relative gap only (columns “default”). Here we only report resultswith relative gap 45%, ρ = 10 and φ = 2. For each setting, we provide the averageCPU time and the final dual bound obtained. We can see that the computingtimes decrease with the new stopping criteration, while the dual bound obtaineddoes not change significantly. This suggests that a considerable amount of timeis spent in trying to improve the primal bound, while the dual bound is alreadyclose to its final value.

6 Conclusions

This paper presents a provably convergent nested decomposition algorithm forinfinite-horizon stochastic LP. The convergence proof is based on a combinationof Benders decomposition with Kelley’s cutting plane algorithm. The advantageof the approach proposed in this paper is that it allows us to use the well-knownBenders scheme on an infinite-horizon problem, while retaining strong theoreticalguarantees.

Our computational experiments lead to the following conclusions. First, themulticut implementation in the nested decomposition scheme converges muchfaster than a single cut implementation. The effect is particularly evident in oursetting because our algorithm relies heavily on Benders decomposition, in the sensethat all subproblems in the decomposition scheme are solved by Benders decompo-sition themselves. This compounds the effect of implementation details, and showsthat keeping a cut for each violated scenario is worth the computational burden,thanks to tighter bounds. Second, the construction of primal bounds is a delicatestep in the infinite-horizon setting: it can lead to a deterioration of performance if


time dual bound

N ψ k default dual impr default dual impr

10 0.97 10 2370.7 85.3 4779.9 4429.020 2770.3 86.3 7879.1 7849.830 5604.5 168.8 8004.4 7937.440 9290.5 303.4 7636.2 7614.550 22829.6 741.3 6226.0 6295.5

0.98 10 1463.1 32.4 6800.5 6648.620 1813.5 47.8 14133.4 14017.930 3174.0 200.7 12943.8 12981.340 5686.4 228.0 12127.5 11834.650 18520.2 347.7 9044.1 9087.4

0.99 10 654.0 28.7 15628.4 15426.620 1232.7 49.9 26051.2 26286.030 2226.8 162.2 22198.2 21986.640 3553.4 103.7 21060.6 21238.650 10414.8 167.5 17184.5 17521.9

1.00 10 309.5 20.6 45044.2 45336.920 662.1 28.1 54885.1 55352.530 1331.3 53.2 47497.9 48025.840 1944.0 93.8 48101.6 48646.650 4379.7 138.3 42908.9 43227.0

Avg. 5011.6 154.4 21506.8 21587.2

Table 4: Average CPU times for B2 on the Continuous Rebalancing problem, usingdifferent stopping criteria.

general-purpose heuristics are not effective and problem-specific primal bounds arenot available. However, in our experiments the dual bound always converges rela-tively quickly, so that even if we cannot prove tight gap guarantees, the algorithmin practice determines near-optimal policies.

References

Richard Bellman. Dynamic Programming. Princeton University Press, Princeton,NJ, 1957.

Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Sci-entific, Belmont, MA, 1995.

D. Bertsimas and J. Tsitsiklis. Introduction to Linear Optimization. Athena Sci-entific, Belmont, 1997.

John R. Birge. Decomposition and partitioning methods for multistage stochasticlinear programs. Operations Research, 33(5):989–1007, 1985.

John R. Birge and Francois V. Louveaux. A multicut algorithm for two-stagestochastic linear programs. European Journal of Operational Research, 34(3):384 – 392, 1988.

John R Birge and Francois Louveaux. Introduction to stochastic programming.Springer Science & Business Media, 2011.

Vitor L. de Matos, Andy B. Philpott, and Erlon C. Finardi. Improving the per-formance of stochastic dual dynamic programming. Journal of Computational

and Applied Mathematics, 290:196 – 208, 2015.


Christopher Donohue and John Birge. The abridged nested decomposition methodfor multistage stochastic linear programs with relatively complete recourse. Al-gorithmic Operations Research, 1(1), 2006.

R. J. Duffin and L. A. Karlovitz. An infinite linear program with a duality gap.Management Science, 12(1):122–134, 1965.

Horand I. Gassmann. Mslip: A computer code for the multistage stochastic linearprogramming problem. Mathematical Programming, 47(1):407–423, 1990.

A. Ghate and R.L. Smith. Optimal backlogging over an infinite horizon undertime-varying convex production and inventory costs. Manufacturing & Service

Operations Management, 11(2):362–368, 2009.Richard C. Grinold. Finite horizon approximations of infinite horizon linear pro-

grams. Mathematical Programming, 12(1):1–17, 1977.Vincent Guigues. SDDP for some interstage dependent risk-averse problems and

application to hydro-thermal planning. Computational Optimization and Appli-

cations, 57(1):167–203, 2014.Holger Heitsch and Werner Rmisch. Scenario tree modeling for multistage stochas-

tic programs. Mathematical Programming, 118(2):371–406, 2009.Gerd Infanger and David P. Morton. Cut sharing for multistage stochastic linear

programs with interstage dependency. Mathematical Programming, 75(2):241–256, 1996.

J. E. Kelley. The cutting-plane method for solving convex programs. Journal of

the Society of Industrial and Applied Mathematics, 8(4):703–712, 1960.Kibaek Kim and Sanjay Mehrotra. A two-stage stochastic integer programming

approach to integrated staffing and scheduling with application to nurse man-agement. Operations Research, 63(6):1431–1451, 2015.

Gilbert Laporte, Frederic Meunier, and Roberto Wolfler Calvo. Shared mobilitysystems. 4OR, 13(4):341–360, 2015.

Silvano Martello and Paolo Toth. Knapsack Problems: Algorithms and Computer

Implementations. John Wiley & Sons, Inc., New York, NY, USA, 1990.John M. Mulvey and Hercules Vladimirou. Solving multistage stochastic networks:

an application of scenario aggregation. Networks, 21(6):619–643, 1991.John M. Mulvey and Hercules Vladimirou. Stochastic network programming for

financial planning problems. Management Science, 38(11):1642–1664, 1992.M.V.F. Pereira and L.M.V.G. Pinto. Multi-stage stochastic optimization applied

to energy planning. Mathematical Programming, 52(1-3):359–375, 1991.Warren B. Powell. Approximate Dynamic Programming: Solving the Curses of

Dimensionality, 2nd Edition. Wiley, 2011.Edwin H. Romeijn, Robert L. Smith, and James C. Bean. Duality in infinite

dimensional linear programming. Mathematical Programming, 53(1-3):79–97,1992.

Irwin E. Schochetman and Robert L. Smith. Finite dimensional approximation ininfinite dimensional mathematical programming. Mathematical Programming,54(1-3):307–333, 1992.

Alexander Shapiro. Analysis of stochastic dual dynamic programming method.European Journal of Operational Research, 209(1):63 – 72, 2011.

R. M. Van Slyke and Roger Wets. L-shaped linear programs with applications tooptimal control and stochastic programming. SIAM Journal on Applied Math-

ematics, 17(4):638–663, 1969.


Stavros A. Zenios. Financial Optimization. Cambridge University Press, Cam-bridge, UK, 1993.

Jikai Zou, Shabbir Ahmed, and Xu Andy Sun. Nested decomposition of multistagestochastic integer programs with binary state variables. Technical report, Geor-gia Institute of Technology, 2016. URL http://www.optimization-online.

org/DB\_HTML/2016/05/5436.html.

b2 framework for infinite horizon stochastic linear · pdf fileuniversit´e de paris 13;...

Documents