stochastic uncapacitated hub location

European Journal of Operational Research 212 (2011) 518–528

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Stochastics and Statistics

Stochastic uncapacitated hub location

Ivan Contreras ⇑, Jean-François Cordeau, Gilbert LaporteHEC Montréal and Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation (CIRRELT), Montreal, Canada H3T 2A7

a r t i c l e i n f o a b s t r a c t

Article history:Received 15 September 2010Accepted 13 February 2011Available online 19 February 2011

Keywords:Hub locationStochastic programmingMonte-Carlo samplingBenders decomposition

0377-2217/$ - see front matter � 2011 Elsevier B.V. Adoi:10.1016/j.ejor.2011.02.018

⇑ Corresponding author.E-mail addresses: [email protected] (I. Contr

cirrelt.ca (J.-F. Cordeau), [email protected] (G.

We study stochastic uncapacitated hub location problems in which uncertainty is associated to demandsand transportation costs. We show that the stochastic problems with uncertain demands or dependenttransportation costs are equivalent to their associated deterministic expected value problem (EVP), inwhich random variables are replaced by their expectations. In the case of uncertain independent trans-portation costs, the corresponding stochastic problem is not equivalent to its EVP and specific solutionmethods need to be developed. We describe a Monte-Carlo simulation-based algorithm that integratesa sample average approximation scheme with a Benders decomposition algorithm to solve problems hav-ing stochastic independent transportation costs. Numerical results on a set of instances with up to 50nodes are reported.

� 2011 Elsevier B.V. All rights reserved.

1. Introduction

Hub location problems (HLPs) arise in transportation, telecom-munication and computer networks, where hub-and-spoke archi-tectures are frequently used to efficiently route commoditiesbetween many origin and destination (O/D) pairs. The performanceof these networks relies on the use of consolidation, switching, ortransshipment points, called hub facilities, where flows from sev-eral origins are consolidated and rerouted to their destinations,sometimes via another hub. In HLPs the locations of the hubs aswell as the paths for sending the commodities have to be deter-mined. Broadly speaking, HLPs consist in locating hubs on a net-work so as to minimize the total flow cost.

Due to their multiple applications, these problems are receivingincreased attention. Solution methods have been developed forseveral variants of HLPs analogous to well-known discrete facilitylocation problems, such as uncapacitated hub location, p-hub loca-tion, p-hub center, and hub covering. For each of these classes ofproblems, there exist several variants arising from various assump-tions, such as hub capacities or a specific topological structure forthe hub-and-spoke network. There are two basic assumptionsunderlying most HLPs. The first is that commodities have to be rou-ted via a set of hubs, and thus paths between O/D pairs include atleast one hub facility. The second assumption is that hubs are fullyinterconnected with more effective, higher volume pathways thatenable a discount factor s (0 < s < 1) to be applied to all transpor-tation costs associated to the commodities routed between a pair

ll rights reserved.

eras), jean-francois.cordeau@Laporte).

of hubs. The reader is refered to Alumur and Kara (2008) and toCampbell et al. (2002) for recent surveys on HLPs.

The location of hub facilities corresponds to long-term strategicdecisions which are typically made within an uncertain environ-ment. That is, costs, demands, distances, and other parametersmay change after location decisions have been made. Nevertheless,standard HLP models treat data as known and deterministic. Thiscan result in highly sub-optimal solutions given the inherent uncer-tainty surrounding future conditions. There exist basically twostreams of research dealing with optimization under uncertainty:stochastic optimization and robust optimization. In stochasticoptimization, it is assumed that the values of the uncertain param-eters are governed by known probability distributions. In robustoptimization, it is assumed that parameters are uncertain but noinformation about their probability distributions is know exceptfor the specification of intervals containing the uncertain values.

In classical facility location, stochastic models have been widelyinvestigated over the last four decades. Louveaux (1986, 1993) pre-sents classical reviews on modeling approaches for stochastic facil-ity location in which the location of the facilities is considered as afirst-stage decision and the distribution pattern is a second-stagedecision. Some of these models (see Louveaux and Peeters, 1992;Laporte et al., 1994) consider capacities on the facilities, and facil-ity size is considered as a first-stage decision. Ravi and Sinha(2006) propose a stochastic problem in which facilities may beopen in either the first or second stage, while incurring differentinstallation costs in each stage. The survey by Snyder (2006) coversboth stochastic and robust location models for stochastic locationproblems.

To the best of the authors’ knowledge, there exist only threepublished articles related to stochastic hub location problems.Marianov and Serra (2003) focus on stochasticity at the hub nodes

http://dx.doi.org/10.1016/j.ejor.2011.02.018

mailto:[email protected]

mailto:jean-francois.cordeau@ cirrelt.ca

mailto:jean-francois.cordeau@ cirrelt.ca

mailto:[email protected]

http://dx.doi.org/10.1016/j.ejor.2011.02.018

http://www.sciencedirect.com/science/journal/03772217

http://www.elsevier.com/locate/ejor

I. Contreras et al. / European Journal of Operational Research 212 (2011) 518–528 519

by representing hub airports as M/D/c queues and limiting throughchance constraints the number of airplanes that can queue at anairport. The authors present a linear mixed integer programming(MIP) formulation and propose a heuristic procedure to obtainfeasible solutions for instances with up to 30 nodes. Sim et al.(2009) introduce the stochastic p-hub center problem and employa chance-constrained formulation to model the minimum service-level requirement. Their model takes into account the variability intravel times when designing the hub network so that the maxi-mum travel time through the network is minimized. The authorspresent a linear MIP formulation for the problem, under theassumption that travel times on the arcs are independent normalrandom variables. They also propose several heuristics to obtainfeasible solutions for instances with up to 25 nodes. Yang (2009)presents a stochastic model for air freight hub location and flightroute planning under seasonal demand variations. The authormodels the problem as a two-stage stochastic program with re-course, in which the location of hub facilities is considered as afirst-stage decision and the planning of flight routes as a second-stage decision. Stochasticity on demands as well as on discountfactors on the arcs of the network is considered. Moreover, themodel allows direct connections between non-hub nodes. A MIPformulation for the problem is presented under the assumptionthat demand is governed by a discrete probability distributioninvolving only three possible scenarios. A case study from a10-node air freight market in Taiwan and China is also described.

One of the problems that have received most attention in deter-ministic hub location is the Uncapacitated Hub Location Problemwith Multiple Assignments (UHLPMA). In this problem, the numberof required hubs to locate is not known in advance, but a fixed set-up cost for each hub facility is considered. The capacity of the hubsand of the links of the hub network is unbounded. It is assumedthat flows originating at the same node but having different desti-nation points can be routed through different sets of hub nodes, i.e.a multiple assignment pattern applies. The objective is to minimizethe sum of the hub fixed costs and demand routing costs. The bestknown formulations for the UHLPMA, in terms of LP relaxationbounds, are those of Hamacher et al. (2004) and of Marín et al.(2006), whereas the best exact algorithms are those of Cánovaset al. (2007), Camargo et al. (2008) and Contreras et al. (2010a).In particular, the Benders decomposition algorithm of Contreraset al. (2010a) can efficiently solve large-scale UHLPMA instanceswith up to 500 nodes.

In the UHLPMA, demand between O/D pairs as well as transpor-tation costs are treated as known and deterministic. However, inreal applications future demand is not known in advance and onlya forecast may be available. Transportation costs between nodepairs are usually defined to be proportional to the distance be-tween nodes. However, transportation costs are also intimately re-lated to the price of resources (fuel, electricity, raw materials) usedto provide the actual transportation of demand, which may behighly uncertain. Other sources of uncertainty in transportationcosts may be due to: (i) uncertainty in travel distances, (ii) trafficand congestion, (iii) tariff changes by outsourcing companies, and(iv) link failures. In this paper we study how the UHLPMA can bemodeled as a two-stage integer stochastic program with recoursein the presence of uncertainty on demands and transportationcosts. In particular, we introduce three different stochastic versionsof the UHLPMA. The first is the Uncapacitated Hub Location Problemwith Stochastic Demands (UHL-SD) in which demands between O/Dpairs are considered to be stochastic. The second is the Uncapaci-tated Hub Location Problem with Stochastic Dependent TransportationCosts (UHL-SDC) where uncertainty is given by a single parameterinfluencing transportation costs. It is assumed that this parameterequally affects the transportation costs for all links of the network.The third is the Uncapacitated Hub Location Problem with Stochastic

Independent Transportation Costs (UHL-SIC) in which the transpor-tation costs are also stochastic. However, this problem considersthe more general case in which the uncertainty of transportationcosts is independent for each link of the network. We show thatboth UHL-SD and UHL-SDC are equivalent to their associatedExpected Value Problem (EVP) in which uncertain transportationcosts are replaced with their expected value (see Birge andLouveaux, 1997). However, this equivalence does not hold for theUHL-SIC.

We use a Monte-Carlo simulation-based method, known as theSample Average Approximation (SAA) scheme (Kleywegt et al.,2001), to solve UHL-SIC problems with continuous distance distri-butions, and therefore, an infinite number of scenarios. This meth-od can also be applied to UHL-SIC problems with a finite but verylarge number of scenarios. We integrate a Benders decompositionscheme to solve the corresponding SAA problems.

The remainder of this paper is organized as follows. Section 2formally introduces two-stage stochastic models for the consid-ered problems. Section 3 describes our solution method for theUHL-SIC. Computational results are presented in Section 4, fol-lowed by conclusions in Section 5.

2. Stochastic uncapacitated hub location problems

Before presenting the stochastic uncapacitated hub locationmodels under study, we describe their deterministic counterpart,the UHLPMA. Let G = (Q,A) be a complete digraph, where Q is theset of nodes and A is the set of arcs. Let also H # Q represent theset of potential hub locations, and K be the set of commoditieswhose origin and destination points belong to Q. For each com-modity k 2 K, define Wk as the amount of commodity k to be rou-ted from the origin o(k) 2 Q to the destination d(k) 2 Q. For eachnode i 2 H, fi is the fixed set-up cost for locating a hub at node i.The transportation cost between nodes i and j is defined as cij = cdij,where dij is the distance between nodes i and j, which is assumedto satisfy the triangle inequality, and c is the resource cost per unitdistance. All costs relate to the same planning horizon.

Given that hub nodes are fully interconnected and distances sat-isfy the triangle inequality, every path between an origin and a des-tination node will contain at least one and at most two hubs. Forthis reason, paths between two nodes are of the form (o(k), i, j,d(k)),where (i, j) 2 H � H is the ordered pair of hubs to which o(k) andd(k) are allocated, respectively. Therefore, the unit transportationcost of routing commodity k along path (o(k), i, j,d(k)) is given byFijk = vco(k)i + scij + dcjd(k), where v, s, and d represent the collection,transfer and distribution costs along the path. To reflect economiesof scale between hub nodes, we assume that s < v and s < d. TheUHLPMA consists in locating a set of hubs and in determining therouting of commodities through the hub nodes, with the objectiveof minimizing the total set-up and transportation cost.

We define binary location variables zi, i 2 H, equal to 1 if andonly if a hub is located at node i. We also introduce binary routingvariables xijk, k 2 K and (i, j) 2 H � H, equal to 1 if and only if com-modity k transits via a first hub node i and a second hub node j. Fol-lowing (Hamacher et al., 2004), the UHLPMA can be stated asfollows:

minimizeXi2H

fizi þXi2H

Xj2H

Xk2K

WkFijkxijk ð1Þ

subject toXi2H

Xj2H

xijk ¼ 1 k 2 K ð2ÞXj2H

xijk þX

j2Hnfigxjik 6 zi i 2 H; k 2 K ð3Þ

xijk P 0 i; j 2 H; k 2 K ð4Þz 2 BjHj: ð5Þ

520 I. Contreras et al. / European Journal of Operational Research 212 (2011) 518–528

The first term of the objective function represents the total set-upcost of the hub facilities and the second term is the total transpor-tation cost. Constraints (2) guarantee that there is a single path con-necting the origin and destination nodes of every commodity.Constraints (3) prohibit commodities from being routed via anon-hub node. Finally, constraints (4) and (5) are the standardnon-negativity and integrality constraints. Given that there are nocapacity constraints on the hub nodes, there is no need to explicitlystate the integrality on the xijk variables because there always existsan optimal solution of (1)–(5) in which all xijk variables are integer.

We now present three stochastic variants of UHLPMA thatincorporate uncertainty on demands Wk and transportation costsFijk. They can be formulated as two-stage stochastic programs withrecourse, where the first-stage decisions correspond to the locationof the hub facilities and the second-stage decisions correspond tothe optimal routing of the commodities. As before, we use the zi

variables for the location of hubs and the xijk variables for the rout-ing of commodities. However, given that the optimal routing ofcommodities depends on the particular realization of the randomevent n, we need to consider routing variables for each possiblerealization of n, i.e., xijk(n), k 2 K and (i, j) 2 H � H, are binary routingvariables equal to one if and only if commodity k transits via a firsthub node i and a second hub node j for realization n.

2.1. Case A: stochastic demands

We consider the UHL–SD in which demand is uncertain. Foreach commodity k 2 K, let Wk(n) be a random variable representingthe future demand of commodity k. It is assumed that the Wk(n) areindependent random variables, governed by known probabilitydistributions, and that the distribution function of

Pk2K WkðnÞ can

be computed. The UHL–SD can be stated as:

minimizeXi2H

fizi þ En

Xk2K

Xi2H

Xj2H

ðWkðnÞFijkÞxijkðnÞ" #

subject toXi2H

Xj2H

xijkðnÞ ¼ 1 k 2 K; n 2 N ð6ÞXj2H

xijkðnÞ þX

j2HnfigxjikðnÞ 6 zi i 2 H; k 2 K; n 2 N

ð7ÞxijkðnÞP 0 i; j 2 H; k 2 K; n 2 N; ð8Þz 2 BjHj; ð9Þ

where En denotes the mathematical expectation with respect to nand N is the support of n. The first term of the objective functionrepresents the total set-up cost of the hub facilities and the secondterm is the total expected transportation cost with respect to n.Constraints (6)–(9) have the same meaning as in the case of UHLP-MA, but they are defined for every possible realization of n.

We now show that the UHL–SD is in fact a deterministic prob-lem in which the demands Wk(n) can be replaced by theirexpectation.

Proposition 1. The stochastic program UHL–SD is equivalent to theexpected value problem:

minimizeXi2H

fizi þXi2H

Xj2H

Xk2K

ðEn½WkðnÞ�FijkÞxijk ð10Þ

subject to ð2Þ—ð5Þ:

Proof. Observe that, given a first-stage vector z, the second-stageterm of the objective function can be separated into jKj indepen-dent subproblems, one for each commodity k 2 K. Furthermore,and more importantly, the optimal solution of each of these

subproblems does not depend on the particular realization of therandom variable n. That is, the optimal route for sending each com-modity is the same regardless of the actual value of the demandWk(n). Let x(z) be the optimal solution vector associated to afirst-stage solution z. Then

xijkðnÞ ¼ xijkðzÞ 8k 2 K; 8i; j 2 H; 8n 2 N: ð11Þ

By (11), the second-stage expectation of the objective function canbe expressed as

En

Xi2H

Xj2H

Xk2K

ðWkðnÞFijkÞxijkðzÞ" #

¼Xi2H

Xj2H

Xk2K

ðEn½WkðnÞ�FijkÞxijkðzÞ;

where the equality follows from the fact that summations andexpectation can be interchanged. The UHL-SD is therefore equiva-lent to the deterministic problem (10), (2)–(5). h

2.2. Case B: dependent stochastic transportation costs

We now focus on the UHL–SDC in which uncertainty relates tothe resource cost c. Consequently, the transportation costs Fijk arestochastic but dependent on a single uncertain parameter. In thisproblem the random variable c is described by a known probabilitydistribution. The unit transportation cost between every node pair(i, j) is c1

ijðnÞ ¼ cðnÞdij. Therefore, the unit transportation cost ofrouting commodity k along the path (o(k), i, j,d(k)) is given byF1

ijkðnÞ ¼ vc1oðkÞiðnÞ þ sc1

ijðnÞ þ dc1jdðkÞðnÞ

� �. The UHL–SDC can be sta-

ted as:

minimizeXi2H

fizi þ En

Xk2K

Xi2H

Xj2H

WkF1ijkðnÞ

� �xijkðnÞ

" #subject to ð6Þ—ð9Þ:

We now show that the UHL–SDC can also be represented by adeterministic problem in which the transportation scale cost c(n)is replaced by its expectation.

Proposition 2. The stochastic program UHL–SDC is equivalent to theexpected value problem:

minimizeXi2H

fizi þXi2H

Xj2H

Xk2K

WkEn F1ijkðnÞ

h i� �xijk ð12Þ


Proof. The proof uses the same idea as that of Proposition 1. Givena first-stage vector z, the second-stage term of the objective func-tion can be separated into jKj independent subproblems. Given that

F1ijkðnÞ ¼ vcðnÞdoðkÞi þ scðnÞdij þ dcðnÞdjdðkÞ

� �¼ cðnÞðvdoðkÞi þ sdij þ ddjdðkÞÞ;

the optimal route for sending each commodity is the same regard-less of the actual value of c(n). Let x(z) be the optimal solution vec-tor associated to a first-stage solution z. Then (11) follows and thesecond-stage expectation of the objective function can be expressedas

En

Xi2H

Xj2H

Xk2K

WkðnÞF1ijk

� �xijkðzÞ

" #¼Xi2H

Xj2H

Xk2K

WkEn F1ijkðnÞ

h i� �xijk:

The UHL–SDC is thus equivalent to the deterministic problem (12),(2)–(5). h


2.3. Case C: independent stochastic transportation costs

We now consider the UHL-SIC in which transportation costs Fijk

are stochastic. It is assumed that the random variables dij(n), i, j 2 Q,are independent and described by known probability distributions,and that for any S # Q � Q, the distribution function of

Pði;jÞ2SdijðnÞ

can be computed. In this case, transportation costs between nodesmay no longer satisfy the triangle inequality (TI). In most applica-tions, the actual costs between nodes will satisfy the TI becauseeven if the costs on the arcs of the original (or physical) networkare random and may violate it, the costs on the arcs of an auxiliary(or virtual) network are computed through the solution of shortestpaths on the original network and are thus guaranteed to satisfythe TI. This, however, assumes that the decision-maker has thecapability of recomputing the shortest paths after the values ofthe random variables become known. In practice this is not alwaysfeasible. In transportation networks, for example, it may beimpractical to divert a vehicle when traffic conditions deteriorateor to break an agreement with a transportation service providerafter a rate change. In such situations, the applicable costs maytemporarily violate the TI.

The main implication of the TI assumption, in classical deter-ministic hub location models, is that it ensures that each commod-ity will be routed through at most two hub nodes. Consequently,we assume here that for each commodity k the path from the ori-gin to the destination will have at least one and at most two hubnodes. This common assumption (see, e.g., Marın et al., 2006;Contreras et al., 2010b) is also consistent with practice. In groundtransportation, for example, it is convenient to restrict each routeto use at most two hub nodes so as to reduce handling and conges-tion in the hub facilities. In air transportation, it is also unusual fora passenger to connect at more than two hub nodes.

The unit transportation cost between nodes i and j isc2

ijðnÞ ¼ cdijðnÞ. Therefore, the unit cost of routing commodity kalong the path (o(k), i, j,d(k)) is given by F2

ijkðnÞ ¼ vc2oðkÞiðnÞþ

�sc2

ijðnÞ þ dc2jdðkÞðnÞÞ. The UHL-SIC can be stated as

minimizeXi2H

fizi þ En

Xk2K

Xi2H

Xj2H

WkF2ijkðnÞ

� �xijkðnÞ

" #subject to ð6Þ—ð9Þ:

Observe that, in the case of the UHL-SIC, we cannot replaceF2

ijkðnÞ variables by their expected value to obtain an equivalentdeterministic problem because, contrary to the previous cases,the optimal solution of the second-stage depends on the particularrealization of the random vector n.

We can extend some properties of optimal solutions of theUHLPMA to perform preprocessing on the UHL-SIC (see Contreraset al., 2010a). These properties lead to a more compact formulationwith fewer variables, but with the same number of constraints. Inparticular, we define a hub edge as a set e 2 E, where E is the set ofsubsets of H containing one or two hubs. We denote e as {e1,e2} ifjej = 2 and as {e1} if jej = 1, i.e., e is a loop. It is known that approx-imately half of the xijk variables can be eliminated by using undi-rected transportation costs for every hub edge (Hamacher et al.,2004). For each e 2 E and k 2 K, we define the transportation costF2

ekðnÞ ¼min F2ijkðnÞ; F

2jikðnÞ

n oif e = {i, j}, and F2

ekðnÞ ¼ F2iikðnÞ if e = {i}.

Moreover, it can be shown that in any optimal UHLPMA solution,no commodity k will be routed through a hub edge e containingtwo different hubs whenever it is cheaper to route it through onlyone of them (Marın et al., 2006). Therefore, we can define the setEk(n) of candidate hub edges for each commodity k 2 K and forevery possible realization of n as

EkðnÞ ¼feje 2 E; jej ¼ 1g

SAkðnÞ; if oðkÞ – dðkÞ;

feje 2 E; jej ¼ 1g; otherwise;

�

where

AkðnÞ ¼ e : e 2 E; jej ¼ 2 and F2ekðnÞ < min F2

fe1gkðnÞ; F2fe2gkðnÞ

n o� �n o:

Using these sets of edges associated with each commodity, we canrestate the routing variables xek(n), k 2 K and e 2 Ek(n), to take thevalue 1 if and only if commodity k is routed using hub edge e forrealization n. We can thus obtain a valid formulation for the UHL–SIC as follows:

minimizeXi2H

fizi þ En

Xk2K

Xe2EkðnÞ

WkF2ekðnÞ

� �xekðnÞ

" #ð13Þ

subject toXe2Ek

xekðnÞ ¼ 1 k 2 K; n 2 N; ð14ÞXe2EkðnÞ:i2e

xekðnÞ 6 zi i 2 H; k 2 K; n 2 N ð15Þ

xekðnÞP 0 k 2 K; n 2 N; e 2 EkðnÞ; ð16Þz 2 BjHj: ð17Þ

3. Algorithm for the UHL-SIC

We now present an algorithm for the UHL-SIC. The methodol-ogy incorporates a sampling technique, known as the SAA scheme(see Shapiro and Homem-De-Mello, 1998; Mak et al., 1999; Kle-ywegt et al., 2001), coupled with a Benders decomposition method(see Benders, 1962). The idea of the SAA scheme is to generate arandom sample and approximate the expected value function bythe corresponding sample average function. The associated deter-ministic sample average optimization problem is then solved toobtain a solution of UHL-SIC, and the procedure is repeated. TheSAA scheme not only generates high quality solutions when solv-ing the sample average problems, but is also able to produce a sta-tistical estimation of their optimality gap. The SAA scheme haspreviously been applied to obtain high quality solutions to stochas-tic supply chain design problems with a very large number of sce-narios (see,e.g., Santoso et al., 2005; Schütz et al., 2009).

3.1. Sample average approximation

The main difficulty in solving the stochastic problem (13)–(17)lies in the evaluation of the expected value of the objective func-tion. In the SAA scheme, a random sample N = {ni, . . . ,njNj} of real-izations of the random vector n is generated, and the second-stage expectation

En

Xk2K

Xe2EkðnÞ

WkF2ekðnÞ

� �xekðnÞ

" #

is approximated by the sample average function

1jNj

Xn2N

Xk2K

Xe2Ekn

bF eknxekn;

where bF ekn ¼WkF2ekn, and Ekn denote the set of hub edges associated

with commodity k 2 K and sample n 2 N. Therefore, the originalproblem (13)–(17) is approximated by the SAA problem

minimizeXi2H

fizi þ1jNj

Xn2N

Xk2K

Xe2Ekn

bF eknxekn ð18Þ

subject toXe2Ekn

xekn ¼ 1 n 2 N; k 2 K; ð19ÞXe2Ekn :i2e

xekn 6 zi n 2 N; i 2 H; k 2 K; ð20Þ

xekn P 0 n 2 N; k 2 K; e 2 Ekn; ð21Þz 2 BjHj: ð22Þ


Let vN and zN be the optimal solution value and an optimal solu-tion, respectively, of the SAA problem (18)–(22). Observe that for aparticular realization ni, . . . ,njNj of the random sample, the problem(18)–(22) is deterministic and can be solved by integer program-ming techniques.

It can be shown that under mild regularity conditions, vN and zN

converge with probability one to their true counterparts, as thesample size jNj increases, and furthermore zN converges to an opti-mal solution of the true problem with probability approaching oneexponentially fast (see Kleywegt et al., 2001). It is possible to esti-mate the sample size jNj needed to generate an e-optimal solutionto the original problem, assuming that the SAA problem is solvedwithin an absolute optimality gap d P 0, with probability at leastequal to 1 � a whenever

jNjP 3r2max

ðe� dÞ2log

jBjHjja

� �; ð23Þ

where e > d and a 2 (0,1). Here r2max is a maximal variance of certain

function differences (see Kleywegt et al., 2001 for details). It isknown that the sample size estimate (23) is too conservative forpractical applications. However, one can choose a sample size jNjas a trade-off between the quality of the associated optimal solutionof the SAA problem (18)–(22) and the computational burdenneeded to solve it. Therefore, instead of solving one large-scaleSAA problem, the SAA algorithm involves the generation of a setof independent samples and the solution of the correspondingsmaller SAA problems (18)–(22). Using the optimal solution valuesfrom these SAA problems, we compute statistical lower and upperbounds for the optimal solution value of the original problem(13)–(17). Finally, statistical confidence intervals for the optimalitygap are computed, by using these statistical bounds, to evaluate thequality of the approximate solutions. We now describe this proce-dure. Later, during the computational experiments we will showhow to select a sample size capable of producing tight and accuratestatistical bounds.

1. Generate a set M = {N1, . . . ,NjMj} of independent samples, each ofsize jNj, i.e., ni

j; . . . ; njNjj for j 2M. For each sample Nj solve the cor-responding SAA problem

minimizeXi2H

fizi þ1jNj

Xn2Nj

Xk2K

Xe2Ekn

bF eknxekn


Let vNj and zNj ; j 2 M, be the corresponding optimal objective va-lue and an optimal solution, respectively.

2. Compute the average of all optimal solution values from theSAA problems and their variance:

lNM ¼

1jMj

Xj2M

vNj ;

r2lN

M¼ 1ðjMj � 1ÞjMj

Xj2M

vNj � lNM

� �2:

It is known that the average lNM provides a statistical lower

bound for the optimal value of the original problem (13)–(17)(Norkin et al., 1998; Mak et al., 1999) and r2

lNM

is an estimateof the variance of this estimator.

3. Choose a feasible solution z 2 BjHj of the original problem, forinstance, one of the previously obtained solutions zNj . Using thissolution, it is possible to estimate the optimal solution value v⁄

of the original problem (13)–(17) as follows:

vN0 ðzÞ ¼Xi2H

fizi þ1jN0j

Xn2N0

Xk2K

Xe2Ekn

bF eknxekn;

where ni; . . . ; nN0 is a sample of size N0

generated independentlyof the samples employed in the SAA problems. Given that thefirst-stage variables are fixed, one can take jN0jmuch larger thanthe sample size jNj used in the SAA problems. Note that vðzÞ is anestimate on the upper bound on the optimal solution value v⁄ ofthe original problem. The variance of this estimate can be com-puted as

r2N0 ðzÞ ¼

1ðjN0j � 1ÞjN0j

�Xn2N0

Xi2H

fizi þXk2K

Xe2Ekn

bF eknxekn � vðzÞ !2

:

4. Compute an estimate of the absolute optimality gap of solutionz and its variance by using the lower and upper bound esti-mates on the optimal solution value of the original problem(13)–(17) obtained in Steps 2 and 3.

gapN;M;N0 ðzÞ ¼ vN0 ðzÞ � lNM;

r2gap ¼ r2

lNMþ r2

N0 ðzÞ:

Using these estimators, we can construct a confidence intervalfor the optimality gap.

3.2. Benders decomposition for the SAA problem

The most computationally expensive part of the SAA algorithmis the solution of jMj two-stage stochastic integer problems (18)–(22) involving jNj scenarios in Step 1. Even though problem (18)–(22) has far fewer scenarios than the original problem (13)–(17)it is still a very large MIP problem that cannot be efficiently solvedby means of a general-purpose solver, even for small size instances.Observe that for any feasible solution of zi variables with at leastone open hub, problem (18)–(22) decomposes into jKj � jNj inde-pendent semi-assignment subproblems. Therefore, for given valuesof zi variables that represent the location of hub nodes, the result-ing subproblems are relatively easy to solve and involve only therouting variables xekn. This observation indicates that a suitablemethod for the solution of problem (18)–(22) would be one thatwould iteratively adjust the values of the zi variables until optimal-ity is reached or a good solution is found. This is our motivation forusing Benders decomposition.

Benders decomposition is a well-known partitioning algorithmthat separates an original MIP problem into two simpler ones, aninteger master problem and a linear subproblem, and reformulatesit into a problem with fewer variables but an exponential numberof constraints. Because most of these constraints are not active foroptimal solutions, an iterative method can be used to generate onlya small subset of necessary constraints. In this section, we will de-scribe how an efficient algorithm can be derived from the Bendersreformulation to solve the SAA problem to optimality.

3.2.1. Benders reformulationFor any fixed vector z 2 BjHj, the primal subproblem (PS) in the

space of the xekn variables is

vðzÞ ¼minimize1jNj

Xn2N

Xk2K

Xe2Ekn

bF eknxekn

subject to ð19Þ; ð22ÞXe2Ekn :i2e

xekn 6 zi n 2 N; i 2 H; k 2 K: ð24Þ

Let akn and uikn be the dual variables associated with constraints(19) and (24), respectively. The dual subproblem (DS), which is thedual of PS, can be stated as follows:


maximizeXk2K

Xn2N

akn �Xi2H

Xk2K

Xn2N

ziuikn

subject to akn � ue1kn � ue2kn 6bF ekn n 2 N; k 2 K; e 2 Ekn; jej ¼ 2;

akn � ue1kn 6bF ekn n 2 N; k 2 K; e 2 Ekn; jej ¼ 1;

uikn P 0 n 2 N; k 2 K; i 2 H:

We recall that e1 and e2 denote the elements of hub edge e2 Ekn. Let D denote the set of feasible solutions of DS and let PD de-note the set of extreme points of D. Observe that D is not modifiedwhen changing z and, because Fekn P 0 for each e 2 Ek and k 2 K,the null vector 0 is always a solution to DS. Hence, because ofstrong duality, either the primal subproblem is feasible andbounded, or it is infeasible. We are thus interested in z vectors thatgive rise to primal subproblems of the former case. It is known thatif there exists at least one open hub facility, the primal and dualsubproblems are always feasible and bounded (see Contreraset al., 2010a). Therefore, it follows that the dual objective functionvalue is equal to

maxða;uÞ2PD

Xk2K

Xn2N

akn �Xi2H

Xk2K

Xn2N

ziuikn:

Introducing an extra variable g for the overall transportation cost,we can formulate the Benders master problem (MP) as follows:

minimizeXi2H

fizi þ g

subject to g PXk2K

Xn2N

akn �Xi2H

Xk2K

Xn2N

uiknzi 8ða;uÞ 2 PD; ð25ÞXi2H

zi P 1; ð26Þ

z 2 BjHj: ð27Þ

Observe that Benders feasibility cuts associated with the extremerays of D are not necessary in the Benders reformulation becausethe feasibility of PS is ensured by constraints (26). We have thustransformed problem (18)–(22) into an equivalent MIP problemwith jHj binary variables and one continuous variable. Nevertheless,the above Benders reformulation contains an exponential numberof constraints and must be tackled by an adequate cutting plane ap-proach. Thus, we iteratively solve relaxed master problems contain-ing a small subset of constraints (25) associated with the extremepoints of PD, and we keep adding these as needed by solving dualsubproblems until an optimal solution to the original problem isobtained.

It is known that the number of cuts required to obtain an opti-mal solution can be reduced given that the subproblem is decom-posable into jKj � jNj independent subproblems (see, e.g. Birge andLouveaux, 1988). We could in principle generate optimality cutsassociated to extreme points of each dual polyhedron of thejKj � jNj subproblems. However, this approach may not be so effec-tive given the large number of cuts to be added at each iteration. Infact, preliminary computational experiments have shown that thebest Benders reformulation, in terms of required CPU times, is theone obtained by aggregating the information obtained to generatea set of optimality cuts associated with subsets of commodities.This reformulation has also been the most effective for solvingthe UHLPMA (Contreras et al., 2010a). In particular, for each nodei 2 H, let Ki � K be the subset of commodities whose origin nodeis i. We can separate the subproblem into jHj independent sub-problems, one for each node. Hence, we consider the dual polyhe-dra of these jHj subproblems and generate cuts from them. Let PD

be the set of extreme points of the dual polyhedron PDi associatedwith subproblem i. We thus obtain the following Bendersreformulation:

minimizeXi2H

fizi þXi2H

gi

subject to ð26Þ; ð27Þgi P

Xk2Ki

Xn2N

atkn �

Xi2H

Xk2Ki

Xn2N

utiknzi 8i 2 H; ða;uÞ 2 PDi :

Using this reformulation, only jHj potential optimality cuts willbe generated when solving the subproblem, instead of jKj � jNj cutsneeded for the full separability into jKj � jNj dual subproblems.

3.2.2. Benders decomposition algorithmLet ub denote an upper bound on the optimal solution value and

let t represent the current iteration number. Let PtD denote the re-

stricted set of extreme points of D at iteration t, MP PtD

� �the relaxed

master problem obtained by replacing PD by PtD in MP, and

vðMP PtD

� �Þ its optimal solution value. Also, let zt be an optimal

solution vector of MP PtD

� �, DS (zt) the dual subproblem for zt, and

v(DS(zt)) its optimal solution value. A pseudo-code of the Bendersdecomposition algorithm is provided in Algorithm 1.

Algorithm 1: Benders decomposition

ub 1, t 0Pt

D £

terminate falsewhile (terminate = false) do

Solve MPðPtDÞ to obtain zt

if ðvðMPðPtDÞÞ ¼ ubÞ then

terminate trueelse

Solve DS (zt) to obtain (at,ut) 2 PD

Ptþ1D Pt

D [ ðat ;utÞf gif vðDSðztÞÞ þ

Pi2Hfizt

i < ub� �

thenub vðDSðztÞÞ þ

Pi2Hfizt

i

end ifend ift t + 1

end while

Whenever the problem defined by (18)–(22) is feasible, Algo-rithm 1 will yield an optimal solution. We use the algorithm de-scribed in Contreras et al. (2010a) for efficiently solving DS (zt)and obtain good optimality cuts at each iteration of the algorithm.

4. Computational experiments

In this section we present the results of computational experi-ments performed to assess the behaviour of the proposed solutionmethod. We first focus on some implementation details concerningthe practical convergence of the SAA scheme, we then compare theeffects of uncertainty under different continuous probability distri-butions and different levels of uncertainty on the optimal solutionsof the UHL-SIC, and finally we test the robustness and limitationsof our method on instances involving up to 50 nodes. All algo-rithms were coded in C and run on a Dell Studio PC with an IntelCore 2 Quad processor Q8200 running at 2.33 GHz and 8 GB ofRAM under a Linux environment. The master problems of all ver-sions of the algorithm to solve the SAA problems were solved usingthe callable library of CPLEX 10.1.

We have generated a set of benchmark instances using the pro-cedure described in Contreras et al. (2010a). These instances con-sider different levels of magnitude for the amount of floworiginating at a given node to obtain three different sets of nodes:low-level (LL) nodes, medium-level (ML) nodes, and high-level


(HL) nodes. The total outgoing flow of LL, ML and HL nodes lies inthe interval [1,10], [10,100], and [100,1000], respectively. Usingthese nodes, different classes of instances can be generated. Wegenerate instances of the class Set I (see for details Contreraset al., 2010a), in which the number of HL, ML, and LL nodes is2%, 38% and 60% of the total number of nodes, respectively. Wegenerate instances with jHj = 10, 20, 30, 40, and 50 and assumegamma and normal distributions for the uncertain transportationcosts. We set the mean transportation costs cij between node iand j, for all i, j 2 H, equal to the Euclidean distance and the stan-dard deviation equal to rij = m � cij, where m is the coefficient of var-iation. The non-negativity of the transportation costs is naturallypreserved in the gamma distribution, whereas the normal distribu-tion has to be truncated to avoid negative values.

4.1. Practical convergence of SAA algorithm

The aim of the first part of the computational experiments is toanalyze the practical convergence of the SAA scheme. Recall thatduring the SAA procedure, we need to generate jMj independentsamples of size jNj. In order to select the proper sizes jNj and jMj,some trade-offs need to be made. On the one hand, the objectivefunction of the SAA problem tends to be a more accurate estimateof the objective function as we increase jNj. This implies that anoptimal solution of the SAA problem tends to be a better solutionand the corresponding estimate on the optimality gap tends tobe tighter as jNj increase. On the other hand, the computationalcomplexity for the solution of the SAA problem increases at leastlinearly in jNj and frequently exponentially. Therefore, it may bemore efficient to choose a smaller sample of size jNj and to increasejMj. We are interested in finding jNj and jMj such that the estimatedoptimality gap and the variance of the gap estimator are suffi-ciently small while keeping the required CPU time at minimum.

We test the practical convergence of the SAA algorithm by usingdifferent combinations of sample sizes jNj 2 {20,50,100,500,1000}and jMj 2 {10,20,40,60,80,100}. To perform this analysis, we selecttwo 10-node instances of Set I having both s = 0.2 and m = 0.5, butwith different probability distributions for the uncertain transpor-tation costs. For each of the optimal solutions obtained in the SAAproblems, we use a sample size of jN0j = 100,000 to obtain goodestimations of the optimal solution value of the true problem(13)–(17). Fig. 1A and B plot the estimated optimality gap for dif-ferent sample sizes jNj and jMj when considering the normal andgamma distributions for the uncertain transportation costs,respectively.

For the normal distribution (Fig. 1A), we observe that by choos-ing a small sample size jNj 6 100, the optimality gap remainsabove 0.5% even if we increase the sample size jMj. We can also

-0.5

0

0.5

1

1.5

2

2.5

3

|M|=10 |M|=20 |M|=40 |M|=60 |M|=80 |M|=100

Estim

ated

% g

ap

Number of SAA problems (M)

|N|=20

|N|=50

|N|=100

|N|=500

|N|=1000

(A) Normal distribution

Fig. 1. Optimality gap for a 10-node instan

appreciate that a higher number of SAA problems need to be solvedin order to obtain a more accurate estimation of the optimality gapcorresponding to a particular value of the sample size jNj. In con-trast, when using a larger sample size such as jNj = 500 andjNj = 1000, only a small number of SAA problems are required toobtain more accurate optimality gaps below 0.1%. For the gammadistribution (Fig. 1B), it is necessary to increase the number ofSAA problems to solve in order to obtain accurate optimality gapswhen using small samples jNj 6 100. Observe that if jNj = 20, theestimated optimality gap is �0.1% if jMj = 10, whereas a more accu-rate estimation of 0.8% is obtained when increasing the number ofSAA problems to jMj = 100. Similar results are obtained for the caseof jNj = 50. However, when using a larger sample jNjP 500, only asmall number of SAA problems are needed to obtain an accurateoptimality gap below 0.1%.

Fig. 2A and B plot the estimated standard deviation for the opti-mality gap for different sample sizes of jNj and jMj, with normaland gamma distributions, respectively. As expected, we observethat as the sample sizes jNj and jMj increase, the correspondingstandard deviation for the optimality gap is considerably reducedfor both the normal and gamma distributions. However, when con-sidering large sample sizes jNjP 500 the standard deviation is suf-ficiently small, even if a small number of SAA problems are solved.In contrast, with sample sizes jNj 6 50 a larger number of SAAproblems are needed to reduce the variability.

Fig. 3A and B plot the total CPU time required for the SAA algo-rithm for different sample sizes jNj and jMj with the normal andgamma distributions, respectively. From these figures, we observethat the computational complexity for solving the SAA problemsseems to increase only linearly in jNj in both cases. For instance,the required CPU time in the case of the normal distribution,jNj = 500 and jMj = 80 is 80 seconds whereas the CPU time in-creases to 140 seconds when jNj = 1000. The results of the previousexperiments indicate that it seems better to increase the samplesize jNj rather than the number of SAA problems when applyingthe SAA algorithm to the UHL-SIC. Furthermore, our results alsoindicate that when using a large sample size jNj, only a small num-ber of SAA problems are required to obtain tight optimality gapshaving a small deviation. For these reasons, during the rest of thecomputational experiments we use sample sizes jNj = 1000 andjMj = 20.

4.2. Effects of uncertainty on optimal solutions

The second part of the experiments is devoted to studying theeffects of introducing uncertainty in the transportation costs ofthe UHLPMA. We use the coefficient of variation as a controlparameter to introduce different levels of uncertainty into the

-0.2-0.1

00.10.20.30.40.50.60.70.80.9

|M|=10 |M|=20 |M|=40 |M|=60 |M|=80 |M|=100

Estim

ated

% g

ap


|N|=20

|N|=50

|N|=100

|N|=500

|N|=1000

(B) Gamma distribution

ce with different values of jNj and jMj.

0

50

100

150

200

250

|M|=10 |M|=20 |M|=40 |M|=60 |M|=80 |M|=100

Stan

dard

dev

iatio

n fo

r % g

ap


|N|=20

|N|=50

|N|=100

|N|=500

|N|=1000

0

20

40

60

80

100

120

140

160

|M|=10 |M|=20 |M|=40 |M|=60 |M|=80 |M|=100

Stan

dard

dev

iatio

n fo

r % g

ap


|N|=20

|N|=50

|N|=100

|N|=500

|N|=1000

(A) Normaldistribution (B) Gamma distribution

Fig. 2. Standard deviation for the optimality gap for a 10-node instance with different values of jNj and jMj.

0

20

40

60

80

100

120

140

160

180

|N|=20 |N|=50 |N|=100 |N|=500 |N|=1000

SAA

tim

e (s

econ

ds)

Scenarios in each SAA problem (N)

|M|=10

|M|=20

|M|=40

|M|=60

|M|=80

|M|=100

0

20

40

60

80

100

120

140

160

180

|N|=20 |N|=50 |N|=100 |N|=500 |N|=1000

SAA

tim

e (s

econ

ds)

Scenarios in each SAA problem (N)

|M|=10

|M|=20

|M|=40

|M|=60

|M|=80

|M|=100

(A) Normal distribution (B) Gamma distribution

Fig. 3. Total CPU time for a 10-node instance with different values of jNj and jMj.

Table 1Comparison for a 10-node instance of Set I using a normal distribution.

m Objective value Lower bound Optimality % gap CI for SAA % gap at 95% Time (s) Best solution

EVP SAA SAA EVP SAA

0.00 25483.75 – – – – – 0.14 1, 5, 80.25 25333.91 25224.59 25214.35 0.47 0.04 (�0.13,0.21) 28.68 1, 50.50 24976.67 24255.89 24244.58 2.93 0.05 (�0.21,0.30) 34.29 5, 7, 90.75 25182.79 23859.95 23860.72 5.25 0.00 (�0.27,0.26) 38.56 5, 7, 91.00 25973.57 24344.77 24276.20 6.53 0.28 (0.00,0.56) 56.41 5, 7, 92.00 31434.09 28188.67 28161.16 10.41 0.10 (�0.21,0.41) 76.38 4, 5, 7, 9


model. In particular, we consider m 2 {0.25,0.5,0.75,1.0,2.0} to rep-resent a wide range of situations for the amount of uncertainty inthe transportation costs. We compare the stochastic solutions ofUHL-SIC obtained with the SAA algorithm to those of the optimalsolution of EVP.

We first study the effects of uncertainty by using the two10-node instances of Set I used in the previous experiments. Thecomputational results for the normal and gamma distributionsare summarized in Tables 1 and 2, respectively. The first columnprovides the coefficient of variation m whereas the next two col-umns give the estimated objective value relative to the EVP andthe best solution provided by the SAA algorithm, respectively.The Lower bound SAA column provides the statistical lower boundobtained with the SAA algorithm, i.e. the average lN

M . The next twocolumns under the heading Estimated % gap provide the percentoptimality gap relative to the EVP and the best solution providedby the SAA algorithm, respectively. We recall that the gaps are

computed with respect to the lower bound obtained by the SAAalgorithm. The next column gives the 95% confidence interval forthe optimality gap of the best solution obtained by the SAA algo-rithm. We assume that gapN;M;N0 ðzÞ is normally distributed withvariance r2

gap. The next column under the heading Time (sec) givesthe total time in seconds for required for the SAA algorithm. Thelast column provides the best found solution for the SAA algorithm.The first row with m = 0.00 corresponds to the optimal EVP solution.

The results presented in Table 1 show that, for the consideredinstance, the optimal EVP solution does not remain optimal whenuncertainty is introduced in the transportation costs. The SAA algo-rithm yields in all cases a better solution having a smaller optimal-ity gap than that of the EVP. Moreover, this difference becomesmore important when the variability in the uncertain transporta-tion costs increases. For a low variability level m = 0.25, the optimal(or best known) set of hubs is reduced to only two hubs by elimi-nating hub node 8. However, for medium variability levels m = 0.50,

Table 2Comparison for a 10-node instance of Set I using a gamma distribution.


EVP SAA SAA EVP SAA

0.00 25483.75 – – – – – 0.14 1, 5, 80.25 24543.10 24152.47 24130.73 1.68 0.09 (�0.12,0.30) 34.94 1, 50.50 23479.64 22851.23 22852.34 2.67 0.00 (�0.23,0.22) 39.76 1, 5, 70.75 22573.51 21595.95 21598.96 4.32 �0.01 (�0.29,0.26) 42.01 1, 5, 71.00 21795.88 20584.52 20564.26 5.65 0.10 (�0.20,0.39) 47.05 1, 5, 72.00 19525.83 17846.37 17827.08 8.70 0.11 (�0.23,0.45) 60.35 1, 5, 7

Table 3Comparison for a 10-node instance of AP using a normal distribution.


EVP SAA SAA EVP SAA

0.00 189568.81 – – – – – 0.12 1, 4, 70.25 189503.05 189503.05 189498.61 0.00 0.00 (�0.24,0.24) 27.82 1, 4, 70.50 189427.06 189427.06 189423.11 0.00 0.00 (�0.24,0.24) 28.01 1, 4, 70.75 189332.76 189332.76 189342.71 �0.01 �0.01 (�0.25,0.23) 29.11 1, 4, 71.00 189222.63 189222.63 189216.51 0.00 0.00 (�0.24,0.24) 28.65 1, 4, 72.00 188847.70 188847.70 188844.77 0.00 0.00 (�0.24,0.24) 28.33 1, 4, 7

Table 4Comparison for a 10-node instance of AP using a gamma distribution.


EVP SAA SAA EVP SAA

0.00 189568.81 – – – – – 0.12 1, 4, 70.25 189497.02 189497.02 189491.78 0.00 0.00 (�0.24,0.24) 27.62 1, 4, 70.50 189128.30 189128.30 189118.00 0.01 0.01 (�0.24,0.25) 27.70 1, 4, 70.75 188511.60 188511.60 188493.37 0.01 0.01 (�0.25,0.24) 29.74 1, 4, 71.00 187621.25 187621.25 187584.65 0.02 0.02 (�0.22,0.26) 32.51 1, 4, 72.00 183165.52 183085.70 182979.98 0.10 0.06 (�0.12,0.24) 46.57 4, 7


0.75 and 1.0, the set changes to be the hub nodes 5, 7 and 9. In thecase of a high variability level m = 2.0, the cardinality of the set in-creases by one. Note that node 5 is the only hub node that appearsin all considered cases of uncertainty.

Similar observations can be drawn from Table 2 for the gammadistribution. When m = 0.25, the best set of hub nodes is the sameas for the normal distribution. However, for higher levels of vari-ability m P 0.5, the set of hubs becomes {1,5,7} . Note that nodes1 and 5 are always selected as hub nodes in all considered casesof uncertainty.

We have also used a 10-node instance from the well-known AP(Australian Post) set of instances to study the effects of uncer-tainty. This data set is the most commonly used in the hub locationliterature (mscmga.ms.ic.ac.uk/jeb/orlib/phubinfo.html). Transporta-tion costs are proportional to the Euclidean distances eij between200 cities in Australia and the values of Wk represent postal flowsbetween pairs of cities. From this set of instances, we select the 10-node instances with set-up costs of the type loose (L), inter-hubdiscount cost s = 0.2, and compute the expected transportationcost as dij = TC � eij for each pair (i, j) 2 H � H, where TC = 3 is a scal-ing parameter for the transportation costs (see for details Contreraset al., 2010a). The computational results for the normal and gammadistributions are summarized in Tables 3 and 4, respectively.

The results presented in Table 3 show that the optimal EVPsolution remains the best (provably optimal) solution when uncer-tainty is introduced in the transportation costs, even for the largestconsidered level of variability m = 2.0. Similar results can be drawnfrom Table 4 for the gamma distribution. Only for the highest level

of variability m = 2.0, is the SAA algorithm capable of finding aslightly better solution than the optimal EVP solution. Additionalcomputational experiment have shown that similar results are ob-tained for the majority of AP data instances with up to 50 nodes.This situation may be partially explained by the fact that the AP in-stances are known to have a very peculiar flow structure. In partic-ular, it is known that the amount of flow originating at each node ishighly variable in every instance of this set: all instances have avery small number of nodes for which the outgoing flow is muchlarger than for the other nodes (see Contreras et al., 2010a). Thissituation seems to bias the optimal location of the hubs since veryfew nodes have a large impact on the overall cost of the networkand thus greatly influence the hub location decisions. Therefore,even if we introduce a high variability in the uncertain transporta-tion costs, the optimal EVP solution is still the best one.

4.3. Solving medium-size instances

In the last part of the computational experiments, we analyzeand evaluate the performance of the SAA algorithm on medium-size instances with up to 50 nodes. The computational results forthe normal and gamma distributions are summarized in Tables 5and 6, respectively. The first three columns provide the numberof nodes, the discount factor and the coefficient of variation. Theother columns have the same meaning as in the previous tables.

The results of Table 5 confirm the efficiency of the SAA algo-rithm for the UHL-SIC. The estimated optimality gaps of the bestsolutions provided by the SAA algorithm are always below 0.1%,

Table 5Computational results for 20 instances of Set I using a normal distribution.

Instance Objective value LB % Gap CI for SAA Time Best solution

jHj s m EVP SAA SAA EVP SAA % gap SAA EVP SAA

10 0.2 0.5 24976.67 24255.89 24244.58 2.93 0.05 (�0,21,0,30) 34.29 1, 5, 8 5, 7, 910 0.7 0.5 28220.7 28220.70 28198.95 0.08 0.08 (�0,12,0,28) 19.49 1, 5 1, 510 0.2 1.0 25973.57 24344.77 24276.20 6.53 0.28 (0,00,0,56) 56.41 1, 5, 8 5, 7, 910 0.7 1.0 29650.95 29355.77 29325.46 1.10 0.10 (�0,15,0, 36) 37.22 1, 5 1, 5, 920 0.2 0.5 41060.47 40757.68 40751.59 0.75 0.01 (�0,21,0, 24) 791.07 2, 20 2, 4, 2020 0.7 0.5 48853.55 48695.73 48693.27 0.33 0.01 (�0,15,0, 16) 374.20 2, 20 2, 1120 0.2 1.0 44758.35 41915.99 41877.31 6.44 0.09 (�0,14,0, 33) 1740.00 2, 20 2, 11, 2020 0.7 1.0 53093.10 50712.01 50752.43 4.41 �0.08 (�0,38,0, 18) 1499.68 2, 20 2, 11, 19, 2030 0.2 0.5 149425.70 148219.39 148219.39 0.81 0.00 (�0,20,0,20) 5332.18 7, 24, 28 6, 7, 2430 0.7 0.5 178453.27 175772.29 175765.89 1.51 0.00 (�0,16,0, 17) 2421.57 7, 18 6, 7, 1730 0.2 1.0 151485.62 148820.66 148849.47 1.74 �0.02 (�0,26,0, 22) 18329.18 7, 24, 28 7, 11, 17, 2430 0.7 1.0 184972.61 173460.65 173356.94 6.28 0.06 (�0,14,0, 26) 13759.99 7, 18 1, 7, 1740 0.2 0.5 112094.75 112094.75 112129.65 �0.03 �0.03 (�0,28,0, 18) 11582.06 9, 17, 20 9, 17, 2040 0.7 0.5 129931.89 125250.34 125185.79 3.65 0.05 (�0,09,0,19) 6169.20 9, 21 9, 2840 0.2 1.0 121937.39 120169.73 120208.65 1.42 �0.03 (�0.34,0.27) 52745.89 9, 17, 20 9, 20, 2840 0.7 1.0 146287.70 135070.60 135129.94 7.63 �0.04 (�0,26,0, 17) 30930.29 9, 21 9, 28, 3450 0.2 0.5 65622.75 59880.41 59925.68 8.68 �0.08 (�0,39,0, 23) 23458.24 7 43, 5050 0.7 0.5 65635.07 62501.18 62562.61 4.68 �0.10 (�0,33,0, 09) 14563.79 7 43, 5050 0.2 1.0 90642.87 72518.57 72556.89 19.95 �0.05 (�0,47,0, 28) 86400.00 7 43, 5050 0.7 1.0 95781.32 76082.17 76141.13 20.51 �0.08 (�0,55,0, 22) 86400.00 7 43, 50

Table 6Computational results for 20 instances of Set I using a gamma distribution.

Instance Objective value LB % Gap CI for SAA Time Best solution

jHj s m EVP SAA SAA EVP SAA % gap SAA EVP SAA

10 0.2 0.5 23479.64 22851.23 22852.34 2.67 0.00 (�0.23,0.22) 40.35 1, 5, 8 1, 5, 710 0.7 0.5 26434.96 26434.96 26459.96 �0.09 �0.09 (�0.36,0.11) 26.28 1, 5 1, 510 0.2 1.0 21795.88 20584.52 20564.26 5.65 0.10 (�0.20,0.39) 46.17 1, 5, 8 1, 5, 710 0.7 1.0 24197.02 23963.12 23988.48 0.86 �0.11 (�0.39,0.18) 41.88 1, 5 1, 5, 920 0.2 0.5 37999.81 37026.13 37011.09 2.60 0.04 (�0.25,0.33) 1317.00 2, 20 2, 11, 17, 2020 0.7 0.5 45747.11 44204.07 44171.33 3.44 0.07 (�0.14,0.29) 629.02 2, 20 2, 11, 1920 0.2 1.0 34784.15 32720.74 32704.24 5.98 0.05 (�0.24,0.34) 1976.93 2, 20 2, 19, 2020 0.7 1.0 42144.67 38172.47 38210.97 9.33 �0.10 (�0.37,0.17) 1532.53 2, 20 2, 11, 1930 0.2 0.5 138954.48 137273.53 137372.54 1.14 �0.07 (�0.29,0.14) 8755.38 7, 24, 28 6, 7, 2430 0.7 0.5 168325.23 160425.78 160469.86 4.67 �0.03 (�0.22,0.17) 5644.40 7, 18 7, 17, 1830 0.2 1.0 125171.77 122668.58 122737.18 1.94 �0.06 (�0.29,0.14) 20120.32 7, 24, 28 7, 17, 24, 2830 0.7 1.0 154033.43 136881.10 136962.42 11.08 �0.06 (�0.32,0.20) 14811.53 7, 18 7, 28, 3440 0.2 0.5 97487.40 96272.78 96239.99 1.28 0.03 (�0.24,0.31) 26837.41 9, 17, 20 7, 9, 2840 0.7 0.5 129931.89 125250.34 125185.79 3.65 0.05 (�0.09,0.19) 6212.16 9, 21 9, 2840 0.2 1.0 86201.89 82950.37 82876.56 3.86 0.09 (�0.26,0.24) 38808.13 9, 17, 20 7, 9, 2840 0.7 1.0 105397.52 87696.63 87772.61 16.72 �0.09 (�0.54,0.46) 42583.25 9, 21 7, 28, 3450 0.2 0.5 58777.56 43532.81 43550.68 25.91 �0.04 (�0.37,0.28) 26863.32 7 43, 5050 0.7 0.5 58797.57 45465.50 45480.25 22.25 �0.03 (�0.34,0.28) 34296.60 7 43, 5050 0.2 1.0 58737.43 37982.81 38012.08 35.28 �0.08 (�0.54,0.23) 25204.74 7 43, 5050 0.7 1.0 58784.56 39241.18 39269.24 33.20 �0.07 (�0.48,0.27) 29628.68 7 43, 50


except for one instance with 0.28%, and the upper bounds of theconfidence intervals never exceed 0.56%. Moreover, the SAA algo-rithm is able to improve the solution of the expected value prob-lem in 19 out of the 20 tested instances. Nevertheless, the CPUtime required for the SAA algorithm grows very fast when instancesize and variability increase.

Similar results can be drawn from Table 6 for the gamma distri-bution. The estimated optimality gap provided by the SAA algo-rithm is always below 0.1% and the upper bound for theconfidence interval never exceeds 0.46%.

5. Conclusions

We have introduced stochastic uncapacitated hub locationproblems in which demands or transportation costs are uncertain.We have proved that stochastic problems with uncertain demandsor dependent transportation costs are equivalent to a deterministicexpected value problem. For the uncertain independent transpor-tation costs, we have presented a solution method that integratesa sampling strategy, the SAA scheme, coupled with a Benders

decomposition algorithm to obtain solutions to problems with avery large number of scenarios. Computational results confirmthe efficiency and robustness of the proposed solution method.Benchmark instances involving up to 50 nodes were solved withinan estimated optimality gap inferior to 0.3%. Furthermore, we haveshown that the stochastic solutions obtained with the SAA algo-rithm are superior to those of the expected value problem, andthe relative difference between the solution costs of the two prob-lems tends to increase with the variance of the uncertain transpor-tation costs. An interesting avenue of research for future workconsists in addressing the problem with stochastic dependentdemands.

Acknowledgments

This work was partly funded by the Canadian Natural Sciencesand Engineering Research Council under Grants 227837-09 and39682-10. This support is gratefully acknowledged. Thanks are alsodue to the anonymous referees.


References

Alumur, S., Kara, B.Y., 2008. Network hub location problems: The state of the art.European Journal of Operational Research 190, 1–21.

Benders, J.F., 1962. Partitioning procedures for solving mixed variablesprogramming problems. Numersiche Mathematik 4, 238–252.

Birge, J.R., Louveaux, F.V., 1997. Introduction to Stochastic Programming. Springer-Verlag, New York.

Birge, J.R., Louveaux, F.V., 1988. A multicut algorithm for two-stage stochastic linearprograms. European Journal of Operational Research 34, 384–392.

Camargo, R.S., Miranda Jr., G., Luna, H.P., 2008. Benders decomposition for theuncapacitated multiple allocation hub location problem. Computers andOperations Research 35, 1047–1064.

Campbell, J.F., Ernst, A.T., Krishnamoorthy, M., 2002. Hub location problems. In:Drezner, Z., Hamacher, H.W. (Eds.), Facility Location: Applications and Theory.Springer, Heidelberg, pp. 373–408.

Cánovas, L., Garcia, S., Marın, A., 2007. Solving the uncapacitated multiple allocationhub location problem by means of a dual-ascent technique. European Journal ofOperational Research 179, 990–1007.

Contreras, I., Cordeau, J.-F., Laporte, G. 2010a. Benders Decomposition for Large-Scale Uncapacitated Hub Location. Techincal Report CIRRELT-2010-26, HECMontréal.

Contreras, I., Fernádez, E., Marın, A., 2010b. The tree of hubs location problem.European Journal of Operational Research 202, 390–400.

Hamacher, H.W., Labbé, M., Nickel, S., Sonneborn, T., 2004. Adapting polyhedralproperties from facility to hub location problems. Discrete AppliedMathematics 145, 104–116.

Kleywegt, A.J., Shapiro, A., Homem-De-Mello, T., 2001. The sample averageapproximation method for stochastic discrete optimization. SIAM Journal ofOptimization 12, 479–502.

Laporte, G., Louveaux, F.V., Van hamme, L., 1994. Exact solution to alocation problem with stochastic demands. Transportation Science 28,95–103.

Louveaux, F.V., 1986. Discrete stochastic location models. Annals of OperationsResearch 6, 23–34.

Louveaux, F.V., 1993. Stochastic location analysis. Location Science 1, 127–154.Louveaux, F.V., Peeters, D., 1992. A dual-based procedure for stochastic facility

location. Operations Research 40, 564–573.Mak, W.K., Morton, D.P., Wood, R.K., 1999. Monte-Carlo bounding techniques for

determining solution quality in stochastic programs. Operations ResearchLetters 24, 47–56.

Marianov, V., Serra, D., 2003. Location models for airline hubs behaving as M/D/cqueues. Computers and Operations Research 30, 983–1003.

Marın, A., Canovas, L., Landete, M., 2006. New formulations for the uncapacitatedmultiple allocation hub location problem. European Journal of OperationalResearch 172, 274–292.

Norkin, V.I., Pflug, G.C., Ruszczynski, A., 1998. A branch and bound method forstochastic global optimization. Mathematical Programming 83, 425–450.

Ravi, R., Sinha, A., 2006. Hedging uncertainty: Approximation algorithms forstochastic optimization problems. Mathematical Programming Series A 108,97–114.

Santoso, T., Ahmed, S., Goetschalckx, M., Shapiro, A., 2005. A stochasticprogramming approach for supply chain network design under uncertainty.European Journal of Operational Research 167, 96–115.

Schütz, P., Tomasgard, A., Ahmed, S., 2009. Supply chain design under uncertaintyusing sample average approximation and dual decomposition. European Journalof Operational Research 199, 409–419.

Shapiro, A., Homem-De-Mello, T., 1998. A simulation-based approach to stochasticprogramming with recourse. Mathematical Programming 81, 301–325.

Sim, T., Lowe, T.J., Thomas, B.W., 2009. The stochastic p-hub center problemwith service-level constraints. Computers and Operations Research 36, 3166–3177.

Snyder, L.V., 2006. Facility location under uncertainty: A review. IIE Transactions 38,537–554.

Yang, T.-H., 2009. Stochastic air freight hub location and flight routes planning.Applied Mathematical Modeling 33 (12), 4424–4430.

stochastic uncapacitated hub location

Documents