The Stochastic Inventory Routing Problem with Direct Deliveries
Anton J. Kleywegt ∗
Vijay S. NoriMartin W. P. Savelsbergh
School of Industrial and Systems EngineeringGeorgia Institute of Technology
Atlanta, GA 30332-0205
November 20, 2000
Abstract
Vendor managed inventory replenishment is a business practice in which vendors monitor their cus-
tomers’ inventories, and decide when and how much inventory should be replenished. The inventory
routing problem addresses the coordination of inventory management and transportation. The ability to
solve the inventory routing problem contributes to the realization of the potential savings in inventory
and transportation costs brought about by vendor managed inventory replenishment. The inventory
routing problem is hard, especially if a large number of customers is involved. We formulate the inven-
tory routing problem as a Markov decision process, and we propose approximation methods to find good
solutions with reasonable computational effort. Computational results are presented for the inventory
routing problem with direct deliveries.
∗Supported by the National Science Foundation under grant DMI-9875400.
The inventory routing problem (IRP) is one of the core problems that has to be solved when implementing
the emerging business practice called vendor managed inventory replenishment (VMI). VMI refers to the
situation where the replenishment of inventory at a number of locations is controlled by a central decision
maker (vendor). The central decision maker can be the supplier, and the inventory can be kept at independent
customers, or the central decision maker can be a manager responsible for inventory replenishment at a
number of warehouses or retail outlets of the same company. Often the central decision maker manages a
fleet of vehicles that make the deliveries. In this paper the central decision maker is called the supplier, and
the inventory locations are referred to as the customers.
VMI differs from conventional inventory management in the following way. In conventional inventory
management, the customers monitor their own inventory levels, and when a customer thinks that it is time
to reorder, an order for a quantity of the product is placed at the supplier. The supplier receives these orders
from the customers, prepares the product for delivery, and makes deliveries using the fleet of vehicles.
Conventional inventory management has several disadvantages. It is typical for orders not to arrive
uniformly over time. For example, one of the suppliers we worked with used to be flooded with orders on
Mondays. The conjecture was that many customers tend to check their inventory levels on Mondays, and
then place orders. The result of this nonuniform order arrival pattern is that the supplier’s resources, such as
the production and storage facilities, as well as transportation resources, cannot be utilized well over time.
For example, the supplier’s resources would be stretched to the limit on Mondays and Tuesdays, after a large
number of orders have arrived, and would be relatively idle during the rest of the week. Another related
phenomenon causes a disadvantage for the customers. Some customers place apparently urgent orders when
other customers place orders that are really urgent. Since the supplier does not know the inventory levels at
the customers, the information needed to compare the real urgency of different orders is not available. Also,
the supplier is only responsible for delivering product on order to the customer, and not for maintaining a
desirable inventory level at the customer, and hence, even if the supplier were provided with the inventory
level data, there would not be a strong incentive for the supplier to find the optimal trade-off between the
inventory needs of the different customers. Consequently really urgent orders may be delayed because of a
lack of information and incentive, and a high demand on the supplier’s resources.
In VMI, the supplier monitors the inventory at the customers. This is made possible with modern
equipment that can both measure the inventory at the customers and communicate with the supplier’s
computer. The rapidly decreasing cost of this technology has probably made a significant contribution
to the increasing popularity and success of VMI. The supplier is responsible for maintaining a desirable
inventory level at each customer, and decides which customers should be replenished at which times, and
with how much product.
To make these decisions, the supplier has the benefit of access to a lot of relevant information, such as
the current (and past) inventory levels at all the customers, the customers’ demand behavior, the customers’
locations relative to the supplier and relative to each other and the resulting transportation costs, and the
capacity and availability of vehicles and drivers for delivery.
It is thus not surprising that VMI has several advantages for the supplier over conventional inventory
2
management. First, VMI may lead to reduced production and inventory costs. By implementing VMI, the
supplier can usually obtain a more uniform utilization of resources. This reduces the amounts of resources
required and increases the productivity of the resources. It also reduces the amount of inventory the supplier
has to keep to achieve a desirable level of customer service. Second, VMI may reduce transportation costs
beyond the reduction achieved by a more uniform utilization of transportation capacity. By proactive
planning based on the additional available information instead of reactive response to customers’ orders as
they arrive, it may be possible to increase the frequency of low-cost full truckload shipments and decrease the
frequency of high-cost less-than-truckload shipments. Furthermore, it may be possible to use more efficient
routes by coordination of the replenishment at different customers close to each other. Third, VMI may
increase service levels, measured in terms of reliability of product availability, which is also an important
benefit for the customers. As discussed, under conventional inventory management the supplier does not
have the information to prioritize urgent orders from different customers. With VMI, the supplier does have
the information to determine which nonurgent deliveries can be postponed to accommodate urgent deliveries.
Similarly, the supplier does have the information to know which customers may receive smaller-than-usual
replenishments to enable larger-than-usual replenishments at other customers in dire need. Also, the supplier
has an incentive to find a good trade-off between the inventory needs of the different customers. Thus two
advantages of VMI for customers are more reliable product availability, and the fact that customers have
to devote fewer resources to monitoring their inventory levels and placing orders than under conventional
inventory management.
There are several requirements to obtain the potential benefits of VMI. Two important requirements are
(1) the availability of relevant, accurate, and timely data for the decision maker, and (2) the ability of the
central decision maker to use the increased amount of information to make good decisions. There have been
several successful, but also failed, implementations of VMI. Many of the failures are due to one or both of
the above requirements not being met.
Using the large amount of data obtained with VMI to make good decisions is a very complex task, as the
resulting decision problems turn out to be extremely hard. In this paper we study a core decision problem
that often has to be addressed when implementing VMI, namely the inventory routing problem, and we
propose methods for obtaining good decisions.
The inventory routing problem (IRP) addresses the coordination of inventory replenishment and trans-
portation. Specifically, we study the problem of determining optimal policies for the distribution of a single
product from a single supplier to multiple customers. For this purpose, the supplier controls a fleet of vehicles.
The demands at the customers are assumed to have probability distributions that are known to the supplier.
The objective is to maximize the expected discounted value, incorporating sales revenues, production costs,
transportation costs, inventory holding costs, and shortage penalties, over an infinite horizon.
Our work on this problem is motivated by our collaboration with a producer and distributor of air
products. The company operates several plants and produces a variety of products, such as liquid nitrogen
and oxygen. The company’s bulk customers have their own storage tanks at their sites, which are replenished
by tanker trucks under the company’s control. Most of the bulk customers participate in the company’s VMI
program. The inventory levels at the bulk customers are measured by remote telemetry units. Such a device
3
measures the quantity of the product in the storage tank, and is connected through a modem and the
telephone network to the company’s computer. A telemetry unit can be set to periodically measure the
inventory level and send the information to the company’s computer, and the computer can also query the
telemetry unit at any time, so that the decision maker can obtain inventory information whenever needed.
For the most part each customer and each vehicle is allocated to a specific plant, so that the overall problem
decomposes according to individual plants. Also, to improve safety and reduce contamination, each vehicle
and each storage tank at a customer is dedicated to a particular type of product. Hence the problem also
decomposes according to type of product. It seems that the most questionable assumptions are that vehicles
and drivers are available at the beginning of each day, mostly because of the unpredictability of driver
availability, and that the probability distributions of the customers’ demands are known to the supplier and
do not change over time. In practice, these probability distributions have to be estimated from data, and
the probability distributions change over time. Fortunately, in this particular case, a large amount of data is
available, and the demand characteristics of consumers do not seem to change rapidly over time. (However,
there are significant differences between demand on weekdays and weekends.)
A definition of the IRP is given in Section 1. In Section 2 research related to the IRP is reviewed.
Section 3 discusses the major computational tasks involved in solving the IRP. Section 4 presents a special
case of the IRP, namely the IRP with Direct Deliveries. In Sections 5 and 6 an approximation method for
this problem is developed. Computational results are presented in Section 7, in which the solution values of
the proposed method are compared with the optimal values for small problems, as well as with the values
of a heuristic proposed in the literature for small and medium sized problems. Further research in this area
is briefly discussed in Section 8.
1 Problem Definition
A more general description of the IRP is given in Section 1.1, after which a Markov decision process formu-
lation is given in Section 1.2.
1.1 Problem Description
A product is distributed from a supplier’s plant to N customers, using a fleet of M homogeneous vehicles,
each with known capacity CV . Each customer n has a known storage capacity Cn. The process is modeled
in discrete time t = 0, 1, . . . , and the discrete time periods are called days. Customers’ demands on different
days are independent random vectors with a joint probability distribution F that does not change with time.
The probability distribution F is known to the supplier. The supplier can measure the inventory level Xntof each customer n at any time t. The supplier makes decisions regarding which customers’ inventories to
replenish, how much to deliver at each customer, how to combine customers into vehicle routes, and which
vehicle routes to assign to each of the M vehicles. The set of feasible decisions is determined by constraints on
the travel times and work hours of vehicles and drivers, delivery time windows at the customers, the storage
capacities and current inventory levels of customers, and other constraints dictated by the application. It
may be feasible for a vehicle to perform more than one route per day. For ease of presentation we assume
4
that the duration of a vehicle route is less than the length of a day, so that all vehicles and drivers are
available at the beginning of each day, when the tasks for that day are assigned.
The cost of each decision is known to the supplier. This includes the travel costs cij on the arcs (i, j) of
the distribution network, which may also depend on the amount of product transported along the arc. The
cost of a decision may include the costs incurred at customers’ sites, for example due to product losses during
delivery. If quantity dn is delivered at customer n, the supplier earns a revenue of rn(dn). Because demand is
uncertain, there is often a positive probability that a customer runs out of stock, and thus shortages cannot
always be prevented. Shortages are discouraged with a penalty pn(sn) if the unsatisfied demand at customer
n is sn. Unsatisfied demand is treated as lost demand, and is not backlogged. If the inventory at customer n
is xn at the beginning of the day, and quantity dn is delivered at customer n, then an inventory holding cost
of hn(xn + dn) is incurred. The inventory holding cost can also be modeled as a function of some average
amount of inventory at each customer during the time period. The objective is to choose a distribution
policy that maximizes the expected discounted value (revenues minus costs) over an infinite time horizon.
1.2 Problem Formulation
We formulate the IRP as a discrete time Markov decision process with the following components:
1. The state x is the current inventory at each customer. Thus the state space X is [0, C1] × [0, C2] ×· · · × [0, CN ]. Let Xnt ∈ [0, Cn] denote the inventory level at customer n at time t. Let Xt =
(X1t, . . . , XNt) ∈ X denote the state at time t.
2. The action space A(x) for each state x is the set of all decisions that satisfy the work load constraints,
such that the vehicles’ capacities are not exceeded, and the customers’ storage capacities are not
exceeded after deliveries. Let At ∈ A(Xt) denote the decision chosen at time t. For any decision
a and arc (i, j), let kij(a) denote the number of times that arc (i, j) is traversed by a vehicle while
executing decision a. Also, for any customer n, let dn(a) denote the quantity of product that is delivered
to customer n while executing decision a. The constraint that customers’ storage capacities not be
exceeded after deliveries can be expressed as Xnt + dn(At) ≤ Cn for all n and t, if it is assumed that
no product is used between the time that the inventory level Xnt is measured and the time that the
delivery of dn(At) takes place. If product is used during this time period, it may be possible to deliver
more. The exact way in which the constraint is applied does not affect the rest of the development.
We applied the constraint as stated above.
3. Let Unt denote the demand of customer n at time t. Then the amount of product used by customer
n at time t is given by min{Xnt + dn(At), Unt}. Thus the shortage at customer n at time t is
given by Snt = max{Unt − (Xnt + dn(At)), 0}, and the next inventory level at customer n at time
t + 1 is given by Xn,t+1 = max{Xnt + dn(At) − Unt, 0}. The known joint probability distribution
F of customer demands gives a known Markov transition function Q, according to which transitions
occur. For any state x ∈ X , any decision a ∈ A(x), and any (measurable) subset B ⊆ X , let
U(x, a,B) ≡ {U ∈ N+ : max{xn + dn(a)− Un, 0} ∈ B}
. Then Q[B | x, a] ≡ F [U(x, a,B)]. In other
5
words, for any state x ∈ X , and any decision a ∈ A(x),
P [Xt+1 ∈ B | Xt = x,At = a] = Q[B | x, a] ≡ F [U(x, a,B)]
For discrete demand distributions, let fn(un) denote the probability that the demand of customer n is
un.
4. Let g(x, a) denote the expected single stage net reward if the process is in state x at time t, and decision
a ∈ A(x) is implemented. Then, in terms of the notation introduced above,
g(x, a) ≡∑n
rn(dn(a)) −∑(i,j)
cijkij(a)−∑n
hn(xn + dn(a))
−∑n
EFn
[pn
(max{Un − (xn + dn(a)), 0})]
where EFn denotes expected value with respect to the marginal probability distribution Fn of Un.
5. The objective is to maximize the expected total discounted value over an infinite horizon. Let α ∈ [0, 1)
denote the discount factor. Let V ∗(x) denote the optimal expected value given that the initial state is
x, i.e.,
V ∗(x) ≡ sup{At}∞
t=0
E
[ ∞∑t=0
αtg (Xt, At)
∣∣∣∣∣X0 = x
](1)
The decisions At are restricted such that At ∈ A(Xt) for each t, and At has to depend only on the
history (X0, A0, X1, . . . , Xt) of the process up to time t, i.e., when the decision maker chooses an action
at time t, the decision maker does not know what is going to happen in the future.
A stationary deterministic policy π prescribes a decision a ∈ A(x) based on the information contained in
the current state x of the process only. For any stationary deterministic policy π, and any state x ∈ X , the
expected value V π(x) is given by
V π(x) ≡ Eπ
[ ∞∑t=0
αtg (Xt, π(Xt))
∣∣∣∣∣X0 = x
]
= g(x, π(x)) + α
∫XV π(y) Q[dy | x, π(x)]
From the results in Bertsekas and Shreve (1978) it follows that under conditions that are not very restrictive
(e.g., g bounded and α < 1), to determine the optimal expected value in (1), it is sufficient to restrict
attention to the class Π of stationary deterministic policies. It follows that for any state x ∈ X ,
V ∗(x) = supπ∈Π
V π(x)
= supa∈A(x)
{g(x, a) + α
∫XV ∗(y) Q[dy | x, a]
}(2)
6
A policy π∗ is called optimal if V π∗
= V ∗.
2 Review of Related Research
The long-term dynamic and stochastic control problem presented above is extremely difficult to solve. As
a result, all of the proposed approaches found in the literature have simplified the problem in one way or
another. Table 1 is an attempt to categorize the variants of the inventory routing problem that have been
studied by different researchers and the contributions that they have made. A survey of some of this work can
be found in Federgruen and Simchi-Levi (1995). Thomas and Griffin (1996) review related work addressing
the coordination of various operations in the supply chain, such as production, inventory, and distribution.
The column headings in the table represent some key problem characteristics, which we briefly describe
here. Customer demands, which in most applications are not known to the decision maker before the usage
takes place (or, in conventional inventory management, before the orders are received), have been modeled
as being either deterministic or stochastic. Fleet size, i.e, the number of available vehicles, which is limited
in practice, is sometimes assumed to be unlimited to facilitate the analysis of a proposed policy. Another
key issue is the length of the planning horizon. In applications, the objective is to maximize profit over
a long period of time, and some researchers explicitly model this objective. Other researchers consider a
short horizon problem where they do not take into account what happens after the short horizon over which
they optimize the objective. Some researchers develop a reduced horizon approach in which a short horizon
problem is formulated where the costs are heuristically modified to capture what happens after the short
horizon. Another issue is the number of customers visited on a vehicle trip. In many situations vehicles can
visit multiple customers on a single route. Several researchers have also studied variants in which a single
customer is visited on each route, which is called the direct delivery case. Finally, a distinguishing feature of
research contributions is whether policies or solution methods are presented that specify when to deliver to
each customer, how much to deliver to each customer, and how to deliver to customers, or whether bounds
on the profits (or costs) are presented.
3 Solving the Markov Decision Process
To determine the optimal value function V ∗, and an optimal policy π∗, if such a policy exists, the optimality
equation (2) has to be solved. This requires the following major computational tasks to be performed.
1. Estimation of the optimal value function V ∗. Because V ∗ appears in the left hand side and right hand
side of (2), most algorithms for computing V ∗ involves the computation of successive approximations
to V ∗(x) for every x ∈ X . Clearly, this is practical only if the number of states is small. For the IRP
as formulated in Section 1.2, X may be uncountable. One can discretize X by discretizing the demand
distributions. Conditions under which the solutions obtained with the discretization of X converge to
the solution of (2) have been studied by Bertsekas (1975), Chow and Tsitsiklis (1991), and Kushner
and Dupuis (1992). Even if the demand distributions are discretized, the number of states grows
exponentially in the number of customers. For example, if Z denotes the number of inventory levels
7
Table 1: Characteristics of inventory routing problems considered by various researchers.
Reference Demands Vehicles Horizon Delivery Contribution
Bell et al. (1983) Deterministic Limited Long Multiple PolicyFedergruen and Zipkin (1984) Stochastic Limited Short Multiple PolicyGolden, Assad and Dahl (1984) Stochastic Limited Short Multiple PolicyBlumenfeld et al. (1985, 1991) Deterministic Unlimited Long Direct PolicyBurns et al. (1985) Deterministic Unlimited Long Direct, Multiple PolicyDror, Ball and Golden (1985) Deterministic Limited Short Multiple PolicyDror and Ball (1987) Stochastic Limited Reduced Multiple PolicyCohen and Lee (1988) Stochastic Unlimited Long Direct PolicyBenjamin (1989) Deterministic Unlimited Long Direct PolicyChien, Balakrishnan and Wong (1989) Deterministic Limited Reduced Multiple PolicyAnily and Federgruen (1990) Deterministic Unlimited Long Multiple Bound, PolicyGallego and Simchi-Levi (1990) Deterministic Unlimited Long Direct BoundTrudeau and Dror (1992) Stochastic Limited Reduced Multiple PolicyAnily and Federgruen (1993) Stochastic Unlimited Long Multiple Bound, PolicyChien (1993) Stochastic Unlimited Long Direct PolicyMinkoff (1993) Stochastic Unlimited Long Multiple PolicyPyke and Cohen (1993, 1993) Stochastic Unlimited Long Direct PolicyChandra and Fisher (1994) Deterministic Unlimited Long Multiple PolicyBassok and Ernst (1995) Stochastic Unlimited Short Multiple PolicyDror and Trudeau (1996) Stochastic Unlimited Long Direct PolicyBard et al. (1997) Stochastic Limited Reduced Multiple PolicyBarnes-Schuster and Bassok (1997) Stochastic Unlimited Long Direct Bound, PolicyJaillet et al. (1997) Stochastic Limited Reduced Multiple PolicyCampbell et al. (1998) Deterministic Limited Short Multiple PolicyChan, Federgruen and Simchi-Levi (1998) Deterministic Unlimited Long Multiple Bound, PolicyChristiansen and Nygreen (1998a, 1998b,1999) Deterministic Limited Long Multiple PolicyBerman and Larson (1999) Stochastic Unlimited Short Multiple PolicyFumero and Vercellis (1999) Deterministic Limited Long Multiple PolicyReiman, Rubio and Wein (1999) Stochastic Limited Long Direct, Multiple PolicyCetinkaya and Lee (2000) Stochastic Unlimited Long Multiple PolicyKleywegt, Nori and Savelsbergh (2000) Stochastic Limited Long Direct, Multiple Policy
8
at each customer, then the number of states |X | = ZN . Thus, even with discrete inventory levels, the
state space X is far too large to compute V ∗(x) for every x ∈ X in reasonable time if there are more
than about four customers.
2. Estimation of the expected value (integral) in (2). For many applications, this is a high dimensional
integral, which requires a lot of computational effort to compute accurately. In the case of the IRP,
the number of dimensions is equal to the number of customers, which can be as much as several
hundred. Conventional numerical integration methods are not practical for the computation of such
high dimensional integrals.
3. The maximization problem on the right hand side of (2) has to be solved to determine an optimal
decision for each state. This maximization problem may be easy or hard, depending on the application.
In the case of the IRP, the optimization problem on the right hand side of (2) is very hard, because
the vehicle routing problem, which is NP-hard, is a special case.
There are several conventional algorithms for solving Markov decision processes; see for example Bert-
sekas (1995) and Puterman (1994). These algorithms are practical only if the computational tasks discussed
above are easy to perform. As mentioned, these requirements are not satisfied by practical inventory routing
problems, as the state space X is usually extremely large, the expected value is hard to compute, and the
optimization problem on the right hand side of (2) is hard to solve.
Our approach is to develop efficient dynamic programming based approximation methods to perform these
computations. The first motivation for using approximation methods is the computational complexity of the
IRP outlined above. A motivation for using specifically dynamic programming based approximation methods
is as follows. Suppose V ∗ is approximated by V such that∥∥∥V ∗ − V
∥∥∥∞≤ ε, that is,
∣∣∣V ∗(x) − V (x)∣∣∣ ≤ ε for
all x ∈ X . Choose policy π ∈ Π such that
g(x, π(x)) + α
∫XV (y) Q[dy | x, π(x)] ≥ sup
a∈A(x)
{g(x, a) + α
∫XV (y) Q[dy | x, a]
}− δ
for all x ∈ X , that is, the objective value of decision π(x) is within δ of the optimal objective value using
approximating function V on the right hand side of the optimality equation (2). Then
V π(x) ≥ V ∗(x)− 2αε + δ
1− α
for all x ∈ X , that is, the value function V π of policy π is close to the optimal value function V ∗.
The application of our proposed method to the IRP with Direct Deliveries is discussed in the next section.
4 The IRP with Direct Deliveries
In the remainder of the paper we consider the special case of the IRP in which only one customer is visited
on each vehicle route. This special case of the IRP is called the IRP with Direct Deliveries (IRPDD). The
reasons why the IRPDD is of interest are discussed next.
9
If the storage capacities and demands of the customers are sufficiently large relative to the vehicle capacity,
and the inventory holding cost is low relative to the transportation cost, then it is often optimal to deliver
full vehicle loads or nearly full vehicle loads to customers. Gallego and Simchi-Levi (1990) analyzed a
single-depot/multi-customer distribution system with constant (deterministic) demand rates, in which no
shortages or backlogs were allowed. Customer storage capacities were not constrained. Transportation cost
proportional to the total distance traveled, a linear inventory holding cost, and ordering costs were taken
into account. They assumed availability of an unlimited number of vehicles with limited capacity. They
studied conditions under which direct delivery is an efficient policy. A lower bound on the long-run average
cost over all policies was derived, by adding a lower bound on the average inventory holding and ordering
costs, using a traditional economic order quantity model, and a lower bound on the long-run transportation
costs, obtained from the model of Haimovich and Rinnooy Kan (1985). An upper bound was derived on
the average cost of a particular direct delivery policy as a function of the economic order quantities (EOQ)
of the customers. It was concluded that the effectiveness (the ratio of the infimum of long-run average cost
over all policies to the long-run average cost of the direct delivery policy) is large (e.g., at least 94%) when
the EOQ of all customers is large relative to the vehicle capacity (e.g., at least 71%).
Barnes-Schuster and Bassok (1997) studied a single-depot/multi-customer distribution system with ran-
dom demands over an infinite horizon. Customer storage capacities were constrained. Linear inventory
holding costs and transportation costs between the depot and the retailers were incorporated. The fleet
size was assumed to be unlimited, but vehicle capacities were limited. The objective was to study the cost
effectiveness of using a particular direct delivery policy. The policy delivers as many full truck loads at a
customer as the remaining capacity at the customer can accommodate. A lower bound was obtained on
the expected long-run average cost per period as a sum of the expected inventory holding cost, using an
infinite horizon newsvendor problem, and the expected transportation cost, extending the bound developed
by Haimovich and Rinnooy Kan (1985) for one retailer and a single period. The policy of direct delivery
with full truck loads was simulated and compared with the lower bound. The results indicate that the policy
performs well in situations in which truck sizes are close to the means of the customer demand distributions.
The formulation of the IRPDD is the same as the formulation of the IRP in Section 1.2 except for the
following.
1. The action space A(x) for each state x is the set of all decisions consisting of routes that visit only one
customer on a route, and that satisfy the work load, time window, and capacity constraints as before.
Each decision a consists of individual customer itineraries an, n = 1, . . . , N . Itinerary an denotes the
number of visits to customer n by each vehicle and the amount of product delivered at customer n by
each vehicle. Let tn denote the amount of time required per vehicle route from the supplier to customer
n and back.
2. The transportation costs can now be associated with the individual customers, instead of with the
routes on the network. For example, if cn denotes the transportation cost for traveling from the
supplier to customer n and back, and vn(an) denotes the number of times that customer n is visited
10
by a vehicle while executing itinerary an, then
g(x, a) ≡N∑n=1
{rn(dn(an))− cnvn(an)− hn(xn + dn(an))− EFn
[pn
(max{Un − (xn + dn(an)), 0})]}
(3)
Although the hard routing and delivery quantity decisions of the IRP become much easier if only one
customer is visited on each vehicle route, the IRPDD is still a hard problem to solve if there are more than
about four customers and a limited number of vehicles, due to the number of states growing exponentially in
the number of customers. To illustrate the effect of this rapid growth, a number of instances of the IRPDD
were solved to optimality using the modified policy iteration algorithm. All instances had Cn = 10 for all
customers n, fn(u) = 1/10 for all customers n and u = 1, . . . , 10, CV = 5, and α = 0.98. Table 2 shows the
rapid growth in computation times on a 166MHz Pentium PC as the number of customers increases.
Table 2: Computation time to find the optimal solution for some instances of the IRPDD.Instance
Customers Vehicles Time (s)2 1 33 2 9004 3 86400
Because direct deliveries are important in practice, as well as to study approximation methods for the
first two computational tasks discussed in Section 3 without being hampered by hard routing problems, we
investigated the IRPDD first before moving on to the more general IRP.
5 Approximating the Value Function
5.1 A Decomposition Approximation
The first major task is the construction of an approximation V to the optimal value function V ∗. Our
approximation is based on a decomposition of the IRPDD into individual customer subproblems, motivated
as follows. From (3) it follows that
g(x, a) =N∑n=1
gn(xn, an)
where
gn(xn, an) ≡ rn(dn(an))− cnvn(an)− hn(xn + dn(an))− EFn
[pn
(max{Un − (xn + dn(an)), 0})] (4)
The only consideration that prevents the exact decomposition of the IRPDD into individual customer sub-
problems, is the limited number of vehicles that have to be assigned to customers each time period. The
11
challenge is to incorporate this dependence between customers in a computationally tractible way.
Consider any policy π ∈ Π. In general, the chosen decision under policy π depends on the state x, and
thus the inventory levels at all the customers. Let πn(x) denote the itinerary associated with customer n
chosen under policy π when the state is x. Assume that the demand distribution and thus the state space Xare discrete. Let νπ(x) denote the stationary probability of state x under policy π, assuming the existence
of unique stationary probabilities under policy π. Then, given the current inventory level xn and delivery
quantity en at customer n, the probability qn(yn,mn|xn, en) that under policy π, at the beginning of the
next day the inventory level at customer n is yn, and customer n is visited mn times by a vehicle, is given
by
qn(yn,mn|xn, en) =
∑{s∈X : sn=xn,dn(πn(s))=en} ν
π(s)∑
{z∈X : zn=yn,vn(πn(z))=mn}Q[z | s, π(s)]∑{s∈X : sn=xn,dn(πn(s))=en} νπ(s)
(5)
if the denominator is positive, and qn(yn,mn|xn, en) = 0 if the denominator is 0. The choice of policy π and
the estimation of qn(yn,mn|xn, en) are discussed later. With these probabilities qn(yn,mn|xn, en) we define
the following MDP for each customer n.
1. State (xn,mn) denotes that the inventory level at customer n is xn and customer n can be visited up
to mn times by a vehicle. Let (Xnt,Mnt) denote the state at time t, and let Xn denote the state space
of the MDP associated with customer n.
2. The set An(xn,mn) of admissible actions an when the state is (xn,mn), is the dispatching of up to mnvehicle trips to customer n, and the delivery of amounts of product constrained by the vehicle capacity
CV and the customer storage capacity Cn. Let Ant denote the decision at time t.
3. The transition probabilities are as follows.
P [(Xn,t+1,Mn,t+1) = (yn, kn) | (Xnt,Mnt) = (xn,mn), Ant = an] = qn(yn, kn|xn, dn(an))
4. The expected net reward per stage, given state (xn,mn) and action an, is gn(xn, an), as in (4).
5. The objective is to maximize the expected total discounted value over an infinite horizon. Let
V ∗n (xn,mn) denote the optimal expected value given that the initial state is (xn,mn), i.e.,
V ∗n (xn,mn) ≡ sup
{Ant}∞t=0
E
[ ∞∑t=0
αtgn (Xnt, Ant)
∣∣∣∣∣ (Xn0,Mn0) = (xn,mn)
]
The actions Ant are again constrained to be feasible and nonanticipatory.
The optimal values V ∗n (xn,mn) of the individual customer MDPs are easily computed, because the state
spaces of the individual customer MDPs are much smaller than the state space of the IRPDD.
The next issue to be addressed is, given a state x = (x1, . . . , xN ) ∈ X of the IRPDD, how to combine the
optimal values V ∗n (xn,mn) of the individual customer MDPs to find a good approximation V (x) to V ∗(x).
To do that, appropriate values of mn has to be chosen for each n, that is, the fleet capacity has to be assigned
12
to the individual customers. The approximate value V (x) is calculated by assigning the available work time
of the M vehicles to the N customers to maximize the total value given by the resulting individual customer
MDPs. That is, the approximate value V (x) is given by the optimal value of the following nonlinear knapsack
problem.
V (x) ≡ maxw=(w1,... ,wN )∈ZZN
+
N∑n=1
V ∗n (xn, wn)
s.t.N∑n=1
tnwn ≤ MT (6)
Recall that tn denotes the amount of time required per vehicle route from the supplier to customer n and
back, M denotes the number of vehicles in the fleet, and T denotes the maximum amount of work time per
vehicle per time period. The nonlinear knapsack problem is easily solved using dynamic programming.
Although the resulting vehicle assignment may constitute a good decision, the knapsack problem (6) is
primarily solved to obtain the approximate values V (y), and the decision π(x) is given by a maximizer in
the optimality equation, using V to approximate the values of future states, as follows.
π(x) ∈ arg maxa∈A(x)
g(x, a) + α
∑y∈X
Q[y | x, a] V (y)
(7)
This method can also be interpreted as a multistage lookahead method, whereby the knapsack problem
is solved to determine the tentative decision at the second stage, and the optimal value functions of the
individual customer MDPs give the objective function for the knapsack problem to take into account the
expected net reward from the second stage onwards.
5.2 An Algorithm
The development in Section 5.1 assumed that the conditional probabilities qn(yn,mn|xn, en) are known.
Computing the probabilities qn(yn,mn|xn, en) exactly using (5) is almost as hard as solving the IRPDD,
because the stationary probabilities νπ(x) have to be computed for all x ∈ X . Since qn(yn,mn|xn, en) is a
five dimensional parameter (with dimensions corresponding to n, yn, mn, xn, and en), if there are more than
about five customers, then the number of probabilities qn(yn,mn|xn, en) is usually less than the number
|X | of states. Thus one may attempt to estimate the probabilities qn(yn,mn|xn, en) without computing the
stationary probabilities νπ(x). One straightforward method to do this is to simulate the IRPDD process
under policy π. Let qnt(yn,mn|xn, en) denote the estimate of qn(yn,mn|xn, en) after t transitions of the
simulation. One method for updating the estimates qnt(yn,mn|xn, en) is as follows. Let Nnt(xn, en) denote
the number of times that customer n has been in state xn and quantity en has been delivered at customer
n by transition t of the simulation. Then
qn,t+1(yn,mn|xn, en) =
13
(Nn0(yn,mn|xn, en) + Nnt(xn, en))qnt(yn,mn|xn, en) + 1Nn0(yn,mn|xn, en) + Nnt(xn, en) + 1
if Xnt = xn and dn(πn(Xt)) = en and Xn,t+1 = yn and vn(πn(Xt+1)) = mn(Nn0(yn,mn|xn, en) + Nnt(xn, en))qnt(yn,mn|xn, en)
Nn0(yn,mn|xn, en) + Nnt(xn, en) + 1
if Xnt = xn and dn(πn(Xt)) = en and (Xn,t+1 �= yn or vn(πn(Xt+1)) �= mn)
qnt(yn,mn|xn, en)if Xnt �= xn or dn(πn(Xt)) �= en
(8)
where Nn0(yn,mn|xn, en) represents a weight, equivalent to Nn0(yn,mn|xn, en) observations, assigned to
the initial estimate qn0(yn,mn|xn, en). It follows from results for Markov chains (Meyn and Tweedie
1993) that if the Markov chain under policy π has a unique stationary probability distribution νπ , then
qnt(yn,mn|xn, en)→ qn(yn,mn|xn, en) as t → ∞ with probability 1 for all inventory levels xn and delivery
quantities en that occur infinitely often, i.e., for all inventory levels xn and delivery quantities en such that
Nnt(xn, en) → ∞ as t → ∞. Convergence with probability 1 can also be established for other update
methods.
However, for most applications the number of probabilities qn(yn,mn|xn, en) is far too large to estimate
accurately in reasonable time using simulation. To resolve this dilemma, we use the following approach.
The conditional probability pn(mn|yn) that customer n is visited by mn vehicles under policy π, given
that the inventory level at customer n is yn, is given by
pn(mn|yn) =
∑{x∈X : xn=yn,vn(πn(x))=mn} ν
π(x)∑{x∈X : xn=yn} νπ(x)
if the denominator is positive, and pn(mn|yn) = 0 if the denominator is 0. The number of probabilities
pn(mn|yn) is much fewer than the number of probabilities qn(yn,mn|xn, en). The probabilities pn(mn|yn)
can be estimated by simulating the IRPDD process under policy π. Let pnt(mn|yn) denote the estimate of
pn(mn|yn) after t transitions of the simulation. The estimates pnt(mn|yn) can be updated similarly to the
estimates qnt(yn,mn|xn, en) in (8), as follows. Let Nnt(yn) denote the number of times that customer n has
been in state yn by transition t of the simulation. Then
pn,t+1(mn|yn) =
(Nn0(mn|yn) + Nnt(yn))pnt(mn|yn) + 1Nn0(mn|yn) + Nnt(yn) + 1
if Xnt = yn and vn(πn(Xt)) = mn
(Nn0(mn|yn) + Nnt(yn))pnt(mn|yn)Nn0(mn|yn) + Nnt(yn) + 1
if Xnt = yn and vn(πn(Xt)) �= mn
pnt(mn|yn) if Xnt �= yn
(9)
Similar convergence results as for qnt(yn,mn|xn, en) in (8) hold for pnt(mn|yn) in (9).
Then an estimate qnt(yn,mn|xn, en) for qn(yn,mn|xn, en) is obtained as follows.
qnt(yn,mn|xn, en) =
{fn(xn + en − yn)pnt(mn|yn) if yn > 0∑∞un=xn+en
fn(un)pnt(mn|yn) if yn = 0(10)
14
In general, the estimates in (10) are not the same as those given in (8). However, the estimates in (10) are
much easier to compute than those in (8), and in numerical tests the estimates were very close to each other.
Now the building blocks are in place to state the first approximation procedure for the IRPDD, given in
Algorithm 1.
Algorithm 1 Approximation Algorithm for IRPDD.1. Start with an initial policy π0. Set i← 0.
2. Repeat steps 3 through 6 for a chosen number of iterations, or until a convergence test is satisfied.
3. Simulate the IRPDD under policy πi to estimate the probabilities pn(mn|yn).4. With the updated estimates of the probabilities pn(mn|yn), formulate and solve the updated individual
customer MDPs.
5. Policy πi+1 is defined by (7), where V is given by (6) with the updated individual customer valuesV ∗n (xn,mn).
6. Increment i← i + 1.
5.3 Parametric Value Function Approximations
One may attempt to improve the approximation described in Section 5.1 by introducing parameters β
into the value function approximation V (x, β). One type of parametric value function approximation with
computational advantages is a function
V (x, β) = β1φ1(x) + · · ·+ βKφK(x) (11)
that is linear in the parameters β, where the φks are chosen basis functions. Van Roy et al. (1997) used
a similar approach to develop an approximation method for a retailer inventory management problem that
was introduced by Nahmias and Smith (1994). Parametric value function approximations are discussed in
detail in Bertsekas and Tsitsiklis (1996).
When using this approach, the parameters β have to be chosen as well. We discuss two approaches for
obtaining parameters β. The first approach is as follows. Consider any policy π ∈ Π with unique stationary
probabilities νπ(x). An appealing idea is to choose the parameters β in such a way that V approximates V π
“as well as possible”. One way to do this is to choose β to solve the following optimization problem.
minβ
∑x∈X
νπ(x)[V π(x)− V (x, β)
]2
(12)
This problem looks like a weighted least squares regression problem, except that νπ and V π are unknown.
Tsitsiklis and Van Roy (1997) showed that if V (x, β) is linear in the parameters β, and other conditions
(given later) hold, then the following stochastic approximation method can be used to compute the optimal
solution β∗ of (12). Suppose the IRPDD process under policy π is simulated. Let βt denote the estimate of
15
the parameters after transition t of the simulation. Then the parameter estimates βt are updated as follows.
βt+1 = βt + γtdtzt
where γt is the step size at iteration t,
dt = g(Xt, π(Xt)) + αV (Xt+1, βt)− V (Xt, βt)
is the so-called temporal difference, or
dt = g(Xt, π(Xt)) + α∑y∈X
V (y, βt) Q[y | Xt, π(Xt)]− V (Xt, βt)
is the expected temporal difference,
zt = αλzt−1 +∇β V (Xt, βt)
is the so-called eligibility vector, λ ∈ [0, 1] is a memory parameter, and ∇β V (Xt, βt) is the gradient of V
with respect to β evaluated at (Xt, βt). If V is linear in β, as in (11), then ∇β V (Xt, βt) has components
∂V (Xt, βt)/∂βk = φk(Xt) for k = 1, . . . ,K. If (1) |X | < ∞, (2) the Markov chain under policy π is
aperiodic with one recurrent class, (3) V (x, β) is linear in the parameters β, (4) the basis functions φk
restricted to the set of recurrent states are linearly independent, (5) the step sizes γt satisfy∑∞t=0 γt = ∞
and∑∞t=0 γ
2t < ∞, and (6) λ = 1, then the parameters βt converge to the optimal solution β∗ of (12) as
t → ∞ with probability 1. A disadvantage of stochastic approximation methods is that the convergence of
the parameters βt is notoriously slow.
Another approach for obtaining parameters β is as follows. The value function V π of a policy π ∈ Π
satisfies
V π(x) = g(x, π(x)) + α∑y∈X
V π(y) Q[y | x, π(x)] (13)
Again assume that π has unique stationary probabilities νπ(x). Then it seems appealing to choose the
parameters β to minimize the weighted discrepancy between the left hand side and right hand side of (13).
Thus, the parameters are chosen to be an optimal solution β∗ of
minβ
∑x∈X
νπ(x)
V (x, β) −
g(x, π(x)) + α
∑y∈X
V (y, β) Q[y | x, π(x)]
2
(14)
This approach is called the Bellman error method.
If V (x, β) is linear in the parameters β, then the corresponding parameter estimates βt can be computed
as follows. Let φ(x) ≡ (φ1(x), . . . , φK(x))T , and let ψ(x) ≡ φ(x) − α∑y∈X φ(y) Q[y | x, π(x)]. Then the
16
optimization problem (14) can be written
minβ
∑x∈X
νπ(x)[ψ(x)T β − g(x, π(x))
]2(15)
This problem also looks like a weighted least squares regression problem, except that νπ is unknown. Let ψ
denote the |X | ×K matrix with rows given by ψ(x)T , let Y denote the |X | × 1 matrix with elements given
by g(x, π(x)), and let ∆π denote the |X |× |X | diagonal matrix with diagonal elements given by νπ(x). Then
any solution β of ψT∆πψβ = ψT∆πY is an optimal solution of (15). If the columns of ∆πψ are linearly
independent (which should be the case if the basis functions φk are well chosen), then ψT∆πψ is positive
definite, and the optimal solution β∗ of (15) is unique.
To overcome the obstacle that νπ, and thus also ∆π, are unknown, one can simulate the IRPDD process
under policy π, and use the following result for Markov chains (Meyn and Tweedie 1993). If the Markov
chain has a single positive recurrent class with stationary probability distribution ν, then for any function
f : X �→ IR such that∫X |f(x)|dν(x) <∞, it holds that, with probability 1,
∑tτ=1 f(Xτ )/t→
∫X f(x)dν(x)
as t → ∞. To apply this result to (15), define K(K + 1)/2 functions fij(x) ≡ φi(x)φj(x) and K functions
gi(x) ≡ φi(x)g(x, π(x)). Then ψT∆πψ is the matrix with element (i, j) equal to∫X fij(x)dνπ(x), and
ψT∆πY is the vector with element i equal to∫X gi(x)dνπ(x). The sample averages
∑tτ=1 fij(Xτ )/t and∑t
τ=1 gi(Xτ )/t are easily computed, as follows. Let F0 ≡ 0 be a K × K matrix, and let Ft+1 ≡ Ft +
ψ(Xt)ψ(Xt)T . Then element (i, j) of Ft is equal to∑tτ=1 fij(Xτ ). Also, let G0 ≡ 0 be a K × 1 matrix,
and let Gt+1 ≡ Gt + ψ(Xt)g(Xt, π(Xt)). Then element i of Gt is equal to∑tτ=1 gi(Xτ ). It follows from the
above result for Markov chains that Ft/t → ψT∆πψ and that Gt/t → ψT∆πY as t → ∞. Let βt be any
solution of the system of linear equations Ftβt = Gt. Then the distance between βt and the set of optimal
solutions of (14) converges to zero as t→∞. Furthermore, if the columns of ∆πψ are linearly independent,
then for sufficiently large t, Ft is positive definite, and the unique solution βt of Ftβt = Gt converges to the
unique optimal solution β∗ of (14).
The optimal solutions of (12) and (14) are not the same in general. One can also formulate multistage
Bellman error objective functions, the optimal solutions of which can be shown to be close to the optimal
solutions of (12). However, from our computational experience for the IRPDD, the optimal solutions of (14)
are close to the optimal solutions of (12), and the optimal solutions of (14) combined with the proposed
basis functions provide good policies, as illustrated in Section 7.2. The Bellman error method has the
advantage that the parameter estimates βt converge much faster to β∗ than with stochastic approximation.
The objective of (14) may not seem quite as appealing as the objective of (12). Rewriting the objective
function of (14) as∑x∈X νπ(x)
[(V (x, β) − α
∑y∈X V (y, β) Q[y | x, π(x)]
)− g(x, π(x))
]2
, it follows that
this objective chooses β in such a way that V (x, β) − α∑y∈X V (y, β) Q[y | x, π(x)] is close to the expected
single stage net reward g(x, π(x)). In contrast, (12) chooses β such that V (x, β) is close to V π(x), which
seems more appealing, especially in the light of the approximation results in Section 3.
Van Roy et al. (1997) proposed an approximation with basis functions φk chosen as first and second degree
polynomials of “features” of x, for their inventory management problem. The resulting policies performed
better than an order-up-to heuristic. We tested such an approximation for the IRPDD with the φks chosen
17
as first and second degree polynomials of x. The performance of the resulting policies was quite poor.
One can combine the decomposition approximation and the parametric approximation to obtain an
approximate value V (x, β) for any given state x ∈ X , where
V (x, β) ≡ β0 +N∑n=1
βnV∗n (xn, w∗
n(x)) (16)
where w∗(x) = (w∗1(x), . . . , w∗
N (x)) is an optimal solution of the nonlinear knapsack problem (6). It is shown
in Section 7 that the policies π based on using the approximation V (x, β) in (16) in the right hand side of (7),
gave excellent numerical results. A procedure that can be used to compute V (x, β) and the resulting policies
π is given in Algorithm 2.
Algorithm 2 Procedure for computing V (x, β) and π.1. Start with an initial policy π0. Set i← 0.
2. Simulate the IRPDD under policy π0 to estimate the probabilities pn(mn|yn).
3. Formulate and solve the individual customer MDPs.
4. Policy π1 is defined by (7), where V is given by (6).
5. Repeat steps 6 through 9 for a chosen number of iterations, or until a convergence test is satisfied.
6. Increment i← i + 1.
7. Simulate the IRPDD under policy πi to update the estimates of the probabilities pn(mn|yn) and theparameters β.
8. With the updated estimates of the probabilities pn(mn|yn), formulate and solve the updated individualcustomer MDPs.
9. Policy πi+1 is given by (7), where V is given by (16) with the updated parameters β and individualcustomer values V ∗
n (xn,mn).
6 Estimation of the Expected Value and Optimal Action
The second major computational task discussed in Section 3 is the estimation of the expected value on
the right hand side of (2) or (7). In the case of the IRPDD, the expected value is a multidimensional
integral with the number of dimensions equal to the number of customers. Conventional deterministic
numerical integration methods can be used to estimate the expected value. A popular approach is to use the
Newton-Cotes formulas; specific examples of these include Euler’s rule, the trapezoid rule, and Simpson’s
rules. Computing the expected value of a multidimensional discrete distribution corresponds to Euler’s rule.
Randomized (Monte Carlo) methods can also be used to estimate the expected value.
The computational efficiency of these methods is a relevant issue. Many deterministic numerical inte-
gration methods construct a grid on the space to be integrated over, and compute the integrand values at
the grid points. Let Z denote the number of grid points per dimension, and let d (= N) denote the number
18
of dimensions. Then the total number of integrand values computed is given by n = Zd. The error of
many of these methods is O(Z−c) = O(n−c/d), where c is a constant that depends on the specific method
(Stroud 1971). For example, for the trapezoid rule c = 2, and for Simpson’s 1/3 rule c = 4 (Mustard,
Lyness and Blatt 1963). One measure for the comparison of the accuracy of different methods is the mean
square error (MSE) as a function of the number of integrand evaluations n. For the deterministic methods
discussed above, MSE = error2 = O(n−2c/d). For randomized methods using simple random sampling, MSE
= Variance = O(n−1). It follows that simple random sampling tends to give better performance than the
deterministic methods (at least for large values of n), if d > 2c. For example, simple random sampling is
more efficient than the trapezoid rule if d > 4, and simple random sampling is more efficient than Simpson’s
1/3 rule if d > 8. Also, for large values of d (d > 20, say), the number of grid points n = Zd becomes too
large to evaluate the integrand at, even with Z = 2, so that conventional deterministic methods are not
practical at all. Thus randomized methods are preferred for estimating the expected value on the right hand
side of (7) for instances of the IRP with a large number of customers.
The use of randomized methods raises a number of related questions.
1. What sample size n should be used?
2. Since the objective function on the right hand side of (7) is estimated with a random estimator with
error, how should the action be chosen?
3. What performance guarantees can be given for the action chosen after the objective value has been
randomly estimated?
These questions have been widely studied in the statistics and stochastic optimization areas.
For the IRPDD, we followed the approach proposed by Nelson and Matejcik (1995). Suppose the current
state is x. Let the actions in A(x) be numbered a = 1, . . . , k, where k = |A(x)|. Let Yaj denote random
observation j of the right hand side of (7) under action a ∈ A(x). That is,
Yaj = g(x, a) + αV (Xj) (17)
where state Xj is randomly generated from distribution Q[ · | x, a]. Let Yj ≡ (Y1j , . . . , Ykj). It is assumed
that Y1, Y2, . . . are i.i.d. normally distributed with unknown mean µ and unknown covariance matrix Σ. Thus,
given the current state x, µa is the value of action a on the right hand side of (7), µa = g(x, a)+α∑y∈X Q[y |
x, a] V (y). To get observations Yj that are approximately normally distributed, we used batch means as
observations and relied on the central limit theorem. It is also assumed that Σ has the sphericity property.
Sphericity implies that Var[Yaj−Ybj ] is the same for all actions a, b ∈ A(x), a �= b. Nelson and Matejcik (1995)
presented evidence that their method (given in Algorithm 3) is robust with respect to deviations from
sphericity as long as the covariances σab between Yaj and Ybj is nonnegative, which one would expect to hold
when using common random numbers for computing Y1j , . . . , Ykj .
If the assumptions stated above are satisfied, then whenever µb ≥ µa + δ for all a ∈ A(x)\{b}, it holds that
P[
¯Y b· > ¯Y a·, ∀ a ∈ A(x)\{b}]≥ 1− α
19
Algorithm 3 Procedure Nelson-Matejcik1. Choose confidence coefficient α (not the discount factor), tolerance δ, and initial sample size n0. Let
g = T(α)k−1,(k−1)(n0−1),1/2, an equicoordinate critical point of the equicorrelated multivariate central t-
distribution.
2. Generate an i.i.d. sample Y1, Y2, . . . , Yn0 .
3. Compute Ya· =∑n0j=1 Yaj/n0, Y·j =
∑ka=1 Yaj/k, and Y·· =
∑ka=1
∑n0j=1 Yaj/(kn0). Compute the sample
variance S2 of Yaj − Ybj (assuming sphericity), given by
S2 =2∑ka=1
∑n0j=1
(Yaj − Ya· − Y·j + Y··
)2
(k − 1)(n0 − 1)
4. Update the required sample size to n1 = max{n0, �(gS/δ)2�}.5. Generate n1 − n0 additional i.i.d. observations Yn0+1, . . . , Yn1 .
6. Compute the overall sample means ¯Y a· =∑n1j=1 Yaj/n1 for each a ∈ A(x).
7. Select the action a with the largest value of ¯Y a·.
In other words, the probability is at least 1 − α that an action a is selected with value µa that is within
tolerance δ of the best value µb.
Note that if variance reduction techniques are used to reduce E[S2] = Var[Yaj − Ybj ], then the required
sample size n1 = max{n0, �(gS/δ)2�} will on average be smaller for fixed values of α and δ, or conversely,
the confidence level 1−α can be increased and/or the tolerance δ can be decreased with a fixed sample size
n1. The following variance reduction techniques reduced the required sample size for the IRPDD with fixed
values of α and δ:
1. common random numbers,
2. stratified sampling,
3. latin hypercubes,
4. orthogonal arrays.
An example of the reduction in required sample size is given in Section 7.
The next computational issue that has to be addressed is the fact that there are a large number of
actions in A(x) to choose from if there are many vehicles and customers. If each vehicle can visit at most
one customer per day, then the number of actions k = |A(x)| =(M+NM
). Comparing all
(M+NM
)actions
requires too much computational effort for large values of M and N , and thus there is a need for a more
computationally efficient method. The greedy method given in Algorithm 4 produced optimal actions for all
the states of all the instances tested, as discussed in Section 7.
For ease of presentation, Procedure Greedy is stated here for the case where each vehicle can visit at most
one customer per day. The extension of Procedure Greedy to the case where vehicles can visit more than
one customer per day is straightforward.
20
Algorithm 4 Procedure Greedy1. Let x1 denote the current state (inventory level at each customer). Set m← 1.
2. Repeat steps 3 through 4 for each vehicle m = 1, . . . ,M .
3. Dispatch vehicle m to maximize the right hand side of the optimality equation (7). That is, choose thecustomer nm to send vehicle m to, and the quantity dm to deliver at customer nm, as follows.
(nm, dm) ∈ arg max(n,d)∈A(xm)
g(xm, (n, d)) + α
∑y∈X
Q[y | xm, (n, d)] V (y)
(18)
where A(xm) ≡ {(n, d) : n ∈ {0, 1, . . . , N}, d ∈ {0, 1, . . . , Cn − xmn }}. If nm = 0 it indicates that vehiclem is not dispatched to any customer (and C0 = 0). Algorithm 3 can be used to select the decision(nm, dm).
4. Update:
xm+1n =
{xmn + dm if n = nm
xmn if n �= nm(19)
Set m← m + 1.
7 Computational Results
To test the viability of our proposed dynamic programming approximation method for the IRPDD and to
fine-tune and improve its efficiency, we have conducted a variety of computational experiments.
7.1 Algorithm Efficiency
In the previous sections, we have proposed several algorithms to approximate the optimal value function
V ∗, the expected value, and an optimal action, in the right hand side of the optimality equation (2). In this
section, we test the efficiency of these algorithms and the quality of the solutions produced.
One type of approximation for the optimal value function V ∗ involves a parametric value function
V (x, β) ≡ β0 +∑Nn=1 βnV
∗n (xn, w∗
n(x)), where V ∗n is the optimal value function of the single customer
MDP for customer n and w∗(x) = (w∗1(x), . . . , w∗
N (x)) is an optimal solution of the nonlinear knapsack
problem (6). We have outlined two approaches for obtaining parameters β: the stochastic approximation
method and the Bellman error method.
The convergence rates of the stochastic approximation parameter estimates are affected significantly by
the rule for choosing the step sizes γt. We experimented with two different step size rules for stochastic
approximation. Rule 1 is γt = c1/(c2 + t), where c1 and c2 are chosen (typically large) constants. This is a
slight modification of γt = 1/t, the step size rule frequently given in the literature. Rule 2 is a variant of the
step size rule analyzed by Ruszczynski and Syski (1986), and is given by
γt = min {γ, γt−1 exp[min{η,−αut}]}
where γ0 > 0 and α, η, γ are chosen positive (typically small) constants. The quantity ut ≡ 〈ξt,∆βt〉, where
21
〈·, ·〉 denotes an inner product, ∆βt ≡ βt − βt−1, and ξt is a stochastic subgradient estimate of the convex
function that is to be minimized. For (12), ξt is given by the negative of the product of the temporal
difference and the eligibility vector, ξt = −dtzt. The convergence of the parameter estimates using each of
the two step size rules is shown in Figure 1. The figure shows that the parameter estimates converge much
faster with step size rule 2 than with step size rule 1. Observe too that in both cases the parameter estimates
first move away before moving towards their optimal values.
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 5 10 15 20 25 30 35 40
Par
amet
er V
alue
s
Time (hours)
Rule 2
Rule 1
Figure 1: Convergence behavior of the parameter estimates βt1 for stochastic approximation using twodifferent step size rules for instance opt12. For Rule 1, c1 = 109 and c2 = 1014. For Rule 2, α = 10−2,η = 10−2 and γ = 10−5.
An alternative to the stochastic approximation method is the Bellman error method. We have exper-
imented with both methods and found that the parameter estimates βt converged much faster with the
Bellman error method than with the stochastic approximation method, even when using step size rule 2
discussed above. An example of this behavior is presented in Figure 2. Note that the parameter estimates
were initialized close to their optimal values for the stochastic approximation method, and in spite of that the
parameter estimates converged much quicker with the Bellman error method, for which parameter estimates
are not initialized.
Also, to improve numerical behavior, the data in all the instances used in this computational study were
scaled appropriately so that the components of the gradient ∇β V (x, β) had the same order of magnitude.
Specifically, after the optimal value function V ∗n has been computed for each individual customer MDP,
the average values θn ≡∑
(xn,mn)∈XnV ∗n (xn,mn)/|Xn| are determined. The value function is rewritten
V (x, β) = β0 +∑Nn=1 βnθnV
∗n (xn, w∗
n(x))/θn =∑Nn=0 βnφn(x) ≡ V (x, β), where θ0 ≡ 1, βn ≡ βnθn,
φ0(x) ≡ 1, and φn(x) ≡ V ∗n (xn, w∗
n(x))/θn for n = 1, . . . , N . Thus the components of the gradient∇β V (x, β)
have the same order of magnitude, because ∂V (x, β)/∂βk = φk(x) ≈ 1 ≈ φl(x) = ∂V (x, β)/∂βl. The scaling
22
0.5
1
1.5
2
2.5
3
3.5
0 4 8 12 16 20 24 28
Par
amet
er V
alue
s
Time (hours)
Bellman Error Method
StochasticApproximation
Figure 2: Convergence behavior of the parameter estimates βt1 for the Bellman error method and for thestochastic approximation method for instance opt12. For the stochastic approximation method, Rule 2 wasused with α = 10−2, η = 10−2 and γ = 10−5.
causes the changes βt+1− βt = γtdtzt in the parameter estimates from one iteration to the next to be of the
same order of magnitude for the different parameters.
Another important aspect of our approach is the use of random sampling to estimate the expected value
in the optimality equation (2). Variance reduction techniques played a significant role in improving the
efficiency of these estimates. Variance reduction techniques can be used to either improve, for a given
sample size, the accuracy of the random estimators ¯Y a· of the value µa of action a on the right hand side
of (7) and of the estimators ( ¯Y a· − ¯Y b·) of µa − µb, or to decrease the sample size needed to obtain the
specified level of accuracy. After experimentation with common random numbers, stratified sampling, latin
hypercubes, and orthogonal arrays, we have chosen to use a combination of common random numbers and
orthogonal arrays, as it gave the best performance. The combination of common random numbers and
orthogonal arrays gave almost a ten-fold reduction in the sample size required for the specified accuracy
compared with just using simple random sampling with common random numbers. Even when one takes
into account that a combination of common random numbers and orthogonal arrays requires approximately
1.4 times as much computation time as simple random sampling with common random numbers for the same
sample size, it still provides a significant reduction in computational effort. The performance improvement is
illustrated in Figure 3, which shows the sample sizes required for the specified accuracy of choosing an action
with objective value within δ = 0.05 (approximately 0.1%) of the optimal objective value with probability at
least 1−α = 0.99, for each of 1000 transitions of a simulation of the IRPDD process, for both simple random
sampling with common random numbers and for random sampling with a combination of common random
numbers and the Bose-Bush orthogonal array design (Bose 1938 and Bose and Bush 1952) with level 9 and
23
frequency 3.
0
400
800
1200
1600
2000
0 100 200 300 400 500 600 700 800 900
Num
ber
Of
Obs
erva
tion
s R
equi
red
Simulation Steps
Figure 3: Sample sizes required for the specified accuracy with δ = 0.05 and α = 0.01, for each of 1000transitions of a simulation of instance cst2, for simple random sampling and orthogonal array sampling. Thenumber of observations required by simple random sampling are indicated by the thin line, and the numberof observations required by orthogonal array sampling are indicated by the thick line.
7.2 Solution Quality
In this section, we discuss a number of experiments to test the quality of the policies produced by the
dynamic programming approximation method.
First, we compare the value functions of the approximation policies with the optimal value functions for
small instances of the IRPDD, for which the optimal value function can be computed in reasonable time. The
ten instances used (given in Appendix B) have two, three, four or five customers, and demand distributions
which are either bimodal, or randomly generated, or uniform over all demand levels.
A concise presentation of the quality of a policy π is difficult because it involves a comparison of its value
function V π with the optimal value function V ∗ over all states x. We have chosen to present the quality of
the various value functions in several ways. For any value function V : X �→ IR, let Vavg ≡∑x∈X V (x)/|X |
denote the average value of the value function over all states. Because Vavg does not reveal the values at
good or bad states, we also present the minimum and maximum values of the value functions over all states,
that is, Vmin ≡ minx∈X V (x) and Vmax ≡ maxx∈X V (x).
In Table 3, we compare V πavg, V πmin, and V πmax for several policies π with V ∗avg, V ∗
min, and V ∗max. Policies
π′i result from Algorithm 2, where both the maximization and the expected value on the right hand side
of (7) are computed using enumeration, and where the sequence of policies π′0, π
′1, π
′2 result from successive
24
iterations of Algorithm 2. Policies πi also result from Algorithm 2, but the maximization and the expected
value on the right hand side of (7) are computed using a combination of Algorithm 4 and Algorithm 3,
and where the sequence of policies π0, π1, π2 also result from successive iterations of Algorithm 2. The
initial policies π′0 and π0 are myopic policies that use value function approximation V = 0 (or equivalently
discount factor α = 0) on the right hand side of (7). The differences between π′0 and π0 are that π′
0 is based
on computing the expected shortage penalty in g(x, a) using enumeration and on comparing the values
g(x, a) of all actions a ∈ A(x) and then choosing the best action, whereas π0 is based on a combination of
Algorithm 3 and Algorithm 4 for computing the expected shortage penalty and choosing an action. The
Gauss-Seidel policy evaluation algorithm was used to compute the value function of each policy.
The results show that the values of the policies produced by the dynamic programming approximation
method are very close to the optimal values. Furthermore, it shows that the policies obtained after successive
iterations of Algorithm 2 are slightly better than the preceding policies.
In Table 4, we present the same results in a different way. Instead of presenting summary information
of the value functions, we present summary information of the value functions of the dynamic programming
approximation policies relative to the optimal value function. However, there are several problems with
interpreting ratios such as V π(x)/V ∗(x) or [V ∗(x) − V π(x)]/V ∗(x). For the IRP, V ∗(x) can be positive or
negative, which could make the ratios above not very meaningful. A ratio such as [V ∗(x) − V π(x)]/|V ∗(x)|is also not without problems, since the denominator can be arbitrarily close to 0. Also, all the ratios above
can be made to appear arbitrarily good by adding a sufficiently large constant revenue in each time period,
independent of the state or decision. In an attempt to overcome some of these shortcomings, we shift the
values to fix the minimum value of the shifted optimal value function at 1. Specifically, let m ≡ minx∈XV ∗(x),
and for any stationary policy π, let ρπ(x) ≡ [V π(x)−m+1]/[V ∗(x)−m+1]. Then, let ρπavg ≡∑x∈X ρπ(x)/|X |,
ρπmin ≡ minx∈X ρπ(x), and ρπmax ≡ maxx∈X ρπ(x) denote the average, minimum, and maximum, over all
states, of the performance ratio ρπ(x) for policy π. Table 4 also shows that the values of the policies
produced by the dynamic programming approximation method are very close to the optimal values.
Another way to present the value function of a policy π is to graph the value functions V π(x) and V ∗(x)
for a subset of the states x. Figure 4 shows the value function V π1(x) of the approximation policy π1 that is
obtained after 104 iterations of the Bellman error method for parameter estimation, as well as the optimal
value function V ∗(x), for instance opt11 with 3 customers, with the inventory level at customer 1 fixed at
x1 = 10, as a function of the inventory level x2 at customer 2, for three levels of inventory at customer 3
(x3 = 0, x3 = 5, and x3 = 10). The figure shows the quality of the resulting policy as its value function is
close to the optimal value function.
Since computing the optimal value functions for large instances of the IRPDD is too time consuming, we
compare the quality of the approximation policies with the quality of two other policies for large instances.
The objective is to evaluate the quality of the approximation policies for larger instances, and to evaluate the
improvements obtained by using parameterized value function approximations. The first of these policies
is based on the method proposed by Chien, Balakrishnan and Wong (1989) (denoted by CBW). They
formulated an integer programming based single-day model, in which problem parameters are adjusted from
one day to the next. We slightly modified the CBW method to take the revenues and costs of our model
25
Table 3: Comparison of the optimal values with the values of the approximation policies.
π∗ π′0 π′
1 π′2 π0 π1 π2
Instance N V ∗min V ∗
avg V ∗max V
π′0
min Vπ′0
avg Vπ′0
max Vπ′1
min Vπ′1
avg Vπ′1
max Vπ′2
min Vπ′2
avg Vπ′2
max V π0min V π0
avg V π0max V π1
min V π1avg V π1
max V π2min V π2
avg V π2max
opt1 2 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95 28.08 28.67 28.95opt2 2 48.11 49.11 49.56 48.00 49.10 49.56 48.11 49.11 49.56 48.11 49.11 49.56 48.00 49.10 49.56 48.11 49.11 49.56 48.11 49.11 49.56
opt3 3 37.26 37.88 38.38 37.16 37.87 38.25 37.18 37.88 38.29 37.22 37.88 38.34 37.14 37.87 38.21 37.17 37.88 38.26 37.19 37.88 38.32opt4 3 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58 38.20 38.83 39.58opt5 3 87.52 89.49 90.95 87.39 89.49 90.80 87.41 89.49 90.84 87.46 89.49 90.89 87.36 89.49 90.78 87.39 89.49 90.82 87.45 89.49 90.87
opt6 4 18.28 18.63 18.99 18.20 18.61 18.90 18.24 18.63 18.92 18.27 18.63 18.96 18.17 18.61 18.87 18.18 18.63 18.89 18.23 18.63 18.93opt7 4 54.92 55.78 56.78 54.82 55.76 56.64 54.88 55.78 56.69 54.90 55.78 56.74 54.81 55.76 56.62 54.83 55.78 56.69 54.87 55.78 56.73opt8 4 41.82 42.71 43.52 41.76 42.71 43.49 41.76 42.71 43.51 41.79 42.71 43.52 41.73 42.71 43.46 41.74 42.71 43.47 41.77 42.71 43.50
opt9 5 25.70 26.16 26.43 25.65 26.15 26.38 25.66 26.16 26.40 25.66 26.16 26.43 25.62 26.15 26.32 25.64 26.15 26.35 25.65 26.16 26.43opt10 5 37.82 38.56 39.41 37.82 38.56 39.34 37.82 38.56 39.41 37.82 38.56 39.41 37.82 38.56 39.33 37.82 38.56 39.41 37.82 38.56 39.41
26
Table 4: Comparison of the values of the approximation policies relative to the optimal values.π′
0 π′1 π′
2 π0 π1 π2
Instance N ρπ′0
min ρπ′0
avg ρπ′0
max ρπ′1
min ρπ′1
avg ρπ′1
max ρπ′2
min ρπ′2
avg ρπ′2
max ρπ0min ρπ0
avg ρπ0max ρπ1
min ρπ1avg ρπ1
max ρπ2min ρπ2
avg ρπ2max
opt1 2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000opt2 2 0.997 0.999 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.998 1.000 1.000 1.000 1.000 1.000 1.000 1.000
opt3 3 0.991 0.994 0.996 0.994 0.997 0.999 0.999 0.999 1.000 0.991 0.994 0.996 0.993 0.996 0.999 0.999 0.999 1.000opt4 3 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000opt5 3 0.983 0.990 0.995 0.995 0.997 0.999 0.999 0.999 1.000 0.983 0.990 0.995 0.995 0.997 0.999 0.999 0.999 1.000
opt6 4 0.986 0.992 0.996 0.996 0.997 0.999 0.999 0.999 1.000 0.986 0.992 0.996 0.996 0.997 0.999 0.999 0.999 1.000opt7 4 0.981 0.990 0.996 0.991 0.993 0.998 0.992 0.994 0.999 0.980 0.988 0.996 0.991 0.993 0.998 0.992 0.994 0.999opt8 4 0.992 0.994 0.998 0.998 0.999 1.000 0.998 0.999 1.000 0.992 0.994 0.998 0.998 0.999 1.000 0.998 0.999 1.000
opt9 5 0.985 0.992 0.996 0.995 0.997 0.998 0.999 0.999 1.000 0.980 0.985 0.990 0.981 0.987 0.992 0.985 0.990 0.994opt10 5 0.994 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.994 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000
27
12.3
12.4
12.5
12.6
12.7
12.8
0 1 2 3 4 5 6 7 8 9 10
Val
ue f
unct
ion
Inventory at customer 2 (x2)
x3 = 0
x3 = 5
x3 = 10
x1 = 10
Figure 4: Comparison of the values V ∗(x) and V π1(x) at different states x for instance opt11. The optimalvalues V ∗(x) are shown by the solid lines, and the values V π1(x) of the KNS policy π1 are shown by thedashed lines.
into account. An integer program is formulated that maximizes the daily profit, which consists of revenue
per unit delivered, transportation costs, inventory holding costs, and shortage costs. The integer program
determines an assignment of vehicles to customers for each day. As the process evolves, data are collected
and are used to modify the rewards and costs for the next day. Unsatisfied demand at a customer in one
day causes an increased reward per unit delivered for that customer the next day. The integer program
that forms the basis of the CBW approach is given in Appendix A. The second policy which is used for
comparison is the myopic policy π0 described at the beginning of Section 7.2, that is based on a combination
of Algorithm 3 and Algorithm 4 to compute the expected shortage penalty and to choose an action.
We also compare two variants of our policy. The first variant is the policy introduced in Section 5.1 and
specifically given in (7), which uses the decomposition approximation given in (6). The second variant is
the policy introduced in Section 5.3, which uses a combination of the decomposition approximation and a
parametric approximation given in (16). The first variant can be considered a special case of the second
variant with parameters β0 = 0 and βn = 1 for all other n (we denote the first variant by KNS (before
simulation)). The second variant was obtained after two policy improvement iterations. During each policy
evaluation phase parameters were estimated by simulating the IRPDD process for 104 steps using the Bellman
error method followed by 108 steps of stochastic approximation (we denote the second variant by KNS (after
simulation)). In both these variants, a combination of random sampling with common random numbers and
the Bose-Bush orthogonal array design with level 9 and frequency 3, and a combination of Algorithm 3 and
Algorithm 4 were used to choose the decision in each time period.
28
The Gauss-Seidel policy evaluation algorithm used to compute the value functions of policies for smaller
instances cannot be used for larger instances. The main reason for this is that the number of states becomes
too large, and hence the available computer memory is not sufficient to store the values of all the states,
and the computation time becomes excessive. For larger instances, the policies were evaluated by randomly
choosing five initial states, and then simulating the IRPDD process under the different policies starting from
the chosen initial states. Each replication produced a sample path over a relatively long but finite time
horizon of 800 time periods. The length of the time horizon was chosen to bound the discounted truncation
error to less than 0.01 (less than 0.1%). Six sample paths were generated for each combination of policy
and initial state, for each problem instance. The sample means µ and standard deviations σ of the sample
means over the six sample paths, as well as intervals (µ− 2σ, µ + 2σ) were computed.
We conducted three experiments to evaluate the quality of the four policies on larger instances. In each of
these experiments, we varied a single instance characteristic and observed the impact on the performance of
the policies. The three instance characteristics are (1) the number of customers, (2) the number of vehicles,
and (3) the coefficient of variation of customer demand.
To study the impact of the number of customers on the performance of the policies, the instances were
generated so that larger instances have more customers with the same characteristics as the smaller instances.
Hence, customer characteristics as well as the ratio of delivery capacity to total expected demand were kept
the same for all instances. Table 5 shows the performance of the policies on instances with different numbers
of customers.
The results clearly demonstrate that the KNS policies consistently outperform the other policies. Fur-
thermore, the difference in quality appears to increase with the number of customers. Apparently, when the
number of customers becomes larger, the KNS policies are better at coordinating deliveries than the other
policies. Also, observe that while KNS (before simulation) gives good results, the results from KNS (after
simulation) are better.
Next, we studied the impact of the number of vehicles, and thus the delivery capacity available, on the
performance of the policies. The numbers of vehicles were chosen in such a way that we could study the
effectiveness of the policies when the available delivery capacity is smaller than the total expected demand,
as well as when there is surplus delivery capacity. The results are given in Table 6.
Intuitively, it is clear that when the delivery capacity is very restrictive, i.e., the number of vehicles is
small, then it is more important to use the available capacity wisely. The results show the superiority of the
KNS policies in handling these situations. The differences in quality are much larger for tightly constrained
instances than for loosely constrained instances.
Finally, we studied the impact of the customer demand coefficient of variation on the performance of
the policies. The customer demand distributions for the six instances were selected so that the demand
distribution was the same for all customers in an instance, and the expected customer demand for each of
the instances was 4.5. We varied the distributions so that the customer demands had different variances,
namely 0.25, 4.65, 8.85, 12.85, 15.25 and 17.05. All other characteristics were exactly the same for the
instances. The results are given in Table 7.
The results show that when the coefficients of variation of customer demands are large and it becomes
29
Table 5: Performance of policies on instances with different numbers of customers.
CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance N µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ
cst1 10 -6.92 0.11 -7.13 -6.70 -6.17 0.04 -6.24 -6.09 -4.31 0.07 -4.45 -4.18 -3.53 0.16 -3.84 -3.21-7.16 0.12 -7.39 -6.92 -6.36 0.05 -6.47 -6.26 -4.48 0.08 -4.65 -4.32 -3.49 0.18 -3.85 -3.12-6.97 0.11 -7.20 -6.74 -6.24 0.05 -6.34 -6.13 -4.54 0.08 -4.69 -4.38 -3.42 0.12 -3.67 -3.18-6.98 0.11 -7.20 -6.76 -6.11 0.05 -6.22 -6.00 -4.24 0.08 -4.41 -4.07 -3.50 0.09 -3.68 -3.32-7.19 0.09 -7.37 -7.01 -6.37 0.05 -6.46 -6.27 -4.55 0.06 -4.66 -4.43 -3.70 0.17 -4.04 -3.36
cst2 20 -13.35 0.10 -13.55 -13.15 -12.30 0.10 -12.50 -12.11 -9.10 0.24 -9.57 -8.63 -7.38 0.16 -7.69 -7.06-12.99 0.09 -13.17 -12.81 -12.41 0.14 -12.69 -12.13 -9.09 0.16 -9.41 -8.77 -7.37 0.16 -7.70 -7.05-13.46 0.16 -13.79 -13.14 -13.00 0.10 -13.20 -12.79 -9.64 0.07 -9.78 -9.49 -7.49 0.05 -7.58 -7.39-13.20 0.06 -13.33 -13.07 -12.55 0.08 -12.70 -12.39 -9.16 0.12 -9.40 -8.93 -7.65 0.11 -7.86 -7.43-13.23 0.12 -13.48 -12.98 -12.87 0.16 -13.19 -12.56 -9.40 0.16 -9.72 -9.08 -7.33 0.14 -7.62 -7.04
cst3 30 -19.45 0.09 -19.63 -19.26 -18.91 0.15 -19.21 -18.61 -13.91 0.15 -14.20 -13.62 -11.20 0.14 -11.48 -10.92-19.43 0.06 -19.56 -19.31 -18.64 0.13 -18.89 -18.38 -13.99 0.16 -14.30 -13.67 -10.99 0.31 -11.62 -10.36-19.73 0.23 -20.20 -19.26 -19.25 0.29 -19.83 -18.67 -13.80 0.26 -14.31 -13.29 -11.31 0.17 -11.65 -10.96-19.32 0.08 -19.47 -19.17 -18.47 0.12 -18.70 -18.23 -13.63 0.20 -14.02 -13.23 -11.28 0.30 -11.88 -10.67-19.63 0.05 -19.73 -19.52 -19.37 0.20 -19.77 -18.97 -14.37 0.14 -14.65 -14.09 -11.34 0.21 -11.76 -10.92
cst4 40 -25.30 0.08 -25.47 -25.14 -24.18 0.23 -24.65 -23.71 -17.34 0.20 -17.74 -16.94 -12.77 0.26 -13.30 -12.24-25.54 0.12 -25.79 -25.29 -24.52 0.26 -25.04 -24.00 -16.75 0.25 -17.25 -16.26 -12.43 0.11 -12.64 -12.22-25.60 0.07 -25.74 -25.46 -24.69 0.22 -25.12 -24.25 -17.09 0.30 -17.68 -16.50 -13.07 0.15 -13.36 -12.77-25.34 0.06 -25.47 -25.22 -24.73 0.16 -25.04 -24.41 -17.42 0.20 -17.82 -17.02 -12.95 0.18 -13.31 -12.58-25.35 0.07 -25.49 -25.21 -24.31 0.21 -24.73 -23.90 -16.91 0.23 -17.36 -16.45 -12.77 0.20 -13.17 -12.38
cst5 50 -31.55 0.22 -31.99 -31.12 -31.38 0.33 -32.05 -30.71 -23.12 0.42 -23.95 -22.28 -18.59 0.33 -19.26 -17.92-31.70 0.11 -31.92 -31.48 -30.64 0.24 -31.11 -30.16 -23.65 0.28 -24.22 -23.08 -18.83 0.41 -19.65 -18.00-31.65 0.13 -31.91 -31.38 -30.77 0.28 -31.33 -30.20 -23.23 0.34 -23.92 -22.54 -18.22 0.26 -18.75 -17.69-31.60 0.21 -32.02 -31.18 -31.20 0.35 -31.91 -30.49 -23.19 0.40 -23.99 -22.39 -18.31 0.27 -18.85 -17.77-31.78 0.12 -32.01 -31.55 -30.78 0.21 -31.20 -30.37 -23.84 0.47 -24.77 -22.91 -19.11 0.33 -19.77 -18.45
cst6 60 -37.21 0.19 -37.58 -36.83 -35.92 0.29 -36.50 -35.35 -26.79 0.26 -27.31 -26.28 -21.98 0.33 -22.63 -21.32-37.08 0.26 -37.59 -36.56 -35.72 0.26 -36.25 -35.20 -26.59 0.22 -27.04 -26.14 -21.34 0.33 -22.01 -20.67-37.84 0.19 -38.21 -37.47 -37.40 0.58 -38.55 -36.25 -26.91 0.18 -27.26 -26.55 -21.69 0.33 -22.35 -21.02-37.47 0.32 -38.10 -36.84 -36.03 0.38 -36.79 -35.26 -27.05 0.19 -27.42 -26.68 -21.49 0.49 -22.47 -20.50-37.24 0.20 -37.63 -36.84 -35.87 0.29 -36.45 -35.29 -27.29 0.26 -27.82 -26.76 -21.24 0.45 -22.15 -20.33
30
Table 6: Performance of policies on instances with different numbers of vehicles.
CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance M µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ
veh1 6 -91.69 0.27 -92.22 -91.16 -89.20 0.26 -89.73 -88.67 -74.17 0.21 -74.58 -73.76 -68.83 0.28 -69.38 -68.28-92.08 0.23 -92.54 -91.62 -90.15 0.22 -90.58 -89.71 -74.15 0.38 -74.92 -73.39 -68.97 0.31 -69.59 -68.35-91.43 0.29 -92.01 -90.85 -90.49 0.20 -90.90 -90.08 -74.32 0.33 -74.99 -73.65 -68.72 0.31 -69.33 -68.10-90.89 0.23 -91.36 -90.43 -89.58 0.12 -89.83 -89.34 -74.53 0.35 -75.24 -73.83 -68.54 0.46 -69.47 -67.62-91.16 0.32 -91.79 -90.52 -90.36 0.14 -90.64 -90.08 -74.32 0.40 -75.12 -73.52 -68.52 0.35 -69.23 -67.82
veh2 8 -57.70 0.22 -58.13 -57.27 -56.00 0.16 -56.32 -55.69 -44.38 0.22 -44.82 -43.94 -40.70 0.20 -41.11 -40.29-58.32 0.21 -58.74 -57.89 -56.17 0.12 -56.41 -55.92 -44.82 0.24 -45.31 -44.34 -40.61 0.45 -41.52 -39.71-57.96 0.29 -58.55 -57.37 -56.18 0.08 -56.35 -56.01 -44.23 0.30 -44.83 -43.63 -40.64 0.34 -41.32 -39.96-57.96 0.21 -58.39 -57.53 -56.28 0.17 -56.61 -55.94 -43.91 0.30 -44.50 -43.31 -40.36 0.40 -41.15 -39.56-57.50 0.20 -57.90 -57.10 -56.13 0.14 -56.41 -55.86 -44.36 0.20 -44.75 -43.96 -41.43 0.24 -41.91 -40.94
veh3 10 -43.78 0.34 -44.45 -43.11 -42.29 0.09 -42.47 -42.10 -32.79 0.25 -33.30 -32.29 -28.94 0.35 -29.64 -28.24-43.50 0.21 -43.92 -43.08 -42.53 0.07 -42.68 -42.38 -32.94 0.36 -33.66 -32.23 -28.23 0.47 -29.16 -27.29-44.22 0.23 -44.67 -43.77 -42.42 0.17 -42.77 -42.08 -32.60 0.25 -33.10 -32.10 -28.45 0.33 -29.10 -27.80-43.80 0.38 -44.56 -43.04 -42.51 0.10 -42.70 -42.32 -32.34 0.29 -32.92 -31.76 -28.77 0.28 -29.33 -28.21-43.88 0.28 -44.43 -43.32 -41.63 0.07 -41.76 -41.50 -32.44 0.27 -32.98 -31.91 -28.31 0.29 -28.88 -27.73
veh4 12 -23.99 0.34 -24.67 -23.32 -22.69 0.12 -22.92 -22.45 -12.40 0.44 -13.27 -11.52 -7.77 0.23 -8.22 -7.32-24.03 0.22 -24.46 -23.59 -23.37 0.09 -23.55 -23.20 -12.48 0.26 -12.99 -11.96 -7.87 0.41 -8.68 -7.06-23.84 0.31 -24.45 -23.23 -22.62 0.11 -22.85 -22.40 -11.94 0.27 -12.49 -11.40 -7.55 0.38 -8.30 -6.80-23.95 0.25 -24.44 -23.46 -23.21 0.15 -23.51 -22.92 -12.74 0.16 -13.05 -12.42 -7.71 0.42 -8.55 -6.86-23.66 0.17 -24.00 -23.32 -22.62 0.08 -22.78 -22.46 -11.98 0.38 -12.75 -11.22 -6.92 0.28 -7.47 -6.37
veh5 14 -3.98 0.15 -4.27 -3.68 -3.33 0.08 -3.49 -3.17 -1.80 0.10 -1.99 -1.61 0.65 0.10 0.45 0.85-3.51 0.34 -4.19 -2.83 -3.41 0.11 -3.63 -3.18 -1.90 0.22 -2.35 -1.46 0.83 0.08 0.68 0.98-3.41 0.22 -3.85 -2.98 -3.44 0.09 -3.61 -3.27 -1.54 0.25 -2.05 -1.03 0.63 0.11 0.41 0.84-3.71 0.28 -4.27 -3.16 -3.69 0.07 -3.84 -3.54 -2.39 0.20 -2.80 -1.98 0.40 0.07 0.26 0.54-4.03 0.22 -4.47 -3.60 -3.18 0.13 -3.44 -2.93 -2.11 0.15 -2.40 -1.81 0.60 0.10 0.40 0.79
veh6 16 -0.95 0.34 -1.63 -0.28 -0.81 0.05 -0.92 -0.71 2.42 0.27 1.88 2.96 3.44 0.18 3.08 3.79-1.13 0.15 -1.44 -0.83 -1.00 0.04 -1.09 -0.91 2.40 0.19 2.02 2.78 3.48 0.07 3.34 3.61-1.28 0.17 -1.61 -0.95 -0.72 0.04 -0.80 -0.65 2.50 0.35 1.80 3.20 3.45 0.09 3.27 3.63-1.30 0.22 -1.74 -0.86 -1.16 0.07 -1.31 -1.02 2.65 0.24 2.18 3.13 3.56 0.13 3.29 3.83-1.26 0.24 -1.74 -0.78 -0.69 0.07 -0.83 -0.56 2.27 0.31 1.64 2.89 3.46 0.11 3.24 3.67
31
Table 7: Performance of policies on instances with different coefficients of variation.
CBW (1989) Myopic KNS (before simulation) KNS (after simulation)Instance CV µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ µ σ µ − 2σ µ + 2σ
var1 0.11 -11.71 0.06 -11.84 -11.59 -11.03 0.05 -11.13 -10.92 -9.00 0.03 -9.06 -8.95 -8.24 0.09 -8.41 -8.07-11.68 0.05 -11.77 -11.58 -10.97 0.05 -11.08 -10.87 -8.96 0.03 -9.02 -8.89 -8.12 0.10 -8.32 -7.91-11.76 0.06 -11.88 -11.63 -10.83 0.07 -10.97 -10.70 -8.81 0.01 -8.84 -8.78 -8.24 0.09 -8.41 -8.06-11.56 0.08 -11.71 -11.40 -10.78 0.05 -10.88 -10.67 -8.70 0.03 -8.77 -8.64 -8.26 0.06 -8.39 -8.13-11.43 0.04 -11.51 -11.35 -10.63 0.07 -10.77 -10.48 -8.61 0.03 -8.67 -8.55 -8.45 0.05 -8.56 -8.35
var2 0.48 -18.78 0.14 -19.06 -18.50 -17.88 0.11 -18.09 -17.67 -15.99 0.24 -16.47 -15.50 -15.13 0.02 -15.17 -15.09-18.59 0.17 -18.92 -18.26 -17.77 0.14 -18.06 -17.49 -15.90 0.27 -16.45 -15.35 -15.08 0.04 -15.16 -15.00-18.74 0.20 -19.13 -18.35 -17.88 0.16 -18.20 -17.57 -16.03 0.29 -16.60 -15.45 -15.10 0.06 -15.21 -14.98-18.70 0.12 -18.95 -18.45 -17.75 0.13 -18.01 -17.50 -15.88 0.36 -16.61 -15.16 -15.11 0.05 -15.22 -15.00-18.37 0.22 -18.81 -17.93 -17.71 0.13 -17.96 -17.45 -15.85 0.24 -16.33 -15.37 -15.13 0.03 -15.19 -15.07
var3 0.66 -23.11 0.27 -23.65 -22.57 -22.91 0.27 -23.45 -22.37 -21.23 0.14 -21.51 -20.96 -20.55 0.05 -20.65 -20.44-23.23 0.19 -23.60 -22.85 -23.16 0.17 -23.50 -22.81 -21.50 0.26 -22.02 -20.97 -20.41 0.07 -20.55 -20.27-23.25 0.26 -23.76 -22.73 -23.16 0.17 -23.50 -22.82 -21.45 0.21 -21.86 -21.03 -20.37 0.06 -20.49 -20.25-22.94 0.29 -23.51 -22.36 -22.80 0.17 -23.15 -22.45 -21.12 0.27 -21.65 -20.59 -20.35 0.05 -20.44 -20.26-22.95 0.23 -23.42 -22.48 -22.78 0.17 -23.11 -22.44 -21.11 0.26 -21.62 -20.59 -20.51 0.06 -20.63 -20.39
var4 0.80 -22.57 0.41 -23.38 -21.75 -22.47 0.35 -23.18 -21.77 -21.17 0.18 -21.53 -20.82 -20.58 0.06 -20.69 -20.46-22.94 0.45 -23.83 -22.05 -22.63 0.34 -23.31 -21.94 -21.32 0.31 -21.95 -20.69 -20.72 0.05 -20.82 -20.62-23.03 0.43 -23.89 -22.17 -22.77 0.37 -23.51 -22.03 -21.41 0.12 -21.64 -21.17 -20.70 0.03 -20.77 -20.63-22.32 0.38 -23.08 -21.57 -22.19 0.33 -22.85 -21.53 -20.81 0.37 -21.56 -20.06 -20.59 0.06 -20.70 -20.47-22.44 0.41 -23.25 -21.63 -22.17 0.35 -22.88 -21.47 -20.85 0.17 -21.19 -20.52 -20.48 0.01 -20.51 -20.46
var5 0.87 -22.75 0.33 -23.41 -22.08 -22.63 0.28 -23.20 -22.07 -21.65 0.26 -22.16 -21.13 -21.28 0.05 -21.38 -21.18-22.61 0.25 -23.11 -22.10 -22.51 0.14 -22.80 -22.23 -21.53 0.28 -22.09 -20.98 -21.28 0.05 -21.39 -21.17-22.57 0.31 -23.20 -21.95 -22.55 0.20 -22.96 -22.15 -21.57 0.21 -21.98 -21.16 -21.38 0.04 -21.47 -21.30-22.72 0.30 -23.33 -22.12 -22.24 0.16 -22.56 -21.93 -21.22 0.18 -21.57 -20.87 -21.26 0.04 -21.35 -21.17-22.39 0.39 -23.16 -21.62 -22.13 0.19 -22.51 -21.76 -21.15 0.19 -21.53 -20.76 -21.41 0.05 -21.52 -21.30
var6 0.92 -21.50 0.27 -22.03 -20.97 -21.13 0.22 -21.57 -20.68 -20.40 0.31 -21.03 -19.77 -20.03 0.04 -20.10 -19.95-21.94 0.25 -22.44 -21.44 -21.35 0.19 -21.72 -20.98 -20.60 0.24 -21.08 -20.13 -20.01 0.06 -20.13 -19.89-21.89 0.33 -22.56 -21.23 -21.53 0.23 -22.00 -21.06 -20.75 0.31 -21.38 -20.12 -19.97 0.04 -20.05 -19.89-21.24 0.35 -21.95 -20.53 -20.92 0.24 -21.40 -20.45 -20.20 0.32 -20.85 -19.56 -19.97 0.04 -20.06 -19.89-21.27 0.32 -21.92 -20.62 -20.97 0.23 -21.43 -20.51 -20.27 0.34 -20.96 -19.58 -19.94 0.03 -20.01 -19.87
32
less clear what the future is going to bring, then the differences in quality between the KNS policies and the
other policies tend to be smaller, although the KNS policies still do better on every instance. As expected,
this indicates that carefully taking into account the available information about the future, such as through
dynamic programming approximation methods, provides more benefit if the available information about the
future is more accurate.
Overall, the computational experiments conducted demonstrate the viability of using dynamic program-
ming approximation methods for the IRPDD.
8 Further Work
An important extension of our work involves routing vehicles to more than one customer on a delivery route.
This version of the IRP is much harder than the IRPDD, since the optimization problem on the right hand
side of (2) is much harder for the IRP than for the IRPDD. In the case of the IRP, this optimization problem
involves solving both a vehicle routing problem, which is NP-hard, as well as determining the optimal
quantities to be delivered to each customer on a delivery route, which involves solving an optimization
problem with a nonunimodal objective function, as shown in Campbell et al. (1998). An approach for the
stochastic IRP in which more than one customer can be visited on a delivery route was proposed in Kleywegt,
Nori and Savelsbergh (2000).
Other issues that have to be addressed before IRPs can be solved in practice, include the estimation of the
problem parameters from data. These include the revenues and costs, as well as the demand distributions.
Estimating these parameters from noisy data lead to hard statistical and optimization problems. It is
surprising how little work has been done in this area, since it is clear that the estimation of problem
parameters from data is an essential activity for the formulation and solution of practical optimization
problems such as the IRP.
References
Anily, S. and Federgruen, A. 1990. One Warehouse Multiple Retailer Systems with Vehicle RoutingCosts. Management Science, 36, 92–114.
Anily, S. and Federgruen, A. 1993. Two-Echelon Distribution Systems with Vehicle Routing Costs andCentral Inventories. Operations Research, 41, 37–47.
Bard, J., Huang, L., Dror, M. and Jaillet, P. 1997. A Branch and Cut Algorithm for the VRP withSatellite Facilities. not published.
Barnes-Schuster, D. and Bassok, Y. 1997. Direct Shipping and the Dynamic Single-depot/Multi-retailer Inventory System. European Journal of Operational Research, 101, 509–518.
Bassok, Y. and Ernst, R. 1995. Dynamic Allocations for Multi-Product Distribution. TransportationScience, 29, 256–266.
Bell, W., Dalberto, L., Fisher, M., Greenfield, A., Jaikumar, R., Kedia, P., Mack, R. and
Prutzman, P. 1983. Improving the Distribution of Industrial Gases with an On-Line ComputerizedRouting and Scheduling Optimizer. Interfaces , 13, 4–23.
33
Benjamin, J. 1989. An Analysis of Inventory and Transportation Costs in a Constrained Network. Trans-portation Science, 23, 177–183.
Berman, O. and Larson, R. C. 1999. Deliveries in an Inventory/Routing Problem Using StochasticDynamic Programming, Technical report, Massachusetts Institute of Technology, Cambridge, MA.
Bertsekas, D. P. 1975. Convergence of Discretization Procedures in Dynamic Programming. IEEETransactions on Automatic Control , 20, 415–419.
Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA.
Bertsekas, D. P. and Shreve, S. E. 1978. Stochastic Optimal Control: The Discrete Time Case.Academic Press, New York, NY.
Bertsekas, D. P. and Tsitsiklis, J. N. 1996. Neuro-Dynamic Programming. Athena Scientific, NewYork, NY.
Blumenfeld, D. E., Burns, L. D. and Daganzo, C. F. 1991. Synchronizing Production and Trans-portation Schedules. Transportation Research, 25B, 23–37.
Blumenfeld, D. E., Burns, L. D., Diltz, J. D. and Daganzo, C. F. 1985. Analyzing Trade-offsbetween Transportation, Inventory and Production Costs on Freight Networks. Transportation Research,19B, 361–380.
Bose, R. C. 1938. On the Application of the Theory of Galois Fields to the Problem of Construction ofHyper-Graeco-Latin Squares. Sankhya, 3, 323–338.
Bose, R. C. and Bush, K. A. 1952. Orthogonal Arrays of Strength Two and Three. The Annals ofMathematical Statistics , 23, 508–524.
Burns, L. D., Hall, R. W., Blumenfeld, D. E. and Daganzo, C. F. 1985. Distribution Strategiesthat Minimize Transportation and Inventory Costs. Operations Research, 33, 469–490.
Campbell, A., Clarke, L., Kleywegt, A. J. and Savelsbergh, M. W. P. 1998. The InventoryRouting Problem. In Fleet Management and Logistics. T. G. Crainic and G. Laporte (editors). KluwerAcademic Publishers, Dordrecht, Netherlands, chapter 4.
Cetinkaya, S. and Lee, C. Y. 2000. Stock Replenishment and Shipment Scheduling for Vendor ManagedInventory Systems. Management Science, 46, 217–232.
Chandra, P. and Fisher, M. 1994. Coordination of Production and Distribution Planning. EuropeanJournal of Operational Research, 72, 503–517.
Chan, L. M. A., Federgruen, A. and Simchi-Levi, D. 1998. Probabilistic Analysis and PracticalAlgorithms for Inventory-Routing Models. Operations Research, 46, 96–106.
Chien, T. W. 1993. Determining Profit-Maximizing Production/Shipping Policies in a One-to-One DirectShipping, Stochastic Demand Environment. European Journal of Operational Research, 64, 83–102.
Chien, T. W., Balakrishnan, A. and Wong, R. T. 1989. An Integrated Inventory Allocation andVehicle Routing Problem. Transportation Science, 23, 67–76.
Chow, C. S. and Tsitsiklis, J. N. 1991. An Optimal One-Way Multigrid Algorithm for Discrete-TimeStochastic Control. IEEE Transactions on Automatic Control , AC-36, 898–914.
Christiansen, M. 1999. Decomposition of a Combined Inventory and Time Constrained Ship RoutingProblem. Transportation Science, 33, 3–16.
34
Christiansen, M. and Nygreen, B. 1998a. A Method for Solving Ship Routing Problems with InventoryConstraints. Annals of Operations Research, 81, 357–378.
Christiansen, M. and Nygreen, B. 1998b. Modelling Path Flows for a Combined Ship Routing andInventory Management Problem. Annals of Operations Research, 82, 391–412.
Cohen, M. A. and Lee, H. L. 1988. Strategic Analysis of Integrated Production-Distribution Systems:Models and Methods. Operations Research, 36, 216–228.
Dror, M. and Ball, M. 1987. Inventory/Routing: Reduction from an Annual to a Short Period Problem.Naval Research Logistics Quarterly, 34, 891–905.
Dror, M., Ball, M. and Golden, B. 1985. A Computational Comparison of Algorithms for the InventoryRouting Problem. Annals of Operations Research, 4, 3–23.
Dror, M. and Trudeau, P. 1996. Cash Flow Optimization in Delivery Scheduling. European Journal ofOperational Research, 88, 504–515.
Federgruen, A. and Simchi-Levi, D. 1995. Analysis of Vehicle Routing and Inventory-Routing Problems.In Network Routing. M. O. Ball, T. L. Magnanti, C. L. Monma and G. L. Nemhauser (editors). Vol. 8 ofHandbooks in Operations Research and Management Science, North-Holland, Amsterdam, Netherlands,chapter 4, 297–373.
Federgruen, A. and Zipkin, P. 1984. A Combined Vehicle Routing and Inventory Allocation Problem.Operations Research, 32, 1019–1037.
Fumero, F. and Vercellis, C. 1999. Synchronized Development of Production, Inventory, and Distrib-ution Schedules. Transportation Science, 33, 330–340.
Gallego, G. and Simchi-Levi, D. 1990. On the Effectiveness of Direct Shipping Strategy for the One-Warehouse Multi-Retailer R-Systems. Management Science, 36, 240–243.
Golden, B., Assad, A. and Dahl, R. 1984. Analysis of a Large Scale Vehicle Routing Problem with anInventory Component. Large Scale Systems , 7, 181–190.
Haimovich, M. and Rinnooy Kan, A. H. G. 1985. Bounds and heuristics for capacitated routingproblems. Mathematics of Operations Research, 10, 527–542.
Jaillet, P., Huang, L., Bard, J. and Dror, M. 1997. A Rolling Horizon Framework for the InventoryRouting Problem, Technical report, Department of Management Science and Information Systems,University of Texas, Austin, TX.
Kleywegt, A. J., Nori, V. S. and Savelsbergh, M. W. P. 2000. The Stochastic Inventory Rout-ing Problem, Technical report, The Logistics Institute, School of Industrial and Systems Engineering,Georgia Institute of Technology, Atlanta, GA 30332-0205.
Kushner, H. J. and Dupuis, P. 1992. Numerical Methods for Stochastic Control Problems in ContinuousTime. Springer-Verlag, New York, NY.
Meyn, S. P. and Tweedie, R. L. 1993. Markov Chains and Stochastic Stability. Springer-Verlag, London,Great Britain.
Minkoff, A. S. 1993. A Markov Decision Model and Decomposition Heuristic for Dynamic Vehicle Dis-patching. Operations Research, 41, 77–90.
Mustard, D., Lyness, J. N. and Blatt, J. M. 1963. Numerical Quadrature in n Dimensions. TheComputer Journal , 6, 75–87.
35
Nahmias, S. and Smith, S. A. 1994. Optimizing Inventory Levels in a Two-echelon Retailer System withPartial Lost Sales. Management Science, 40, 582–596.
Nelson, B. L. and Matejcik, F. J. 1995. Using Common Random Numbers for Indifference-zone Selectionand Multiple Comparisons in Simulation. Management Science, 41, 1935–1945.
Puterman, M. L. 1994. Markov Decision Processes. John Wiley & Sons, Inc., New York, NY.
Pyke, D. F. and Cohen, M. A. 1993. Performance Characteristics of Stochastic Integrated Production-Distribution Systems. European Journal of Operational Research, 68, 23–48.
Reiman, M. I., Rubio, R. and Wein, L. M. 1999. Heavy Traffic Analysis of the Dynamic StochasticInventory-Routing Problem. Transportation Science, 33, 361–380.
Ruszczynski, A. and Syski, W. 1986. A Method of Aggregate Stochastic Subgradients with On-Line Step-size Rules for Convex Stochastic Programming Problems. Mathematical Programming Study, 28, 113–131.
Stroud, A. H. 1971. Approximate Calculation of Multiple Integrals. Prentice Hall, Englewood Cliffs, NJ.
Thomas, D. J. and Griffin, P. M. 1996. Coordinated Supply Chain Management. European Journal ofOperational Research, 94, 1–15.
Trudeau, P. and Dror, M. 1992. Stochastic Inventory Routing: Route Design with Stockouts and RouteFailures. Transportation Science, 26, 171–184.
Tsitsiklis, J. N. and Van Roy, B. 1997. An Analysis of Temporal-Difference Learning with FunctionApproximation. IEEE Transactions on Automatic Control , 42, 674–690.
Van Roy, B., Bertsekas, D. P., Lee, Y. and Tsitsiklis, J. N. 1997. A Neuro-dynamic ProgrammingApproach to Retailer Inventory Management, Technical report, Laboratory for Information and DecisionSystems, Massachusetts Institute of Technology, Cambridge, MA.
36
Appendices
A CBW Formulation
In this appendix, we present a slightly modified version of the method proposed by Chien, Balakrishnan,
and Wong (1989) (CBW), adapted for the IRPDD. At the start of each day, an integer program is solved
to determine the vehicle assignments for that day. The parameters and variables of the integer program are
given below.
n Number of customers
ri Revenue earned per unit delivered to customer i
ci Round-trip travel cost between depot and customer i
hi Holding cost per unit stored in inventory at customer i
pi Penalty per unit short at customer i
Ci Storage capacity of customer i
Xi Initial inventory at customer i
Di Estimate of the demand of customer i
m Number of vehicles
CV Vehicle capacity (for all vehicles)
dij Quantity delivered at customer i by vehicle j
yij 1 if vehicle j is assigned to customer i, 0 otherwise
δi Lower bound on the final inventory at customer i
ηi Upper bound on the shortage at customer i
The integer program is given below.
Maximize∑i,j
ridij −∑i,j
ciyij − 12
∑i
hi
Xi +
∑j
dij + δi
− α
∑i
piηi (20)
subject to∑j
dij ≤ Ci −Xi, ∀i (21)
dij − CV yij ≤ 0, ∀i, j (22)∑i
yij ≤ 1, ∀j (23)
Xi +∑j
dij −Di ≤ δi ∀i (24)
−Xi +
∑j
dij −Di
≤ ηi ∀i (25)
δi, ηi, dij ≥ 0 ∀i, jyij binary ∀i, j
37
Constraints (21) ensure that the total amount of product delivered to a customer does not exceed the
customer’s remaining capacity. Constraints (22) ensure that the amount of product delivered to a customer
by a single vehicle is no more than the vehicle capacity. Constraints (23) ensure that a vehicle is assigned
to at most one customer. Constraints (24) and (25) determine, for each customer, the final inventory or
shortage at the end of the day. The inventory at the end of the day is computed as max{0, Xi+∑j dij−Di},
where Di is taken to be the maximum demand as suggested by CBW. Likewise, shortage is computed as
max{0,−Xi −∑j dij + Di}. Note that by the choice of Di, the holding costs are underestimated and the
shortage costs are overestimated. This may result in a conservative low-risk policy.
The objective function consists of four parts: the revenue earned, the transportation cost, the inventory
holding cost and the shortage cost. As proposed by CBW, the revenue earned per unit is given by ri + pi or
ri depending on whether or not there was a shortage in the previous period. Their model has been modified
slightly by incorporating a linear inventory holding cost given by half the sum of inventory after delivery
and inventory at the end of the day, times the per unit holding cost. We have also assumed that shortages
occur at the end of the day and are discounted at a rate α to the beginning of the day. Finally, it is assumed
that the depot has an unlimited supply of the product.
38
B Instances Used in Computational Results
Table 8: Instance opt1.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 10
1 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 80 30 52 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 80 30 5
n = 2, m = 2, CV = 5, α = 0.98
Table 9: Instance opt2.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 10
1 10 0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0 0.0 0.0 10 120 0 42 10 0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 50 140 3 0
n = 2, m = 1, CV = 8, α = 0.98
Table 10: Instance opt3.i Ci fi ci ri pi hi
0 1 2 3 4 5
1 5 0 0.2 0.1 0.1 0.5 0.1 50 100 30 22 5 0 0.2 0.2 0.2 0.2 0.2 20 150 60 43 5 0 0.0 0.4 0.0 0.0 0.6 20 60 40 1
n = 3, m = 2, CV = 5, α = 0.98
Table 11: Instance opt4.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8
1 8 0 0.0 0.0 0.0 0.5 0.0 0.5 0.0 0.0 120 80 30 52 8 0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 0.0 120 80 40 53 8 0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.5 120 80 20 5
n = 3, m = 3, CV = 8, α = 0.98
Table 12: Instance opt5.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 10
1 10 0 0.11 0.07 0.25 0.10 0.05 0.05 0.10 0.04 0.06 0.17 100 180 50 52 10 0 0.12 0.05 0.05 0.08 0.21 0.05 0.11 0.06 0.06 0.21 70 120 50 153 10 0 0.09 0.08 0.24 0.06 0.04 0.01 0.24 0.01 0.15 0.08 80 150 50 25
n = 3, m = 2, CV = 10, α = 0.98
Table 13: Instance opt6.i Ci fi ci ri pi hi
0 1 2 3 4
1 4 0 0.00 0.50 0.00 0.50 100 80 50 52 4 0 0.25 0.50 0.00 0.25 120 80 50 103 4 0 0.00 0.50 0.25 0.25 120 80 50 54 4 0 0.50 0.00 0.00 0.50 100 80 60 4
n = 4, m = 3, CV = 4, α = 0.98
39
Table 14: Instance opt7.i Ci fi ci ri pi hi
0 1 2 3 4 51 5 0 0.01 0.33 0.32 0.03 0.31 36 134 17 72 5 0 0.14 0.30 0.26 0.11 0.19 81 88 39 63 5 0 0.04 0.44 0.22 0.28 0.02 73 101 30 54 5 0 0.18 0.13 0.22 0.30 0.17 53 153 36 11
n = 4, m = 3, CV = 5, α = 0.98
Table 15: Instance opt8.i Ci fi ci ri pi hi
0 1 2 3 4 5 61 6 0.00 0.01 0.29 0.28 0.02 0.27 0.13 63 92 21 102 6 0.00 0.25 0.22 0.10 0.16 0.02 0.25 56 102 32 103 6 0.00 0.18 0.24 0.02 0.19 0.14 0.23 30 101 30 104 6 0.00 0.23 0.13 0.11 0.11 0.19 0.23 73 80 30 10
n = 4, m = 4, CV = 4, α = 0.98
Table 16: Instance opt9.i Ci fi ci ri pi hi
0 1 2 31 3 0 0.02 0.49 0.49 64 73 30 172 3 0 0.07 0.63 0.30 61 130 13 63 3 0 0.45 0.39 0.16 56 92 35 114 3 0 0.37 0.05 0.58 73 102 39 165 3 0 0.41 0.55 0.04 66 101 31 12
n = 5, m = 3, CV = 3, α = 0.98
Table 17: Instance opt10.i Ci fi ci ri pi hi
0 1 2 3 41 4 0 0.01 0.48 0.47 0.04 46 88 30 112 4 0 0.30 0.14 0.30 0.26 91 101 36 143 4 0 0.18 0.30 0.04 0.48 83 153 35 84 4 0 0.29 0.38 0.03 0.30 63 55 33 175 4 0 0.16 0.27 0.36 0.21 106 97 30 15
n = 5, m = 5, CV = 4, α = 0.98
Table 18: Instance opt11.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 02 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 03 10 0.00 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 120 80 50 0
n = 3, m = 2, CV = 5, α = 0.98
Table 19: Instance opt12.i Ci fi ci ri pi hi
0 1 2 3 4 51 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 152 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 153 5 0.00 0.20 0.20 0.20 0.20 0.20 20 80 50 15
n = 3, m = 2, CV = 5, α = 0.98
40
Table 20: Instances cst1-cst6. The values of (n,m) are (10, 5), (20, 10), (30, 15), (40, 20), (50, 25) and (60, 30).i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 25 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 36 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 2
10 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 311 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 215 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 316 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 321 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 225 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 326 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 227 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 228 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 129 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 230 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 331 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 232 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 233 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 134 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 235 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 336 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 237 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 238 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 139 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 240 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 341 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 242 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 243 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 144 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 245 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 346 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 247 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 248 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 149 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 250 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 351 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 252 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 253 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 154 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 255 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 356 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 257 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 258 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 159 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 260 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 3
CV = 10, α = 0.98
41
Table 21: Instances veh1-veh6. The values of m are 6, 8, 10, 12, 14 and 16.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 23 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 24 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 25 10 0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 70 10 45 26 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 27 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 29 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 2
10 10 0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 140 20 45 211 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 112 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 113 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 115 10 0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 120 15 20 116 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 217 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 218 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 219 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 160 25 30 221 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 322 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 323 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 324 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 325 10 0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.5 70 10 25 3
n = 25, CV = 10, α = 0.98
42
Table 22: Instance var1.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 25 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 36 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 2
10 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 311 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 215 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 316 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 220 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 321 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 160 25 30 225 10 0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
Table 23: Instance var2.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 22 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 23 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 14 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 25 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 36 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 27 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 28 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 19 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 2
10 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 311 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 212 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 213 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 114 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 215 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 316 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 217 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 218 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 119 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 220 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 321 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 45 222 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 140 20 45 223 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 120 15 20 124 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 160 25 30 225 10 0 0.0 0.3 0.2 0.0 0.0 0.2 0.3 0.0 0.0 0.0 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
43
Table 24: Instance var3.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 22 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 23 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 14 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 25 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 36 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 27 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 28 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 19 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 2
10 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 311 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 212 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 213 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 114 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 215 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 316 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 217 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 218 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 119 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 220 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 321 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 45 222 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 140 20 45 223 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 120 15 20 124 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 160 25 30 225 10 0 0.3 0.1 0.1 0.0 0.0 0.0 0.3 0.2 0.0 0.0 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
Table 25: Instance var4.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 22 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 23 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 14 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 25 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 36 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 27 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 28 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 19 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 2
10 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 311 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 212 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 213 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 114 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 215 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 316 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 217 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 218 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 119 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 220 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 321 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 45 222 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 140 20 45 223 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 120 15 20 124 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 160 25 30 225 10 0 0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0.10 0.10 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
44
Table 26: Instance var5.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 22 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 23 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 14 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 25 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 36 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 27 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 28 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 19 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 2
10 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 311 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 212 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 213 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 114 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 215 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 316 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 217 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 218 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 119 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 220 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 321 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 45 222 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 140 20 45 223 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 120 15 20 124 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 160 25 30 225 10 0 0.5 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.3 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
Table 27: Instance var6.i Ci fi ci ri pi hi
0 1 2 3 4 5 6 7 8 9 101 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 22 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 23 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 14 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 25 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 36 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 27 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 28 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 19 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 2
10 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 311 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 212 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 213 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 114 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 215 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 316 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 217 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 218 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 119 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 220 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 321 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 45 222 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 140 20 45 223 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 120 15 20 124 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 160 25 30 225 10 0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.3 70 10 25 3
n = 25, m = 10, CV = 10, α = 0.98
45