merchant energy trading in a network - university of north...

35
Merchant Energy Trading in a Network Selvaprabu Nadarajah College of Business Administration, University of Illinois at Chicago, 601 South Morgan Street, Chicago, IL, 60607, USA [email protected] Nicola Secomandi Tepper School of Business, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, USA [email protected] Abstract We formulate the merchant trading of energy in a network of storage and transport assets as a Markov decision process with uncertain energy prices, generalizing known models. Because of the intractability of our model, we develop heuristics and both lower and dual (upper) bounds on the optimal policy value estimated within Monte Carlo simulation. We achieve tractability using linear optimization, extending near optimal approximate dynamic programming techniques for the case of a single storage asset, versions of two of which are commercially available. We propose (i) a generalization of a deterministic reoptimization heuristic, (ii) an iterative version of the least squares Monte Carlo approach, and (iii) a perfect information dual bound. We apply our methods to a set of realistic natural gas instances. The combination of our reoptimization heuristic and dual bound emerges as a practical approach to nearly optimally solve our model. Our iterative least squares Monte Carlo heuristic is also close to optimal. Compared to our other heuristic, it exhibits slightly larger optimality gaps and requires some tuning, but is faster to execute in some cases. Our methods could enhance single storage asset software and have potential relevance beyond our specific application. 1. Introduction Energy is a commodity traded in spot and forward wholesale markets (Kaminski 2012, Roncoroni et al. 2015). Merchants have access to energy storage and transport infrastructure through owner- ship of physical or contractual assets. These assets allow merchants to trade energy across current and future dates and geographical locations to take advantage of positive price differentials in these markets. The merchant trading of energy in a network of storage and transport assets has received limited attention in the extant literature. We model this problem as a finite horizon Markov decision process (MDP) formulated in a real option setting (Smith and McCardle 1999, Eydeland and Wolyniec 2003, Geman 2005, Smith 2005, Burger et al. 2007, Secomandi and Seppi 2014, Swindle 2014). In every stage, the states of this MDP include the inventory levels of the energy storage assets and the forward curves – a vector of futures prices (Clewlow and Strickland 2000, Chapter 4) – for a set of wholesale energy markets connected by the transport assets. We model the stochastic evolution of 1

Upload: votuyen

Post on 11-Apr-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Merchant Energy Trading in a Network

Selvaprabu NadarajahCollege of Business Administration, University of Illinois at Chicago, 601 South Morgan Street,

Chicago, IL, 60607, [email protected]

Nicola SecomandiTepper School of Business, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA

15213-3890, [email protected]

Abstract

We formulate the merchant trading of energy in a network of storage and transport assets as aMarkov decision process with uncertain energy prices, generalizing known models. Because of theintractability of our model, we develop heuristics and both lower and dual (upper) bounds onthe optimal policy value estimated within Monte Carlo simulation. We achieve tractability usinglinear optimization, extending near optimal approximate dynamic programming techniques for thecase of a single storage asset, versions of two of which are commercially available. We propose(i) a generalization of a deterministic reoptimization heuristic, (ii) an iterative version of the leastsquares Monte Carlo approach, and (iii) a perfect information dual bound. We apply our methodsto a set of realistic natural gas instances. The combination of our reoptimization heuristic and dualbound emerges as a practical approach to nearly optimally solve our model. Our iterative leastsquares Monte Carlo heuristic is also close to optimal. Compared to our other heuristic, it exhibitsslightly larger optimality gaps and requires some tuning, but is faster to execute in some cases.Our methods could enhance single storage asset software and have potential relevance beyond ourspecific application.

1. Introduction

Energy is a commodity traded in spot and forward wholesale markets (Kaminski 2012, Roncoroni

et al. 2015). Merchants have access to energy storage and transport infrastructure through owner-

ship of physical or contractual assets. These assets allow merchants to trade energy across current

and future dates and geographical locations to take advantage of positive price differentials in these

markets.

The merchant trading of energy in a network of storage and transport assets has received limited

attention in the extant literature. We model this problem as a finite horizon Markov decision process

(MDP) formulated in a real option setting (Smith and McCardle 1999, Eydeland and Wolyniec 2003,

Geman 2005, Smith 2005, Burger et al. 2007, Secomandi and Seppi 2014, Swindle 2014). In every

stage, the states of this MDP include the inventory levels of the energy storage assets and the

forward curves – a vector of futures prices (Clewlow and Strickland 2000, Chapter 4) – for a set of

wholesale energy markets connected by the transport assets. We model the stochastic evolution of

1

Page 2: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

the forward curves using a term-structure model (Clewlow and Strickland 2000, Chapter 8), making

a price taking assumption.

The curses of dimensionality (Powell 2011, §1.2), in particular the high dimensional state space

and the difficulty of evaluating expectations, make computing an optimal policy for our MDP

intractable. We overcome this intractability using linear optimization (Bertsimas and Tsitsiklis

1997) within approximate dynamic programming (ADP; Bertsekas and Tsitsiklis 1996, Chang et al.

2007, Powell 2011, Bertsekas 2012). We extend near optimal single storage asset ADP methods

(Boogert and De Jong 2008, Secomandi 2010b, 2015, Lai et al. 2010, Wu et al. 2012, Nadarajah

et al. 2017): Both heuristics based on reoptimization or least squares Monte Carlo (LSM), variants

of which are part of commercial software (KYOS 2014, Lacima 2014, MathWorks 2014, FEA 2015,

EnergyQuants 2015), and an upper bound.

We adapt to our network setting a reoptimization heuristic (RH; Gray and Khandelwal 2004).

Our RH method makes decisions by solving a linear program that represents the deterministic

version of our MDP formulated using the information available at a given stage and state.

The LSM approach includes regress-now/later (LSMN/L) variants (Carriere 1996, Longstaff and

Schwartz 2001, Tsitsiklis and Van Roy 2001, and Glasserman and Yu 2004), which rely on estimating

continuation/value function approximations (C/VFAs). We face two complications: How to (i)

make high dimensional decisions and (ii) sample the inventory level vectors of the storage assets

that support these CFAs. We develop an iterative LSMN (ILSMN) version based on CFAs that at

each stage are specified for a manageable number of such inventory vectors and are distinguished

by basis functions that depend on the forward curves. At a given stage and state ILSMN makes

decisions by solving a linear program that combines these inventory-specific CFAs into a CFA that

applies to any reachable vector of inventory levels. ILSMN estimates inventory-specific CFAs based

on sample paths of forward curves obtained by Monte Carlo simulation. It iteratively generates

inventory level vectors by solving the same linear programs used to make decisions but specified

with respect to the CFAs available at the previous iteration.

We show that both RH and ILSMN are ADP methods based on CFAs that share the piecewise

linear concavity of the exact continuation functions in the inventory levels of the storage assets.

Hence, our heuristics yield policies that belong to a family that includes an optimal policy.

We evaluate the policies associated with RH and ILSMN using Monte Carlo simulation, thus

estimating lower bounds on the value of an optimal policy. We also estimate within Monte Carlo

simulation a dual (upper) bound on this value (Brown et al. 2010 and references therein). We set

the dual penalties based on VFAs that are linear in the inventory levels on the storage assets, which

2

Page 3: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

we estimate by developing an LSML method that relies on a fixed set of inventory levels and uses

linear optimization to make decisions. We can thus formulate the dual models as linear programs.

We apply our methods to realistic natural gas instances with up to three storage assets partly

developed in conjunction with an energy trading company. Our dual bound is almost tight and

both RH and ILSMN yield close to optimal policies. Relative to RH, ILSMN exhibits marginally

larger optimality gaps and needs some tuning but is faster to execute on both our single storage

asset instances and one set of our two storage asset instances. In other words, achieving near

optimality only requires reactively capturing uncertainty in the optimization, which frees it from

stochastic model assumptions and does not involve selecting basis functions and tuning of their

associated parameters, and has computational advantages with enough storage assets. RH and

our dual bound are thus a practical approach to solve our model almost optimally. Our analysis

also suggests that optimal storage and transport decisions strongly compete for the capacity of the

transport assets. Single storage asset software packages could be modified to include our methods.

Besides natural gas and other energy sources, such as coal, electricity, oil, and petroleum products,

our research has relevance for the merchant trading of other commodities, such as agricultural

products, metals, and non-energy natural resources, e.g., water and timber. More generally, our

techniques may be applicable to other resource allocation problems (Powell 2011, Chapter 13).

We review the extant literature in §2. We formulate our MDP in §3. We analyze it in §4. We

present RH and ILSMN in §5 and §6, respectively, and examine them in §7. We introduce our dual

bound and LSML version in §8. We discuss our numerical study in §9. We conclude in §10. Online

Appendix A includes proofs. Online Appendix B contains supporting material.

2. Literature Review

Our MDP extends the energy merchant operations literature (Secomandi and Seppi 2014), in which

energy conversion assets are modeled as real options (Dixit and Pindyck 1994, Trigeorgis 1996): It

generalizes to a setting with multiple energy storage and transport assets energy trading models

with one storage asset and no transport assets (Scott et al. 2000, Maragos 2002, Sinha et al. 2004,

Boogert and De Jong 2008, Lai et al. 2010, Secomandi 2010b, Thompson 2012, Wu et al. 2012,

Mazieres and Boogert 2013, Bauerle and Riess 2014, Jiang and Powell 2015, Nadarajah et al. 2015,

Zhou et al. 2015, Nadarajah et al. 2017) or one or more transport assets and no storage assets (Deng

et al. 2001, Secomandi 2010a, Secomandi and Wang 2012).

Bannister and Kaye (1991), Lohndorf and Minner (2010), Devalkar et al. (2011), Kim and Powell

(2011), Lai et al. (2011), Grillo et al. (2012), Arvesen et al. (2013), Denault et al. (2013), Nascimento

3

Page 4: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

and Powell (2013), Zhou et al. (2013), Jiang et al. (2014), Salas and Powell (2014), Moazeni et al.

(2015), and Powell and Miesel (2016) jointly optimize energy/commodity production/procurement

and storage assets (Arvesen et al. 2013 model linepack storage in a natural gas pipeline). In contrast

to ours, their models do not feature a network of energy storage and transport assets. Midthun

(2007) proposes a model of natural gas production, transportation, and storage. In this model,

storage occurs at dedicated facilities or via linepack by varying the natural gas pressure within

pipelines, but transport between markets is not allowed. Rømo et al. (2009) develop and apply

a network model of natural gas production and transportation, which, different from our model,

does not include storage. Merener et al. (2016) optimize the production, shipping, and storage of

agricultural commodities in a network with deterministic prices, whereas we model energy price

uncertainty (see Markland 1975, Markland and Newett 1976, Devalkar et al. 2011, Boyabatli et al.

2011, Kazaz and Webster 2011, and Boyabatli 2015 for other applications related to agricultural

commodities).

The piecewise linear concavity in the inventory levels of the exact value and continuation func-

tions of our model generalizes a known property for a single storage asset (Secomandi 2010b, Nasci-

mento and Powell 2013, van de Ven et al. 2013, Secomandi et al. 2015). Moreover, it is analogous

to a result of Salas and Powell (2014) for multiple storage assets.

The policy associated with RH in the case of only one storage asset without transport assets is

known as the rolling intrinsic policy (Gray and Khandelwal 2004). Our network extension of this

policy is new, but the StoragePLUS software (FEA 2015) includes a version of the rolling intrinsic

policy for a single storage asset with multiple transport assets. Secomandi (2010b, 2015), Lai et al.

(2010), and Wu et al. (2012) have documented the near optimality of the rolling intrinsic policy.

The persistence of this feature of our RH method that we observe on our network instances is novel.

Our ADP interpretation of RH resembles one of a related heuristic in Secomandi (2015).

The LSM approach is commonly applied to MDPs with operational (e.g., inventory) states that

are enumerated (Glasserman 2004, Chapter 8 and references therein, Cortazar et al. 2008, Nadara-

jah et al. 2017) or discretized based on a grid (Boogert and De Jong 2008, Arvesen et al. 2013),

either optimally (Nadarajah and Secomandi 2015, Nadarajah et al. 2017) or using specialized pro-

cedures (Carmona and Ludkovski 2010, Denault et al. 2013, Bauerle and Riess 2014). The iterative

approach that underlies ILSMN is thus novel for the LSM literature, but resembles approximate

policy iteration methods (see, e.g., Powell 2011, §10.5). Further, LSM applications, as the ones

in the works just cited, typically consider MDPs with low dimensional action spaces that enable

action optimization by enumeration. Because our MDP features a large action space, ILSMN and

4

Page 5: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

our LSML version instead solve linear programs to make decisions. In particular, although our

inventory-specific CFAs and our VFAs extend the C/VFAs of Nadarajah et al. (2017), which are

themselves related to the CFAs of Boogert and De Jong (2008), the specific embedding of these

functions in the ILSMN optimization model appears novel for the LSM literature. However, the use

of linear programming is common in the ADP literature (Powell 2011, §13.3), for instance in the

context of energy storage (e.g., Nascimento and Powell 2013, Salas and Powell 2014).

Pereira and Pinto (1991), Scott et al. (2000), Lohndorf et al. (2013), Asamov and Powell (2015),

and references therein also compute CFAs for energy storage models. They use stochastic dual

dynamic programming methods that require either stagewise independence for the evolution of the

stochastic part of the state or low dimensional models thereof. In contrast, our approach is not

restricted to these cases.

Nadarajah et al. (2015) includes a comparison of LSMN and the rolling intrinsic policy for the

single storage asset case. In the context of wind energy production and storage, Salas and Powell

(2014) compare a CFA-based ADP heuristic and a model predictive control heuristic (Camacho and

Bordons 2007), which is a technique analogous to RH. Our numerical study broadens this line of

investigation by considering ILSMN and RH in a novel energy storage setting.

Secomandi (2015) proposes estimating dual bounds on the value of a single storage asset by

formulating the dual optimizations as linear programs. We generalize this approach to our network

case. This author sets the dual penalties using the exact value function of a relaxed version of the

single-storage MDP that is available in essentially closed form and is linear in inventory. Instead,

for this purpose we employ VFAs that are linear in inventory, which we estimate numerically using

our LSML version. Our research confirms the near optimal performance of “linear” penalties for

the estimation of dual bounds (see also Brown and Smith 2014).

3. Model

In this section we formulate our MDP. A merchant owns given energy transport and storage assets.

For example, in §9 these assets represent contracts on the capacity of natural gas pipelines and

storage facilities. We model the locations corresponding to the storage assets and the wholesale

markets connected by the transport assets as nodes on a network. We include these locations

and markets in sets L and M, respectively, with corresponding cardinalities denoted as L and M .

Storage assets and markets may be colocated. The node set of the network is LYM.

The storage and transport assets allow the merchant to trade energy across different markets

and dates. We define a trade as a unique path (sequence of nodes) in the network. Our model

5

Page 6: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

is formulated based on the set of trades J . We assume it is feasible to enumerate all the trades

supported by the merchant storage and transport assets. Otherwise, we could equivalently formulate

our model using the node-to-node flows as modeling objects.

Trades can be performed at each of I times. The i-th trading time is Ti with i an element of set

I :“ t0, 1, . . . , I ´ 1u. The amount of energy transacted at time Ti under trade j P J is xi,j . The

operational execution of this transaction occurs in between dates Ti and Ti`1. The time Ti vector

of trade amounts is xi :“ pxi,j , j P J q.

We model capacity constraints by imposing limits on the maximal amount of energy that can

be received or delivered by the transport assets at a market or that can be added or removed from a

storage asset during a single time period, that is, the time period elapsed in between two successive

trading times. The receipt and delivery capacities, respectively, of the transport assets at market

m are CRm and CD

m, where R is for receipt and D is for delivery. The inventory increase and decrease

capacities, respectively, of storage asset l are C`l and C´l ; here the ` and ´ signs denote inventory

addition and removal, respectively.

The subsets of the set of trades J with paths that include node m as a receipt point and

a delivery point, respectively, are J Rpmq and J Dpmq. We denote by yi,l the stage i (energy)

inventory level of storage asset l. The vector of such inventory levels is yi :“ pyi,l, l P Lq. The

maximal allowed inventory level of the l-th storage asset is yl. The set of feasible inventory level

vectors is Y :“Ś

lPLr0, yls. The sets of trades that respectively add and remove energy to and from

storage asset l are J `plq and J ´plq. Given the vector of inventory levels yi P Y, the vector of trade

amounts xi is feasible if it satisfies the following constraints:

ÿ

jPJRpmq

xi,j ď CRm,@m PM, (1)

ÿ

jPJDpmq

xi,j ď CDm,@m PM, (2)

ÿ

jPJ`plqxi,j ď C`l ,@l P L, (3)

ÿ

jPJ´plqxi,j ď C´l ,@l P L, (4)

ÿ

jPJ`plqxi,j ď yl ´ yi,l,@l P L, (5)

ÿ

jPJ´plqxi,j ď yi,l,@l P L, (6)

xi,j ě 0,@j P J . (7)

Constraints (1) and (2) restrict the received and delivered energy at each market to be less than

6

Page 7: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

the receipt and delivery capacities, respectively. Constraints (3) and (4) limit the energy addi-

tion and removal, respectively, at each storage asset to be smaller than its inventory increase and

decrease capacities. Constraints (5) and (6) ensure that these energy amounts do not exceed the

available space and inventory, respectively, available at each storage asset. Constraints (7) enforce

nonnegativity of the trade amounts. We denote as X pyiq the set of feasible trade amounts for yi P Y.

A futures market is available at each market m. The price at time Ti of a futures with maturity

on date Ti1 , with i and i1 P I and i1 ě i, and delivery at market m is Fi,i1,m P R`. The forward

curve for this market at this time is Fi,m :“ pFi,i1,m, i1 P I, i1 ě iq. The time Ti spot price of market

m is si,m ” Fi,i,m. The array of forward curves and the vector of spot prices across all markets at

time Ti are Fi :“ pFi,m,m PMq and si :“ psi,m,m PMq, respectively. We define FI :“ 0.

We use the set I as the stage set of our MDP. A state in stage i is the pair pyi, Fiq. Executing

the vector of trade amounts xi changes the inventory level of each storage asset l from yi,l at time

Ti to yi,l`ř

jPJ`plq xi,j ´ř

jPJ´plq xi,j at time Ti`1. Following Lai et al. (2010) and Secomandi and

Wang (2012), we monetize the execution of this trade amount vector using the stage i spot prices

vector using the reward function

rpxi, siq :“ÿ

jPJ

α1j psiq ` α

2j

xi,j ; (8)

here (i) the term α1j psiq includes the per-unit cost incurred or revenue earned from respectively

buying or selling energy on the spot markets and the monetization at the prices traded in these

markets of any energy losses that may arise from transportation/storage inefficiencies for each

trade j and (ii) the term α2j captures the marginal transportation/storage costs for each such trade

(Online Appendix B.2 specifies these terms for the natural gas application considered in §9). A

known stochastic process governs the evolution of the array of forward curves Fi from each stage i

to the next one. We assume this process to be both Markovian and unaffected by the execution of

the trade amounts (that is, the merchant is a small player and, hence, a price taker). We present

such a model in §9.1.

A policy π is the collection of decision rules tXπi , i P Iu, where Xπ

i : pyi, Fiq Ñ X pyiq for each

pi, yi, Fiq P I ˆ Y ˆ RM ¨pI´iq` . The set of all feasible policies is Π. We use risk neutral valuation of

cash flows (see, e.g., Secomandi and Seppi 2014, Ch. 3). We denote by E expectation under the

corresponding risk-neutral probability measure for the evolution of the array of forward curves. This

measure is unique when the commodity market is complete (see, e.g., Secomandi and Seppi 2014, Ch.

3). This assumption is common in the energy real option and merchant operations literature (see,

e.g., Smith and McCardle 1999, Smith 2005, Secomandi and Seppi 2014, and references therein).

7

Page 8: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

We also make it here. For simplicity, the constant δ is the risk-free discount factor from each time

Ti back to time Ti´1, i P Izt0u. The initial state at time T0 is py0, F0q. Our optimization model is

maxπPΠ

ÿ

iPIδiE rrpXπ

i pyπi , Fiq, siq|y0, F0s , (9)

where yπi is the random inventory level reached in stage i when using policy π.

4. Structural Analysis

In this section we analyze our MDP. In §4.1 we discuss the interplay between the optimization of

the storage and transport decisions, highlighting the need to jointly optimize them. In §4.2 we

establish the structure of the MDP value and continuation functions and explain how they would

lead, if known, to the efficient computation of optimal decisions.

4.1 Interplay between Storage and Transport Decisions

In our MDP the storage and transport decisions compete for the receipt and delivery capacities

of the network nodes. Intuitively, there is thus potential substitution between these optimized

choices. Let ΠS and ΠT be the subsets of the set of feasible policies Π that allow only storage-

based and transport-alone trades, respectively (a storage-based trade can also include a transport

trade). Proposition 4.1 relates the optimal objective function value of model (9), V0py0, F0q, and the

analogous values with the restrictions π P ΠS and π P ΠT, V S0 py0, F0q and V T

0 py0, F0q, respectively.

Proposition 4.1. It holds that maxtV S0 py0, F0q, V

T0 py0, F0qu ď V0py0, F0q ď V S

0 py0, F0q`VT

0 py0, F0q.

The second inequality in Proposition 4.1 is consistent with the definition of substitutes in Topkis

(1998, §2.6.1). There is substitution between optimized storage and transport decisions only when

this inequality is strict; that is, otherwise combining the optimal storage-based and transport-alone

policies gives an optimal policy. When the first inequality in this proposition holds as an equality

there is maximal substitution between the storage and transport decisions taken by the jointly

optimized policy: These choices exclude each other and this policy corresponds to the best of the

optimal storage-based and transport-alone policies. No, partial, and maximal substitution can all

arise. However, our numerical analysis conducted in §9 suggests that partial substitution occurs in

our natural gas application. In this case the optimization in model (9) cannot be simplified by using

either the best of or both the two separately optimized storage-based and transport-alone policies.

Further, in general it is impossible to combine these two policies to obtain an optimal policy, e.g.,

by executing first the decisions prescribed by the former policy and then the ones dictated by the

latter policy that remain feasible, or vice versa.

8

Page 9: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

4.2 Value and Continuation Functions and Computing Optimal Decisions

In theory an optimal policy for our MDP could be found by solving a stochastic dynamic program

(SDP), which we now formulate. We define the inventory change from time Ti to time Ti`1 for

storage asset l corresponding to the vector of feasible trades xi as ∆yi,lpxiq :“ř

jPJ`plq xi,j ´ř

jPJ´plq xi,j and the vector of such changes as ∆yipxiq :“ p∆yi,lpxiq, l P Lq. The stated SDP, for

each pi, yi, Fiq P I ˆ Y ˆ RM ¨pI´iq` , is

Vipyi, Fiq “ maxxiPX pyiq

rpxi, siq `Wipyi `∆yipxiq, Fiq, (10)

Wip¨, Fiq :“ δE rVi`1 p¨, Fi`1q |Fis , (11)

with Vip¨, ¨q and Wip¨, ¨q the value and continuation functions in stage i and boundary conditions

VIpyI , FIq :“ 0 for yI P Y. Proposition 4.2 characterizes the behavior of these functions in the

inventory levels of the storage assets.

Proposition 4.2. For each given pi, Fiq P I ˆ RM ¨pI´iq` , the functions Vip¨, Fiq and Wip¨, Fiq are

piecewise linear concave on set Y.

If the continuation function were known, this result implies that finding an optimal solution to

the maximization on the right hand side of (10) would involve formulating and solving a finite

dimensional linear program (see, e.g., Bertsimas and Tsitsiklis 1997, §1.3). However, computing

this function is intractable because of (i) the high-dimensional state space of SDP (10)-(11) and (ii)

the inability to compute the expectation in (11). We thus develop both heuristics for model (9) and

bounds on its optimal policy value.

5. RH

In this section we describe the RH method. This heuristic makes decisions at each stage and stage

by optimizing a tractable linear program in lieu of the intractable linear programming equivalent

reformulation of the maximization in (10) (see the discussion toward the end of §4.2). At stage i and

state pyi, Fiq, RH solves the deterministic version of model (9) formulated from this stage forward

starting with inventory vector yi and each random spot price vector si1 , with i1 ě i, replaced with

the vector of time Ti forward prices Fi,i1 :“ pFi,i1,m,m PMq. This model is the linear program

maxtpxi1 ,y

1i1q,i1PI,i1ěiu

ÿ

i1PI,i1ěiδi1´irpxi1 , Fi,i1q (12)

s.t. y1i “ yi, (13)

y1i1 “ y1i1´1 `∆y1i1´1pxi1´1q,@i1 P I, i1 ą i, (14)

9

Page 10: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

xi1 P X py1i1q,@i1 P I, i1 ě i. (15)

The decision variables of this linear program are the pairs of trade amount and inventory level

vectors pxi1 , y1i1q for each stage i1 through I ´ 1. The objective function (12) is the sum of the

discounted rewards collected from stages i through I ´ 1, with the stage i1 cash flows monetized

using the vector of futures prices Fi,i1 . Constraint (13) sets the stage i inventory level decision

variable vector to the inventory vector yi. Constraints (14) model the dynamics of the vector of

inventory levels. Constraints (15) ensure feasibility of the vectors of trade amounts.

At each stage i and state pyi, Fiq, the RH policy executes the trade amounts that correspond to

the values of the decision variables for stage i in an optimal solution to model (12)-(15), that is, the

optimal trading choices pertaining to stage i alone. The value of this policy can be estimated by

executing it within a Monte Carlo simulation of the array of forward curves and resulting vector of

inventory levels starting from the initial state in stage 0 through the last stage I ´ 1. Specifically,

we generate a set of sample paths of arrays of forwards curves for times T1 through TI´1. For each

such sample path, we execute the trade amounts obtained by solving (12)-(15) at stage 0 and state

py0, F0q. The next state results from performing these decisions and observing the time T1 array

of forward curves for this sample path. The linear program (12)-(15) is re-solved at this stage and

state. This reoptimization and decision-making process repeats until the last stage is reached. We

discount to stage 0 and cumulate the cash flows obtained at each stage and visited state. Finally,

we average these discounted cash flow sums across all the sample paths to obtain an unbiased lower

bound estimate on the value of an optimal policy. In practice, implementing the RH policy does not

require assuming a model of the evolution of the array of forward curves, because observed futures

prices can be used instead.

6. ILSMN

In this section we discuss ILSMN. The ILSMN policy makes decisions in each stage and state

by solving a tractable linear program that resembles the intractable maximization in (10). The

tractability of the former model stems from the use of a low-dimensional CFA in lieu of the exact

continuation function in the latter model. At each stage, this CFA is a convex combination of

inventory-specific CFAs defined on a set of reference inventory vectors. ILSMN estimates these

CFAs based on Monte Carlo simulation of the array of forward curves and least squares regression,

as is typical for LSM methods, as well as linear programming, which is atypical for LSM techniques.

ILSMN employs this optimization technique to obtain the sets of reference inventory vectors that

support each such CFA in each stage. This process is iterative and relies on the reference inventory

10

Page 11: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

vectors and inventory-specific CFAs obtained in the prior iteration and the linear program used

to make decisions, until the final ones are obtained. The estimation of the value of the ILSMN

policy in the initial stage and state is analogous to the estimation of the value of the RH policy in

this stage and state. We define the inventory-specific CFAs in §6.1, introduce the CFA and linear

program used to make decisions, as well as generate reference inventory vectors, in §6.2, present the

generation of reference inventory vectors and estimation of the inventory-specific CFAs in §6.3, and

discuss some algorithmic aspects in §6.4.

6.1 Inventory-Specific CFAs

At each stage the inventory-specific CFAs are linear combinations of given basis functions of the

arrays of forward curves, e.g., polynomials of futures prices, with weights that depend on the

stage, the basis function, and the reference inventory vector. Specifically, fix a stage i ‰ I ´ 1.

Here we assume given the reference inventory vectors for stage i ` 1. Their number is Qi`1.

We define Qi`1 :“ t1, 2, . . . , Qi`1u. We denote as yqi`1 the q-th reference inventory vector, with

q P Qi`1, and define the set of such vectors as pYi`1 :“

yqi`1, q P Qi`1

(

. The b-th basis function

is φi,b : RMpI´iq Ñ R. There are Bi such functions and we define Bi :“ t1, 2, . . . , Biu. The weight

associated with the b-th basis function and the q-th reference inventory vector in set pYi`1 is γi,b,q.

The corresponding inventory-specific CFA isř

bPBiφi,bpFiqγi,b,q.

6.2 CFA and Making Decisions

The ILSMN policy makes decisions by solving a linear program that relies on a CFA expressed as

a convex combination of the inventory-specific CFAs. Specifically, at stage i and state pyi, Fiq the

decision variables of this model are the trading decision vector xi and each weight θi`1,q in this

combination given to the inventory-specific CFA corresponding to the reference inventory vector

yqi`1. This linear program is

maxpxi,θi`1q

rpxi, siq `ÿ

qPQi`1

«

ÿ

bPBi

φi,bpFiqγi,b,q

ff

θi`1,q (16)

s.t.ÿ

qPQi`1

yqi`1,lθi`1,q “ yi,l `∆yi,lpxiq,@l P L, (17)

ÿ

qPQi`1

θi`1,q “ 1, (18)

θi`1,q ě 0,@q P Qi`1, (19)

xi P X pyiq. (20)

The objective function (16) is analogous to the one of the maximization in (10) but with the contin-

uation function replaced by the stated CFA. Constraints (17) ensure that the convex combination

11

Page 12: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

of the l-th elements yqi`1,l’s of each reference inventory vector in set pYi`1 taken using the weights

θi`1,q’s equals the stage i`1 inventory vector obtained by applying the trading decision vector xi to

the inventory vector yi. Constraints (18)-(19) specify the convexity of both this linear combination

and the one in the objective function. Constraints (20) enforce feasibility of the trading decisions.

6.3 Generating the Reference Inventory Vectors and Estimating the Inventory-Specific CFAs

ILSMN generates the reference inventory vectors and estimates its inventory-specific CFAs itera-

tively: It alternates between (i) the estimation of inventory-specific CFAs using an LSMN approach

based on the sets of reference inventory vectors obtained in the previous iteration and (ii) the up-

dating of these sets given the inventory-specific CFAs determined in the prior iteration. A common

input to both LSMN and ILSMN is the set tF hi , pi, hq P I ˆ Hu, with H :“ t1, 2, . . . ,Hu, of H

paths of arrays of forward curves sampled for stages 1 through I ´ 1 using Monte Carlo simulation

starting from the known stage 0 array of forward curves F0. We present LSMN and ILSMN in this

order.

Algorithm 1: LSMN

inputs : (i) Set of basis functions tφi,b, pi, bq P I ˆ Biu.(ii) Set of arrays of forward curves samples tF hi , pi, hq P I ˆHu.(iii) Set of reference inventory vector sets t pYi, i P Izt0uu and corresponding

set of index sets tQi, i P Izt0uu.

initialization: Define γI´1 as a vector of zeros.

for i “ I ´ 2 to 0 dofor q P Qi`1 do

(i) for h P H doSolve a version of the linear program (16)-(20) formulated at stage i` 1 and statepyqi`1, F

hi`1q using the inventory-specific CFA weight vector γi`1 and use its

optimal objective function value discounted by δ as the inventory-specificcontinuation function estimate wi`1py

qi`1, F

hi`1q.

(ii) Perform a least squares regression on the set of inventory-specific continuationfunction estimates twi`1py

qi`1, F

hi`1q, h P Hu to determine the inventory-specific CFA

weight vector γi.

output : Set of inventory-specific CFA weight vectors tγi, i P Izt0uu.

Algorithm 1 summarizes LSMN. The inputs to LSMN are the set of basis functions; the set of

arrays of forward curves sample paths; and the set of (assumed known) reference inventory vector

sets and corresponding set of index sets. For expositional convenience, we define the vector of stage

i inventory-specific CFA weights γi :“ pγi,b,q, pb, qq P Bi ˆQi`1q. After setting to 0 the elements of

12

Page 13: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

vector γI´1, LSMN performs the following steps for each stage i from stage I ´ 2 back to stage 0

and each reference inventory vector index in set Qi`1:

• In step (i) it solves a version of the linear program (16)-(20) formulated in stage i ` 1 and

state pyqi`1, Fhi`1q based on the known stage i`1 inventory-specific CFA weight vector γi`1 and

uses its optimal objective function value discounted by δ as the inventory-specific continuation

function estimate wi`1pyqi`1, F

hi`1q.

• In step (ii) it executes a least squares regression of the latter estimates to determine the stage

i inventory-specific CFA weight vector γi.

LSMN gives as output the set of inventory-specific CFA weight vectors {γi, i P Iu.

Algorithm 2: ILSMN

inputs : (i) Number of iterations N .(ii) Number of array of forward curve sample paths used for inventory vector

generation H.(iii) Set of basis functions tφi,b, pi, bq P I ˆ Biuu.(iv) Set of arrays of forward curves sample paths tF hi , pi, hq P I ˆHu(v) Set of sets of reference vectors t pY0

i , i P Izt0uu and corresponding set ofindex sets tQ0

i , i P Izt0uu.

initialization: Obtain the set of inventory-specific CFA weight vectors tγ0i , i P Iu by

using Algorithm 1 with tφi,b, pi, bq P I ˆ Biu, tF hi , pi, hq P I ˆHu,t pY0

i , i P Izt0uu, and tQ0i , i P Izt0uu as inputs.

for n “ 1 to N do

(i) Define pYni :“ pYn´1i and Qni :“ Qn´1

i for each stage i P I.(ii) Sample uniformly at random a subset H of H unique indices from set H and defineyh0 :“ y0 for each h P H.(iii) for i “ 1 to I ´ 1 do

for h P H do(a) Let xhi´1 be an optimal trading vector of a version of linear program (16)-(20)

formulated at stage i´ 1 and state pyhi´1, Fhi´1q using γn´1

i´1 , and pYni , and Qni .

(b) Define yhi :“ yhi´1 `∆yi´1pxhi´1q and if yhi R

pYni include it in set pYni and updateset Qni accordingly.

(iv) Compute the set of inventory-specific CFA weight vectors tγni , i P Iu by calling

Algorithm 1 using tφi,b, pi, bq P I ˆ Biu, tF hi , pi, hq P I ˆHu, t pYni , i P Izt0uu, andtQni , i P Izt0uu as inputs.

output : Set of inventory-specific CFA weight vectors tγNi , i P Izt0uu.

Algorithm 2 outlines ILSMN. The inputs to this method are the number of iterations N , the

number of sampled arrays of forward curves H that we use for generating reference inventory

13

Page 14: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

vectors at each iteration, the same sets of basis functions and sample paths of arrays of forward

curves used by LSMN, and the initial set of sets of reference inventory vectors t pY0i , i P Izt0uu

and its corresponding set of index sets tQ0i , i P Izt0uu. To construct each set pY0

i we use the

minimal and maximal feasible inventory levels of each storage asset l that can be reached in stage

i when starting from the initial inventory vector y0 in stage 0, which we respectively define as

yi,l

:“ maxt0, y0 ´ iC´l u and yi,l :“ mintyl, y0 ` iC

`l , pI ´ iqC

´l u. We include in each such set both

the two vectors corresponding to these inventory levels and the 2L inventory vectors with the l-th

element equal to yi,l

(respectively, yi,l) and each other element l1 equal to yi,l1 (respectively, yi,l1

).

Each index set Q0i thus includes 2pL` 1q elements. ILSMN uses Algorithm 1 to obtain the sets of

inventory-specific CFA weight vectors tγ0i , i P I}. For each iteration n this method executes the

following steps:

• In step (i) it makes the sets of reference inventory vectors for the current iteration equal to

the analogous sets obtained at the end of the previous iteration.

• In step (ii) it samples uniformly at random a subset of H different elements from the set H

of indices of the arrays of forward curves sample paths and includes them in set H, to reduce

the computational burden of the next two steps, and defines the stage 0 inventory vector for

each such chosen sample path index h as yh0 :“ y0.

• In step (iii) it obtains a trajectory of feasible inventory vectors from stages 1 through I ´ 1

starting from yh0 for each array of forward curve sample path index h P H and updates the

current iteration sets of reference inventory vectors accordingly. Specifically,

– in step (iii)(a) it lets the vector of trade amounts xhi´1 be an optimal solution to the

linear program (16)-(20) formulated at stage i´ 1 and state pyhi´1, Fhi´1q using the stage

i ´ 1 inventory-specific CFA weight vector obtained in the previous iteration γn´1i´1 , as

well as the stage i sets of reference vectors and their indices pYni and Qni ;

– in step (iii)(b) it determines the stage i inventory vector yhi by applying the vector of

trade amounts xhi´1 to the inventory vector yhi´1 and if it is not an element of the set of

reference inventory vectors pYni it adds it to this set and updates its corresponding set

of indices Qni by adding to it an index for the inventory vector yhi equal to the largest

element of Qni increased by one.

• In step (iv) it obtains the set of inventory-specific CFA weight vectors tγni , i P Iu as the output

of Algorithm 1 called with the set of basis functions tφi,b, pi, bq P I ˆ Biu, the set of sample

14

Page 15: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

paths of the arrays of forward curves tF hi , pi, hq P I ˆ Hu, and the updated sets of sets of

reference inventory vectors and their indices t pYni , i P Izt0uu and tQni , i P Izt0uu as inputs.

ILSMN returns the set of inventory-specific CFA weight vectors tγNi , i P Iu.

6.4 Algorithmic Considerations

For simplicity of exposition, ILSMN uses a fixed number of iterations. We can replace this

termination criterion with one that halts this method when the estimated value of the policy that

corresponds to the set of inventory-specific CFA weight vectors obtained at a given iteration out-

performs the analogous policy obtained at the previous iteration by less than a predefined amount.

We employ this alternative stopping rule in our numerical investigation performed in §9.

We can speed up the execution of each iteration of ILSMN by modifying its step (iii)(b) so

that a new inventory vector is added to the current set of reference inventory vectors provided that

its distance from this set, according to some metric, such as the Euclidean one, exceeds a given

threshold. We use this approach in our numerical analysis presented in §9.

The definition of the initial sets of reference inventory vectors directly affects the next stage

inventory vectors considered by the ILSMN policy when making decisions at a given stage and

state, that is, once ILSMN has been executed, because these inventory vectors form the convex hull

of the initial set of reference inventory vectors for the next stage. Our choice of these initial sets is

tractable because the cardinality of each of these sets scales linearly in the number of storage assets

L; recall that it is 2pL ` 1q. However, the stated convex hull for a given stage coincides with the

set of inventory vectors that can be reached at this stage only for three or fewer storage assets, as

in our numerical study discussed in §9. With four or more storage assets the ILSMN policy thus

considers a strict subset of these inventory vectors when making decisions; that is, it ignores some

feasible trade amounts. Addressing this issue by including in each initial set of reference inventory

vectors all the vertices of the set of inventory vectors that are reachable in each stage is intractable

when the number of storage assets is large, because the number of such vertices grows exponentially

in this number. Tractability can be maintained in this case by (i) using modified inventory-specific

CFAs that are separable across small subsets of the set of storage assets that form a cover of this

set and (ii) applying to each such subset our proposed approach to define the initial set of reference

inventory vectors.

7. Discussion of Heuristics

In this section we relate the RH, ILSMN, and optimal policies and contrast RH and ILSMN.

As is typical in the ADP literature (Powell 2011, Bertsekas 2012), one can obtain a feasible

15

Page 16: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

vector of trade amounts for the maximization in (10) by (i) replacing the unknown continuation

function Wi in this model with a tractable CFA xWi and (ii) solving the resulting optimization.

An optimal solution to this model is greedy with respect to the CFA xWi. Proposition 7.1 states

that the RH and ILSMN policies make greedy decisions with respect to particular CFAs that share

the piecewise linear concavity of the continuation function Wi presented in Proposition 4.2. We

denote these CFAs as xWRHi for RH and xWLSM

i for ILSMN. We define xWRHi pyi`1, Fiq as the optimal

objective function of the linear program (12)-(15) formulated at stage i` 1 with yi`1 in lieu of yi.

The optimal objective function of the part of the linear program (16)-(19) expressed with xi and

rpxi, siq omitted and yi,l`∆yi,lpxiq, γi,b,q, and Qi`1 replaced by yi`1,l, γNi,b,q, and QNi`1, respectively,

defines xWLSMi pyi`1, Fiq.

Proposition 7.1. The RH and ILSMN policies make greedy decisions with respect to the CFAs

xWRHi and xWLSM

i . Moreover, xWRHi p¨, Fiq and xWLSM

i p¨, Fiq are piecewise linear concave functions,

and xWLSMi p¨, Fiq is the concave envelope of the set of points tpyqi`1,

ř

bPBiφi,bpFiqγ

Ni,q,bq, q P QNi`1u.

This result implies that the RH and ILSMN policies belong to a family of policies that includes an

optimal policy for our MDP.

RH and ILSMN differ in how they use the model that describes the evolution of the array of

forward curves to obtain their respective CFAs: RH ignores it and ILSMN does not. Thus, whereas

the RH policy accounts for price uncertainty only via the updated state when making decisions,

that is, reactively, the ILSMN policy also uses a CFA that is specified by considering this source of

randomness. This reactive nature of the RH policy may seem to put it at a disadvantage with respect

to the ILSMN policy. However, it may be advantageous because it (i) insulates the RH policy from

any errors made when choosing and calibrating a price model in practice (Secomandi et al. 2015)

to which the ILSMN policy is instead exposed and (ii) avoids the selection of basis functions and

the potentially time consuming process of estimating the resulting CFA associated with ILSMN. In

other words, the RH policy is free of typically erroneous stochastic modeling assumptions, is simpler

to use, does not involve tunable parameters, and may have a computational edge compared to the

ILSMN policy.

8. Dual Bound

In this section we discuss how we estimate a dual bound on the optimal policy value of our MDP

that can be used to assess the suboptimality of heuristics for this model, such as RH and ILSMN.

This approach relies on combining Monte Carlo simulation of the arrays of forward curves and opti-

mization that uses hindsight knowledge of these arrays but imposes dual penalties on the availability

16

Page 17: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

of such information (see Brown et al. 2010 and references therein). In §8.1 we introduce the VFAs

that we then employ to obtain dual penalties following a standard approach (Brown et al. 2010; as

discussed by Nadarajah et al. 2017, using VFAs rather than CFAs for this purpose has substantial

computational advantage under a particular assumption, which we state in §8.2 and holds in our

numerical study reported in §9). In §8.2 we develop LSML to estimate these VFAs. In §8.3 we

discuss the estimation of our dual bound.

8.1 VFAs

We use VFAs that are linear in the inventory vector because they lead to linear dual optimization

models. Similar to the CFAs that underly both LSMN and ILSMN, our VFAs expressed for given

inventory vectors are convex combinations of inventory-specific VFAs. Different from these CFAs,

besides their linearity in the inventory vectors, these VFAs are separable across storage assets.

They thus rely on inventory-specific VFAs than depend on inventory levels rather than inventory

vectors. Each such VFA is a linear combination of basis functions of the array of forward curves. In

particular, for each storage asset l we consider two reference inventory levels in each stage i P Izt0u:

The smallest and largest inventory levels that can be reached in this stage, yi,l

and yi,l, respectively.

We employ the same basis functions used to specify our CFA but denote the weights of each basis

function φi,b for these two inventory levels as βi,b,l

and βi,b,l. Their corresponding inventory-specific

VFAs areř

bPBiφi,bpFiqβi,b,l and

ř

bPBiφi,bpFiqβi,b,l. Taking a convex combination of these two

inventory-specific VFAs subject to the constraint that the analogous convex combination of yi,l

and

yi,l equals the given inventory level yi,l yields

ÿ

bPBi

φi,bpFiq

«

βi,b,l

yi,l ´ βi,b,lyi,lyi,l ´ yi,l

`

˜

βi,b,l ´ βi,b,lyi,l ´ yi,l

¸

yi,l

ff

, (21)

which is a linear interpolation of the two given inventory-specific VFAs for the inventory level yi,l.

The assumed separability of our inventory-specific VFAs implies that the VFA for stage i and state

pyi, Fiq is the sum of these expressions across the storage assets. We define βi

and βi as the vectors

pβi,b,l

, b P Bi ˆ Lq and pβi,b,l, b P Bi ˆ Lq, respectively, and λi,bpyi, βi, βiq as the sum over the set of

storage assets L of the terms inside the squared brackets in (21). We thus express this VFA, which

is a linear and separable function of the inventory vector yi, as

ÿ

bPBi

φi,bpFiqλi,bpyi, βi, βiq. (22)

17

Page 18: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

8.2 Estimating the VFAs: LSML

We develop LSML to estimate the sets of inventory-specific VFA weight vectors tβi, i P Izt0uu and

tβi, i P Izt0uu. This method relies on the same set of arrays of forward curves sample paths indexed

by set H that both LSMN and ILSMN employ. In addition, at each stage other than the first one

LSML obtains value function estimates for a set of inventory vectors that we refer to as evaluation

inventory vectors.

Algorithm 3: LSML

inputs : (i) Set of basis functions tφi,b, pi, bq P I ˆ Biu.(ii) Set of arrays of forward curves samples tF hi , pi, hq P I ˆHu.(iii) Set of sets of reference vectors t qYi, i P Izt0uu.

initialization: (i) Define βI

and βI as vectors of zeros.

(ii) Construct the set of sets of evaluation inventory vectors t qYi, i P Izt0uu.

for i “ I ´ 1 to 1 do

(i) for pyi, hq P qYi ˆH doSolve the linear program (23) to obtain the inventory-specific value function estimatevipyi, F

hi q.

(ii) Perform a least squares regression on the set of inventory-specific value functionestimates tvipyi, F

hi q, pyi, hq P

qYi ˆHuu to determine the stage i inventory-specific VFAweight vectors β

iand βi.

output : Sets of inventory-specific VFA weight vectors tβi, i P Izt0uu and

tβi, i P Izt0uu.

Algorithm 3 gives a synopsis of LSML. The inputs of this method are the set of basis functions

and the set of arrays of forward curves sample paths. LSML defines the inventory-specific VFA

weight vectors βI

and βI as vectors of zeros and constructs the set of sets of evaluation inventory

vectors t qYi, i P Izt0uu. Each set qYi has L ` 1 vectors with elements that are consistent with the

inventory levels used to define the inventory-specific VFA in stage i: The L vectors with their l-th

element equal to the largest inventory level that the l-th storage asset can attain at stage i and each

of their other elements equal to the smallest inventory level that each of the other storage assets can

achieve at stage i, as well as the inventory vector with all its elements equal to their corresponding

latter inventory levels. LSML then executes the following steps for each stage I ´ 1 back to 1:

• In step (i) it obtains the inventory-specific value function estimate vipyi, Fhi , βi, βiq for each

evaluation inventory vector yi P qYi and array of forward curves sample path index h P H

based on the known stage i ` 1 inventory-specific VFA weight vectors βi`1

and βi`1. This

18

Page 19: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

estimate is the optimal objective function value of the linear program

maxxiPX pyiq

rpxi, shi q ` δ

ÿ

bPBi`1

Erφi`1,bpFi`1q|Fhi sλi,bpyi `∆yipxiq, βi`1

, βi`1q. (23)

This model corresponds to the maximization in (10) but with the continuation function Wi

replaced by the CFA obtained by taking the expectation of the VFA (22) expressed for stage

i ` 1 and state pyi `∆yipxiq, Fi`1q given the forward curve F hi and discounting it by δ. We

make Assumption 8.1 to ensure that this expectation is easy to evaluate.

Assumption 8.1. The expectation E rφi`1,bpFi`1q|Fis can be evaluated exactly for each stage

i, basis function φi`1,b, and forward curve Fi.

Common futures price evolution models, such as term structure models (see, e.g., Secomandi

and Seppi 2014, Chapter 4), and basis functions that are polynomials of futures prices and

call/put options that involve these prices (Nadarajah et al. 2017) satisfy this assumption. It

holds in our numerical study discussed in §9.

• In step (ii) it performs a least squares regression on these value function estimates to determine

the stage i inventory-specific VFA weight vectors βi

and βi.

The LSML output is the sets of inventory-specific VFA weight vectors tβi, i P Izt0uu and tβi, i P

Izt0uu.8.3 Estimating the Dual Bound

We estimate our dual bound based on the set of G Monte Carlo sample paths of arrays of forward

curves for stages 0 through I ´ 1, each beginning from the known array of forward curves F0,

tF gi , pi, gq P I ˆ Gu, with G :“ t1, . . . , Gu. The dual penalty corresponding to reaching stage i ` 1

from stage i with inventory vector yi`1 for the g-th sample path in this set is zipyi`1, Fgi , F

gi`1q.

This quantity penalizes knowledge in stage i of the stage i ` 1 array of forward curves F gi`1. We

apply the “good” penalty approach (Brown et al. 2010, §2.3) based on the VFA (22) and obtain

zipyi`1, Fgi , F

gi`1q “ δ

ÿ

bPBi`1

φi`1,bpFgi`1q ´ E rφi`1,bpFi`1q|F

gi s(

λi`1pyi`1, βi`1, βi`1q, (24)

which is linear in the inventory vector yi`1 because of the linearity of λi`1p¨, βi`1, βi`1q. Assump-

tion 8.1 allows exact evaluation of this penalty. For each array of forward curves sample path

indexed by g P G we solve the dual linear program

maxtpxi,y1iq,iPIu

ÿ

iPIδi“

rpxi, sgi q ´ zi

`

y1i `∆y1ipsiq, Fgi , F

gi`1

˘‰

19

Page 20: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

s.t. y10 “ y0,

y1i “ y1i´1 `∆y1i´1pxi´1q,@i P Izt0u,

xi P X py1iq,@i P I,

which is the linear program (12)-(15) formulated at stage 0 and state py0, F0q by using the array

of forward curves sample path tF gi , i P Iu and subtracting the dual penalty (24) from each term

in its objective function. The average of the optimal objective function values of the G dual linear

programs so obtained is an unbiased dual bound estimate; that is, an unbiased estimate of an upper

bound on the optimal policy value of our MDP.

9. Numerical Study

In this section we conduct a numerical investigation of our methods in a natural gas setting. In §9.1

we describe our application. In §9.2 we discuss our results.

9.1 Natural Gas Application

Natural gas served more than one quarter of the 2012 energy consumption in the United States (EIA

2013). The availability and importance of this energy source has been growing with the ongoing

shale boom (Smith 2013). It has been projected that natural gas consumption in North America

will increase by 18% between 2008 and 2030 and be accompanied by a need for 130-210 billion

US dollars worth of natural gas infrastructure, of which eighty percent is for building new natural

gas pipeline systems (INGAA 2009). In our application, natural gas transport and storage assets

are contracts that give merchant companies access to portions of the capacity of interconnected

pipelines and storage facilities. Specifically, merchants own the natural gas that pipeline companies

transport or store on their account. This contractual system describes the status quo of the natural

gas industry in the Unites States. In particular, we focus on firm contracts that give merchants

guaranteed access to natural gas storage and transport capacity (Sturm 1997).

We consider a realistic network of natural gas transport and storage assets that we created in

conjunction with an energy trading company. This network is made up of eight markets and three

storage facilities corresponding to parts of the Texas Eastern Transmission Company (TETCO),

Transcontinental Gas Pipeline Company (TRANSCO), and Algonquin Gas Transmission (AGT)

systems. Figure 1 illustrates it. Three markets correspond to TRANSCO zones 3, 4, and 6, which

we label TR3, TR4, and TR6; four markets to TETCO zones 1 through 3 and East Louisiana, which

we label TE1, TE2, TE3, and TEELA; and one market to AGT. The Washington and Eminence

storage facilities (WS and ES) are connected to zones TR3 and TR4, respectively. The Bobcat

storage facility (BS) is located at an interconnect station (IS) between TR3 and TEELA. An edge

20

Page 21: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

TR3

TR4

TR6TE1

TE2

TE3TEELA

TRANSCO TETCO

0

BS

WS

ES

AGT

IS

Figure 1: Full network of storage and transport assets used in our numerical study.

that links two nodes in this network indicates the possibility of transporting natural gas between

these nodes (in both directions). A sequence of nodes connected by such edges defines the path of a

possible energy trade. There are 64 transport trades and 48 storage trades in the network displayed

in Figure 1. The merchant storage and transport assets are firm contracts for parts of the capacity

of the pipeline sections and storage facilities in this network. The values for the capacities of these

assets are in Online Appendix B.1.

The cash flow of a trade includes the cost of purchasing or revenue from selling energy at the

prevailing spot price of the market where energy is transacted and two types of variable costs:

Marginal costs and in-kind losses. The merchant incurs a marginal cost on each unit of energy

transported, injected, or withdrawn. In-kind losses model the use of energy to fuel the transport

of energy between nodes or the injection or withdrawal of energy into or out of storage, as well as

inefficiencies when transporting energy or modifying the energy inventory. For example, compressor

stations create pressure differentials between natural gas pipeline segments, enabling the transport

of natural gas. Natural gas storage injections and withdrawals are also based on pressure differentials

obtained by the use of pumps. Merchants pay the pipeline company in kind for the fuel used by

the compressors and pumps. The term α1j psiq in the reward function (8) comprises in kind losses,

costs from purchases, and revenues from sales, whereas the term α2j in this function captures the

marginal costs. Online Appendix B.2 provides detailed expressions for these terms.

We use a time horizon equal to thirteen months each subdivided into four weekly periods so

that the number of stages I is fifty-two. We model the risk-neutral dynamics of the array of

forward curves using a multi-market version of a multifactor term structure model that is common

in both the merchant energy trading literature and practice (Cortazar and Schwartz 1994, Clewlow

21

Page 22: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

and Strickland 2000, Blanco et al. 2002, Secomandi and Seppi 2014, Chapter 4, Secomandi et al.

2015, and references therein). We denote by K the number of stochastic factors in this model. We

associate the standard normal random variable Zk to factor k. These random variables are mutually

independent. Given stages i1 and i with i1 ą i, we denote by σk,i,i1,m the loading coefficient within

time interval rTi, Ti`1q on the k-th factor for the price of the futures contract with maturity at time

Ti1 for market m. For each stage i P IztI´1u, later stage i1 P ti`1, . . . , I´1u, and market m PM,

our price model expressed in a form suitable for Monte Carlo simulation is

Fi`1,i1,m “ Fi,i1,m exp

«

´1

2pTi`1 ´ Tiq

Kÿ

k“1

σ2k,i,i1,m `

a

Ti`1 ´ Ti

Kÿ

k“1

σk,i,i1,mZk

ff

. (25)

This model captures the seasonality in futures price levels via the initial (time T0) array of forward

curves, and the seasonalities in the price changes through the dependence of the loading factors on

the trading time Ti. Futures price changes are correlated because they are functions of common

factors. We calibrate this model using New York Mercantile Exchange (NYMEX) closing prices

for natural gas futures and basis swaps observed from June 2011 to August 2012. The physical

delivery location for NYMEX natural gas futures contracts is Henry Hub, Louisiana. The basis

swaps are contracts on price differences between Henry Hub and other locations. Online Appendix

B.3 provides the details of our calibration.

Our numerical study deals with four networks that include the eight markets shown in Figure

1 and the following storage assets, respectively: (i) BS; (ii) BS and WS; (iii) BS and ES; and (iv)

BS, WS, and ES. (BS is required in each network to connect markets in TRANSCO to the ones in

TETCO and AGT.) For each such network, we have twelve instances that correspond to choosing

a valuation date that coincides with the first trading date of each month between June 2011 and

May 2012. We set the discount factor for each of these instances based on the following one year

United States Treasury rates associated with their respective valuation dates: 0.18%, 0.20%, 0.22%,

0.10%, 0.12%, 0.13%, 0.12%, 0.12%, 0.13%, 0.18%, 0.18%, and 0.19%. We use prices of NYMEX

futures and basis swaps observed on the valuation dates of our instances to specify their respective

initial arrays of forward curves. Because these contracts have monthly maturities, we employ the

interpolation approach discussed by Guthrie (2009, §12) to derive the initial weekly forward curves.

There are forty-eight instances in total.

9.2 Results

For each stage i P IztI ´ 1u our basis functions for both ILSMN and LSML are the constant 1 and

the futures prices in set tFi,i1 , i1 P I, i1 ą iu. This specification is common in the LSM literature

22

Page 23: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

(Longstaff and Schwartz 2001). Nadarajah et al. (2017) use it to value natural gas storage in a

single market. It satisfies Assumption 8.1 for model (25) because ErFi1,i2,m|Fi,i2,ms “ Fi,i2,m for

each i, i1, i2 P I with i ă i1 ă i2 and m P M. For both ILSMN and LSML, we set to 2,000 the

number of arrays of forward curves Monte Carlo sample paths H. At each iteration of ILSMN we

use H “ 200 such paths to generate reference inventory vectors. Employing all the 2,000 paths does

lead to better policies but increases the ILSMN computational effort by up to 6 times. We only add

a new such vector to the current set of these vectors if its Euclidean distance from this set exceeds

1% of the minimum of the maximal allowed inventory of the storage assets (minlPL yl; see §6.4).

This choice reduces the computational effort of ILSMN by a factor of 2.5 without appreciably

affecting the quality of its policy on average. We terminate ILSMN when the estimate of the value

of the policy corresponding to stopping this algorithm at a given iteration is less than 0.1% of the

analogous value for the policy associated with the previous iteration (see §6.4). We let the number

of arrays of forward curves Monte Carlo sample paths G used to evaluate our policies and dual

bound be equal to 10,000.

1 2 3 4 5 6 7 8 9 1094

96

98

100

Number of IterationsPer

centa

geR

atio

ofE

stim

ated

Val

ues

ofIL

SM

NP

olic

yan

dD

ual

Bou

nd

January

July

(a)

2 4 6 8 100

50

100

150

Number of Iterations

Ave

rage

Nu

mb

erof

Ref

eren

ceIn

vento

ryV

ecto

rsp

erS

tage

(b)

Figure 2: Percentage ratio of the estimated values of the ILSMN policy and dual bound (panel (a))and average average number of reference inventory vectors per stage (panel (b)) as functions of thenumber of iterations for the instances with January and July valuation months and the BS-ES-WSnetworks.

ILSMN stops after three to five iterations. Essentially all the improvement in the value of its

policy occurs at the second iteration. This method generates between fifty and one-hundred-and-

twenty reference inventory vectors on average per stage. We observe a sublinear growth of the

number of these vectors as a function of the number of iterations. Panels (a) and (b) of Figure 2

23

Page 24: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

illustrate these features on the instances that pertain to the January and July valuation months

and the BS-ES-WS network.

Table 1: Statistics of the estimated optimality gaps of the RH and ILSMN policies relative to ourdual bound estimate expressed as percentages of it.

NetworkBS BS-WS BS-ES BS-ES-WS

Policy Policy Policy PolicyStatistic RH ILSMN RH ILSMN RH ILSMN RH ILSMN

Minimum 0.13 0.44 0.45 1.68 0.16 0.80 0.45 1.80Average 0.31 1.00 0.88 2.12 0.37 1.33 0.91 2.45

Maximum 0.62 1.81 1.73 2.85 0.75 1.93 1.78 3.11

Table 1 reports the minimum, average, and maximum of the estimated percentage optimality

gaps of the RH and ILSMN policies measured with respect to our estimated dual bound across our

instances. Online Appendix B.3 includes these gaps for each instance. The ranges and average,

respectively, of the estimated optimality gaps for the BS, BS-WS, BS-ES, and BS-ES-WS instances

are 0.13-0.62%, 0.45-1.73%, 0.16-0.75%, and 0.45-1.78% and 0.31%, 0.88%, 0.37%, and 0.91% for

the RH policy and 0.44-1.81%, 1.68-2.85%, 0.80-1.93%, and 1.80-3.11% and 1.00%, 2.12%, 1.33%,

and 2.45% for the ILSMN policy. The standard errors of all the estimated optimality gaps are at

most 0.40% of their respective dual bound estimates, the standard errors of which do not exceed

0.35% of their respective estimates of this bound. Each of the three considered statistics of the

estimates of the optimality gaps of both the RH and ILSMN policies slightly worsens when the

number of considered storage assets increases. These results indicate that both the RH and ILSMN

policies are near optimal but the former policy marginally outperforms the latter one—roughly by

1% on average and by at most 3%. Moreover, our dual bound is close to optimal.

Table 2: Average CPU minutes required to execute our methods on our instances.

RH ILSMNBound Bound Dual Bound

Network Estimation Algorithm Estimation Total LSML Estimation Total

BS 20 6 2 8 4 1 5BS-WS 32 32 3 35 13 2 15BS-ES 27 14 2 16 11 2 13

BS-ES-WS 39 54 3 57 25 2 27

Our computational platform is a 64 bits Dell OptiPlex XE2 Mini Tower with 64GB of memory,

an 8-core Intel Xeon E5-2609 v2 processor, the Ubuntu 14.04.2 LTS operating system, the g++ 4.8

compiler, the LAPACK 3.X library with a single processor for least squares regressions, and the

24

Page 25: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Gurobi 5.0 linear programming solver (Gurobi Optimization 2012). Table 2 reports the computa-

tional effort of applying our methods to our instances under this set up. Estimating the value of

the RH policy takes on average 20, 32, 27, and 39 CPU minutes on the BS, BS-WS, BS-ES, and

BS-ES-WS instances. On these instances the execution of the ILSMN algorithm and the estimation

of the value of the resulting policy on average require 6, 32, 14, and 54 and 2, 3, 2, and 3 CPU

minutes, respectively, for a total of 8, 35, 16, and 57 CPU minutes. ILSMN is thus faster than

RH on the BS and BS-ES instances and slower than RH on the remaining instances, in particular

the BS-ES-WS ones. On the BS, BS-WS, BS-ES, and BS-ES-WS instances the average burden of

obtaining a dual bound estimate is 5, 15, 13, and 27 CPU minutes, of which 4, 13, 11, and 25 are

for running LSML and 1, 2, 2, and 2 are for estimating this bound.

We also assess the degree of substitution between the storage-based and transport-alone activities

on our instances (see the discussion that follows Proposition 4.1 in §4.1). We approximate the

optimal policy value, V0py0, F0q, with our estimate of the value of the RH policy and use the

estimated value of this policy restrained to perform only storage-based trades as a proxy of the

value of the optimal storage-based policy, V S0 py0, F0q. We obtain an unbiased estimate of the value

of the optimal transport-alone policy, V T0 py0, F0q, by solving the linear program (12)-(15) restricted

to transport-alone trades along each of our arrays of forward curves sample paths and cumulating

the discounted resulting cash flows. Our proxies of the respective averages across all our instances

of the quantities maxtV S0 py0, F0q, V

T0 py0, F0qu and V S

0 py0, F0q ` VT

0 py0, F0q expressed as percentage

ratios of our stand-in for V0py0, F0q are 89% and 165%. These figures suggest that considerable

substitution between the storage-based and transport-alone activities occurs on our instances.

10. Conclusions

We study the trading of energy by merchant companies that operate networks of storage and

transport assets. We formulate this problem as an MDP in which the energy prices are the source of

uncertainty. This MDP is intractable. We thus develop heuristics and estimate both lower and dual

bounds on the optimal operating policy value within Monte Carlo simulation. Our methodological

developments rely on the application of linear optimization. They extend single storage asset

ADP methods that are close to optimal and variants of two of which are available commercially.

Specifically, we propose (i) RH, a deterministic reoptimization technique; (ii) ILSMN, an iterative

LSM method; and (iii) a perfect information dual bound. We perform a numerical study based

on realistic natural gas instances. Our dual bound is near tight and both our heuristics lead to

policies that are close to optimal. In contrast to RH, ILSMN features slightly larger optimality

25

Page 26: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

gaps and requires some tuning, yet it runs faster on both our instances with one storage asset and

one batch of our instances with two storage assets. Overall, RH and our dual bound are a practical

approach to solve our model near optimally. Our proposed methods could augment single storage

asset software and are potentially relevant beyond the application considered in this paper.

Acknowledgments

This research is based upon work supported by the National Science Foundation under grant CMMI

No. 1129163. The second author is a Faculty Affiliate of the Scott Institute for Energy Innovation

at Carnegie Mellon University. We thank the review team for their comments, which substantially

improved the quality of this research.

References

Arvesen, Ø., V. Medbø, S. E. Fleten, A. Tomasgard, S. Westgaard. 2013. Linepack storage valuation underprice uncertainty. Energy 52(1) 155–164.

Asamov, T., W. B. Powell. 2015. Regularized decomposition of highdimensional multistage stochastic pro-grams with markov uncertainty. Working Paper, Department of Operations Research and FinancialEngieering, Princeton University, Princeton, NJ, USA.

Bannister, C. H., R. J. Kaye. 1991. A rapid method for optimization of linear systems with storage. OperationsResearch 39(2) 220–232.

Bauerle, N., V. Riess. 2014. Gas storage valuation with regime switching. Working Paper, Department ofMathematics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Bertsekas, D. P. 2012. Dynamic Programming and Optimal Control, Volume II . 4th ed. Athena Scientific,Belmont, MA, USA.

Bertsekas, D. P., J. N. Tsitsiklis. 1996. Neuro-Dynamic Programming . Athena Scientific, Belmont, MA,USA.

Bertsimas, D., J. N. Tsitsiklis. 1997. Introduction to Linear Optimization. Athena Scientific, Belmont, MA,USA.

Blanco, C., D. Soronow, P. Stefiszyn. 2002. Multi-factor models for forward curve analysis: An introductionto principal component analysis. Commodities Now June 76–78.

Boogert, A., C. De Jong. 2008. Gas storage valuation using a Monte Carlo method. The Journal of Derivatives15(3) 81–98.

Boyabatli, O. 2015. Supply management in multi-product firms with fixed proportions technology. Manage-ment Science 61(12) 3013–3031.

Boyabatli, O., P. R. Kleindorfer, S. R. Koontz. 2011. Integrating long-term and short-term contracting inbeef supply chains. Management Science 57(10) 1771–1787.

Brown, D. B., J. E. Smith. 2014. Information relaxations, duality, and convex stochastic dynamic programs.Operations Research 62(6) 1394–1415.

Brown, D. B., J. E. Smith, P. Sun. 2010. Information relaxations and duality in stochastic dynamic programs.Operations Research 58(4) 1–17.

Burger, B., B. Graeber, G. Schindlmayr. 2007. Managing Energy Risk: An Integrated View on Power andOther Energy Markets. John Wiley & Sons, Ltd., Chichester, UK.

Camacho, E. F., C. Bordons. 2007. Model Predictive Control . 2nd ed. Springer-Verlag, London, UK.

Carmona, R., M. Ludkovski. 2010. Valuation of energy storage: An optimal switching approach. QuantitativeFinance 10(4) 359–374.

26

Page 27: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Carriere, J. F. 1996. Valuation of the early-exercise price for options using simulations and nonparametricregression. Insurance: Mathematics and Economics 19(1) 19–30.

Chang, S. H., M. C. Fu, J. Hu, S. I. Marcus. 2007. Simulation-based Algorithms for Markov DecisionProcesses. Springer, London, UK.

Clewlow, L., C. Strickland. 2000. Energy Derivatives: Pricing and Risk Management . Lacima Publications,London, UK.

Cortazar, G., M. Gravet, J. Urzua. 2008. The valuation of multidimensional American real options using theLSM simulation method. Computers & Operations Research 35(1) 113–129.

Cortazar, G., E. S. Schwartz. 1994. The valuation of commodity contingent claims. The Journal of Derivatives1(4) 27–39.

Denault, M., J.G. Simonato, L. Stentoft. 2013. A simulation-and-regression approach for stochastic dynamicprograms with endogenous state variables. Computers & Operations Research 40(11) 2760–2769.

Deng, S., B. Johnson, A. Sogomonian. 2001. Exotic electricity options and the valuation of electricitygeneration and transmission assets. Decision Support Systems 30(3) 383–392.

Devalkar, S. K., R. Anupindi, A. Sinha. 2011. Integrated optimization of procurement, processing, and tradeof commodities. Operations Research 59(6) 1369–1381.

Dixit, A. K., R. S. Pindyck. 1994. Investment Under Uncertainty . Princeton University Press, Princeton,New Jersey, USA.

EIA. 2013. Anual Energy Outlook 2013. Tech. rep., Energy Information Agency (EIA).

EnergyQuants. 2015. StoragePlanner. http://www.energyquant.nl/software/. Accessed on January 7,2016.

Eydeland, A., K. Wolyniec. 2003. Energy and Power Risk Management . John Wiley & Sons, Inc., Hoboken,NJ, USA.

FEA. 2015. @ENERGY SUITE. https://www.msci.com/documents/1296102/1636401/FEA_Factsheet.

pdf/2331641e-81bd-498a-847b-ff600c9dec03. Accessed on January 7, 2016.

Geman, H. 2005. Commodities and Commodity Derivatives: Modelling and Pricing for Agriculturals, Metals,and Energy . John Wiley & Sons, Ltd., Chichester, UK.

Glasserman, P. 2004. Monte Carlo Methods in Financial Engineering . Springer, New York, NY, USA.

Glasserman, P., B. Yu. 2004. Simulation for American options: Regression now or regression later? H. Nieder-reiter, ed., Monte Carlo and Quasi-Monte Carlo Methods 2002 . Springer-Verlag, Berlin, Germany,213–226.

Gray, J., P. Khandelwal. 2004. Realistic gas storage models II: Trading strategies. Commodities NowSeptember 1–5.

Grillo, S., M. Marinelli, S. Massucco, F. Silvestro. 2012. Optimal management strategy of a battery-basedstorage system to improve renewable energy integration in distribution networks. IEEE Transactionson Smart Grid 3(2) 950–958.

Gurobi Optimization, Inc. 2012. Gurobi optimizer reference manual version 5.0. Houston, Texas: GurobiOptimization.

Guthrie, G. 2009. Real Options in Theory and Practice. Oxford University Press, NY, USA.

INGAA. 2009. Natural gas pipeline and storage infrastructure projections through 2030. Tech. rep., InterstateNatural Gas Association of America (INGAA).

Jiang, D. R., W. B. Powell. 2015. Optimal hour-ahead bidding in the real-time electricity market with batterystorage using approximate dynamic programming. INFORMS Journal on Computing 27(3) 525–543.

Jiang, D.R., T.V. Pham, W.B. Powell, D.F. Salas, W.R. Scott. 2014. A comparison of approximate dynamicprogramming techniques on benchmark energy storage problems: Does anything work? Proceedings ofthe 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . Orlando,FL, USA, 1–8.

Kaminski, V. 2012. Energy Markets. Risk Publications, London, UK.

27

Page 28: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Kazaz, B., S. Webster. 2011. The impact of yield-dependent trading costs on pricing and production planningunder supply uncertainty. Manufacturing & Service Operations Management 13(3) 404–417.

Kim, J. H., W. B. Powell. 2011. Optimal energy commitments with storage and intermittent supply. Opera-tions Research 59(6) 1347–1360.

KYOS. 2014. KYSTORE Gas storage valuation – Optimal decisions in forward and spot markets. http:

//www.kyos.com/files/uploads/modeling/KyStore_brochure.pdf. Accessed on January 7, 2016.

Lacima. 2014. Lacima analytics – Valuation & optimisation suite. http://www.lacimagroup.com/

page18759/ValuationampOptimisationSuite.aspx. Accessed on January 7, 2016.

Lai, G., F. Margot, N. Secomandi. 2010. An approximate dynamic programming approach to benchmarkpractice-based heuristics for natural gas storage valuation. Operations Research 58(3) 564–582.

Lai, G., M. X. Wang, S. Kekre, A. Scheller-Wolf, N. Secomandi. 2011. Valuation of storage at a liquefiednatural gas terminal. Operations Research 59(3) 602–616.

Lohndorf, N., S. Minner. 2010. Optimal day-ahead trading and storage of renewable energies: An approximatedynamic programming approach. Energy Systems 1(1) 61–77.

Lohndorf, N., D. Wozabal, S. Minner. 2013. Optimizing trading decisions for hydro storage systems usingapproximate dual dynamic programming. Operations Research 61(4) 810–823.

Longstaff, F. A., E. S. Schwartz. 2001. Valuing American options by simulation: A simple least-squaresapproach. Review of Financial Studies 14(1) 113–147.

Maragos, S. 2002. Valuation of the operational flexibility of natural gas storage reservoirs. E. Ronn, ed., RealOptions and Energy Management Using Options Methodology to Enhance Capital Budgeting Decisions.Risk Publications, London, UK, 431–456.

Markland, R. E. 1975. Analyzing multi-commodity distribution networks having milling-in-transit features.Management Science 21(12) 1405–1416.

Markland, R. E., R. J. Newett. 1976. Production-distribution planning in a large scale commodity processingnetwork. Decision Sciences 7(4) 579–594.

MathWorks. 2014. Natural Gas Storage Valuation. http://www.mathworks.com/matlabcentral/

fileexchange/47667-natural-gas-storage-valuation. Accessed on January 7, 2016.

Mazieres, D., A. Boogert. 2013. A radial basis function approach to gas storage valuation. The Journal ofEnergy Markets 6(2) 19–50.

Merener, N., R. Moyano, N.E. Stier-Moses, P. Watfi. 2016. Optimal trading and shipping of agriculturalcommodities. Journal of the Operational Research Society 67(1) 114–126.

Midthun, K. T. 2007. Optimization models for liberalized natural gas markets. Ph.D. thesis, NorwegianUniversity of Science and Technology, Trondheim, Norway.

Moazeni, S., B. Powell, A.H. Hajimiragha. 2015. Mean-conditional value-at-risk optimal energy storageoperation in the presence of transaction costs. IEEE Transactions on Power Systems 30(3) 1222–1232.

Nadarajah, S., F. Margot, N. Secomandi. 2015. Relaxations of approximate linear programs for the realoption management of commodity storage. Management Science 61(12) 3054–3076.

Nadarajah, S., F. Margot, N. Secomandi. 2017. Comparison of least squares Monte Carlo methods withapplications to energy real options. European Journal of Operational Research 256(1) 196–204.

Nadarajah, S., N. Secomandi. 2015. Relationship between least squares Monte Carlo and approximate linearprogramming. Working paper, University of Illinois at Chicago.

Nascimento, J., W. Powell. 2013. An optimal approximate dynamic programming algorithm for concave,scalar storage problems with vector-valued controls. IEEE Transactions on Automatic Control 58(12)2995–3010.

Pereira, M.V.F., L.M.V.G. Pinto. 1991. Multi-stage stochastic optimization applied to energy planning.Mathematical Programming 52(1) 359–375.

Powell, W. B. 2011. Approximate Dynamic Programming: Solving the Curses of Dimensionality . 2nd ed.John Wiley & Sons, Hoboken, NJ, USA.

28

Page 29: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Powell, W.B., S. Miesel. 2016. Tutorial on stochastic optimization in energy–Part II: An energy storageillustration. IEEE Transactions on Power Systems 31 1468–1475.

Rømo, F., A. Tomasgard, L. Hellemo, M. Fodstad, B. H. Eidesen, B. Pedersen. 2009. Optimizing theNorwegian natural gas production and transport. Interfaces 39(1) 46–56.

Roncoroni, A., G. Fusai, M. Cummins, eds. 2015. Handbook of Multi-Commodity Markets and Products:Structuring, Trading and Risk Management . John Wiley & Sons, Ltd., Chichester, West Sussex, UK.

Salas, D., W. B. Powell. 2014. Benchmarking a scalable approximation dynamic programming algorithm forstochastic control of multidimensional energy storage problems. Working paper, Princeton Univ.

Scott, T., H. Brown, N. Perry. 2000. Storing true value? Energy and Power Risk Management March22–25.

Secomandi, N. 2010a. On the pricing of natural gas pipeline capacity. Manufacturing & Service OperationsManagement 12(3) 393–408.

Secomandi, N. 2010b. Optimal commodity trading with a capacitated storage asset. Management Science56(3) 1090–1049.

Secomandi, N. 2015. Merchant commodity storage practice revisited. Operations Research 63(5) 1131–1143.

Secomandi, N., G. Lai, F. Margot, A. Scheller-Wolf, D. Seppi. 2015. Merchant commodity storage and termstructure model error. Manufacturing & Service Operations Management 17(3) 302–320.

Secomandi, N., D. Seppi. 2014. Real Options and Merchant Operations of Energy and Other Commodities.Foundations and Trends in Technology, Information and Operations Management 6(3-4) 161–331.

Secomandi, N., M. X. Wang. 2012. A computational approach to the real option management of networkcontracts for natural gas pipeline transport capacity. Manufacturing & Service Operations Management14(3) 441–454.

Sinha, K., J. Ji, D. Hansen, S. Murphy, G. Maese, G. Zhu. 2004. Storage strategies. Energy Risk February62–65.

Smith, J. E., K. F. McCardle. 1999. Options in the real world: Lessons learned in evaluating oil and gasinvestments. Operations Research 47(1) 1–15.

Smith, J.E. 2005. Alternative approaches for solving real-options problems. Decision Analysis 2(2) 89–102.

Smith, R. 2013. Can gas undo nuclear power? Wall Street Journal January 30.

Sturm, J. F. 1997. Trading Natural Gas: A Nontechnical Guide. PennWell Publishing Company, Tulsa, OK,USA.

Swindle, G. 2014. Valuation and Risk Management in Energy Markets. Cambridge University Press, NewYork, NY, USA.

Thompson, M. 2012. Natural gas storage valuation, optimization, market and credit risk management.Working paper, Queens Univ.

Topkis, D. M. 1998. Supermodularity and Complementarity . Princeton University Press, Princeton, NewJersey.

Trigeorgis, L. 1996. Real Options: Managerial Flexibility and Strategy in Resource Allocation. The MITPress, Cambridge, MA, USA.

Tsitsiklis, J. N., B. Van Roy. 2001. Regression methods for pricing complex American-style options. IEEETransactions on Neural Networks 12(4) 694–703.

van de Ven, P. M., N. Hedge, L. Massoulie, T. Salonidis. 2013. Optimal control of end-user energy storage.IEEE Transactions on Smart Grid 4(2) 789–797.

Wu, O. Q., D. D. Wang, Z. Qin. 2012. Seasonal energy storage operations with limited flexibility: The price-adjusted rolling intrinsic policy. Manufacturing & Service Operations Management 14(3) 455–471.

Zhou, Y., A. Scheller-Wolf, N. Secomandi, S. Smith. 2013. Managing wind-based electricity generation in thepresence of storage and transmission capacity. Working paper, Carnegie Mellon Univ.

Zhou, Y., A. Scheller-Wolf, N. Secomandi, S. Smith. 2015. Electricity trading and negative prices: Storagevs. Disposal. Management Science 62(3) 880–898.

29

Page 30: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Online Appendix

A Proofs

The section includes the proofs of the results stated in §4 and §7.

Proof of Proposition 4.1. Let π˚ be an optimal policy to (9) and πS,˚ and πT,˚ be the storageand transport policies, respectively, that make up π˚. The upper bounding inequality V0py0, F0q ď

V T0 py0, F0q`V

S0 py0, F0q follows because πS,˚ and πT,˚ are feasible but not necessarily optimal policies

to the two model versions of (9) specified with the restrictions π P ΠT and π P ΠS, respectively.Let πS and πT be the optimal policies to versions of (9) specified with the restrictions π P ΠT

and π P ΠS, respectively. Since πS and πT are both feasible, but not necessary optimal, policies to(9), we obtain the lower bounding inequality V0py0, F0q ě maxtV T

0 py0, F0q, VS0 py0, F0qu.

Proof of Proposition 4.2. To prove the claimed characterization we require the finiteness of thevalue and continuation functions of SDP (10). It is obvious that not trading results in zero value.Hence, Vipyi, Fiq ě 0 ą ´8, which implies that Wipyi`1, Fiq ą ´8. Further, the value from sellingas much as possible at each market at every stage provides an upper bound, that is, Vipyi, Fiq ďřM

m“1CDm

si,m `řI´1

i1“i`1 Ersi1,m|Fis

ı

“řM

m“1

řI´1i1“i C

DmFi,i1,m, which follows from si,m ” Fi,i,m

and the martingale property of futures prices (Shreve 2004, page 244). Using this inequality andanalogous arguments we have

Wipyi`1, Fiq ” δE rVi`1 pyi`1, Fi`1q |Fis

ď δE

«

Mÿ

m“1

I´1ÿ

i1“i`1

CDmFi`1,i1,m|Fi

ff

“ δMÿ

m“1

I´1ÿ

i1“i`1

CDmE

Fi`1,i1,m|Fi

“ δMÿ

m“1

I´1ÿ

i1“i`1

CDmFi,i1,m

ă 8.

Thus, the value and continuation functions of SDP (10) are finite.We now proceed by induction to prove the claimed result. At stage I ´ 1, for a given Fi, we

haveVI´1pyI´1, FI´1q “ max

xiPX pyiqrpxi, siq.

The optimization in the right hand side of this equality is a linear program with yI´1 appearingin the right hand side of inequalities (5) and (6) defining the polyhedral feasible set X pyiq. Itthus follows from standard linear programming results (Bertsimas and Tsitsiklis 1997, Ch. 5) thatVI´1pyI´1, FI´1q is piecewise linear concave in yI´1. The continuation function at stage I ´ 1 iszero by definition and is therefore piecewise linear concave.

Make the induction hypothesis that the value and continuation functions are piecewise linearconcave in their first arguments also for stages i ` 1, i ` 2, . . . , I ´ 2. We proceed to prove theclaim at stage i. From the finiteness of the continuation function in every stage and the inductionhypothesis, it is easy to verify that the continuation function is piecewise linear concave in its firstargument at stage i. This fact and the linearity of the reward function imply the piecewise linear

OA-1

Page 31: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

concavity of rpxi, siq `Wipyi `∆yipxiq, Fiq in xi, which belongs to a convex (polyhedral) feasibleset X pyiq with yi in the right hand side of two of its defining inequalities. Thus, Vipyi, Fiq is theobjective function of a linear program and is piecewise linear concave in yi. The claimed piecewiselinear concavity of the value and continuation functions at all stages for a given array of forwardcurves follows from the principle of mathematical induction.

Proof of Proposition 7.1. (a) Fix an arbitrary stage i and state pyi, Fiq. We define an extendedoptimal solution of

maxxiPX pyiq

rpxi, siq `xWRHi pyi `∆yipxiq, Fiq. (1)

as its optimal decision vector x˚i , yi, and the optimal solution to the model representing xWRHi pyi `

∆yipx˚i q, Fiq. This extended solution is feasible to (12)-(15) because the constraints in (1) and this

model together encode the constraints (13)-(15). Moreover, the optimal objective function valueof (1) is the same as the value of the objective function (12) evaluated at this extended optimalsolution. Conversely, given an optimal solution px˚i1 , y˚i1q for all i1 P ti, . . . , I ´ 1u to (12)-(15), itfollows from its constraints that (i) y˚i`1 “ yi`∆yipx

˚i q, and (ii) px˚i1 , y˚i1q for all i1 P ti`1, . . . , I´1u

defines a feasible solution to the model that defines xWRHi pyi ` ∆yipx

˚i q, Fiq. Thus the extended

optimal solution set of (1) and the optimal solution set of (12)-(15) coincide and their objectivefunction values are the same.

Finally, since xWRHi p¨, Fiq is defined as the objective function of a linear program with yi`1 in the

right hand side of one of its constraints, its piecewise linear concavity follows from standard linearprogramming theory (Bertsimas and Tsitsiklis 1997, Ch. 5).

(b) Since yi`1 appears in the right hand side of constraints Θi`1pyi`1q and the objective is

maximization, it follows from standard linear programming theory that xWLSMi p¨, Fiq is piecewise

linear concave in yi`1 (Bertsimas and Tsitsiklis 1997, Ch. 5). To see that it is also a concave

envelope, we consider the dual of the linear program that defines xWLSMi p¨, Fiq:

mina0,a1

a0 `ÿ

lPLa1,lyi`1,l (2)

s.t. a0 `ÿ

lPLa1,ly

qi`1,l ě

Biÿ

b“1

φi,bpFiqγi,q,b, q “ 1, . . . , | qYi`1,l|. (3)

The variable a0 and the vector of variables a1 :“ pa1,l, lq of this linear program can be interpretedas an intercept and slope of a hyperplane. The objective function (2) minimizes the evaluation ofthis hyperplane at yi`1. The constraints (3) ensure that the evaluation of the hyperplane definedby a0 and a1 at any inventory sample yqi`1,l is an upper bound on

řBib“1 φi,bpFiqγi,q,b. At optimality,

at least L`1 constraints will hold as an equality, and the inventory samples corresponding to thesetight constraints will contain L ` 1 affinely independent vectors. Thus, the hyperplane defined bya0 and a1 is a facet of the concave envelope. From strong duality, it follows that xWLSM

i p¨, Fiq equalsthe value on this concave envelope.

B Additional Details on Natural Gas Application

This section contains more detailed information on the natural gas network instances and pricemodel calibration that we use for our computational experiments in §9. We will need an outline ofthe subsections before submitting this revision. It can be done later.

OA-2

Page 32: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

B.1 Maximal Inventory Levels and Capacity Values

The storage asset maximal inventory levels (yl) for BS, ES, and WS are equal to 1 MMBtu,0.40 MMBtu, and 3.30 MMBtu, respectively. The injection capacities (C`l ) for BS, ES, and WSare 0.1125 MMBtu/week, 0.025 MMBtu/week, and 0.1375 MMBtu/week, respectively. The with-drawal capacities (C´l ) for BS, ES, and WS are equal 0.1875 MMBtu/week, 0.10 MMBtu/week,and 0.2875 MMBtu/week, respectively. The receipt and delivery capacities at all the TRANSCOmarkets (nodes) other than the market TR3 are 0.0375 MMBtu/week and 0.0625 MMBtu/week,respectively; the receipt (CR

m) and delivery (CDm) capacities at both markets TR3 and TEELA are

0.1125 MMBtu/week and 0.1875 MMBtu/week, respectively; and the receipt and delivery capacitiesat the AGT market and all the TETCO markets other than the market TEELA are 0.0225 MMB-tu/week and 0.0375 MMBtu/week, respectively.

B.2 Reward Function

Below we define the reward function at a given time, omitting the time index for notational simplic-ity. The marginal cost for transporting energy between node m and node m1 is denoted by cm,m1 .The storage injection and withdrawal marginal costs are denoted by c` and c´, respectively. Thein-kind loss p1´ηm,m1q{ηm,m1 , where the transport fuel factor ηm,m1 P p0, 1s, occurs when transport-ing 1 unit of energy from node m to node m1. In other words, 1{ηm,m1 ” 1`p1´ηm,m1q{ηm,m1 unitsneed to be received at node m in order to deliver 1 unit at node m1. We assume that this energy ispurchased at the market corresponding to node m. The in-kind losses incurred to inject and with-draw, respectively, 1 unit of energy into and out of storage are p1´ η`q{η` and p1´ η´q{η´ wherethe injection fuel factor η` P p0, 1s and the withdrawal fuel factor η´ P p0, 1s have interpretationsanalogous to the transport fuel factor. We assume that the energy used for injection or withdrawalis monetized at the spot price of the market closest to storage.

Denote by pj the path of trade j, and by pjpu1q the u1-th node in this path. The number of

nodes in path pj is denoted by |pj |. The reward rpx, sq from executing the vector of trade amountsx given the vector of spot prices s is defined as

rpx, sq :“ÿ

jPJ

|pj |ÿ

u1“1

α1j,u1psq ` α2

j,u1

xj , (4)

where

α1j,u1psq :“

$

&

%

´spjp2qp1´ η´q

η´, if j P J ´ and u1 “ 1,

´spjp1q

ηpjp1q,pjp2q, if j P J zJ ´ and u1 “ 1,

´spjpu1q

p1´ ηpjpu1q,pjpu1`1qq

ηpjpu1q,pjpu1`1q, if 1 ă u1 ă |pj |,

´spjp|pj |´1qp1´ η`q

η`, if j P J ` and u1 “ |pj |,

spjp|pj |q, if j P J zJ ` and u1 “ |pj |,

and

α2j,u1 :“

$

&

%

´c´, if j P J ´ and u1 “ 1,

´cpjp1q,pjp2q, if j P J zJ ´ and u1 “ 1,

´cpjpu1q,pjpu1`1q, if 1 ă u1 ă |pj |,

´c`, if j P J ` and u1 “ |pj |;

OA-3

Page 33: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

here, the term α1j,u1psq includes the per-unit cost incurred or revenue earned from buying or selling

energy, respectively, and the corresponding monetized in-kind losses when executing trade j, andthe term α2

j,u1 represents the marginal cost incurred when executing trade j. The first case in the

definition of α1j,u1psq accounts for the withdrawal fuel cost in a withdrawal path; the second for the

purchase plus transport fuel cost between the first two nodes in transport and injection paths; thethird for the monetized transport fuel cost for transport between nodes, not including the first andlast node, in any path; the fourth for the injection fuel cost in an injection path; and the fifth forthe revenue from selling energy in withdrawal and transport paths.

For each storage asset l, the injection and withdrawal fuel factors (η`l and η´l ) equal to 1 and0.985, respectively; and injection and withdrawal commodity charges (c`l and c´l ) equal to $0.02/MMBtu and $0.01 /MMBtu, respectively (in the natural gas industry, an in-kind loss is known asa fuel loss and a marginal cost as a commodity charge). The parameters of the transport assets arecommodity charges and fuel factors (cm,m1

and ηm,m1

, respectively) as given in Tables 1-3.

Table 1: Transport fuel factors (ηm,m1

) for the months April to November.

BS ES WS TR3 TR4 TR6 TEELA TE1 TE2 TE3 AGT

BS - - - 1 - - 1 - - - -ES - - - 1 - - - - - -WS - 1 - - - - - - -TR3 - 0.9823 0.9638 - - - - -TR4 - 0.9672 - - - - -TR6 - - - - - -

TEELA - 0.9557 0.9406 0.9305 -TE1 - 0.9632 0.9531 -TE2 - 0.9602 -TE3 - 0.9907

Table 2: Transport fuel factors (ηm,m1

) for the months December to March.

BS ES WS TR3 TR4 TR6 TEELA TE1 TE2 TE3 AGT

BS - - - 1 - - 1 - - - -ES - - - 1 - - - - - -WS - 1 - - - - - - -TR3 - 0.9823 0.9638 - - - - -TR4 - 0.9672 - - - - -TR6 - - - - - -

TEELA - 0.9523 0.9316 0.9179 -TE1 - 0.956 0.9423 -TE2 - 0.952 -TE3 - 0.99

B.3 Calibration of Forward Curve Model

Our data set includes 1 year and 3 months of natural gas closing futures prices for Henry Hub,Louisiana, and basis swaps from June 2011 to August 2012 for each of the 8 markets in Figure 1,from which we created monthly forward curves of futures price for these 8 markets. To calibrateprice model (25), we needed to use this monthly data set to determine the number of factors K and

OA-4

Page 34: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Table 3: Commodity charges (cm,m1

, $/MMBtu).

BS ES WS TR3 TR4 TR6 TEELA TE1 TE2 TE3 AGT

BS - - - 0.05 - - 0.0103 - - - -ES - - - - - - - - - -WS - - - - - - - - -TR3 - 0.02253 0.04454 - - - - -TR4 - 0.04027 - - - - -TR6 - - - - - -

TEELA - 0.0353 0.0762 0.1044 -TE1 - 0.0659 0.0941 -TE2 - 0.0743 -TE3 - 0.013

obtain loading coefficients on a weekly time scale. We explain below our approach to obtain eachof these quantities.

We first estimated monthly sample variance-covariance matrices of the daily log futures pricereturns across maturities and markets. We then performed a principal component analysis of thesematrices and estimated the monthly loading coefficients accordingly (see Blanco et al. 2002 andSecomandi et al. 2015 for details). We chose the number of factors K equal to 6 because this is thesmallest value that explains more than 99% of the total observed variance in each of our monthlydata sets. The estimated loading coefficients are available at http://selvan.people.uic.edu.

Obtaining weekly loading coefficients from monthly loading coefficients at each market can bedone in several approximate ways. We choose one approach. The first issue we face is the lack of amonth i loading coefficient for trading within this month since the first maturity is for month i` 1.We address this issue by choosing the weekly loading coefficients for maturities at weeks withinmonth i to equal the month i loading coefficient with maturity in month i` 1. The second issue isthat this approach does not work for the last month, where we do not have a loading coefficient witha prompt month maturity. In this case, we choose the loading coefficients for maturities in weekswithin the last month to equal the monthly loading loading coefficient at the penultimate tradingmonth with maturity at the last month. Finally, for the remaining cases, we set the weekly loadingcoefficient equal to the monthly loading coefficient such that that the trading week and maturityweek are contained within the trading month and maturity month.

B.4 Estimated Optimality Gaps

Table 4 displays the estimated percentage optimality gaps of the RH and ILSMN policies mea-sured with respect to our estimated dual bound for each instance.

OA-5

Page 35: Merchant Energy Trading in a Network - University of North ...public.kenan-flagler.unc.edu/2017msom/SIGs/iFORM SIG/Nadarajah and... · the optimal policy value estimated within Monte

Table 4: Estimated optimality gaps of the RH and ILSMN policies with respect to our dual boundestimate reported as percentages of this estimate.

NetworkBS BS-WS BS-ES BS-ES-WS

Valuation Policy Policy Policy PolicyMonth RH ILSMN RH ILSMN RH ILSMN RH ILSMN

January 0.13 0.56 0.94 1.68 0.16 0.80 0.92 1.80February 0.20 0.44 0.57 1.71 0.24 0.91 0.59 2.55

March 0.26 1.81 0.61 1.88 0.31 1.93 0.65 2.00April 0.16 1.30 0.45 2.85 0.18 1.70 0.45 3.11May 0.23 0.67 0.57 1.81 0.29 1.22 0.59 2.04June 0.55 1.18 1.28 2.34 0.68 1.42 1.39 2.71July 0.62 1.05 1.73 2.60 0.75 1.34 1.78 2.96

August 0.57 0.90 1.59 2.34 0.68 1.25 1.63 2.79September 0.40 0.69 1.32 2.02 0.49 1.27 1.31 2.42October 0.24 0.59 1.06 1.90 0.33 1.28 1.25 2.42

November 0.47 0.82 1.15 1.97 0.55 1.16 1.22 2.27December 0.27 0.55 0.96 1.70 0.31 0.99 0.98 1.89

References

Bertsimas, D., J. N. Tsitsiklis. 1997. Introduction to Linear Optimization. Athena Scientific, Belmont, MA,USA.

Blanco, C., D. Soronow, P. Stefiszyn. 2002. Multi-factor models for forward curve analysis: An introductionto principal component analysis. Commodities Now June 76–78.

Secomandi, N., G. Lai, F. Margot, A. Scheller-Wolf, D. Seppi. 2015. Merchant commodity storage and termstructure model error. Manufacturing & Service Operations Management 17(3) 302–320.

Shreve, S. 2004. Stochastic Calculus for Finance II: Continuous-Time Models. Springer, New York, NY,USA.

OA-6