second-order cone programming for p-spline simulation metamodeling · second-order cone programming...

Second-Order Cone Programming for P-Spline

Simulation Metamodeling

Yu Xia ∗ Farid Alizadeh†

April 26, 2015

Abstract

This paper approximates simulation models by B-splines with a penalty on high-order finite differences of the coefficients of adjacent B-splines. The penaltyprevents overfitting. The simulation output is assumed to be nonnegative. Thenonnegative spline simulation metamodel is casted as a second-order cone pro-gramming model, which can be solved efficiently by modern optimization tech-niques. The method is implemented in MATLAB/GNU Octave.

1 Introduction

People use computer simulation to study complex systems that prohibit analyt-ical evaluations, in order to have a basic understanding of the system, or to findrobust decisions or policies, or to compare different decisions or policies [20].Simulation is applied in various areas [4, 23, 27], and it is considered as one ofthe three most important operations research techniques [22]. Let y representthe response and x represent the input of a system. A simulation model canthen be written as

y = f(x).

In situations where the systems are so complex that even their valid simulationmodels can’t be evaluated in reasonable time, metamodels, or models of mod-els [19], are constructed to approximate the simulation models. Advantages ofthe simulation metamodel include “model simplification, enhanced explorationand interpretation of the model, generalization to other models of the sametype, sensitivity analysis, optimization, answering inverse questions, and pro-viding the researcher with a better understanding of the behaviour of the systemunder study and the interrelationships among the variables” [15].

∗Lakehead University, Canada [email protected] Research supported in part by NSERC.†Management Science and Information Systems, Rutgers University, NJ, USA

[email protected]

1

Parametric polynomial response surface approximation is the most populartechnique for building metamodels [5]. To determine and quantify the unknownor too complex relationship between the response variables and the experimentalfactors assumed to influence the response, in response surface methodolgy –introduced by Box and Wilson [6], a mathematical model is constructed to fitthe data collected from a series of experiments, and the optimal settings of theexperimental factors is determined [7, 17, 25]. Usually the mathematical modelis a first or second order polynomial, called a response surface.

By Weierstrass approximation theorem, every continuous function can beuniformly approximated as closely as desired by a polynomial. Polynomialsare easy to compute and have continuous derivatives of all orders. On the otherhand, polynomials are inflexible: their values on a complex plane are determinedby an arbitrarily small set [35, Theorem 3.23]; they oscillate increasingly withthe increase in the order of the polynomials, while high-order is required forsuitable accuracy in polynomial approximation; the Runge phenomenon [31] isthe classic example of divergent polynomial interpolation. A polynomial fitsdata nicely near one data point may display repulsive features at parts of thecurve not close to that particular data point.

Approximation by splines (smooth piecewise polynomials) overcomes the in-flexibility of polynomial approximation. In practice, B-Splines, which are firstthoroughly studied by [33], are widely used in approximation, as there are goodproperties associated with B-splines [10]. Especiallarly, compared with repre-sentations by splines in truncated power basis—defined as

{(x − ti)j+/j! : (j =

1, . . . k)}

for node ti, B-spline representations are relatively well-conditioned andinvolve fewer basis functions computationally. Let t ≡ (ti) be a nondecreasingsequence. The ith (normalized) B-spline basis function of order 1 for the knotsequence t is defined as follows:

Bi1t(x) ≡ χi(x) ≡

{1 ti ≤ x < ti+1

0 otherwise.

When it can be inferred from the context, the knots t and variable x are omittedin the notations for B-spline representations. Denote

ωik(x) ≡

{x−ti

ti+k−1−ti , ti 6= ti+k−1

0 otherwise.

For k > 1, the ith B-Spline basis function of order k for knot sequence t can beobtained recursively by:

Bik ≡ ωikBi,k−1 + (1− ωi+1,k)Bi+1,k−1. (1)

The spline basis function Bikt depends only on the knots ti, . . . , ti+k. It ispositive on the interval (ti, ti+k) and is zero elsewhere. The Curry-SchoenbergTheorem [8] describes how to construct B-spline basis for the linear space ofpiecewise polynomial functions satisfying predefined continuity conditions based

2

on the muliplicity of the knots. This property simplifies the approximation offunctions with required degree of smoothness, compared with truncated powerspline approximation, where additional constraints on smoothness need to beincluded in the model. The de Boor algorithm [9] is a well-conditioned yetefficient way of evaluating B-splines.

P-Spline approximation. To fit the metamodel with data collected fromexperiments, i.e., to find the parameters for the B-spline approximation, weconside P-spline regression [14]. The objective function of a P-spline regressioncombines B-splines with a penalty on high-order finite differences of the coeffi-cients of adjacent B-splines. Similar to the smoothing term in the loss functionfor smoothing spline regression [30, 34], the penalty in P-spline regression lossfunction prevents overfitting, i.e., the penalty reduces the variation of the fit-ted curve caused by data error. Compared with smoothing splines, P-splinesare relatively inexpensive to compute and without the complexity of choosingthe optimal number and positions of knots — too few data points causes un-der fitting while too many data points results in overfitting. An algorithm ofdetermining the number of knots for P-spline regression is given in [32].

Let (αi) represent the B-spline coefficients. The second-order differences ofthe adjacent B-spline coefficients for the knot sequence t are

(αj − αj−1)− (αj−1 − αj−2) = αj − 2αj−1 + αj−2.

Denote the parameter controlling the smoothness of the fit by λ. The leastsquares objective function (loss function) of the regression of m data points(xi, yi) using n B-spline basis functions of order four with a penalty on second-order differences of the B-spline coefficients, i.e. the P-spline regression lossfunction for fitting the metamodel studied in this paper, is

minα

m∑i=1

yi − n∑j=1

αjBj4(xi)

2

+ λ

n∑j=3

(αj − 2αj−1 + αj−2)2. (2)

Nonnegative model fitting. In many applications, the response of the sys-tem is known or required to be nonnegative or above some threshold; for in-stance, when the output of the system describes duration, productions, prices,demand, sales, wages, amount of precipitation, probability mass, etc; see [3, 28]).Because of the noisy or tendency in the data, quite often, the fitted curve doesn’texhibit nonnegativitiy, even though it should be. For instance, let (xi, yi) bethe monthly precipitation amount at some region, where xi is the variable formonths and yi is the rain fall amount. The rain fall amount may be decreas-ing during some period till at some months there is little or no rain, then itmay increase again. Because of the increase and decrease trend before and af-ter these certain months, the fitted curve may have negative values at thesepoints for y [29]. Even if the data points are nonnegative, without imposingthe nonnegativity constraints, the resulting models may take negative values at

3

some areas [38]. In [3] a numerical example of cubic spline approximation ofarrival-rate for an e-mail data set shows that the maximum likelihood splinetakes negative values in a significant time period with all positive data points,and the estimation problem may even be unbounded and thus ill-posed.

To obtain a satisfiable and sometimes meaningful model, the nonnegativityconstraint on the output needs to be imposed on the regression. The nonneg-ative cubic spline approximation in truncated power basis is considered in [3].Constrained smoothing spline approximation is studied in [37], but they ac-knowledge the computational difficulty in their approach. Since the B-splinebasis functions are nonnegative, imposing positivity on B-spline coefficients [16]or integrating B-splines with positive coefficients (I-spline [28]) preserves posi-tivity in regression. But this approach excludes some classes of positive splinesand thus reduces the accuracy of the regression. It is proved in [11] that errorsin approximation of nonnegative functions by B-splines of order k (degree < k)with nonnegative coefficients are bigger in magnitude compared with errors inapproximation by nonnegative splines of the same order, and the difference be-tween the magnitudes of the errors increases with the order of the splines. Thetwo approximation schemes give errors of the same magnitude only if k ≤ 2, i.e.approximation by piece wise constant or piece wise linear functions. Because ofthe approximation and computational advantage of P-splines, this paper focuseson nonnegative P-spline approximation.

To simplify notation, in this paper, we concatenate vectors row wise by ‘,’and concatenate vectors column wise by ’;’; for instance, the adjoining of vectorsx, y, and z can be represented asxy

z

=(x>, y>, z>

)= (x; y; z) .

2 Nonnegative Cubic Polynomials

In the context below, matrices are represented by capital letters: A ≡ [aij ],where the element of matrix A at both of its ith row and jth column is denotedas aij . Let A � 0 represent the symmetric matrix A being positive semidefinite.For two matrices A and B of the same size, let A • B denote their Hadamardproduct:

A •B =∑ij

aijbij .

By Markov-Lukacs theorem [26], a cubic polynomial p(x) ≡ β3x3 + β2x

2 +β1x + β0 is nonnegative on the interval [ti, ti+1] if and only if there existc1, c2, d1, d2 ∈ R such that p(x) can be represented as

p(x) = (x− ti)(c1x+ c2)2 + (ti+1 − x)(d1x+ d2)2.

Denote

H ≡[x2 xx 1

].

4

Then by [26, Theorem 1], the above representation is equivalent to: existingC ≡ [cij ] � 0, D ≡ [dij ] � 0 such that

(c1x+ c2)2 = C •H, (d1x+ d2)2 = D •H.

Because C � 0 is equivalent to: c11 ≥ 0, c22 ≥ 0, c212 ≤ c11c22, p(x) is non-negative on [ti, ti+1] if and only if there exist c11, c12, c22, d11, d12, d22 ∈ R suchthat

β3 = c11 − d11,β2 = −tic11 + 2c12 + ti+1d11 − 2d12,

β1 = −2tic12 + c22 + 2ti+1d12 − d22,β0 = −tic22 + ti+1d22

c11, c22, d11, d22 ≥ 0

c212 ≤ c11c22, d212 ≤ d11d22.

(3)

3 Nonnegative Representations By B-Splines OfOrder Four

Based on the definition of B-spline basis functions (1), the ith B-spline basisfunction of order three for knot sequence t is

Bi3 = ωi3ωi2χi+[ωi3 (1− ωi+1,2) + (1− ωi+1,3)ωi+1,2]χi+1+(1− ωi+1,3) (1− ωi+2,2)χi+2.

And the ith B-spline basis function of order four for knot sequence t is

Bi4 = ωi4Bi,3 + (1− ωi+1,4)Bi+1,3

= ωi4ωi3ωi2χi + [ωi4 (ωi3 (1− ωi+1,2) + (1− ωi+1,3)ωi+1,2) + (1− ωi+1,4)ωi+1,3ωi+1,2]χi+1

+ [ωi4 (1− ωi+1,3) (1− ωi+2,2) + (1− ωi+1,4) (ωi+1,3 (1− ωi+2,2) + (1− ωi+2,3)ωi+2,2)]χi+2

+ (1− ωi+1,4) (1− ωi+2,3) (1− ωi+3,2)χi+3

Hence, on the interval [ti, ti+1), the B-spline∑i αiBi4t(x) is{

αiωi4ωi3ωi2 + αi−1 [ωi−1,4 (ωi−1,3 (1− ωi2) + (1− ωi3)ωi2) + (1− ωi,4)ωi3ωi2]

+ αi−2 [ωi−2,4 (1− ωi−1,3) (1− ωi,2) + (1− ωi−1,4) (ωi−1,3 (1− ωi,2) + (1− ωi,3)ωi,2)]

+ αi−3 (1− ωi−2,4) (1− ωi−1,3) (1− ωi2)}χi(x)

Given a finite knot sequence t ≡ (t1, . . . , tn), define

t1−k = · · · = t0 = t1, tn = tn+1 = · · · = tn+k.

For u = 0, 1, 2, 3 and v = i− 3, i− 2, i− 1, i, let a(i)u,v denote the coefficient of xu

associated with αv of the polynomial on the interval [ti, ti+1).

5

∑i

αiBi4 =∑i

i∑j=i−3

αj

(3∑l=0

a(i)lj x

l

)χi,

where for j ≤ 0, define αj = 0 and a(i)u,j = 0. In other words,

3∑l=0

a(i)li x

l = ωi4ωi3ωi2

3∑l=0

a(i)l,i−1x

l = ωi−1,4 (ωi−1,3 (1− ωi2) + (1− ωi3)ωi2) + (1− ωi,4)ωi3ωi2

3∑l=0

a(i)l,i−2x

l = ωi−2,4 (1− ωi−1,3) (1− ωi,2) + (1− ωi−1,4) (ωi−1,3 (1− ωi,2) + (1− ωi,3)ωi,2)

3∑l=0

a(i)l,i−3x

l = (1− ωi−2,4) (1− ωi−1,3) (1− ωi2) .

Let 1/(tj − tl) ≡ 0 for tj = tl. Then a(i)u,v can be represented in terms of t as

below:

bi−3 ≡1

(ti+1 − ti−2)(ti+1 − ti−1)(ti+1 − ti)a(i)3,i−3 = −bi−3, a

(i)2,i−3 = 3ti+1bi−3, a

(i)1,i−3 = −ti+1a

(i)2,i−3, a

(i)0,i−3 = t3i+1bi−3

bi−2 ≡1

(ti+2 − ti−1)(ti+1 − ti−1)(ti+1 − ti), bi−1 ≡

1

(ti+2 − ti−1)(ti+2 − ti)(ti+1 − ti)a(i)3,i−2 = bi−3 + bi−2 + bi−1

a(i)2,i−2 = −(ti−2 + 2ti+1)bi−3 − (ti−1 + ti+1 + ti+2)bi−2 − (ti + 2ti+2)bi−1

a(i)1,i−2 = (2ti−2ti+1 + t2i+1)bi−3 + (ti−1ti+1 + ti−1ti+2 + ti+1ti+2)bi−2 + (2titi+2 + t2i+2)bi−1

a(i)0,i−2 = −ti−2t2i+1bi−3 − ti−1ti+1ti+2bi−2 − tit2i+2bi−1

bi−4 ≡1

(ti+3 − ti)(ti+2 − ti)(ti+1 − ti)a(i)3,i−1 = −bi−4 − bi−2 − bi−1

a(i)2,i−1 = (ti+3 + 2ti)bi−4 + (ti+1 + 2ti−1)bi−2 + (ti+2 + ti + ti−1)bi−1

a(i)1,i−1 = −(t2i + 2ti+3ti)bi−4 − (2ti+1ti−1 + t2i−1)bi−2 − [ti+2(ti−1 + ti) + ti−1ti]bi−1

a(i)0,i−1 = ti+3t

2i bi−4 + ti+1t

2i−1bi−2 + ti+2titi−1bi−1

a(i)3,i = bi−4, a

(i)2,i = −3tibi−4, a

(i)1,i = −tia(i)2,i, a

(i)0,i = −t3i bi−4

Denote∆i ≡ ti+1 − ti.

6

For equally spaced knot sequence, i.e., ∆ = ∆i = ∆i+1 = . . . , the above

expression for a(i)u,v can be simplified:

a(i)3,i−3 = − 1

6∆3a(i)2,i−3 =

ti + ∆

2∆3a(i)1,i−3 = − (ti + ∆)2

2∆3a(i)0,i−3 =

(ti + ∆)3

6∆3

a(i)3,i−2 =

1

2∆3a(i)2,i−2 = −3ti + 2∆

2∆3a(i)1,i−2 =

3t2i + 4ti∆

2∆3a(i)0,i−2 =

−3t3i − 6t2i∆ + 4∆3

6∆3

a(i)3,i−1 = − 1

2∆3a(i)2,i−1 =

3ti + ∆

2∆3a(i)1,i−1 =

−3t2i − 2ti∆ + ∆2

2∆3a(i)0,i−1 =

3t3i + 3t2i∆− 3ti∆2 + ∆3

6∆3

a(i)3,i =

1

6∆3a(i)2,i = − ti

2∆3a(i)1,i =

t2i2∆3

a(i)0,i = − t3i

6∆3

By (3), the B-spline∑i αiBi4 is nonnegative on the interval [ti, ti+1) iff there

exist c(i)11 , c

(i)22 , d

(i)11 , d

(i)22 , such that

i∑j=i−3

a(i)3j αj = c

(i)11 − d

(i)11 ,

i∑j=i−3

a(i)2j αj = −tic(i)11 + 2c

(i)12 + ti+1d

(i)11 − 2d

(i)12 ,

i∑j=i−3

a(i)1j αj = −2tic

(i)12 + c

(i)22 + 2ti+1d

(i)12 − d

(i)22 ,

i∑j=i−3

a(i)0j αj = −tic(i)22 + ti+1d

(i)22

c(i)11 , c

(i)22 , d

(i)11 , d

(i)22 ≥ 0(

c(i)12

)2≤ c(i)11 c

(i)22 ,

(d(i)12

)2≤ d(i)11 d

(i)22 .

(4)

The model. Adding constraints (4) to the P-spline regression loss function (2),we obtain the formula for fitting the metamodel:

7

minα,c,d

m∑i=1

yi − n∑j=1

αjBj4(xi)

2

+ λ

n∑j=3

(αj − 2αj−1 + αj−2)2.

s.t.

i∑j=i−3

a(i)3j αj = c

(i)11 − d

(i)11 ,

i∑j=i−3

a(i)2j αj = −tic(i)11 + 2c

(i)12 + ti+1d

(i)11 − 2d

(i)12 ,

i∑j=i−3


(i)12 + c

(i)22 + 2ti+1d

(i)12 − d

(i)22 ,

i∑j=i−3


(i)22

c(i)11 , c

(i)22 , d

(i)11 , d

(i)22 ≥ 0(

c(i)12

)2≤ c(i)11 c

(i)22 ,

(d(i)12

)2≤ d(i)11 d

(i)22

(i = 1, . . . , n).

(5)

Variable reduction. Let α denote the column vector containing all αi: α ≡(α1, α2, . . . , αn)

>. Denote

ci ≡(c(i)11 , c

(i)22 , c

(i)12

)>, di ≡

(d(i)11 , d

(i)22 , d

(i)12

)>.

Let c and d denote the column vectors containing all c(i)lj ’s and d

(i)lj ’s:

c ≡ (c1; c2; . . . ; cn) , d ≡ (d1; d2; . . . ; dn) .

Constraints (4) contain 4n homongenous equations and 7n variables: α, c,and d. By Curry-Schoenberg theorem [8, 10], the sequence B1,k,t, . . . , Bn,k,tis a basis for the linear space of piecewise polynomials of order k with breaksequence t that satisfies continuity condition specified by the multiplicities ofthe elements of t. Since the sub-matrix corresponding to α in the coefficientmatrix of the constraints (4) is the linear transformation from the B-spline basisto the truncated power function basis, the matrix corresponding to α has fullcolumn rank.

Lemma 3.1. The coefficient matrix of the equalities in constraints (4) has rankat least 4n−κ+ %, where κ is the number of total multiple knots – counted withmultiplicities, and % is the number of different multiple knots – counted withoutmultiplicities.

8

Proof. The submatrix of the coefficent matrix for the equalities in constraints (4)with columns corresponding to c and d is block diagonal, where the ith blockis:

Gi ≡

−1 1ti −2 −ti+1 2

2ti −1 −2ti+1 1ti −ti+1

.The block has rank 4 if ti 6= ti+1, and it has rank 3 if ti = ti+1. Since thecoefficient matrix of the equalities in constraints (4) has 4n rows, the statementof the lemma follows.

Corollary 3.2. If all the knots (ti)n+1i=1 are distinct, then the coefficient matrix

of the constraints (4) has full row rank.

Therefore, given a distinct knot sequence t, we can use Gauss elimination torepresent α by c, d and t in the constraints (4). Since each αi relates to only ti,ti+1, ti+2, ti+3 in constraints (4), we can represent each αi by at most variables

c(i)11 , c

(i)12 , c

(i)22 , d

(i)11 , d

(i)12 , d

(i)22 .

For equally spaced knot sequences, below are representations of αi−3, αi−2, αi−1, αi:For 4 ≤ i ≤ n, omitting the subscript of ti for simplicity, we have

αi/∆3 =

(2t2

∆2+

22t

3∆+ 6

)c(i)11 +

(8t

3∆2+

22

3∆

)c(i)12 +

(2t

3∆3+

8

3∆2

)c(i)22

−(t2

∆2+

10t

3∆+

7

3

)d(i)11 +

(4t2

3∆2+

2t

3∆2− 2

∆

)d(i)12 −

(2t

3∆3+

5

3∆2

)d(i)22

αi−1/∆3 =

(t2

∆2+

4t

3∆

)c(i)11 +

(2t

3∆2+

4

3∆

)c(i)12 +

(2t

3∆3+

5

3∆2

)c(i)22

+

(2t

3∆+

2

3

)d(i)11 +

(4t2

3∆3+

8t

3∆2+

2

∆

)d(i)12 −

(2t

3∆3+

2

3∆2

)d(i)22

αi−2/∆3 = − 2t

3∆c(i)11 −

(4t

3∆2+

2

3∆

)c(i)12 +

(2t

3∆3+

2

3∆2

)c(i)22

+

(t2

∆2+

2t

3∆− 1

3

)d(i)11 +

(4t2

3∆3+

14t

3∆3+

2

∆

)d(i)12 +

(− 2t

3∆3+

1

∆2

)d(i)22

αi−3/∆3 =

(− t2

∆2+

4t

3∆

)c(i)11 +

(− 10t

3∆2+

4

3∆

)c(i)12 +

(2t

3∆3− 1

3∆2

)c(i)22

+

(2t2

∆2− 10t

3∆+

2

3

)d(i)11 +

(4t2

3∆3+

20t

3∆2− 2

∆

)d(i)12 +

(− 2t

3∆3+

4

3∆2

)d(i)22

(6)

9

For i = 1:

α1 = 6∆3[c(1)11 − d

(1)11

],

3t1

[c(1)11 − d

(1)11

]= t1c

(1)11 − 2c

(1)12 − t2d

(1)11 + 2d

(1)12 ,

3t21

[c(1)11 − d

(1)11

]= −2t1c

(1)12 + c

(1)22 + 2(t1 + ∆)d

(1)12 − d

(1)22 ,

t31

[c(1)11 − d

(1)11

]= t1c

(1)22 − (t1 + ∆)d

(1)22

For i = 2:

α1 = 4t2∆2c(2)11 + t∆2c

(2)12 +

(2∆3 − 4t2∆2

)d(2)11 − 4∆2d

(2)12

α2 = 6∆2 (2t2 + ∆) c(2)11 + 12∆2c

(2)12 − 12∆2t2d

(2)11 − 12∆2d

(2)12

α2 =6∆2

∆− 2t2

[ (∆2 − 2t2∆− 3t22

)c(2)11 − 2t2c

(2)12 + c

(2)22 +

(3t22 + 2t2∆−∆2

)d(2)11

+ 2 (t2 + ∆) d(2)12 − d

(2)22

]α2 =

6∆3

t22 − t2∆2 + ∆3/3

[ (t32 + t22 − t2∆2 + ∆3/3

)c(2)11 − t2c

(2)22

−(t32 + t22 − t2∆2 + ∆3/2

)d(2)11 + (t2 + ∆)d

(2)22

]For i = 3:

α1 = ∆(t23 − 2t3∆

)c(3)11 + ∆ (2t3 − 2∆) c

(3)12 + ∆c

(3)22 −∆

(∆2 − 4t3∆ + t23

)d(3)11

+ ∆ (4∆− 2t3) d(3)12 −∆d

(3)22

α1 =∆2

∆2 − t3∆ + t23∆− t23

[(t33 − 2t33∆ + 2t23∆− 2t3∆2

3

)c(3)11 −

(2

3∆2 − 2t3∆

+ 2t23∆))c(3)12 − t3c

(3)22 −

(t33 + t23∆2 − 5

3t3∆2 − 2t33∆ + 2t23∆ +

∆3

3

)d(3)11

+

(2

3∆2 + 2t23∆− 2t3∆

)d(3)12 + (t3 + ∆) d

(3)22

]α2 = 2t23∆c

(3)11 + 4t3∆c

(3)12 + 2∆c

(3)22 +

(−2t23∆ + 4t3∆2

)d(3)11 +

(4∆2 − 4t3∆

)d(3)12 − 2∆d

(3)22

α3 =(3t23∆ + 6t3∆2

)c(3)11 +

(6t3∆ + 6∆2

)c(3)12 + 3∆c

(3)22 +

(3∆3 − 3t23∆

)d(3)11 − 6t3∆d

(3)12

+ 6∆3c(3)11 − 6∆3d

(3)11 − 3∆d

(3)22 .

Then we can replace αi in the objective of (5) by the following relation:

10

α1 = 6∆3[c(1)11 − d

(1)11

],

α2 =6∆3

t22 − t2∆2 + ∆3/3

[ (t32 + t22 − t2∆2 + ∆3/3

)c(2)11 − t2c

(2)22

−(t32 + t22 − t2∆2 + ∆3/2

)d(2)11 + (t2 + ∆)d

(2)22

]α3 =

(3t23∆ + 6t3∆2

)c(3)11 +

(6t3∆ + 6∆2

)c(3)12 + 3∆c

(3)22 +

(3∆3 − 3t23∆

)d(3)11 − 6t3∆d

(3)12

+ 6∆3c(3)11 − 6∆3d

(3)11 − 3∆d

(3)22

i ≥ 4 :

αi =

(2t2∆ +

22t∆2

3+ 6∆3

)c(i)11 +

(8t∆

3+

22∆2

3

)c(i)12 +

(2t

3+

8∆

3

)c(i)22

−(t2∆ +

10t∆2

3+

7

3∆3

)d(i)11 +

(4t2∆

3+

2t∆

3− 2∆2

)d(i)12 −

(2t

3+

5∆

3

)d(i)22 ,

(7)

4 Second-Order Cone Programming

Index vectors in Rn from 0. A second-order cone (quadratic cone, Lorentz cone,or ice-cream cone) in Rn is the set

Qn ≡{x = (x0; x) ∈ R× Rn−1 : x0 ≥ ‖x‖

}.

The rotated quadratic cone is obtained by rotating the second-order cone by 45degrees in the x0-x1 plane:

Qn ≡{x = (x0;x1; x) ∈ R× R× Rn−2 : 2x0x1 ≥ ‖x‖2, x0 ≥ 0, x1 ≥ 0

}.

The nonnegative orthant is a one-dimensional second-order cone. Because asecond-order cone induces a partial ordering, an n-dimensional vector x ∈ Qncan be represented as x �Qn 0. The subscript n is sometimes omitted when itis clear from the context.

Second-order cone programming is an extension of linear programming. Insecond-order cone programming, one minimizes a linear objective function un-der linear equality constraints and second-order cone constraints where variablesare required to be in the second-order cones. Let xi (i = 1, . . . , r) be vectors notnecessarily of the same dimensions. Let ci (i = 1, . . . , r) and b be vectors andAi (i = 1, . . . , r) be matrices. The standard form second-order cone program-ming problem is

minx∑ri=1 c

>i xi

subject to∑ri=1Aixi = b

xi �Q 0

Second-order cone programming has many applications. A solution to asecond-order cone programming problem can be obtained approximately by in-terior point methods in polynomial time of the problem data size. See [2] for

11

a survey of applications and algorithms of second-order cone programming. Inaddition, the complexity of an interior point method for second-order cone pro-gramming doesn’t depend on the dimension of the second-order cone.

The metamodel fitting problem (5) can be casted as a second-order coneprogramming problem. Below are two constructions.

Model I

minα,z

z

subject to

i∑j=i−3

a(i)3j αj = c

(i)11 − d

(i)11 ,

i∑j=i−3

a(i)2j αj = −tic(i)11 + 2c

(i)12 + ti+1d

(i)11 − 2d

(i)12 ,

i∑j=i−3


(i)12 + c

(i)22 + 2ti+1d

(i)12 − d

(i)22 ,

i∑j=i−3


(i)22(

c(i)11 , c

(i)22 ,√

2c(i)12

)>∈ Q2,

(d(i)11 , d

(i)22 ,√

2d(i)12

)>∈ Q2, (i = 1, . . . , n)z, y1 − n∑

j=1

αjBj4(x1), . . . , ym −n∑j=1

αjBj4(xm),√λ (α3 − 2α2 + α1) , . . . ,

√λ (αn − 2αn−1 + αn−2)

> ∈ Qm+n+1.

Model II. The square of the L2 norm of a vector x ∈ Rn no more than thevalue of y ∈ R can be represented as a second-order cone constraint:

‖x‖22 ≤ y ⇐⇒ (y + 1; y − 1; 2x) ∈ Qn+2.

Therefore, problem (5) can also be formulated as the following second-ordercone programming model:

12

minα,u,v

m∑i=1

ui + λ

n∑j=3

vj

subject to

i∑j=i−3

a(i)3j αj = c

(i)11 − d

(i)11 ,

i∑j=i−3

a(i)2j αj = −tic(i)11 + 2c

(i)12 + ti+1d

(i)11 − 2d

(i)12 ,

i∑j=i−3


(i)12 + c

(i)22 + 2ti+1d

(i)12 − d

(i)22 ,

i∑j=i−3


(i)22(

c(i)11 , c

(i)22 ,√

2c(i)12

)>∈ Q2,

(d(i)11 , d

(i)22 ,√

2d(i)12

)>∈ Q2, (i = 1, . . . , n)up + 1, up − 1, 2

y− n∑j=1

αjBj4(xp)

> ∈ Q3, (p = 1, . . . ,m)

[vq + 1, vq − 1, 2 (αq − 2αq−1 + αq−2)]> ∈ Q3, (q = 3, . . . , n).

5 Numerical Examples

We have implemented the nonnegative P-spline regression by second-order coneprogramming in MATLAB / GNU Octave [13].

5.1 Parameter Selection

We use equally spaced knots. For each fixed knot sequence, the smoothnessparameter λ is chosen to minimize the GCV (Generalized Cross-Validation)statistic [32]: Let α(λ) be the optimal coefficients under parameter λ. Theaverage squared residuals using λ for the regression is

ASR(λ) = m−1m∑i=1

yi − n∑j=1

α(λ)jBj4(xi)

2

.

13

Denote D ∈ Rn×(n−2) as

D =

1−2 11 −2 1

. . .. . .

. . .

1 −2 11 −2

1

.

Then the penalty term in the regression loss function can be represented asx>DD>x. Let X ∈ Rm×n denote the design matrix whose ith row is

Xi =[B1,4,t(xi) B2,4,t(xi) . . . Bn,4,t(xi)

].

The smoother matrix S(λ) is defined as

S(λ) = X(X>X + λDD>)−1X>.

Then the generalized cross validation statistic is

GCV (λ) =ASR(λ)

[1−m−1tr {S(λ)}]2.

The value tr {S(λ)} measures the effective degrees of freedom of the fit.The number of knots is determined by the Akaike information theoretical

criterion (AIC) [1], i.e, we run the algorithm with different number of knots andchoose the number n for the model that has the minimum AIC:

AIC = m

ln

m∑i=1

yi − n∑j=1

αjBj4(xi)

2

/m

+ 1

+ 2n.

5.2 Numerical Examples

The second-order cone programming model is solved via SDPT3-4.0 [36]throughthe YALMIP interface [24]. YALMIP is a modeling language that models theproblems into standard second-order cone programs. SDPT3 and SeDuMi arestate of the art software for second-order cone programming. The reformattedSDPT3 and SeDuMi for GNU Octave by Michael Grant are available at therepositories on GitHub.

Below are some numerical examples with MATLAB. Data points are de-picted by blue “*”; the B-spline fitted function is the green curve. We testedthe method with number of internal knots from 4 to 19. For each knot sequence,the λ is chosen to minimize GCV . The values of λ tested are 1.0e− 4, 1.0e−3, . . . , 103, 104. The number of knots is determined by minimizing AIC.

14

Density estimation. Figure 1 shows the output of our method for densityestimation.

For the Poisson probability density function with mean 20:

f(k) =20ke−20

k!,

based on GCV and AIC, the best model has 5 interior knots and λ = 1.0−4.For the Gamma probability density function with α = 2, β = 2:

f(x;α, β) =1

βαΓ(α)xα−1e−

xβ , Γ(α) =

∫ ∞0

xα−1e−xdx,

the best model has 19 interior knots and λ = 10−2.For the Weibull probability density function with α = 1, β = 1.5:

f(x;α, β) =

{αβαx

α−1e−(x/β)α

x ≥ 0

0 x < 0,

the best model has 19 interior knots and λ = 10−3.For the Pareto probability density function with α = 1, b = 1:

f(x;α, b) =

{(αbα)/(xα+1), x ≥ b0 x < b,

the best model has 17 interior knots and λ = 10−4.

15

x0 5 10 15 20 25 30 35 40

f(x)

-0.02

0

0.02

0.04

0.06

0.08

0.1Poisson Probability Density Function (mean = 20)

x0 5 10 15 20 25

f(x)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2Gamma Probability Density Function (, = 2, - = 2)

x0 1 2 3 4 5 6 7 8

f(x)

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Weibull Probability Density Function (, = 1, - = 1.5)

x1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3

f(x)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Pareto Probability Density Function (, = 1, b = 1)

Figure 1: Probability Density Estimation

Duration analysis. Duration analysis [21] studies the spell of an event. Letf(t) be the density function of the probability distribution of duration. Thesurvivor function S(t) = Pr(x ≥ t) is the probability that the random variablex will equal or exceed t. The hazard function λ(t) = f(t)/S(t) is the rate atwhich spells will be completed at t.

16

The data in Figure 2 are strike spells and survivor and hazard estimatesfrom [18]. The duration time are strike lengths in days between 1968 and 1976involving at least 1,000 works in the U.S. manufacturing industries with majorissue.

For the survivor function estimation, the best model has 19 interior knotsand λ = 10−3; For the hazard function estimation, the best model has 6 interiorknots and λ = 10−2;

Duration in Days0 10 20 30 40 50 60 70 80

Sur

vivo

r

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Survivor Function (Strike Data)

Duration in Days0 10 20 30 40 50 60 70 80

Haz

ard

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14Hazard Function (Strike Data)

Figure 2: Duration Data

Cost and production. The data in Figure 3 are the monthly production costsand output for a hosiery mill over a 4-year period from [12]. The productionis in thousands of dozens of pairs, and the costs is in thousands of dollars. Wedownloaded the data from Larry Winner’s web site: http://www.stat.ufl.

edu/~winner/data/millcost.dat.For the production data, the best model has 18 interior knots, and λ = 10−2.

For the cost data, the best model also has 18 interior knots, and λ = 10−2.

17

http://www.stat.ufl.edu/~winner/data/millcost.dat

http://www.stat.ufl.edu/~winner/data/millcost.dat

Month0 5 10 15 20 25 30 35 40 45 50

Cos

t

10

20

30

40

50

60

70

80

90

100Hosiery Mill Cost

Month0 5 10 15 20 25 30 35 40 45 50

Pro

duct

ion

0

5

10

15

20

25

30

35

40

45

50Hosiery Mill Production

Figure 3: Mill Production/Costs

References

[1] Hirotogu Akaike. Information theory and an extension of the maximumlikelihood principle. In Emanuel Parzen, Kunio Tanabe, and GenshiroKitagawa, editors, Selected Papers of Hirotugu Akaike, Springer Series inStatistics, pages 199–213. Springer New York, 1998.

[2] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math.Program., 95(1, Ser. B):3–51, 2003. ISMP 2000, Part 3 (Atlanta, GA).

[3] Farid Alizadeh, Jonathan Eckstein, Nilay Noyan, and Gabor Rudolf.Arrival rate approximation by nonnegative cubic splines. Oper. Res.,56(1):140–156, 2008.

[4] Jerry Banks, John S. II Carson, Barry L. Nelson, and David M. Nicol.Discrete-Event System Simulation. Prentice Hall, U.S.A., 2010.

[5] Russell R. Barton. Simulation metamodels. In Proceedings of the 30thConference on Winter Simulation, WSC ’98, pages 167–176, Los Alamitos,CA, USA, 1998. IEEE Computer Society Press.

[6] G. E. P. Box and K. B. Wilson. On the experimental attainment of optimumconditions. Journal of the Royal Statistical Society. Series B (Methodolog-ical), 13(1):pp. 1–45, 1951.

18

[7] George E. P. Box and Norman R. Draper. Empirical model-building andresponse surfaces. Wiley Series in Probability and Mathematical Statistics:Applied Probability and Statistics. John Wiley & Sons, Inc., New York,1987.

[8] H. B. Curry and I. J. Schoenberg. On Polya frequency functions. IV. Thefundamental spline functions and their limits. J. Analyse Math., 17:71–107,1966.

[9] Carl de Boor. On calculating with B-splines. J. Approximation Theory,6:50–62, 1972. Collection of articles dedicated to J. L. Walsh on his 75thbirthday, V (Proc. Internat. Conf. Approximation Theory, Related Topicsand their Applications, Univ. Maryland, College Park, Md., 1970).

[10] Carl de Boor. A practical guide to splines, volume 27 of Applied Mathe-matical Sciences. Springer-Verlag, New York, revised edition, 2001.

[11] Carl de Boor and James W. Daniel. Splines with nonnegative B-splinecoefficients. Math. Comp., 28:565–568, 1974.

[12] Joel Dean. Statistical Cost Functions of a Hosiery Mill, volume XI, no. 4 ofStudies in Business Administration. The School of Business, The Universityof Chicago, Chicago, IL, U.S.A., 1941.

[13] John W. Eaton, David Bateman, Søren Hauberg, and Rik Wehbring. GNUOctave version 3.8.1 manual: a high-level interactive language for numer-ical computations. CreateSpace Independent Publishing Platform, 2014.ISBN 1441413006.

[14] Paul H. C. Eilers and Brian D. Marx. Flexible smoothing with B-splinesand penalties. Statist. Sci., 11(2):89–121, 1996. With comments and arejoinder by the authors.

[15] Linda Weiser Friedman and Israel Pressman. The metamodel in simulationanalysis: Can it be trusted? The Journal of the Operational ResearchSociety, 39(10):pp. 939–948, 1988.

[16] Xuming He and Peide Shi. Monotone B-spline smoothing. J. Amer. Statist.Assoc., 93(442):643–650, 1998.

[17] Andre I. Khuri and John A. Cornell. Response surfaces, volume 152 ofStatistics: Textbooks and Monographs. Marcel Dekker, Inc., New York,second edition, 1996. Designs and analyses.

[18] Nicholas Kiefer. Economic duration data and hazard functions. Journal ofEconomic Literature, 26(2):646–79, 1988.

[19] Jack P. C. Kleijnen. A comment on blanning’s metamodel for sensitivityanalysis: The regression metamodel in simulation. Interfaces, 5(3):21–23,1975.

19

[20] Jack P.C. Kleijnen, Susan M. Sanchez, Thomas W. Lucas, and Thomas M.Cioppa. State-of-the-art review: A users guide to the brave new worldof designing simulation experiments. INFORMS Journal on Computing,17(3):263–289, 2005.

[21] Tony Lancaster. The econometric analysis of transition data, volume 17 ofEconometric Society Monographs. Cambridge University Press, Cambridge,1990.

[22] Michael S. Lane, Ali H. Mansour, and John L. Harpell. Operations re-search techniques: A longitudinal update 1973–1988. Interfaces, 23(2):63–68, 1993.

[23] Averill M Law, David M Kelton, and David M Kelton. Simulation modelingand analysis. McGraw-Hill, New York, 2015.

[24] Johan Lofberg. Yalmip : A toolbox for modeling and optimization inMATLAB. In Proceedings of the CACSD Conference, Taipei, Taiwan, 2004.

[25] Raymond H. Myers, Douglas C. Montgomery, and Christine M. Anderson-Cook. Response surface methodology. Wiley Series in Probability andStatistics. John Wiley & Sons, Inc., Hoboken, NJ, third edition, 2009.Process and product optimization using designed experiments.

[26] Yurii Nesterov. Squared functional systems and optimization problems. InHigh performance optimization, volume 33 of Appl. Optim., pages 405–440.Kluwer Acad. Publ., Dordrecht, 2000.

[27] J Tinsley Oden, Ted Belytschko, Jacob Fish, TJ Hughes, Chris Johnson,David Keyes, Alan Laub, Linda Petzold, David Srolovitz, and S Yip. Revo-lutionizing engineering science through simulation. National Science Foun-dation Blue Ribbon Panel Report, 65, 2006.

[28] J. O. Ramsay. Monotone regression splines in action. Statistical Science,3(4):pp. 425–441, 1988.

[29] J. O. Ramsay and B. W. Silverman. Functional data analysis. SpringerSeries in Statistics. Springer, New York, second edition, 2005.

[30] Christian H. Reinsch. Smoothing by spline functions. I, II. Numer. Math.,10:177–183; ibid. 16 (1970/71), 451–454, 1967.

[31] Carl Runge. Uber empirische funktionen und die interpolation zwischenaquidistanten ordinaten. Zeitschrift fur Mathematik und Physik, 46(224-243):20, 1901.

[32] David Ruppert. Selecting the number of knots for penalized splines. J.Comput. Graph. Statist., 11(4):735–757, 2002.

20

[33] I. J. Schoenberg. Contributions to the problem of approximation of equidis-tant data by analytic functions. Part B. On the problem of osculatory inter-polation. A second class of analytic approximation formulae. Quart. Appl.Math., 4:112–141, 1946.

[34] I. J. Schoenberg. Spline functions and the problem of graduation. Proc.Nat. Acad. Sci. U.S.A., 52:947–950, 1964.

[35] Larry L. Schumaker. Spline functions: basic theory. Cambridge Math-ematical Library. Cambridge University Press, Cambridge, third edition,2007.

[36] Kim-Chuan Toh, Michael J. Todd, and Reha H. Tutuncu. On theimplementation and usage of SDPT3—a Matlab software package forsemidefinite-quadratic-linear programming, version 4.0. In Handbook onsemidefinite, conic and polynomial optimization, volume 166 of Internat.Ser. Oper. Res. Management Sci., pages 715–754. Springer, New York,2012.

[37] Miguel Villalobos and Grace Wahba. Inequality-constrained multivariatesmoothing splines with application to the estimation of posterior probabil-ities. J. Amer. Statist. Assoc., 82(397):239–248, 1987.

[38] Yu Xia and Paul D. McNicholas. A gradient method for the monotone fusedleast absolute shrinkage and selection operator. Optimization Methods andSoftware, 29(3):463–483, 2014.

21

second-order cone programming for p-spline simulation metamodeling · second-order cone programming...

Documents