arxiv:1604.08636v1 [math.oc] 28 apr 2016 · derivative-free e cient global optimization on...

24
Noname manuscript No. (will be inserted by the editor) Derivative-free Efficient Global Optimization on High-dimensional Simplex Priyam Das Received: date / Accepted: date Abstract In this paper, we develop a novel derivative-free deterministic greedy algorithm for global optimization of any objective function of parameters be- longing to a unit-simplex. Main principle of the proposed algorithm is making jumps of varying step-sizes within the simplex parameter space and search- ing for the best direction to move in a greedy manner. Unlike most of the other existing methods of constraint optimization, here the objective function is evaluated at independent directions within an iteration. Thus incorpora- tion of parallel computing makes it even faster. Requirement of paralleliza- tion grows only in the order of the dimension of the parameter space, which makes it more convenient for solving high-dimensional optimization problems in simplex parameter space using parallel computing. A comparative study of the performances of this algorithm and other existing algorithms have been shown for some moderate and high-dimensional optimization problems along with some transformed benchmark test-functions on simplex. Around 20 – 300 folds improvement in computation time has been achieved using the proposed algorithm over Genetic algorithm with more accurate solution. Keywords Simplex · coordinate descent · gradient descent · convex optimization · non-convex global optimization · Genetic algorithm Priyam Das North Carolina State University, USA E-mail: [email protected] arXiv:1604.08636v1 [math.OC] 28 Apr 2016

Upload: others

Post on 21-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Noname manuscript No.(will be inserted by the editor)

    Derivative-free Efficient Global Optimization onHigh-dimensional Simplex

    Priyam Das

    Received: date / Accepted: date

    Abstract In this paper, we develop a novel derivative-free deterministic greedyalgorithm for global optimization of any objective function of parameters be-longing to a unit-simplex. Main principle of the proposed algorithm is makingjumps of varying step-sizes within the simplex parameter space and search-ing for the best direction to move in a greedy manner. Unlike most of theother existing methods of constraint optimization, here the objective functionis evaluated at independent directions within an iteration. Thus incorpora-tion of parallel computing makes it even faster. Requirement of paralleliza-tion grows only in the order of the dimension of the parameter space, whichmakes it more convenient for solving high-dimensional optimization problemsin simplex parameter space using parallel computing. A comparative study ofthe performances of this algorithm and other existing algorithms have beenshown for some moderate and high-dimensional optimization problems alongwith some transformed benchmark test-functions on simplex. Around 20 – 300folds improvement in computation time has been achieved using the proposedalgorithm over Genetic algorithm with more accurate solution.

    Keywords Simplex · coordinate descent · gradient descent · convexoptimization · non-convex global optimization · Genetic algorithm

    Priyam DasNorth Carolina State University, USAE-mail: [email protected]

    arX

    iv:1

    604.

    0863

    6v1

    [m

    ath.

    OC

    ] 2

    8 A

    pr 2

    016

  • 2 Priyam Das

    1 Introduction

    k-simplex is defined as a k-dimensional polytope, which is a convex hull ofits k + 1 affinely independent vertices. Consider {v1, . . . ,vm} ∈ Rm−1 are maffinely independent vertices of the convex-hull H in Rm−1. Then all points inH can be described by the set

    SH = {p1v1 + · · ·+ pmvm | pi ≥ 0, 1 ≤ i ≤ m,m∑i=1

    pi = 1}

    Clearly, SH is a (m−1)-dimensional simplex. Consider a case where an objec-tive function has to be optimized on the parameter space is given by H. Forany point v ∈ H, we would get an unique m-tuple p = (p1, · · · , pm) such thatpi ≥ 0 for i = 1, . . . ,m and

    ∑mi=1 pi = 1. Thus, we can write our objective

    function as a function of p = (p1, · · · , pm). So our problem can be formulatedas

    minimize : f(p1, · · · , pm)

    subject to : pi ≥ 0, 1 ≤ i ≤ m,m∑i=1

    pi = 1. (1)

    In the field of computational mathematics, statistics and operational research,optimization problems on the simplex parameter space are pretty common. Forexample, some of the useful and convenient methods like modeling with splines(specifically B-splines), estimation problem in multinomial setup, Markov chaintransition matrix estimation and estimation of mixture proportions of mixturedistribution are a few of them. But there is a scarcity of specially designedalgorithms for non-linear optimization of parameters coming from simplex.

    There exist many algorithms for optimizing linear functions on constrainedlinear space as well as on simplex. For non-linear objective functions, con-vex optimization algorithms can be used when the objective function is alsoconvex. ‘Interior-point (IP)’ algorithm (see [5], [6], [7], [8]) and ‘SequentialQuadratic Programming (SQP)’ algorithm (see [9], [8], [10]) are widely usedfor non-linear convex optimization problems. The simplex parameter spacebeing convex, IP and SQP algorithms can be used for convex optimizationfor optimization of objective functions where parameters are in simplex. Al-though, the main problem with these convex optimization algorithms is thatit gets struck at any local minima in case the objective function is non-convexwith multiple minimas. One possible solution to avoid this problem is startingthe iterations from several starting points. Although, for low dimensional non-convex optimization problems, the strategy of starting from multiple initialpoints might be affordable. But with increasing dimension of the parameterspace, this strategy proves to be computationally very expensive since withincreasing dimension, the requirement of number of starting points increaseexponentially. So, it is of interest to find an efficient algorithm for non-convex

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 3

    optimization on simplex constrained space.

    In last few decades, many non-convex global optimization strategies havebeen proposed on constrained spaces. The ‘Genetic algorithm (GA)’ (see [12],[13], [14]) and ‘Simulated annealing (SA)’ (see [15], [16]) remained quite popu-lar among them. GA works fine for lower dimensional problems. But GA doesnot scale well with complexity because in higher dimensional optimizationproblems there is often an exponential increase in search space size (see [17],page 21). Besides, one major problem with these two methods is they mightbe much expensive in case we use these methods for simple convex functions.

    With the increasing access to high-performance modern computers andclusters ([21]), some of the existing parallelizable optimization algorithms (e.g.,Monte Carlo methods) have a great advantage for certain types of problems.The motivation behind using parallelization in these methods is mainly to ei-ther start from different starting points or to use different random numbergenerator seeds simultaneously. As mentioned earlier, though these methodsperform well for lower dimensional parameter spaces, since with an increasingnumber of dimensions, parameter space grows exponentially, the way thesemethods use parallelization, is not much helpful. On the other hand, paral-lelization can only increase the sampling rates linearly. Instead, if we can designan algorithm where requirement of parallelization increases linearly with thedimension of the parameter space, it would be more convenient and useful forhigh-dimensional parameter spaces on simplex.

    The main principle of our algorithm is to make jumps along the coordi-nates of the parameter of varying step sizes within the parameter space andsearch for the best possible direction of movement in a greedy manner dur-ing each iteration step. In most of the existing methods, during an updatestage, it finds out the best possible direction of the movement (in terms ofthe objective function values at those sites) by derivative based methods (e.g.,‘interior point’ and ‘SQP algorithms). Though derivative based methods workquite fast and well enough for smooth objective functions, but for all objectivefunctions, there might not exist a closed form of the derivative. In that case,numerical evaluation of derivative might affect the computational efficiency.Secondly, relying on derivative based methods, there is always a chance ofconverging to local extrema. The strategy of choosing a parameter randomlyand moving with step size (may be adaptive) depending on it’s effect on theobjective function works well for low-dimensional problems. But with increas-ing number of parameters, at every iteration, it is important to move in thedirection which has the minimum value of the objective function. In a high-dimensional optimization problem, choosing the parameters’ axes randomlyfor updates (e.g., [18]) decreases the chance of the moving in the best possibledirection during iterations. This motivates us to find a novel way for selectingthe best possible direction of movement during each iteration step instead of

  • 4 Priyam Das

    randomly choosing any parameter axis or direction.

    To avoid derivative, we can moving along the co-ordinates of the parameterwith varying step-sizes and choose the best possible direction of movement.In our algorithm during an iteration, once we fix the step size (say s), for anyco-ordinate of the parameter, there are two available options for movement,either increase or decrease by that step size s. For m-dimensional parameterspace, the step-size, by which a co-ordinate position of the parameter is in-creased, is deducted from the rest of the co-ordinates of the parameter equallyto maintain the constraint

    ∑mi=1 pi = 1. For example, if we increase the first

    co-ordinate by step size s at a step, we subtract s/(m− 1) from rest of the co-ordinates i = 2, . . . ,m of the parameters. Hence for a given step size, there are2m possible directions of movement since we are moving only one co-ordinateposition of the parameter at a time by that step-size s in positive and negativedirections and the corresponding constraint adjustment is done as mentionedabove (discussion on how to account for boundary issues and how to deter-mine and vary step sizes has been mentioned in Section 2). Thus during eachiteration, we find 2m possible movements in the neighborhood of the currentparameter value. After checking the objective function value at those 2m pos-sible directions of movement, the direction of movement where the objectivefunction has the minimum value is chosen and compared with the value of theobjective function at the current site (before making any movements). Newupdated value of the parameter is set equal to the site with lower objectivefunction value among those two aforementioned sites. Thus the best possibledirection of movement in the neighborhood of the current value of the param-eter is ensured at each iteration. Another advantageous side of our algorithmis that we can evaluate the objective function at possible 2m directions in par-allel. Thus the requirement of parallel computing only increases in the order ofthe number of parameters, unlike the way Monte Carlo uses parallel comput-ing as mentioned above. We call this algorithm ‘Greedy Coordinate Descentof Varying Step-size on Simplex’ (GCDVSS).

    2 Algorithm

    Suppose we have a objective function Y = f(p) where p = (p1, · · · , pm) is avector of length m, suct that,

    ∑mi=1 pi = 1 and pi ≥ 0 for i = 1, · · · ,m. Our

    objective is

    minimize : f(p0, p1, · · · , pm)

    subject to : pi ≥ 0, 0 ≤ i ≤ m,m∑i=0

    pi = 1. (2)

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 5

    Define

    S = {p = (p1, . . . , pm) ∈ Rm |m∑i=1

    pi = 1, pi ≥ 0, 0 ≤ i ≤ m}

    Our problem can be written as

    minimize : f(p)

    subject to : p ∈ S (3)

    Our algorithm consists of several runs. Each run is an iterative procedure. Arun stops based on some convergence criteria (see below). At the end of eachrun, a solution is returned. For the first run only, the starting point should beprovided by the user. But for the following runs, it starts from the solutionreturned by the previous run. Each run tries to minimize the objective func-tion value in a greedy manner (see below for details). Hence, the solution getsimproved after each run. Once two consecutive runs yield the same solution,our algorithm stops returning the final solution.

    In our algorithm, each run is a similar iterative procedure except the factthat the values of tuning parameters after each run might be changed. In eachrun there are four tuning parameters which are initial global step size sinitial,step decay rate ρ, step size threshold φ and sparsity threshold λ. Except stepdecay rate ρ, the values of the other tuning parameters are fixed at the begin-ning and kept unchanged till the algorithm converges. The value of step decayrate is taken to be ρ = ρ1 for the first run and ρ = ρ2 for the following runs.Overall there are 5 tuning parameters which are sinitial, ρ1, ρ2, φ and λ. Apartfrom these parameters, max iter denotes the maximum number of allowed it-erations inside a run.max runs denotes the maximum number of allowed runs.

    Inside each run, there is a parameter called global step size and 2m localparameters named local step sizes (denoted by {s+i }mi=1 and {s

    +i }mi=1). In the

    first iteration of each run, we set initial value of global step size s(1) = sinitial.It’s value is kept unchanged throughout a iteration. But at the end of eachiteration, it’s value is either kept same or decreased by a division factor stepdecay rate ρ, based on some convergence criteria (see below, see step (7) ofSTAGE 1). Hence, in the (j+1)-th iteration, the global step size is s(j+1) would

    be equal to s(j) or s(j)

    ρ based on the aforementioned criteria. At the beginningof any iteration, the local step sizes are set equal to the current global stepsize. For example, at the beginning of j-th iteration, we set s+i = s

    −i = s

    (j)

    for i = 1, . . . ,m. Suppose the current value of p in the j-th iteration is

    p(j) = (p(j)1 , . . . , p

    (j)m ) ∈ S. During the j-th iteration, we look for 2m points

    within the domain S which can be reached by moving from the current solu-tion p(j) and the movements depend on the local step sizes. The value of thelocal step sizes are subject to be updated if the movements corresponding tothose step sizes yield points in Rm outside S (see step (3) and (4) of STAGE

  • 6 Priyam Das

    1). Note that, once the value of a co-ordinate goes below the sparsity thresholdλ, we consider those co-ordinates to be ‘insignificant’. Suppose l-th component

    of p(j) is ‘insignificant’. Then for the evaluations of q+i and q−i for i 6= l, p

    (j)l

    is kept unchanged (see step (3) and (4) of STAGE 1). If the points q+i and q−i

    for i = 1, . . . ,m are in S, the values of the objective function are evaluated atthose points and are saved as {f+i }mi=1 and {f

    −i }mi=1 for i = 1, . . . ,m.

    Once {f+i }mi=1 and {f−i }mi=1 are evaluated for j-th iteration, we find the

    smallest one out of these 2m values. If the smallest of these 2m values issmaller than f(p(j)), the point corresponding to that smallest value of theobjective function is accepted. Then the ‘insignificant’ positions are replacedby 0 and the sum of the ‘insignificant’ positions (named ‘garbage’) is dividedby the number of remaining positions and that derived quantity is added tothose remaining positions. After that, this new point is considered as the up-dated p(j) and the value of p(j+1) is set equal to that obtained point (see step(5) and (6) of STAGE 1). Once the square of the euclidean distance of theobjective function parameters of two consecutive iterations becomes less thantol fun, the global step size is decreased by a division factor of ρ, the stepdecay rate (see step (7) of STAGE 1). A run ends when the global step sizebecomes less than or equal to step size threshold φ (see step (8) of STAGE 1).Once same solution is returned by two consecutive runs, our algorithm stopsafter returning the final solution.

    The default value of sinitial is taken to be equal to 1. ρ1 and ρ2 denotethe step decay rates. Taking smaller step decay rate results in better solutionin the cost of higher computation time. Based on experiments, we note thatwe get satisfactory results for setting the default values of these parametersρ1 = 2 and ρ2 = 1.05. φ denotes minimum allowed size of the global stepsize for movement within the domain (see the algorithm for dependence ofthe movement of the global step size). Making the value of φ smaller resultsinto more accurate solution in the cost of higher computation time. It’s de-fault value is taken to be equal to 10−3. λ controls the sparsity. At the end ofeach iteration, the positions of the current estimated parameter with valuesless than λ are set equal to 0 (see step (6) of STAGE 1). Also, λ controlsthe movement of the parameters of the objective function in the domain (seestep (3) and (4) of STAGE 1). In case, the solution is expected to be sparse,it’s value should be set larger and in case, the solution is not expected tobe sparse, it’s value should be set smaller. We note, in general, the defaultvalue of λ = 10−3 works fine. We set max iter = 50000, max runs = 1000and tol fun = 10−15. Before going through the STAGE 1 for the first time,

    we set R = 1, ρ = ρ1 and initial guess of the solution p(1) = (p

    (1)1 , . . . , p

    (1)m ) ∈ S.

    STAGE : 1

    1. Set j = 1. Set s(j) = sinitial Go to step (2).

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 7

    2. If j > max iter, set p̂ = p(j−1). Go to step (9). Else, set s+i = s−i = s

    (j)

    and f+i = f−i = Y

    (j) = f(p(j)) for all i = 1, · · · ,m. Set i = 1 and go tostep (3).

    3. If i > m, set i = 1 and go to step (4). Else, find K+i = n(S+i ) where

    S+i = {l | p(j)l > λ, l 6= i}. If K

    +i ≥ 1, go to step (3.1), else set i = i + 1

    and go to step (3).(a) If s+i ≤ φ, set i = i + 1 and go to step (3). Else (if s

    +i > φ), evaluate

    vector q+i = (q+i1, · · · , q

    +im) such that

    q+il = p(l)i + s

    +i for l = i

    = p(l)i −

    s+iK+i

    if l ∈ S+i

    = p(l)i if l ∈ (S

    +i ∪ {i})

    C

    Go to step (3.2).(b) Check whether q+i ∈ S or not. If q

    +i ∈ S, go to step (3.3). Else, set

    s+i =s+iρ and go to step (3.1)

    (c) Evaluate f+i = f(q+i ). Set i = i+ 1 and go to step (3).

    4. If i > m, go to step (5). Else, find K−i = n(S−i ) where S

    −i = {l | p

    (j)l >

    λ, l 6= i}. If K−i ≥ 1, go to step (4.1), else set i = i+ 1 go to step (4).(a) If s−i ≤ φ, set i = i + 1 and go to step (4). Else (if s

    −i > φ), evaluate

    vector q−i = (q−i1, · · · , q

    −im) such that

    q−il = p(l)i − s

    −i for l = i

    = p(l)i +

    s−iK−i

    if l ∈ S−i

    = p(l)i if l ∈ (S

    −i ∪ {i})

    C

    Go to step (4.2)(b) Check whether q−i ∈ S or not. If q

    −i ∈ S, go to step (4.3). Else, set

    s−i =s−iρ and go to step (4.1)

    (c) Evaluate f−i = f(q−i ). Set i = i+ 1 and go to step (4).

    5. Set k1 = arg min1≤l≤m

    f+l and k2 = arg min1≤l≤m

    f−l . If min(f+k1, f−k2) < Y

    (j), go to

    step (5.1). Else, set p(j+1) = p(j) and Y (j+1) = Y (j), set j = j + 1. Go tostep (7).

    (a) If f+k1 < f−k2

    , set ptemp = q+k1

    , else (if f+k1 ≥ f−k2

    ), set ptemp = q−k2

    . Goto step (6).

    6. Find Kupdated = n(Supdated) where Supdated = {l | ptemp(l) > λ, l =1, · · · ,m}. Go to step (6.1).(a) If Kupdated = m, set p

    (j+1) = ptemp, set j = j+ 1. Go to step (7). Else,go to step (6.2)

  • 8 Priyam Das

    (b) Set garbage =∑j∈SCupdated

    ptemp(k).

    p(j+1)(l) = ptemp(l) + garbage/Kupdated if l ∈ Supdated= 0 if l ∈ SCupdated

    Set j = j + 1. Go to step (7).7. If

    ∑mi=1(p

    (j)(i)−p(j−1)(i))2 < tol fun, set s(j) = s(j−1)/ρ. Go to step (8).Else, set s(j) = s(j−1). Go to step (2).

    8. If s(j) ≤ φ, set p̂ = p(j). Go to step (9). Else, go to step (2).9. STOP execution. Set z(R) = p̂. Set R = R+ 1. Go to STAGE 2.

    STAGE : 2

    1. If R ≤ max runs and z(R) 6= z(R−1), go to step (2). Else z(R) is the finalsolution. STOP and EXIT.

    2. Set ρ = ρ2 keeping other tuning parameters (φ, λ and sinitial) fixed. Repeatalgorithm described in STAGE 1 setting p(1) = z(R).

    At the end of each run, under certain set of assumptions and regularityconditions, our algorithm returns a local minima as the solution (see Section4). After first run, the following runs try to improve the solution returned bythe previous run by making jumps of various step sizes in the parameter spaceand checking the objective function values at those sites. Thus, it tries to finda better solution even after reaching a local minima.

    3 Order of Algorithm

    In this section, we find the order of our algorithm both in terms of the numberof basic operations required and the number of objective function evaluationsrequired as functions of the dimension of the parameter space. Since we aretrying to find the order of our algorithm, finding the upper bound of numberof operations for worst case scenario would be sufficient to determine the order.

    Suppose we want to minimize f(p) where p ∈ S. At the beginning of eachiteration, 4 arrays of length m, i.e., s+, s−, f+, f+ are initialized (see step (2)of STAGE 1 in Section 2). During each iteration, starting from the currentvalue of the parameter 2m possible movements are looked for in a way suchthat each of them belongs to the domain S. Search algorithm for first m ofthese movements have been described in step (3) of STAGE 1 of Section 2.

    Consider the search procedure for any one of these m movements. In theaforementioned step, note that it requires not more than m operations to findS+i . To find K

    +i , it takes at most m operations. As we are considering the

    worst case scenario in terms of maximizing the number or required opera-tions, assume K+i ≥ 1. In step (3.1) and (3.2) of STAGE 1, suppose the valueof s+i is updated atmost k times. So, we have

    sinitialρk−1

    ≤ φ but sinitialρk−2

    > φ.

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 9

    Hence k = 1 +[ log( sinitialφ )

    log(ρ)

    ], where [x] returns the largest interger less than

    or equal to x. Corresponding to each update step of s+i , first it is checkedwhether s+i ≤ φ or not. It involves a single operation. Then deriving q

    +i in-

    volves not more than 2m steps because the most complicated scenario occursfor updating the positions of q+i which belong to S

    +i . And in that case, it takes

    total two operations for each site, one operation to finds+iK+i

    and one more to

    evaluate to subtract that quantity from p(l)i for l ∈ S

    +i . After that, to check

    whether q+i ∈ S or not, it requires m operations. For the worst case scenario,we also add one more step required for updating s+i =

    s+iρ . Hence the search

    procedure of any movement (i.e., for any i ∈ {1, . . . ,m}) in step (3) of STAGE1 requires m+m+k×(1+2m+m+1) = m×(2+3k)+2k operations. Hence,for m movements (mentioned in step (3) of STAGE 1 in Section 2) it requiresnot more than m2 × (2 + 3k) + 2mk operations. In a similar way, it can beshown that for step (4) also the maximum number of required operations isnot more than m2 × (2 + 3k) + 2mk.

    In step (5) of STAGE 1 in Section 2, to find k1 or k2, it takes (m − 1)operations. The required number of steps for this step will be maximized ifmin(f+k1 , f

    −k2

    ) < Y (j). Under this scenario, two more operations (i.e., com-parisons) are required to find ptemp. So this step requires not more than2× (m− 1) + 2 = 2m operations.

    In step (6) of STAGE 1 in Section 2, it takes at most m operations tofind Supdated. In case Kupdated is not m, the required number of operationsspent in this step will be more than the case when Kupdated = m. For find-ing out the number of operations required for the worst case scenario, assumeKupdated < m. To find the value of garbage, maximum number of requiredsteps is not more than m. Finally, it can be noted that in step (6.2), updatingthe value of the parameter of interest from p(j) to p(j+1) requires not morethan 2m steps. So maximum number of operations required for step (6) ofSTAGE 1 is not more than m+m+ 2m = 4m.

    In step (7) of STAGE 1 in Section 2, to find (p(j)(i)− p(j−1)(i))2 for eachi ∈ {1, . . . ,m}, we need one operation for taking difference, and one operationfor taking the square. Hence to find the sum of the squares, it needs (m− 1)more operations. Comparing it’s value with tol fun takes one more operation.In the worst case scenario, it would take two more operations till the end ofthe iteration, i.e., update of s(j) at step (7) and it’s comparison with φ at step(8). Hence after step (6), the required number of operations would be at most(3m− 1) + 1 + 2 = 3m+ 2.

    Hence for each iteration, in the worst case scenario, the number of requiredbasic operations is not more than m2(2 + 3k) + 2mk+ 2m+ 3m+ (3m+ 2) =m2(2 + 3k) +m(2k + 8) + 2. So, number of basic operations required for each

  • 10 Priyam Das

    iteration in our algorithm is of O(m2) where m is the number of parametersto estimate.

    Note that the number of times the function is evaluated in each iterationis 2m+ 1, once at step (2), m times at step (3.3) and m times at step (4.3) ofSTAGE 1 in Section 2. Thus, we note that the order of the number of functionevaluations at each iteration step is of O(m).

    4 Theoretical Properties

    In this section we have shown that if our objective function is continuous,differentiable and convex, then our algorithm gives the global minimum of theobjective function as the solution. Consider the following theorem.

    Theorem 1 Suppose S = {(x1, · · · , xn) ∈ Rn :∑ni=1 xi = 1, xi ≥ 0, i =

    1, · · · , n} and f is convex, continuous and differentiable on S. Consider a se-quence δk =

    sρk

    for k ∈ N and s > 0, ρ > 1. Suppose u is a point in Ssuch that all its coordinates are positive. Define u

    (i+)k = (u1−

    δkn−1 , . . . , ui−1−

    δkn−1 , ui + δk, ui+1 −

    δkn−1 , . . . , un −

    δkn−1 ) and u

    (i−)k = (u1 +

    δkn−1 , . . . , ui−1 +

    δkn−1 , ui − δk, ui+1 +

    δkn−1 , . . . , un +

    δkn−1 ) for i = 1, · · · , n. If for all k ∈ N,

    f(u) ≤ f(u(i+)k ) and f(u) ≤ f(u(i−)k ) (whenever u

    (i+)k ,u

    (i−)k ∈ S) for all

    i = 1, · · · , n, the global minimum of f occurs at u.

    Proof (Proof of Theorem 1) Fix some i ∈ {1, . . . , n}. Define

    r1 = min{(n− 1)u1, . . . , (n− 1)ui−1, (1− u1), (n− 1)ui+1, . . . , (n− 1)un},r2 = min{(n− 1)(1− u1), . . . , (n− 1)(1− ui−1), u1, (n− 1)(1− ui+1), . . . ,

    (n− 1)(1− un)}

    Set r = min{r1, r2}. Since δk is strictly decreasing sequence going to zero,there exist a N ∈ Z such that for all k ≥ N , δk < r. Fix some i ∈ {1, . . . , n}.Hence u

    (i+)N ,u

    (i−)N ∈ S.

    Once we fix the first (n − 1) coordinates of any element in S, the n-thcoordinate can be derived by subtracting the sum of the first (n−1) coordinatesfrom 1. Define

    S∗ = {(x1, · · · , xn−1) ∈ Rn−1 :n∑i=1

    xi < 1, xi ≥ 0, i = 1, · · · , n− 1}.

    Define u∗ = (u1, . . . , un−1) and

    u∗(i+)k = (u1 −

    δkn− 1

    , . . . , ui−1 −δk

    n− 1, ui + δk, ui+1 −

    δkn− 1

    , . . . , un−1 −δk

    n− 1)

    u∗(i−)k = (u1 +

    δkn− 1

    , . . . , ui−1 +δk

    n− 1, ui − δk, ui+1 +

    δkn− 1

    , . . . , un−1 +δk

    n− 1)

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 11

    for i = 1, . . . , n−1. Note that u∗,u∗(i+)k ,u∗(i−)k are the first (n−1) coordinates

    of u,u(i+)k ,u

    (i−)k respectively. Define f

    ∗ : S∗ 7→ R such that

    f∗(x1, . . . , xn−1) = f(x1, . . . , xn−1, 1−n−1∑i=1

    xi).

    Hence we have f∗(u∗) = f(u), f∗(u∗(i+)k ) = f(u

    (i+)k ) and f

    ∗(u∗(i−)k ) =

    f(u(i−)k ). Since, f is continuous and differentiable on S, f

    ∗ is continuous anddifferentiable on S∗. Convexity of f implies f∗ is convex on S∗. Considerx∗1,x

    ∗2 ∈ S∗. Suppose x1,x2 ∈ S are such that their first (m− 1) co-ordinates

    are same as x∗1 and x∗2 respectively. Take any γ ∈ (0, 1). Now

    γf∗(x∗1) + (1− γ)f∗(x∗2) = γf(x1) + (1− γ)f(x2)≥ f(γx1 + (1− γ)x2)= f∗(γx∗1 + (1− γ)x∗2).

    Hence f∗ is also convex. Define hi : Ui 7→ S∗ such that

    hi(z) = (u1 −z

    n− 1, . . . , ui−1 −

    z

    n− 1, ui + z, ui+1 −

    z

    n− 1, . . . un−1 −

    z

    n− 1)

    for i = 1, . . . , n − 1, where Ui = [−δN , δN ] (since each co-ordinate of u ispositive, u∗ ∈ S∗. Note that the way N is chosen ensures hi(Ui) ⊂ S∗). Definegi : Ui 7→ R for i = 1, . . . , n− 1 such that gi = f∗ ◦ hi. Hence we have

    gi(z) = f∗(u1−

    z

    n− 1, . . . , ui−1−

    z

    n− 1, ui+z, ui+1−

    z

    n− 1, . . . un−1−

    z

    n− 1)

    for i = 1, . . . , n− 1.

    It is noted that hi is continuous on Ui = [−δN , δN ] and differentiable on(−δN , δN ) for i = 1, . . . , n − 1 and f∗ is continuous and differentiable onS∗. Composition of two continuous functions is continuous and the composi-tion of two differentiable functions is differentiable. Hence, gi is continuous onUi = [−δN , δN ] and differentiable on (−δN , δN ).

    Take any i ∈ {1, . . . , n − 1}. Note that gi(δN ) = f∗(u∗(i+)N ), gi(−δN ) =f∗(u

    ∗(i−)N ) and gi(0) = f

    ∗(u∗). So, gi(0) ≤ gi(−δN ) and gi(0) ≤ gi(δN ).Without loss of generality, assume f∗(u

    ∗(i−)N ) ≤ f∗(u

    ∗(i+)N ) which implies

    gi(0) ≤ gi(−δN ) ≤ gi(δN ).

    Since we have gi(0) ≤ gi(−δN ) ≤ gi(δN ), from continuity of gi we can saythat there exists w ∈ [0, δN ] such that gi(w) = gi(−δN ) ≥ gi(0). Now gi iscontinuous on [−δN , δN ] and differentiable on (−δN , δN ) implies gi is continu-ous on [−δN , w] and differentiable on (−δN , w). Using Mean value theorem wecan say that there exists a point v ∈ [−δN , w] such that g′i(v) = 0. We claimthat g′i(v) = 0 holds for v = 0.

  • 12 Priyam Das

    Suppose g′i(0) 6= 0. Assume g′i(v∗) = 0 for some v∗ 6= 0 and v∗ ∈ (−δN , w).Without loss of generality, we assume v∗ > 0. Since hi and f are convex, giis also convex. Now g′i(v

    ∗) = 0 implies v∗ is a local minima. On the otherhand, since g′i(0) 6= 0, implies 0 is not a critical point or local minima. Hence,gi(0) > gi(v

    ∗). Take N1 ∈ Z such that 0 < δN1 < v∗. Hence there exists aλ ∈ (0, 1) such that δN1 = λ.0 + (1− λ).v∗. Now,

    gi(δN1) = gi(λ.0 + (1− λ).v∗)≤ λgi(0) + (1− λ)gi(v∗)= gi(0) + (1− λ)(gi(v∗)− gi(0))= gi(0)− (1− λ)(gi(0)− gi(v∗))< gi(0).

    But, we know for all k ∈ Z, gi(0) ≤ gi(δk) which implies gi(0) ≤ gi(δN1). It isa contradiction.Hence we have g′i(0) = 0. Now

    g′i(0) =

    [∂

    ∂�gi(�)

    ]�=0

    =

    [∂

    ∂�f∗(hi(�))

    ]�=0

    =

    [∂

    ∂hi(�)f∗(hi(�))

    ]�=0

    [∂

    ∂�hi(�)

    ]�=0

    .

    Now hi(0) = u∗. Hence[∂

    ∂hi(�)f∗(hi(�))

    ]�=0

    = ∇f∗(u∗)

    =

    [∂

    ∂x1f∗(u∗), . . . ,

    ∂xn−1f∗(u∗)

    ]=

    [∇1, . . . ,∇n−1

    ]where ∇i = ∂∂xi f

    ∗(u∗) for i = 1, . . . , n− 1. and

    ∂�hi(�) = [ai1, . . . , ai(n−1)]

    T

    where aii = 1 and aij = − 1n−1 for j ∈ {1, . . . , n− 1} \ {i} Hence[∂

    ∂�gi(�)

    ]�=0

    =

    [∇1, . . . ,∇n−1

    ][ai1, . . . , ai(n−1)

    ]T

    =

    [ai1, . . . , ai(n−1)

    ] ∇1...∇n−1

    = 0.

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 13

    Since this equation holds for all i = 1, · · · , n− 1, we have Ax = 0 where

    An×n =

    1 − 1n−1 · · · −

    1n−1

    − 1n−1 1 · · · −1

    n−1...

    .... . .

    ...− 1n−1 −

    1n−1 · · · 1

    , xn×1 = ∇1...∇n−1

    .

    Since A is full rank for n ∈ N\{1}, Ax = 0 implies x = 0. Hence ∂∂xi f∗(u∗) = 0

    for all i = 1, . . . , n− 1. Hence u∗ is a critical point. Since f∗ is convex, a localminima occurs at u∗. But for a convex function, global minimum occurs atany local minimum. Hence global minimum of f∗ occurs at u∗, which clearlyimplies global minimum of f occurs at u.

    Suppose the solution given by GCDVSS is a point u ∈ S such that all it’scoordinates are greater than zero. Our algorithm stops and yields the finalsolution when two consecutive runs give the same solution. It implies in the lastrun, for all movements of step sizes δk =

    sinitialρk

    (until δk gets smaller than the

    step size threshold) the objective function value is checked at u(i+)k ,u

    (i−)k and

    f(u) ≤ f(u(i+)k ) and f(u) ≤ f(u(i−)k ) hold for all i = 1, . . . , n. So taking step

    size threshold small enough, this algorithm reaches the global minimum underassumed regularity conditions of the objective function. Note that, ideallythe value of λ should be taken to be zero. But, in practical, it is noted thatsetting a small non-zero value of λ (specially in high-dimensional problemswith possibility of sparsity) increases the efficiency and accuracy of the solutionprovided by this algorithm.

    5 Generalization to some other cases

    In this section the proposed algorithm has been extended for two general cases,namely simplex inequality and single linearly constrained parameter space.

    5.1 Simplex Inequality

    Consider the case where the optimization problem is given by

    minimize : f(p1, · · · , pm)

    subject to : pi ≥ 0, 1 ≤ i ≤ m,m∑i=1

    pi ≤ 1. (4)

    Under this scenario, a slack variable pm+1 is introduced such that pm+1 ≥ 0and

    ∑m+1i=1 pi = 1. Define f1(p1, . . . , pm+1) = f(p1, . . . , pm). So, the modified

  • 14 Priyam Das

    optimization problem which is equivalent to Equation (4) is given by

    minimize : f1(p1, · · · , pm+1)

    subject to : pi ≥ 0, 1 ≤ i ≤ m+ 1,m+1∑i=1

    pi = 1, (5)

    which can be easily solved using the proposed algorithm.

    5.2 Linear Constraint with Positive Coefficients

    Now, consider the case where the optimization problem is as following

    minimize : f(x1, · · · , xm)

    subject to : xi ≥ 0, 1 ≤ i ≤ m,m∑i=1

    aixi = K, (6)

    where {ai}mi=1, K are given positive constants. To solve this problem, considerthe change of variable given by yi =

    aixiK for i = 1, . . . ,m. yi is non-negative

    since K > 0, xi ≥ 0 and ai > 0 for i = 1, . . . ,m. Now,∑mi=1 aixi = K is

    equivalent to∑mi=1 yi = 1. Consider the mapping g : Rm 7→ Rm

    g(y1, . . . , ym) =

    (Ky1a1

    , . . . ,Kymam

    ).

    Define h : Rm 7→ R such that h = f ◦ g. So,

    h(y1, . . . , ym) = f(g(y1, . . . , ym))

    = f

    (Ky1a1

    , . . . ,Kymam

    )= f(x1, . . . , xm).

    Hence, the optimization problem in Equation (6) is equivalent to

    minimize : h(y1, · · · , ym)

    subject to : yi ≥ 0, 1 ≤ i ≤ m,m∑i=1

    yi = 1, (7)

    which can be solved using the proposed algorithm.

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 15

    6 Application to non-convex global optimization on Simplex

    In this section, we compare the performance of the proposed method (GCD-VSS) to three standard constrained optimization methods : the ‘interior-point’(IP) algorithm, ‘sequential quadratic programming’ (SQP) and ‘genetic algo-rithm’ (GA) for optimization of non-convex problems on simplex parameterspace. All of the above-mentioned well-known algorithms are available in Mat-lab R2014a (The Mathworks) via the Optimization Toolbox functions fmincon(for IP and SQP algorithm) and ga (for GA). IP and SQP search for local min-imum and they are less time consuming in general. On the other hand GA triesto find global minimum, being more time consuming. In the following stud-ies, we considered the convergence to the true solution to be successful if theabsolute distance of the optimum objective function value returned by thealgorithms with the true optimum value of the objective function is less than10−2. For GCDVSS algorithm, the values of all the tuning parameters havebeen taken to be same as mentioned in Section 2. For IP and SQP algorithms,the upper bound for maximum number of iterations and function evaluationsis set to be infinity each. For GA, we use the default options of ‘ga’ functionin Matlab R2014a. GCDVSS algorithm is implemented in Matlab R2014a.We perform the simulations in a machine with 64-Bit Windows 8.1, Intel i73.60GHz processors and 32GB RAM.

    6.1 Maximum of two Gaussian densities

    Consider the problem

    maximize : max(8 ∗ φ(p;µ1, Σ1), 5 ∗ φ(p;µ2, Σ2)

    )subject to : p1, p2 ≥ 0, p1 + p2 = 1 (8)

    where p = (p1, p2) is our parameter of interest, φ(x;µ,Σ) denotes the normal

    density at x with mean µ and covariance matrix Σ. Here, µ1 =

    [0.250.75

    ], µ2 =[

    0.80.2

    ]and Σ1 = Σ2 =

    [0.1 00 0.1

    ]. In Figure 1(b), we plot the function on the

    restricted parameter space. Note that is has two local maximas at (p1, p2) =(0.8, 0.2) and (0.25, 0.75). Out of these two points, the global maxima occurs at(0.25, 0.75). For comparative study, we take starting point (p1, p2) = (0.8, 0.2)(which is a local maxima, not the global maxima) for GCDVSS, SQP, IPand GA. Starting from this point, it is to be noted that the global maxima(0.25, 0.75) is reached using GCDVSS and GA algorithms only and the average(over 100 repetitions) time required for convergence are 0.07 and 1.52 secondsrespectively. While using IP and SQP algorithms, the starting point being thelocal maxima, is returned as the final solution and hence the global maxima isnot achieved. We also perform a comparison study between these four methods,starting from 100 randomly generated points satisfying the simplex constraint.In Table 1 it is noted that out of 100 times, every time GCDVSS and GA

  • 16 Priyam Das

    (a) (b)

    Fig. 1 In Example (6.1) (a) Value of f(p) on unrestricted X − Y plane. (b) f(p) onrestricted 1-simplex defined by x + y = 1; x, y ≥ 0.

    reach the global maxima and GCDVSS is (on average) 21 times faster thanGA. SQP and IP reaches the global maxima 67 and 71 times respectively outof 100 times.

    6.2 Modified Easom function on simplex

    Consider the following problem

    maximize : cos(6πp1) cos(6πp2) cos(6πp3) exp(−3∑i=1

    (3πpi − π)2)

    subject to : p1, p2, p3 ≥ 0, p1 + p2 + p3 = 1 (9)

    This function has multiple local maxima (see Figure 2) with the global maximaat p = ( 13 ,

    13 ,

    13 ), the functional value at this point being 1. Starting from

    randomly generated 100 points within the domain of simplex, the result ofthe comparative study of all above-mentioned algorithms have been shown inTable 1. It is noted that GCDVSS is around 85 times faster than GA.

    6.3 Non-linear non-convex optimization on 2-simplex

    Here we consider a problem of non-linear non-convex function optimization onthe a linearly constrained space on R2.

    maximize : sin(7πx

    4) + sin(

    7πy

    4)− 2(x− y)2

    subject to : 3x+ 2y ≤ 6, x, y ≥ 0 (10)

    In Figure 3(a), we plot a heat-map of the values of this function over theparameter space. It can be easily noted that there exist 4 local maximas of

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 17

    Fig. 2 Heat-map of functional values of modified Easom function on 3-simplex in Example(6.2).

    (a) (b)

    Fig. 3 (a) Heat-map of f(p) on restricted X − Y plane in Example (6.3). (b) Heat-map off(p) for n = 3 in Example (6.4).

    which the global maximum occurs at (x, y) = ( 27 ,27 ) = (0.2857, 0.2857), the

    objective function value being 2 at this point. Note that it can be consideredas an optimization problem on simplex. Because, any point in the feasibleregion is in the convex hull generated by (0, 0), (2, 0) and (0, 3) on R2. InTable 1, we note that GCDVSS outperforms the other algorithms based onthe comparative study of the number of successful convergences for all methodsstarting from 100 randomly generated starting points.

  • 18 Priyam Das

    AlgorithmsExample 6.1 Example 6.2 Example 6.3

    Success(%)

    Avg. Time(sec)

    Success(%)

    Avg. Time(sec)

    Success(%)

    Avg. Time(sec)

    GCDVSS 100 0.071 100 0.016 100 0.019SQP 67 0.014 31 0.014 44 0.025IP 71 0.023 47 0.030 47 0.014GA 100 1.494 100 1.354 100 1.270

    Table 1 Comparison of required time and number of successful convergence for solving theproblems in Example (6.1), (6.2) and (6.3) using GCDVSS, SQP, IP and GA starting from100 randomly generated points.

    Algorithmsn=5 n=10 n=25 n=50 n=100

    No. ofsuccess

    Avg.time

    No. ofsuccess

    Avg.time

    No. ofsuccess

    Avg.time

    No. ofsuccess

    Avg.time

    No. ofsuccess

    Avg.time

    GCDVSS 100 0.040 100 0.079 100 0.226 100 0.394 100 0.909SQP 31 0.008 22 0.010 16 0.017 15 0.028 7 0.063IP 30 0.022 22 0.026 15 0.052 22 0.094 26 0.183GA 4 2.910 0 55.762 0 51.091 0 50.345 0 53.232

    Table 2 Comparison of required time and number of successful convergence for solving theproblem in Example (6.4) using GCDVSS, SQP, IP and GA for n = 5, 10, 25, 50, 100 startingfrom 100 randomly generated points in each case.

    6.4 Optimization of function with multiple local extremums on boundarypoints for various dimensions

    Consider the problem

    maximize :

    n∑i=1

    ip4i

    subject to : pi ≥ 0, i = 1, · · · , nn∑i=1

    pi = 1 (11)

    where n is any positive integer. This function has local maximas at the bound-ary points of the simplex. But the global maxima occurs at p̂ = (p1, · · · , pn)for p1 = . . . = pn−1 = 0 and pn = 1. The objective function value atthis point is equal to n. With increasing value of n, it gets harder to es-timate the global maxima. In Figure 3(b), we plot the heat map of f(p)for n = 3. It can be seen that this function has three local maxima atP1 = (1, 0, 0), P2 = (0, 1, 0) and P3 = (0, 0, 1) where P3 = (0, 0, 1) is the globalmaxima. For each n = 5, 10, 25, 50, 100, we perform a comparative study ofperformances of all the above-mentioned algorithms starting from randomlygenerated 100 points within corresponding domain. In Table (2), we note thatunlike GA, GCDVSS works well for high-dimensional cases also. In higherdimensions, GCDVSS outperforms GA significantly. It is also noted that therequired time of computation for GCDVSS algorithm increases almost linearlywith dimension of the problem.

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 19

    6.5 Transformed Ackley’s Function on Simplex

    Consider an unconstrained function f needs to be minimized on a d-dimensionalhypercube Dd where D = [l, u] for some constants l, u in R. Consider the bi-jection map g : D 7→ [0, 1d ] such that g(xi) = yi =

    xi−ld(u−l) for i = 1, . . . , d.

    Replacing the original parameters of the problem with the transformed pa-rameters we get

    f(x1, . . . , xd) = f(g−1(y1), . . . , g

    −1(yd)).

    Now, define h : [0, 1d ]d 7→ R such that

    h(y) = h(y1, . . . , yd) = f(g−1(y1), . . . , g

    −1(yd)).

    Consider the set S = {(z1, . . . , zd) |zi ≥ 0,∑di=1 zi ≤ 1}. Clearly [0,

    1d ]d ⊂ S.

    Define h′ : S 7→ R which is equal to function h considered on the extendeddomain S. We have yi ∈ [0, 1d ] for i = 1, . . . , d and 0 ≤

    ∑di=1 yi ≤ 1. Define

    yd+1 = 1 −∑di=1 yi. Clearly, 0 ≤ yd+1 ≤ 1 and

    ∑d+1i=1 yi = 1. Hence we can

    conclude that ȳ = [y, yd+1] ∈ ∆d where y = (y1, . . . , yd) and

    ∆d = {(y1, . . . , yd+1) ∈ Rd+1 | yi ≥ 0, i = 1, . . . , d+ 1,d+1∑i=1

    yi = 1}.

    Now define h̄ : ∆d 7→ R such that h̄(ȳ) = h̄(y1, . . . , yd+1) = h′(y1, . . . , yd) forȳ ∈ ∆d. It can be seen that ȳ ∈ ∆d implies (y1, . . . , yd) ∈ S. Suppose theglobal minimum of the function f occurs at (m1, . . . ,md) in D

    d. Hence, thefunction h̄ will have the global minimum at ȳ =

    (g−1(m1), . . . , g

    −1(md), 1 −∑di=1 g

    −1(mi))

    in ∆d.

    d-dimensional Ackley’s function is given by

    f(x1, . . . , xd) = −20 exp(−0.2

    √√√√0.5 d∑i=1

    x2i )− exp(0.5d∑i=1

    cos(2πxi)) + e+ 20

    The domain of x = (x1, . . . , xd) is generally taken to be [−5, 5]d. The globalminimum of f is 0 (even if considered on Rd) which occurs at x∗ = (x∗1, . . . , x∗d) =(0, . . . , 0). After doing the above mentioned transformations taking l = −5and u = 5, we get the transformed Ackley’s function on a d-dimensional unit-simplex ∆d given by

    h̄(ȳ) =h̄(y1, . . . , yd+1)

    =− 20 exp(−0.2

    √√√√0.5 d∑i=1

    (g−1(yi))2 − exp(0.5d∑i=1

    cos(2πg−1(yi))) + e+ 20

    The global minimum of the transformed Ackley’s function on simplex occurs atȳ∗1×(d+1) = (

    12d , . . . ,

    12d ,

    12 ) which is found using inverse transformation on x

    ∗ as

  • 20 Priyam Das

    mentioned above. For comparative study we considered GCDVSS algorithmfor three set of parameter with default parameter values (as mentioned inSection 2) and two more set of parameter values which are GCDVSS (pl1)and GCDVSS (pl2) (pl stands for precision level). In GCDVSS (pl1), we takeλ = φ = 10−5 and for GCDVSS (pl2), we take λ = φ = 10−7 keeping the valuesof the other parameters same as default. For each algorithm, transformedAckley’s function has been optimized for d = 5, 10, 25, 50 and 100. In eachcase, the objective function has been minimized starting from 100 randomlychosen points. In Table 3, the average computation time and the minimumvalue achieved for each algorithm have been given. It is noted that for thisfunction, GCDVSS algorithm outperforms all other algorithms significantly.It is also observed that taking smaller values of λ and φ improves the accuracyof the solution at the cost of higher computation time. For d = 5, GCDVSS(pl1) yields better solution than GA with a 235 folds improvement in averagecomputation time.

    6.6 Transformed Griewank’s Function on Simplex

    d-dimensional Griewank’s function is given by

    f(x1, . . . , xd) =1

    4000

    d∑i=1

    x2i −d∏i=1

    cos( xi√

    i

    )+ 1

    The domain of x = (x1, . . . , xd) is generally taken to be [−500, 500]d. Theglobal minimum of f is 0 (even if considered on Rd) which occurs at x∗ =(x∗1, . . . , x

    ∗d) = (0, . . . , 0). Similar to the previous problem, after performing

    the above mentioned transformations taking l = −500 and u = 500, we getthe transformed Griewank’s function on a d-dimensional unit-simplex ∆d givenby

    h̄(ȳ) =h̄(y1, . . . , yd+1)

    =1

    4000

    d∑i=1

    (g−1(yi))2 −

    d∏i=1

    cos(g−1(yi)√

    i

    )+ 1

    Like the previous function, in this case also the transformed global minimumoccurs at ȳ∗1×(d+1) = (

    12d , . . . ,

    12d ,

    12 ). Simulation study has been performed for

    this function under the similar setup of Example (6.5). In this case, althoughSQP and IP performs better than GCDVSS (default, pl1 & pl2) and GA forsmaller dimensional cases but GCDVSS (pl1 & pl2) performs same or betterthan other algorithms in high-dimensional problems. In this case also, thesolution improves taking the values of λ and φ smaller. Note that for d = 5,there has been more than 338 folds improvement in average computation timeon using GCDVSS(pl1) over GA along with more accuracy.

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 21

    6.7 Transformed Rastrigin’s Function on Simplex

    d-dimensional Rastrigin’s function is given by

    f(x1, . . . , xd) = 10d+

    d∑i=1

    [x2i − 10 cos(2πxi)]

    The domain of x = (x1, . . . , xd) is generally taken to be [−5, 5]d. After trans-formation in the above-mentioned way, the transformed Rastrigin’s functionon ∆d is given by

    h̄(ȳ) =h̄(y1, . . . , yd+1)

    =10d+

    d∑i=1

    [(g−1(yi))2 − 10 cos(2πg−1(yi))]

    The global minimum of h̄ occurs at ȳ∗1×(d+1) = (12d , . . . ,

    12d ,

    12 ) which follows

    from the fact that the global minimum of the original form of Rastrigin’sfunction occurs at x∗ = (x∗1, . . . , x

    ∗d) = (0, . . . , 0). Comparative study of per-

    formances of the algorithms has been carried out for this function under thesame setup of (6.5). In Table 3 it is noted that GCDVSS outperforms otheralgorithms significantly. It is noticeable that for d = 5, GCDVSS (pl2) givesmore accurate solution than GA with a 253 folds improvement in averagecomputation time.

    7 Discussion

    This paper has presented a novel efficient derivative-free algorithm for globaloptimization of any objective function whose parameters are on simplex. Thisalgorithm being derivative-free, is efficient when the closed form of derivativedoes not exist or it is expensive to evaluate. Unlike other global optimizationtechniques (e.g., genetic algorithm), the number of required function evalua-tions for this algorithm increases only in the order of the number of parameterswhich makes it work faster for high dimensional problems. The way this al-gorithm evaluates the objective function value at different sites of the samplespace, incorporation of parallelization can be done easily making the algorithmeven faster for expensive high dimensional objective functions. The require-ment of parallelization only increases linearly with the number of parameters.

    Another unique feature of this algorithm is, unlike other global optimiz-ers (e.g., genetic algorithm) this algorithm works fast enough for simpler andconvex optimization problems. Also it guarantees the global solution when thefunction is convex, continuous and differentiable on the simplex domain.

    The accuracy of the solution can be controlled changing the values of thetuning parameters. In Example (6.5), (6.6) and (6.7) it is noted that smaller

  • 22 Priyam Das

    Fu

    nct

    ion

    sA

    lgori

    thm

    sd

    =5

    d=

    10

    d=

    25

    d=

    50

    d=

    100

    Min

    .

    valu

    e

    Avg.

    tim

    e

    Min

    .

    valu

    e

    Avg.

    tim

    e

    Min

    .

    valu

    e

    Avg.

    tim

    e

    Min

    .

    valu

    e

    Avg.

    tim

    e

    Min

    .

    valu

    e

    Avg.

    tim

    e

    Ack

    ley’s

    Fu

    nct

    ion

    (tra

    nsf

    orm

    ed)

    GC

    DV

    SS

    3.2

    4e

    -02

    0.0

    99

    1.1

    6e

    -01

    0.1

    99

    3.7

    4e

    -01

    0.5

    46

    9.2

    4e

    -01

    1.1

    89

    1.3

    4e

    -00

    3.6

    85

    GC

    DV

    SS

    (pl1

    )2.8

    8e

    -04

    0.1

    63

    4.9

    1e

    -04

    0.3

    14

    2.5

    9e

    -03

    0.8

    76

    5.1

    5e

    -03

    2.2

    34

    1.0

    8e

    -02

    6.2

    77

    GC

    DV

    SS

    (pl2

    )1.3

    1e

    -06

    0.1

    86

    6.4

    5e

    -06

    0.3

    94

    2.4

    7e

    -05

    1.1

    94

    4.9

    7e

    -05

    3.0

    63

    1.0

    4e

    -04

    9.3

    07

    SQ

    P1.6

    5e

    -00

    0.0

    32

    2.0

    1e

    -00

    0.0

    60

    4.7

    1e

    -00

    0.1

    92

    1.7

    3e

    -00

    0.5

    41

    1.2

    7e

    -00

    2.8

    61

    IP2.3

    2e

    -00

    0.0

    78

    2.3

    2e

    -00

    0.1

    39

    8.5

    8e

    -00

    0.3

    61

    1.3

    2e

    +01

    0.8

    65

    1.4

    9e

    +01

    2.8

    63

    GA

    8.4

    8e

    -04

    38.3

    95

    3.6

    0e

    -00

    40.6

    53

    1.1

    2e

    +01

    40.0

    97

    1.4

    4e

    +01

    39.7

    61

    1.6

    0e

    +01

    46.6

    26

    Gri

    ewan

    k’s

    Fu

    nct

    ion

    (tra

    nsf

    orm

    ed)

    GC

    DV

    SS

    8.3

    9e

    -02

    0.0

    69

    6.4

    4e

    -01

    0.1

    03

    1.2

    5e

    -00

    0.3

    22

    2.8

    1e

    -00

    0.7

    95

    1.3

    0e

    +01

    2.1

    52

    GC

    DV

    SS

    (pl1

    )7.6

    5e

    -03

    0.1

    11

    8.8

    7e

    -03

    0.2

    13

    6.2

    4e

    -03

    0.5

    99

    2.6

    0e

    -02

    1.5

    18

    9.1

    2e

    -02

    5.6

    09

    GC

    DV

    SS

    (pl2

    )7.4

    0e

    -03

    0.1

    36

    7.4

    0e

    -03

    0.2

    48

    4.8

    7e

    -07

    0.8

    08

    2.6

    4e

    -06

    2.1

    86

    1.2

    4e

    -05

    6.3

    45

    SQ

    P3.7

    0e

    -01

    0.0

    41

    8.8

    4e

    -09

    0.0

    84

    6.7

    0e

    -08

    0.2

    67

    1.2

    4e

    -05

    0.7

    84

    3.5

    1e

    -04

    3.1

    38

    IP2.0

    4e

    -01

    0.1

    02

    2.3

    3e

    -09

    0.1

    74

    2.4

    7e

    -08

    0.2

    94

    1.8

    6e

    -07

    0.5

    86

    1.2

    3e

    -06

    1.8

    02

    GA

    1.9

    4e

    -02

    37.5

    78

    1.2

    3e

    +01

    37.7

    65

    7.5

    2e

    +02

    38.5

    98

    3.5

    2e

    +03

    40.1

    48

    1.0

    6e

    +04

    46.5

    27

    Rast

    rigin

    ’sF

    un

    ctio

    n

    (tra

    nsf

    orm

    ed)

    GC

    DV

    SS

    1.2

    5e

    -00

    0.0

    68

    2.6

    8e

    -00

    0.1

    25

    9.2

    1e

    -00

    0.3

    00

    1.6

    0e

    +01

    0.8

    59

    4.7

    7e

    +01

    2.3

    49

    GC

    DV

    SS

    (pl1

    )3.0

    7e

    -05

    0.1

    18

    3.9

    8e

    -00

    0.2

    11

    2.6

    1e

    -03

    0.6

    40

    9.9

    7e

    -00

    1.4

    53

    6.1

    5e

    -00

    4.2

    13

    GC

    DV

    SS

    (pl2

    )1.5

    1e

    -09

    0.1

    63

    3.9

    8e

    -00

    0.3

    33

    2.5

    5e

    -07

    0.9

    29

    9.9

    5e

    -01

    2.0

    84

    5.9

    7e

    -00

    6.0

    14

    SQ

    P2.9

    8e

    -00

    0.0

    30

    2.1

    9e

    +01

    0.0

    62

    1.7

    9e

    +02

    0.2

    20

    4.0

    5e

    +02

    0.7

    18

    6.8

    5e

    +02

    3.5

    78

    IP9.9

    5e

    -00

    0.0

    76

    6.1

    7e

    +01

    0.1

    38

    2.0

    1e

    +02

    0.9

    47

    4.2

    5e

    +02

    15.1

    06

    8.0

    4e

    +02

    3.3

    42

    GA

    6.8

    e-

    06

    41.2

    71

    3.2

    4e

    +01

    40.7

    72

    4.6

    8e

    +02

    40.7

    30

    1.8

    8e

    +03

    42.5

    53

    5.3

    2e

    +03

    45.7

    97

    Table

    3C

    om

    pari

    son

    of

    min

    imu

    mvalu

    each

    ieved

    an

    daver

    age

    com

    pu

    tati

    on

    tim

    e(i

    nse

    cond

    s)fo

    rso

    lvin

    gtr

    an

    sform

    edd-d

    imen

    sion

    al

    Ack

    ley’s

    fun

    ctio

    n,

    Gri

    ewan

    k’s

    fun

    ctio

    nan

    dR

    ast

    rigin

    ’sfu

    nct

    ion

    on

    sim

    ple

    xfo

    rd

    =5,1

    0,2

    5,5

    0an

    d100

    usi

    ng

    GC

    DV

    SS

    (dea

    fult

    ,lp

    1&

    lp2),

    SQ

    P,

    IPan

    dG

    Afo

    rd

    =5,1

    0,2

    5,5

    0,1

    00

    start

    ing

    from

    100

    ran

    dom

    lygen

    erate

    dp

    oin

    tsin

    each

    case

    .

  • Derivative-free Efficient Global Optimization on High-dimensional Simplex 23

    values of step size threshold (φ) and sparsity threshold (λ) improves solutionaccuracy in the cost of higher computation time. In case of beforehand knowl-edge of sparsity (specially in high-dimensional simplex), increasing sparsitythreshold results in better solution with relatively lower computation time.Under the default values of the tuning parameters (described in Section 2), itis shown that this algorithm outperforms SQP, IP and GA for optimizing vari-ous non-convex functions on simplex. For higher dimensional problems, settinglower values of λ and φ is recommended for improving the performance of theproposed algorithm.

    Acknowledgements I would really like to thank Dr. Rudrodip Majumdar, Debraj Dasand Suman Chakraborty for helping me editing the earlier drafts of this paper and for theirvaluable suggestions for improvements. I would also like to acknowledge my adviser Dr.Subhashis Ghoshal for his valuable suggestions and suggested statistical problems whichmade me think of this algorithm.

    References

    1. D. W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters,Journal of the Society for Industrial and Applied Mathematics, Vol. 11, 431–441 (1963)

    2. R. H. Byrd, M. E. Hribar, J. Nocedal, An interior point algorithm for large scalenonlinear programming, SIAM Journal on Optimization, Vol. 9(4), 877–900 (1999)

    3. R. H. Byrd, J. C. Gilbert, J. Nocedal, A trust region method based on interior pointtechniques for nonlinear programming, Mathematical Pro gramming, Vol. 89(1), 149–185(2000)

    4. T. F. Coleman, Y. Li, An interior, trust region approach for nonlinear minimizationsubject to bounds, SIAM Journal on Optimization, Vol. 6, 418–445 (1996)

    5. F. A. Potra, S. J. Wright, Interior-point methods, Journal of Computational and AppliedMathematics, Vol. 4, 281–302 (2000)

    6. N. Karmakar, New polynomial-time algorithm for linear programming, COMBINA-TOR1CA, Vol. 4, 373–395 (1984)

    7. S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, Cam-bridge, 2006

    8. M. H. Wright, The interior-point revolution in optimization: History, recent develop-ments, and lasting consequences, Bulletin of American Mathematical Society, Vol. 42,39–56 (2005)

    9. J. Nocedal, S. J. Wright, Numerical Optimization, 2nd Edition, Operations ResearchSeries, Springer, 2006

    10. P. T. Boggs, J. W. Tolle, Sequential quadratic programming, Acta Numerica, 152, (1996)

    11. J. Goodner, G. A. Tsianos, Y. Li, G. Loeb, Biosearch: A physiologically plausiblelearning model for the sensorimotor system, Proceedings of the Society for NeuroscienceAnnual Meeting, 275.22/LL11 (2012)

  • 24 Priyam Das

    12. A. S. Fraser, Simulation of genetic systems by automatic digital computers i. introduc-tion, Australian Journal of Biological Sciences, Vol. 10, 484–491 (1957)

    13. A. D. Bethke, Genetic algorithms as function optimizers (1980)

    14. D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning,Operations Research Series, Addison-Wesley Publishing Company, (1989)

    15. S. Kirkpatrick, C. D. G. Jr, M. P. Vecchi, Optimization by simulated annealing,Australian Journal of Biological Sciences, Vol. 220(4598), 671–680 (1983)

    16. V. Granville, M. Krivanek, J. P. Rasson, Simulated annealing: A proof of convergence,IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, 652–656 (1994)

    17. L. Geris, Computational Modeling in Tissue Engineering, Springer, 2012

    18. C. C. Kerr, T. G. Smolinski, S. Dura-Bernal, D. P. Wilson, Optimization by bayesianadaptive locally linear stochastic descent, http://thekerrlab.com/ballsd/ballsd.pdf

    19. J. A. Nelder, R. Mead, A simplex method for function minimization, ComputerJournal, Vol. 7, 308–313 (1965)

    20. T. Steihaug, S. Suleiman, Global convergence and the powell singular function, Journalof Global Optimization, Vol. 56(3), 845–853 (2013)

    21. M. Hilbert, P. Lopez, The Worlds Technological Capacity to Store, Communicate, andCompute Information, Science, Vol 332(60), 60 – 65 (2011)

    http://thekerrlab.com/ballsd/ballsd.pdf

    1 Introduction2 Algorithm3 Order of Algorithm4 Theoretical Properties5 Generalization to some other cases6 Application to non-convex global optimization on Simplex 7 Discussion