reducing the number of function evaluations in mesh adaptive

24
Reducing the Number of Function Evaluations in Mesh Adaptive Direct Search Algorithms Charles Audet * Andrea Ianni ebastien Le Digabel Christophe Tribes § October 12, 2012 Abstract: The Mesh Adaptive Direct Search (MADS) class of algorithms is designed for nonsmooth optimization, where the objective function and constraints are typically computed by launching a time-consuming computer simulation. Each iteration of a MADS algorithm attempts to improve the current best-known solution by launching the simulation at a finite number of trial points. Common implementations of MADS generate 2n trial points at each iteration, where n is the number of variables in the optimization problem. The objective of the present work is to reduce that number. We present an algorithmic framework that reduces the number of simulations to exactly n + 1, without impacting the theoretical guarantees from the convergence analysis. Numerical experiments are conducted for several different contexts; the results suggest that these strategies allow the new algorithms to reach a better solution with fewer function evaluations. Key Words: Mesh Adaptive Direct Search (MADS) algorithms, derivative-free optimization, positive spanning sets, nonsmooth optimization. 1 Introduction Many optimization problems may be formulated as min xΩ f (x), (1) * GERAD and D´ epartement de math´ ematiques et g´ enie industriel, ´ Ecole Polytechnique de Montr´ eal, C.P. 6079, Succ. Centre-ville, Montr´ eal, Qu´ ebec, Canada H3C 3A7, www.gerad.ca/Charles.Audet. Department of Computer, Control, and Management Engineering Antonio Ruberti (La Sapienza, Universit` a di Roma), Via Ariosto 25, 00185 Rome, Italy, https://sites.google.com/site/operationalresearchforall/ GERAD and D´ epartement de math´ ematiques et g´ enie industriel, ´ Ecole Polytechnique de Montr´ eal, C.P. 6079, Succ. Centre-ville, Montr´ eal, Qu´ ebec, Canada H3C 3A7, www.gerad.ca/Sebastien.Le.Digabel. § ´ Ecole Polytechnique de Montr´ eal, C.P. 6079, Succ. Centre-ville, Montr´ eal, Qu´ ebec, Canada H3C 3A7.

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reducing the Number of Function Evaluations in Mesh Adaptive

Reducing the Number of Function Evaluations inMesh Adaptive Direct Search Algorithms

Charles Audet∗ Andrea Ianni† Sebastien Le Digabel‡

Christophe Tribes§

October 12, 2012

Abstract: The Mesh Adaptive Direct Search (MADS) class of algorithms is designed fornonsmooth optimization, where the objective function and constraints are typically computedby launching a time-consuming computer simulation. Each iteration of a MADS algorithmattempts to improve the current best-known solution by launching the simulation at a finitenumber of trial points. Common implementations of MADS generate 2n trial points at eachiteration, where n is the number of variables in the optimization problem. The objective of thepresent work is to reduce that number. We present an algorithmic framework that reduces thenumber of simulations to exactly n+1, without impacting the theoretical guarantees from theconvergence analysis. Numerical experiments are conducted for several different contexts;the results suggest that these strategies allow the new algorithms to reach a better solutionwith fewer function evaluations.

Key Words: Mesh Adaptive Direct Search (MADS) algorithms, derivative-free optimization,positive spanning sets, nonsmooth optimization.

1 IntroductionMany optimization problems may be formulated as

minx∈Ω

f (x), (1)

∗GERAD and Departement de mathematiques et genie industriel, Ecole Polytechnique de Montreal, C.P.6079, Succ. Centre-ville, Montreal, Quebec, Canada H3C 3A7, www.gerad.ca/Charles.Audet.

†Department of Computer, Control, and Management Engineering Antonio Ruberti (La Sapienza, Universitadi Roma), Via Ariosto 25, 00185 Rome, Italy, https://sites.google.com/site/operationalresearchforall/

‡GERAD and Departement de mathematiques et genie industriel, Ecole Polytechnique de Montreal, C.P.6079, Succ. Centre-ville, Montreal, Quebec, Canada H3C 3A7, www.gerad.ca/Sebastien.Le.Digabel.

§Ecole Polytechnique de Montreal, C.P. 6079, Succ. Centre-ville, Montreal, Quebec, Canada H3C 3A7.

Page 2: Reducing the Number of Function Evaluations in Mesh Adaptive

where f is a single-valued objective function, and Ω is the set of feasible solutions in Rn.The Mesh Adaptive Direct Search (MADS) class of algorithms [6] is designed for situationswhere f and the inequality constraints used to define the set Ω are not known analyticallybut are instead the result of a computer simulation. MADS belongs to the family of direct-search methods, which work directly with the function values returned by the simulationwithout information about the properties of the problem. There are no assumptions about thecontinuity or differentiability of the functions. The recent book [14] discusses the generalcontext of derivative-free optimization.

In the 1990s, Torczon proposed the Generalized Pattern Search (GPS) class of algo-rithms [33] for derivative-free unconstrained optimization. This class includes algorithmssuch as Coordinate Search (CS), evolutionary operation [11], the original pattern search al-gorithm [25], and the multidirectional search algorithm [19]. These are iterative methodswhere each iteration attempts to improve the current best solution, called the incumbent, bylaunching the simulation at a finite number of trial points. The term pattern search [25]refers to the pattern made by the directions used from the incumbent to construct the set oftrial points. Lewis and Torczon [28] propose the use of positive bases [18] to construct thepatterns. Positive bases are not bases but minimal sets of directions whose nonnegative lin-ear combinations span Rn. The nonsmooth convergence analysis [5] of GPS shows that themethod produces a limit point that satisfies some necessary optimality conditions, and thatthese conditions are closely tied to the finitely many positive basis directions used to con-struct the patterns. The Mesh Adaptive Direct Search (MADS) algorithm in [6] has a flexiblemechanism allowing stronger convergence results. In particular, they show for unconstrainedoptimization that if f is Lipschitz near an accumulation point, then the Clarke generalizeddirectional derivatives [12] are nonnegative for every direction in Rn.

The constraints defining Ω are treated in [6] by the extreme barrier, which simply in-volves applying the MADS algorithm to the unconstrained minimization of fΩ : Rn → R,which takes the value fΩ(x) := f (x) when x belongs to Ω and fΩ(x) := +∞ otherwise. Withthis approach, infeasible trial points are immediately rejected from consideration. More re-cently, the progressive barrier was proposed [7] to treat the constraints. It uses a nonnegativefunction h : Rn → R that aggregates the constraint violations [20] and equals zero only atfeasible points. The progressive barrier places a maximal threshold on h that is progressivelyreduced, and trial points whose constraint violation value exceeds the threshold are rejectedfrom consideration.

The main element that distinguishes the CS, GPS, and MADS algorithms is the way inwhich the space of variables is explored around the incumbent solution. In CS, trial pointsare generated using the 2n positive and negative coordinate directions. In [28] and [6], posi-tive bases are used to generate n+1 or 2n trial points for GPS and MADS. In some situations,numerical experiments show that it is better to reduce the number of evaluations at every iter-ation from 2n to n+1 [3, 22, 32]. The numerical results of Section 6 confirm this observationof some other test set.

The objective of the present paper is to improve the efficiency of MADS algorithms byreducing the maximal number of trial points at each iteration without impacting the qualityof the solution. We devise various strategies, embedded in a generic algorithmic framework,

1

Page 3: Reducing the Number of Function Evaluations in Mesh Adaptive

that order the trial points in such a way that the promising points are evaluated first, and theunpromising points are discarded and replaced by a single point. A crucial element is thatthe proposed methods retain the hierarchical nonsmooth convergence analysis. A differentapproach is proposed in [1] in a context where the signs of some directional derivatives of theobjective exist and are known: the set of directions is reduced to a single promising direction.

This paper is organized as follows. Section 2 gives a general overview of the MADS classof algorithms, with an emphasis on the rules that govern how the trial points are generated.Section 3 describes a first framework for the reduction of the number of trial points at a giveniteration and proposes a concrete implementation to reduce the 2n ORTHOMADS [2] direc-tions to exactly n+1. A second and more elaborate framework is then presented in Section 4,applicable to any MADS instance. This second framework uses models of the optimizationproblem to reduce the number of directions. Section 5 shows that the proposed frameworksconstitute valid MADS instantiations, and it gives a simple algorithmic rule ensuring that thestrongest convergence results of MADS hold. Finally, Section 6 illustrates the performanceof the various strategies on a set of academic problems from the derivative-free optimizationliterature and on an engineering blackbox simulator.

2 The MADS class of algorithmsThe content of this section is mainly extracted from [6] where the MADS class of algorithmsis introduced.

2.1 A brief summary of MADS

MADS is a generic class of algorithms and to date two practical implementations exist. LT-MADS was defined in the original MADS article [6]. It is based on random lower triangularmatrices, hence the name LT. A more recent implementation, ORTHOMADS, was introducedin [2] and possesses many advantages over LTMADS: it is deterministic and uses sets ofdirections with a better spread, and its convergence theory is not based on a probabilisticargument as in LTMADS. In addition, numerical tests suggest that ORTHOMADS is superiorto LTMADS on most problems [9].

At each iteration of these methods, we generate and compare a finite number of trialpoints. Each of these trial points lies on a conceptual mesh, constructed from a finite set ofnD directions D⊂Rn scaled by a mesh size parameter ∆m

k ∈Rn+. The subscript k denotes the

iteration number. The superscript m is a label referring to the mesh and is used to distinguishit from ∆

pk , the poll size parameter to be introduced later. For convenience, the set D is also

viewed as a real n×nD matrix. The mesh is defined as follows, and it is central to the practicalapplications and the theoretical analysis of MADS.

Definition 2.1 At iteration k, the current mesh is defined to be the following union:

Mk =[

x∈Vk

x+∆mk Dz : z ∈ NnD,

2

Page 4: Reducing the Number of Function Evaluations in Mesh Adaptive

where Vk is the set of points where the objective function has been evaluated by the start ofiteration k, and ∆m

k > 0 is the mesh size parameter that dictates the coarseness of the mesh.

In the above definition the mesh is defined to be the union of sets over the cache Vk.Defining the mesh in this way ensures that all previously visited points trivially belong to themesh, and that new trial points can be selected around any of them using the directions in D.To verify that a trial point x + ∆m

k Dz belongs to the mesh, it suffices to check that x belongsto the cache Vk and that z is an integer vector.

In addition to being a finite set of Rn, the set D must satisfy two requirements:

• D must be a positive spanning set, i.e., nonnegative linear combinations of its elementsmust span Rn;

• Each direction d j ∈ D, for j ∈ 1,2, . . . ,nD, must be the product Gz j of some fixednonsingular generating matrix G ∈ Rn×n by an integer vector z j ∈ Zn.

In the LTMADS and ORTHOMADS instantiations of MADS, this set is simply defined asD = [I − I], the 2n positive and negative standard coordinate directions.

In the situation where the union of all trial points over all iterations belongs to a boundedsubset of Rn, MADS produces a limit point x that satisfies some optimality conditions thatdepend on the degree of smoothness of the objective and constraints near x. These optimalityconditions are also tied to the directions used to generate new trial points.

Each iteration of a MADS algorithm is divided into two steps. Both of them generate alist of tentative points that lie on the mesh Mk at which the functions defining problem (1)are evaluated. The first step, the search, requires only that finitely many mesh points areevaluated. This allows users to exploit knowledge of the problem in order to propose newcandidates. The second step, the poll, performs a local exploration near the current incumbentsolution. An iteration is called successful when either the search or the poll generates a trialpoint that is better than the previous best point. Otherwise, no better solution is found andthe iteration is said to be unsuccessful.

In addition to the set of mesh directions D, there are a few other parameters that are fixedthroughout the algorithm. A rational number τ > 1 and two integers w− ≤ 0 and w+ ≥ 0define how the mesh size parameter is updated. When an iteration succeeds in generating anew incumbent solution, the mesh size parameter is allowed to increase as follows:

∆mk+1 = τ

wk∆mk

where wk ∈ 0,1, . . . ,w+; otherwise wk ∈ w−,w−+1, . . . ,−1. In ORTHOMADS the gen-eral rule is to multiply the mesh size parameter by four on successful iterations and to divideit by four otherwise. A detailed analysis of the rules imposed on τ and D can be found in [4].

2.2 The polling directionsThe MADS class of algorithms introduces the poll size parameter ∆

pk to indicate the distance

from the trial points generated by the poll step to the current incumbent solution xk. In GPS

3

Page 5: Reducing the Number of Function Evaluations in Mesh Adaptive

there is a single parameter called ∆k that represents both the poll size parameter ∆pk and the

mesh size parameter ∆mk used in the definition of the mesh Mk: ∆k = ∆

pk = ∆m

k .Decoupling the mesh- and poll size parameters allows MADS to explore the space of

variables using a richer set of directions. In fact, the GPS poll directions are confined to afixed finite subset of D, but in MADS these normalized directions can be asymptotically densein the unit sphere as the number of iterations k goes to infinity. The strategy for updating ∆

pk

must be such that ∆mk ≤ ∆

pk for all k, and moreover, it must satisfy

limk∈K

∆mk = 0 if and only if lim

k∈K∆

pk = 0 for any infinite subset of indices K.

While the set D and the parameter ∆mk define the mesh Mk, the poll size parameter ∆

pk

defines the region in which the tentative poll points will lie. The set of trial points consideredduring the poll step is called the poll set, and it is constructed using the current incumbentsolution xk and the parameters ∆m

k and ∆pk to obtain a positive spanning set of directions

denoted by Dk.

Definition 2.2 At iteration k, the MADS poll set is:

Pk = xk +∆mk d : d ∈ Dk ⊂Mk,

where Dk is a positive spanning set of nDk directions such that 0 /∈ Dk and for each d ∈ Dk:

• d can be written as a nonnegative integer combination of the directions in D:

d = Du

for some vector u ∈ NnDk that may depend on the iteration number k;

• The distance from the frame center xk to a frame point xk + ∆mk d ∈ Pk is bounded by a

constant times the poll size parameter:

∆mk ‖d‖ ≤ ∆

pk max‖d′‖ : d′ ∈ D;

• The limits (as defined in Coope and Price [15]) of the normalized sets Dk are positivespanning sets.

The third condition of Definition 2.2 plays an important role in the framework presentedin the next section. The condition requires that the limits of Dk are positive spanning sets.This requirement precludes positive bases such as

Dk =[

11k

],

[1−1

k

],

[−1

0

]→[

10

],

[−1

0

]that collapse to a set that is not a positive basis or even a basis as k goes to infinity. Inour framework, we will have to ensure that after we manipulate the set of directions Dk, theresulting set still satisfies the conditions of Definition 2.2.

We can ensure that the third condition is satisfied by verifying that the limit of the cosinemeasure [26] exceeds a threshold κmin > 0 for every k:

κ(Dk) = minv∈Rn

maxd∈Dk

vT d‖v‖‖d‖

≥ κmin > 0.

4

Page 6: Reducing the Number of Function Evaluations in Mesh Adaptive

3 A basic framework to reduce the size of the poll setThe first part of this section presents the first simple generic framework for the reduction ofthe size of the poll set to n+1 points. We then give a practical implementation based on theORTHOMADS polling strategy.

The following notation is used throughout the paper: the original MADS elements prior tothe application of the reduction strategy are tagged by the superscript o. After we manipulatethese elements and form the transformed poll set, the final elements are free of superscriptsand are exactly as in Definition 2.2. Intermediate sets, containing a reduced set of directionsor points are tagged by the superscript r.

3.1 High-level presentation of the basic frameworkIn both the GPS and MADS classes of algorithms, we use the concept of positive spanningsets iteratively: each poll step starts from the current best point xk, the incumbent solution,and attempts to identify a better point by exploring near xk using a positive spanning set ofdirections. We now propose a way to reduce, in some situations, the size of the poll set.

Let Dok denote the finite set of directions in Rn generated at the start of a poll step. These

directions together with the mesh ∆mk and poll ∆

pk size parameters are used to construct the

tentative poll setPo

k = xk +∆mk d : d ∈ Do

k

from Definition 2.2. The way in which the directions are generated and the way in which themesh size parameter evolves depends on the specific class of algorithm considered. In thenext subsection for example, Do

k consists of the 2n directions composed of the positive andnegative elements of the basis produced by ORTHOMADS. In Section 4, Do

k is more general.The trial points of Po

k can be evaluated sequentially or in parallel. Either way, this pollingprocedure can either be conducted until all points of Po

k are processed or terminated as soonas a trial point t ∈ Po

k is shown to be better than xk. In the latter situation, iteration k isterminated, and iteration k + 1 is initiated with the new incumbent solution xk+1 = t. Thestrategy of interrupting the poll as soon as a better trial point is identified is known as theopportunistic strategy. When that strategy is used, the poll points are first sorted according tosome criteria so that the most promising ones are considered first [27]. When no point of Po

kis better than xk, xk+1 is simply set to xk and the point xk is called a minimal poll center.

Notice that the opportunistic strategy has no effect on the algorithm at minimal poll cen-ters, since the entire poll set must be evaluated. We propose a generic strategy to reducethe size of the poll set, thereby reducing the computational cost of detecting a minimal pollcenter. Figure 1 gives a simple algorithm for this.

The first step takes as input the original positive spanning set Dok generated by a valid

MADS instance and extracts from it a basis Drk. Such a basis necessarily exists and may be

found easily by inspecting the column rank of the submatrices.Then, an additional direction dk is added to the reduced set of directions Dr

k so that Dk =Dr

k ∪dk forms a minimal positive basis and xk + ∆mk dk belongs to the mesh. This may be

done by simply setting dk to be the negative sum of the directions in Drk.

5

Page 7: Reducing the Number of Function Evaluations in Mesh Adaptive

Basic framework: Poll set reduction at iteration k

Let Pok = xk +∆m

k d : d ∈ Dok be the original poll set.

Extract a basis Drk from the columns of Do

k .Compute a new direction dk so that Dk = Dr

k∪dkforms a positive spanning set.

Construct Pk = xk +∆mk d : d ∈ Dk (the reduced poll set).

Figure 1: First framework to reduce the poll set from Pok to Pk.

Finally, the resulting poll set Pk with n + 1 points is processed by the poll step, and thesimulation is launched opportunistically on its members.

3.2 ORTHOMADS with n+1 directionsThis section describes a simple instance of the framework described above. It reduces thenumber of ORTHOMADS poll directions [2] from 2n to exactly n+1.

ORTHOMADS generates exactly 2n trial poll points in Pok that need to be processed in

order to declare xk a minimal poll center. They are generated along the maximal positive basisdirections Do

k = [Hk −Hk] where Hk ∈ Zn×n is an orthogonal basis with integer coefficients.The simplest way to construct Dr

k is to set it equal to Hk. However, this strategy does nottake into account the history prior to iteration k. We propose exploiting the knowledge of theprevious directions that led to a successful iteration. Suppose that we are at iteration k > 0with incumbent solution xk, and that the previous distinct incumbent solution was x` with` < k. ` is the index of the last successful iteration. Consider the nonzero direction wk ∈ Rn,called the target direction, obtained by taking the difference between xk and x`. In otherwords, wk := xk− x` is the last direction that generated a successful iteration. The rationaleis that the success of direction wk makes it a promising direction for the next iteration.

Given the nonzero vector wk ∈ Rn, the basis Drk is constructed as follows. For every d ∈

Hk, the direction d is added to Dk when d and wk are in the same half-space, and −d is addedto Dk otherwise. This is easily done by adding d when dT wk ≥ 0 and −d when dT wk < 0.This construction ensures that Dr

k is an orthogonal basis with integer coefficients, since itcontains exactly one element of each pair d,−d where d is a column of the orthogonalbasis with integer coefficients Hk.

The construction of the minimal positive basis Dk is done by adding the negative sum ofthe directions of the basis Dr

k,

dk = − ∑d∈Dr

k

d, (2)

and the set Dk = Drk∪dk.

Figure 2 illustrates the framework on an example in R2. The plot on the left shows thefour points of the original poll set Po

k , together with the target direction wk. The poll directions

6

Page 8: Reducing the Number of Function Evaluations in Mesh Adaptive

are pruned so that Drk contains only the two directions in the same half-space as wk. The plot

on the right shows the three points of the reduced poll set Pk.

Pokr

r

rHH

HHH

H

rHHHH

HH

2n directions Dok .

1wk

rxk =⇒

Pk

rHHHH

HH

rxk

xk+∆mdkr

r

n+1 directions Dk = Drk∪dk.

Figure 2: Illustration of the ORTHOMADS n+1 polling strategy with the target direction wk.In this two-dimensional example, the strategy reduces the number of poll points from four tothree, including two in the half-space defined by wk.

To conclude this section we introduce notation to describe this basic framework withinthe more general framework of the next section. The basic framework used for pruningDo

k into Drk is from now on identified by MADS(suc,neg). The keyword suc refers to the

successful direction, and neg indicates that the completion to a minimal positive basis isdone by taking the negative sum. Three additional combinations of strategies are describedin the next section.

4 A general framework to reduce the size of the poll setThe strategy described in the previous section has the advantage of being relatively simple toimplement. It suffices to remember the last successful direction wk and to complete a minimalpositive basis by taking the negative sum of the directions. We now generalize this frameworkusing information from quadratic models of the functions defining the problem, developingthree other combinations of strategies. This leads to a total of four different implementationsdenoted by MADS(r,c) where r ∈ suc,mod refers to the reduction of the poll set Po

k into Prk

and c ∈ neg,opt refers to the completion into a positive spanning set.

4.1 High-level presentation of the general frameworkTo generalize the basic framework, we need to describe a few steps more precisely. Let Do

kdenote the initial set of valid MADS directions, and let κmin > 0 be a valid lower bound on thecosine measure κ(−B∪B) for every basis B extracted from the columns of Do

k . For example,for ORTHOMADS or Coordinate Search, κmin takes the value 1√

n .

7

Page 9: Reducing the Number of Function Evaluations in Mesh Adaptive

The framework first identifies a reduced poll set Prk = xk + ∆m

k d : d ∈ Drk where Dr

kis a basis extracted from the columns of Do

k and then constructs an additional poll pointxk + ∆m

k dk ∈ Mk such that Drk ∪dk forms a minimal positive basis. A difference with the

previous framework is that we must launch the simulation at the poll points in Prk before

constructing dk, because the information gathered from these evaluations will be used in theconstruction. Figure 3 gives the algorithm for the modified poll step.

Advanced framework: Poll at iteration k

Let Pok = xk +∆m

k d : d ∈ Dok be the original poll set.

Extract a basis Drk from the columns of Do

k .Evaluate opportunistically the points of Pr

k = xk +∆kd : d ∈ Drk:

successinterrupt iteration

failureconstruct an additional direction dkevaluate xk +∆m

k dk.

Figure 3: Description of the modified poll of the advanced framework.

4.2 Strategies to construct the reduced poll set Prk

In Section 3.2, the reduced poll set is constructed by setting Drk to n directions generated by

ORTHOMADS in the same half-space as the last direction of success wk. When Dok is not

generated by ORTHOMADS, Drk is constructed by sorting the directions of Do

k by increas-ing values of the angle made with wk and then iteratively adding the linearly independentdirections to Dk until a basis is formed.

When a model of the optimization problem is available, we use a second strategy to con-struct the reduced poll set. The model might be a surrogate, i.e., a simulation that sharessome similarities with the true optimization problem but is cheaper to evaluate [10]. Alter-natively, it may be composed of quadratic approximations of the objective and constraints,as presented in [13] or in [17] in the unconstrained case. Regardless of the type of model,the second strategy consists of ordering the directions of Do

k according to the model valuesat the tentative poll points in Po

k . We then sort the feasible points by their objective functionvalues. To handle the infeasible points, we use the constraint aggregation function [21] inconjunction with the progressive barrier [7]. Using these tools, we order the directions of Do

kas proposed in [13]. Finally, we iteratively add to Dk the linearly independent directions ofDo

k until a basis is formed. The models used in the numerical tests of Section 6 are quadraticmodels.

In the numerical experiments, these two strategies will be tagged with the labels sucand mod, which stand for ordering by the angle made with the last successful direction or

8

Page 10: Reducing the Number of Function Evaluations in Mesh Adaptive

by the model values, respectively. Notice that both strategies can be applied for both theunconstrained case and the constrained case.

4.3 Completion to a positive basisHaving constructed the reduced poll set, we evaluate the blackbox functions defining prob-lem (1) at the trial points in Pr

k = xk +∆mk d : d ∈Dr

k. The process is opportunistic, meaningthat it terminates either when a new incumbent solution is found or when it cannot find a bet-ter solution than xk. In the latter case, we construct an additional direction dk. To ease thepresentation, let d1,d2, . . . ,dn denote the n directions forming the basis Dr

k. The additionaldirection dk must be chosen so that

dk ∈ int(cone−d1,−d2, . . . ,−dn) (3)

and the new poll candidate must belong to the mesh: xk +∆mk dk ⊆Mk.

When constructing the positive basis, we must consider an important algorithmic aspect.Even if Dr

k ∪ dk forms a positive spanning set for all values of k, the limit in the senseof [15] might collapse to a nonpositive spanning set, as illustrated by the example at the endof Section 2.2. To address this potential problem, we introduce a minimal threshold 0 < ε < 1,a scalar fixed throughout the algorithm, and we require the added direction dk to satisfy

dk = −n

∑i=1

αidi (4)

where ε < αi ≤ 1 for i = 1,2, . . . ,n. Notice that under these conditions, the requirements ofEq. (3) are satisfied. Notice also that Eq. (4) is consistent with Eq. (2) where dk is simply thenegative sum of the directions.

The solution of a model or surrogate of the optimization problem restricted to the region

Cε =

xk−∆

mk

n

∑i=1

αidi : αi ∈ [ε,1], i = 1,2, . . . ,n

is needed to generate a trial point yk ∈Cε. We describe in the next subsection a way to performthis suboptimization using quadratic models.

Finally, it is unlikely that the resulting candidate yk belongs to the mesh. The last stepconsists of rounding yk to some point xk + ∆m

k dk on the mesh Mk. In the LTMADS or OR-THOMADS framework, where the mesh is constructed from the positive and negative coordi-nate directions, it suffices to set the jth coordinate of dk to

(dk) j =

dv je if v j ≤−∑

pi=1 di

j

bv jc if v j >−∑pi=1 di

j,

where d.e and b.c are the ceiling and floor operators respectively and vk ∈ Rn satisfies theequality yk = xk +∆m

k vk. This approach rounds the trial point toward xk−∆mk ∑

pi=1 di and en-

sures that Drk∪dk forms a positive spanning set (these assertions are formally demonstrated

in Section 5).

9

Page 11: Reducing the Number of Function Evaluations in Mesh Adaptive

This strategy of completion to a positive basis via a suboptimization is called the optstrategy.

4.4 Completion using quadratic modelsThis section gives the technical details of the construction of the candidate yk ∈Cε generatedby considering quadratic models of the objective and constraints.

We build one quadratic model for the objective function and one for each constraint. Moreprecisely, the feasible region Ω defined in (1) is described as the following set of inequalityconstraints:

Ω =

x ∈ Rn : c j(x)≤ 0 for all j ∈ J

with J = 1,2, . . . ,nJ and c j : Rn → R∪ ∞, j ∈ J. The infinity value is used for trialpoints where at least one function failed to evaluate in practice, due to some hidden constraintembedded in the simulator associated with this function.

First, we collect data points where the function values are available and finite. Thesepoints form the data set Y ⊂ Rn and are taken within a neighborhood of the poll center:

Y =

y ∈Vk : ‖y− xk‖∞ ≤ ρ∆pk , f (y) < ∞ and g j(y) < ∞ for all j ∈ J

where the parameter ρ≥ 2 is called the radius factor and is typically set to 2 as in [13]. Theconstraint ρ ≥ 2 ensures that the recently evaluated poll points of Pr

k belong to Y . Further-more, this choice of ρ also ensures that the previously visited trial points in the region Cε

are contained in Y . Note that cache points from previous iterations may also be found in Y ,which enriches the models.

Consider the nonsingular linear transformation T : Rn → Rn that maps the region Cε tothe unit hypercube [0,1]n, as illustrated in Fig. 4. The motivation for this transformation is toreplace linear constraints by simple bounds to construct the model optimization problem (5).For y ∈ Rn and λ ∈ Rn the expressions for T (y) and its inverse T−1(λ) are:

T (y) =(Dr

k)−1

∆mk (1− ε)

(xk− y−∆mk Dr

k1ε)

and T−1(λ) = xk +∆mk Dr

k((ε−1)λ− ε1).

Indeed, it can readily be verified that T (xk−∆mk Dr

k1) = 1, T−1(1) = xk−∆mk Dr

k1 and thatT (xk− ε∆m

k Drk1) =O, T−1(O) = xk− ε∆m

k Drk1.

The shaded area on the left of Fig. 4 represents the set Cε, and the open circle representsthe candidate yk.

The next step consists of building the nJ + 1 models by considering the scaled points of[0,1]n. Depending on the size nY of the data set Y , there are two possible strategies. First, ifnY < (n+1)(n+2)/2, which is more likely to happen, we consider minimum Frobenius normmodels; otherwise we use least-squares regression. See [13] for the computational details.

Let m f be the model constructed from f and mg j the model associated with g j for allj ∈ J. We expect these models to be good representations of the original functions in the

10

Page 12: Reducing the Number of Function Evaluations in Mesh Adaptive

.

..............................................

.............................................

............................................

...........................................

...................................................................................

....................................... ...................................... ....................................... ........................................ .........................................................................

..........

..........................................

...........................................

......................................

......

..................................

...........

T (Cε)

O

1j

.

........................

................

........................

.............

........................

...........

..........................

.......

..............................

...

.................................

..................................

..................................

..................................

.................................. ................................... ................................... .................................... ...............................................................................

...........................................

.............................................

...............................................

.................................................

...................................................

T−1([0,1]n)

][0,1]n

••

xk

xk+∆mk d1

xk+∆mk d2

yk

?PPq

xk+∆mk dk

Figure 4: The T transformation applied to Cε gives the unit hypercube.

zone of interest:

m f (T (x))' f (x) and mg j(T (x))' g j(x) for all x ∈Cε.

In Fig. 4, the shaded region represents Cε. The solid outline on the left represents the regionin which ‖y− xk‖ ≤ ρ∆

pk with a value of ρ = 2. The eight points of Y represented by bullets

are used to construct the models. The right part of the figure represents the hypercube onwhich the following quadratic model is minimized:

minλ∈[0,1]n

m f (λ) subject to mg j(λ)≤ 0 for all j ∈ J. (5)

Any method, heuristic or otherwise, can be applied to solve Problem (5) since the conver-gence of the framework does not rely on the quality of this optimization. However, in practicebetter solutions should improve the overall quality of the method. Currently, and similarlyto [13], we use the MADS algorithm for the sake of simplicity. Future work will include thereplacement of MADS by a dedicated bound-constrained quadratic solver.

The point obtained when solving Problem (5) is denoted λk ∈ [0,1]n and can be feasibleor infeasible with respect to the model constraints. Regardless of feasibility, the solution istransformed into the original space via the inverse transformation: set yk = T−1(λk) ∈Cε.

5 Convergence analysis of the general frameworkWe now show that the general framework is a valid MADS instance. The analysis does notdepend on the order in which the poll points are evaluated, and it therefore holds for the basic

11

Page 13: Reducing the Number of Function Evaluations in Mesh Adaptive

framework of Section 3. Next, we give a detailed example in which the set of normalizedrefined directions does not grow asymptotically dense in the unit sphere. To circumvent thisundesirable behavior, we add a rule to decide whether or not the polling reduction should beapplied.

5.1 A valid MADS instanceTo show that the general framework produces a valid MADS instance, we must prove thatthe conditions of Definition 2.2 are satisfied. To achieve this, we must redefine the set Dof directions used to construct the polling directions to take into account the fact that thedirection dk produced by Eq. (4) lies in the cone generated by the negative of d1,d2, . . . ,dn.

Let Do be the original set used to construct the set Dok at every iteration, and consider the

direction with the largest norm: dmax ∈ argmax‖d′‖ : d′ ∈ Do. Now, replace Do by

D =−Do∪Do∪n dmaxas the new finite set of directions. The addition of −Do ensures that the added direction dkfrom Eq. (4) belongs to the cone of negative directions. It also ensures that the strategy thatrounds yk generates a mesh point successfully, because Cε contains at least one mesh point,namely xk−∆m

k ∑ni=1 di. The addition of n dmax does not introduce any point into the mesh. It

simply ensures that dk belongs to the poll frame and increases the maximal norm used in thesecond condition of Definition 2.2, thereby allowing poll points to lie further from the pollcenter. Figure 2 illustrates the fact that the norm of the added direction dk may exceed that ofthe directions in the original set Do

k .The following proof is independent of the construction of the reduced poll set Pr

k and ofthe method used for the completion to a positive spanning set, provided Dr

k is a basis extractedfrom Do

k and dk satisfies Eq. (4).

Lemma 5.1 If Drk = d1,d2, . . . ,dn is a basis of Rn extracted from the columns of Do

k anddk = −∑

ni=1 αidi with 0 < ε ≤ αi ≤ 1 for i ∈ 1,2, . . . ,n then Dk = Dr

k ∪dk is a minimalpositive basis and

κ(Dk)≥ε

nκ(−Dr

k∪Drk).

Proof. Let Drk and Dk satisfy the conditions in the statement. To show that Dk is a posi-

tive spanning set, we let v be a nonzero vector in Rn such that vT di ≤ 0 for i = 1,2, . . . ,n.Then, since Dr

k is a basis and v 6= 0, vT di < 0 for at least one index i. Therefore, vT dk =−∑

ni=1 αivT di ≥−ε∑

ni=1 vT di > 0 and consequently v is in the same half-space as dk. Since

Dk contains exactly n elements, it follows that it is a minimal positive basis [18].The cosine measure is invariant with respect to the length of the vectors, since the vectors

are normalized, so let us introduce λ = dk‖dk‖ and δi = di

‖di‖ for i = 1,2, . . . ,n. The cosinemeasure can be obtained by solving the following optimization problem:

κ(Dk∪dk) = mint∈R,v∈Rn

t

s.t. t ≥ vT δi i = 1,2, . . .nt ≥ vT λ

vT v = 1.

12

Page 14: Reducing the Number of Function Evaluations in Mesh Adaptive

There exists an optimal solution (t,v) such that n of the inequality constraints are satisfied atequality and t = κ(Dk∪dk). Two cases must be considered:

Case 1. If vT δi = κ(Dk ∪ dk) for every index i = 1,2, . . . ,n, then κ(Dk ∪ dk) = vT δi,which implies that κ(Dk∪dk) = κ(−Dr

k∪Drk)≥ κ(−Dr

k∪Drk).

Case 2. Otherwise, renaming the indices if necessary, vT δi = κ(Dk ∪dk) for every indexi = 1,2, . . . ,n−1 and vT λ = κ(Dk∪dk). By the definition of λ and Eq. (4), we have

κ(Dk∪dk) = vTλ = −

n

∑i=1

αivTδ

i

= −

(κ(Dk∪dk)

n−1

∑i=1

αi +αnvTδ

n

)≥ −κ(Dk∪dk)(n−1)+αnvT (−δ

n).

Observe that vT (−δn) ≥ 0, since otherwise all directions of the positive basis Drk ∪ dk

would lie in the same half-space, which is impossible. Reordering the terms, dividing by n,and using the fact that αn ≥ ε yields κ(Dk∪dk)≥ ε

nvT (−δn).Next, consider the cosine measure of the maximal positive basis −Dr

k∪Drk:

κ(−Drk∪Dr

k) ≤ maxi=1,2,...,n

|vTδ

i|

= maxκ(Dk∪dk),vT (−δn).

It follows that either κ(Dk∪dk)≥ κ(−Drk∪Dr

k) or vT (−δn)≥ κ(−Drk∪Dr

k). In both casesthe inequality κ(Dr

k∪dk)≥ ε

nκmin holds.

The following theorem ensures that this strategy yields a valid MADS instance by showingthat the conditions of Definition 2.2 are satisfied.

Theorem 5.2 Let Dok be the original spanning set. The poll set formed by the directions in

Dk = Drk ∪dk, where Dr

k is a basis formed of n columns of Dok and xk + ∆m

k dk belongs toMk∩Cε, yields a valid MADS instance.

Proof. Since Dok is a valid set of polling directions and xk + ∆m

k dk belongs to the mesh,xk +∆m

k d ∈Mk for every d ∈ Dk and the first condition of Definition 2.2 is satisfied.Since Dr

k ⊂ Dok and Do ⊂ D, the second condition is trivially satisfied for every direction

d ∈ Dk. For the additional direction,

∆mk ‖dk‖ ≤ ∆

mk

n

∑i=1

αi‖di‖ ≤ n∆mk dmax = ∆

pk max‖d′‖ : d′ ∈ D

since αi ≤ 1 and dmax is the direction of D with the largest norm. This shows that the secondcondition of the definition is satisfied.

13

Page 15: Reducing the Number of Function Evaluations in Mesh Adaptive

The previous lemma showed that Dk is a minimal positive basis and that its cosine mea-sure is bounded below by the strictly positive value ε

nκmin. This ensures that the third condi-tion of the definition is satisfied.

All the conditions of Definition 2.2 are satisfied, and therefore this strategy defines a validinstance of MADS.

5.2 An example that does not cover all directionsSatisfying the requirements of Definition 2.2 is not sufficient to ensure that the set of normal-ized polling directions grows asymptotically dense in the unit sphere. Indeed, the CoordinateSearch and GPS are both instantiations of MADS, but the set of polling directions is limitedto a fixed finite number.

We now give an example for which reducing the size of the poll set at every iteration doesnot produce a dense set of directions. In the next subsection, we propose a slight modificationof the method that guarantees density in the unit sphere. The following example illustratesthis issue.

Example 5.3 Consider the unconstrained minimization of the continuous piecewise linearfunction f : R2→ R defined as

f (a,b) = maxa,min−a+b,−a−b

whose graph is plotted on the left of Fig. 5 and whose level sets are represented on the rightof the figure.

Figure 5: Graphical representation and level sets of a continuous piecewise linear function.

Now, suppose that the initial point is x0 = (−1,0)T with f (x0) = 1, and that the first trialpoint proposed by a MADS instance is the origin. The iteration is successful and terminatesat x1 = (0,0)T with f (x1) = 0. The direction of success is w1 = (1,0)T .

14

Page 16: Reducing the Number of Function Evaluations in Mesh Adaptive

Now consider any iteration k ≥ 1 with xk = (0,0) and let Dok = [Hk −Hk] be a maximal

positive basis obtained using an orthogonal basis Hk. Construct the basis Drk by taking

directions in Dok that are in the same half-space as the target direction wk. Dr

k containsexactly two directions in the half-space V = v = (v1,v2) ∈ R2 : v1 ≥ 0 and any trial pointgenerated in that subspace xk +∆m

k v = ∆mk v will have a nonnegative objective function value

equal to ∆mk v1.

Therefore, the additional direction dk will need to be computed. But since Drk forms an

orthogonal basis, the direction dk will necessarily belong to the cone W = v ∈ R2 : v1 <0, |v1| ≥ |v2|. However, any trial point generated in that cone also possesses a nonnegativeobjective function value because f (v) = −v1− |v2| = |v1| − |v2| ≥ 0 for every v ∈W. Itfollows that iteration k ≥ 1 is unsuccessful and xk = (0,0)T for every k ≥ 1.

In this example, even if the sets of normalized directions of Dok generated by a valid

MADS instance grow asymptotically dense in the unit sphere, the normalized sets of pollingdirections Dk = Dr

k ∪dk are never generated in the full-dimensional cone v ∈ R2 : v1 <0, |v2|> |v1|.

5.3 Asymptotically dense normalized polling directionsThe previous example shows that the poll reduction strategy cannot be systematically ap-plied at every iteration. A similar difficulty was encountered n the development of the OR-THOMADS 2n algorithm (see the management of the index tk in Section 3.4 of [2]). Thesituation was handled by making different algorithmic decisions based on whether or not thecurrent poll size parameter is the smallest so far, i.e., if ∆

pk ≤ ∆

pj for every integer j ≤ k. The

same treatment is applied to the present context.To formalize the presentation, we give the definition of a refining direction for a MADS

algorithm.

Definition 5.4 (from [6]) A subsequence of the MADS iterates consisting of minimal framecenters (i.e., unsuccessful iterations) xkk∈K for some subset of indices K is said to be arefining subsequence if ∆p

kk∈K converges to zero.Let x be the limit of a convergent refining subsequence. If the limit limk∈L

dk‖dk‖ exists for

some subset L⊆K with poll direction dk ∈Dk, and if xk +∆mk dk ∈Ω for infinitely many k ∈ L,

then this limit is said to be a refining direction for x.

Theorem 5.5 Suppose that the strategy for generating the set of original polling directionsDo

k∞k=0 is rich enough that the set of normalized refining directions grows asymptotically

dense in the unit sphere. At iteration k, define the poll set:

Dk =

Dok if ∆

pk ≤ ∆

pj for every integer j ≤ k

Drk∪dk otherwise.

Then the set of refining directions with Dk grows asymptotically dense in the unit sphere.

Proof. Consider the subset of indices of unsuccessful iterations

U = k1,k2, . . . = k : iteration k is unsuccessful and ∆pk ≤ ∆

pj ∀ j = 0,1, ...k.

15

Page 17: Reducing the Number of Function Evaluations in Mesh Adaptive

This subset is infinite because liminfk ∆mk = 0 for any valid MADS instance. The mesh size

is reduced only at unsuccessful iterations, and therefore there exists a refining subsequencewith indices in U . However, the construction of the poll set is such that at all iterations inU , the set of poll directions is Dk = Do

k . Therefore, the normalized directions are constructedwith elements of Do

k , which grow dense by assumption. The previous result ensures that the proposed method inherits the convergence results of

MADS. More precisely, let x be a feasible limit of a refining subsequence generated by anORTHOMADS instantiation that reduces to n+1 polling directions, as prescribed by the pre-vious theorem. Then, the analyses of [6, 7] ensure that the Clarke directional derivatives arenonnegative for every direction in the Clarke tangent cone, provided f is Lipschitz near x andthe hypertangent cone at x is not empty. For directionally Lipschitz functions, the Rockafellargeneralized directional derivatives are nonnegative along the refining directions [34].

6 Numerical resultsThe numerical tests are conducted using the NOMAD [27] software publicly available athttp://www.gerad.ca/nomad. The tests compare the performance of the new frameworkswith the default version. All tests are conducted with the ORTHOMADS strategy: the originalset of directions Do

k at iteration k is the orthogonal maximal positive basis introduced in [2].In our implementation, the default value of the parameter ε from Section 4.3 is 1%.

The default version of the algorithm is denoted MADS 2n, and the basic framework pre-sented in Section 3.2 is denoted MADS(suc,neg) where suc stands for a successful directionas the target direction, and neg for the sum of negative directions for the positive basis com-pletion.

The combination of strategies where Prk is obtained not by considering the target direction

but by ordering the model values is denoted MADS(mod,neg). The keyword mod stands formodel. The remaining two combinations are where the completion to a positive basis is doneby optimizing a model as discussed in Sections 4.3 and 4.4. They are denoted MADS(suc,opt)and MADS(mod,opt), where opt stands for the optimization of a model. The different labelsfor reducing the size of the poll set are listed in Table 1.

Construction of the pruned poll setsuc wk is the last direction leading to a successful iterationmod Pr

k is composed of the poll points with the best model valuesPositive basis completion

neg dk is the negative sum of the basis directionsopt xk +∆m

k dk is obtained by optimizing the model over Cε

Table 1: Label descriptions for reducing the poll set.

In the numerical tests, the models are always quadratic, and to make the comparisonsmore reliable, we have deactivated the model searches described in [13].

16

Page 18: Reducing the Number of Function Evaluations in Mesh Adaptive

6.1 Test problems from the derivative-free optimization literatureWe test two series of problems from the derivative-free optimization literature. The algo-rithms are compared using data profiles as described in [31]. Data profiles are used to displaythe fraction of problems solved for a given tolerance depending on the progression of the al-gorithm. Here, the relative tolerance for matching the best known solution is fixed at 10−3,and the progression is represented by the equivalent number of simplex gradient evaluations,i.e., the number of groups of (n+1) calls to the simulation.

The first set contains 159 analytical problems as described in [31]. The number of vari-ables ranges from 2 to 12, and the problems have no constraints except for bounds on thevariables in some cases. The noisy problems with nondeterministic characteristics from [31]are not considered to ensure the repeatability of the tests. In Fig. 6, the data profiles for thefour strategies become more distinct as the number of function evaluations grows larger than100× (n + 1) evaluations. A first observation is that the MADS 2n strategy is outperformedby all four strategies that reduce the size of the poll set. A second observation is that the besttwo strategies are those using the optimization of the quadratic model to complete the positivebasis MADS(·,opt). Finally, the data profiles reveal that the MADS(suc,·) and MADS(mod,·)strategies have similar performance.

20 40 60 80 100 120 1400

10

20

30

40

50

60

70

80

90

100

Groups of n + 1 evaluations

Fra

ctio

nofpro

ble

ms

solv

ed(%

)

uT

uT

uT

uTuT

uTuT

uT

lD

lD

lD

lDlD

lD lDlD

rS

rS

rS

rSrS

rSrS

rS

qP

qP

qP

qPqP

qPqP qP

bC

bC

bCbC

bC

bCbC

bC

uT MADS 2nlD MADS(suc,neg)rS MADS(suc,opt)qP MADS(mod,neg)bC MADS(mod,opt)

1Figure 6: Data profiles with a relative tolerance of 10−3 for 159 problems from [31].

17

Page 19: Reducing the Number of Function Evaluations in Mesh Adaptive

The second set contains 32 problems studied in [2, 13] with at most 12 variables. Of the 32problems 7 are constrained and 15 are nonsmooth (see the description in Table 2). Figure 7shows the data profiles obtained for the different strategies. Again, the two best strategiesare those that optimize a quadratic model to complete the positive basis MADS(·,opt). Inthe presence of constraints, MADS(mod,·) is more efficient than MADS(suc,·). A possibleexplanation for this behavior is that in contrast to the quadratic models, the direction ofsuccess does not systematically account for the constraints, and the set Pr

k is more likelyto contain points outside the feasible region.

50 100 150 200 250 300 350 4000

10

20

30

40

50

60

70

80

90

100

Groups of n + 1 evaluations

Fra

ctio

nofpro

ble

ms

solv

ed(%

)

uT

uT

uT uTuT

uTuT

lD

lD

lDlD lD

lD lD

rS

rS

rS

rSrS

rS rS

qP

qP

qPqP qP qP

qP

bC

bC

bC

bCbC

bCbC

uT MADS 2nlD MADS(suc,neg)rS MADS(suc,opt)qP MADS(mod,neg)bC MADS(mod,opt)

1Figure 7: Data profiles with a relative tolerance of 10−3 for 32 test problems from the litera-ture.

18

Page 20: Reducing the Number of Function Evaluations in Mesh Adaptive

# Name Source n nJ Bnds Smth f ∗

1 ARWHEAD [23] 10 0 no yes 0.02 BDQRTIC [23] 10 0 no yes 18.28123 BIGGS6 [23] 6 0 no yes 6.97074 ·10−5

4 BRANIN [24] 2 0 yes yes 0.3978875 BROWNAL [23] 10 0 no yes 0.06 CRESCENT10 [7] 10 2 no yes −9.07 DIFF2 [13] 2 0 yes no 2 ·10−4

8 DISK10 [7] 10 1 no yes −17.32059 ELATTAR [29] 6 0 no no 0.561139

10 EVD61 [29] 6 0 no no 3.51212 ·10−2

11 FILTER [29] 9 0 no no 8.40648 ·10−3

12 G2 [8] 10 2 yes no −0.74046613 GRIEWANK [24] 10 0 yes yes 0.014 HS78 [29] 5 0 no no −2.4911115 HS114 [29] 9 6 yes no −1429.3416 MAD6 [29] 5 7 no no 0.10183117 OSBORNE2 [29] 11 0 no no 9.43876 ·10−2

18 PBC1 [29] 5 0 no no 8.90604 ·10−2

19 PENALTY1 [23] 10 0 no yes 7.08765 ·10−5

20 PENALTY2 [23] 10 0 no yes 2.95665 ·10−4

21 PENTAGON [29] 6 15 no no −1.8596222 POLAK2 [29] 10 0 no no 54.598223 POWELLSG [23] 12 0 no yes 0.024 RASTRIGIN [24] 2 0 yes yes 0.025 SHOR [29] 5 0 no no 22.602326 SNAKE [7] 2 2 no yes 0.027 SROSENBR [23] 10 0 no yes 0.028 TRIDIA [23] 10 0 no yes 0.029 VARDIM [23] 10 0 no yes 0.030 WONG1 [29] 7 0 no no 680.70731 WONG2 [29] 10 0 no no 24.945832 WOODS [23] 12 0 no yes 0.0

Table 2: Description of the set of 32 analytical problems. Those for which nJ > 0 haveconstraints other than bounds. The column Bnds indicates whether a problem has boundconstraints, the column Smth indicates whether the problem is smooth, and the column f ∗

gives the best known solution.

19

Page 21: Reducing the Number of Function Evaluations in Mesh Adaptive

6.2 A pump-and-treat groundwater remediation problemThis section describes an application introduced in [30]: a pump-and-treat groundwater reme-diation problem from the Lockwood Solvent Groundwater Plume Site located in Montana.In [30], several algorithms are compared, with the empirical conclusion that direct-searchmethods are among the most promising for this problem.

The basic version of the problem considered here is to determine extraction rates for sixwells whose locations are fixed. These rates (in feet/day) are continuous and box-constrainedin [0;20,000], and our starting point fixes them to 10,000. The function to minimize rep-resents the operating costs subject to two simulation-based constraints that capture the fluxof two contaminant plumes. These two constraints depend on the outputs from the Bluebirdsimulator [16]. There are no hidden constraints, and a typical evaluation takes approximatelytwo seconds. From now on we refer to this problem as the LOCKWOOD problem.

Figure 8 shows the progress of the best feasible objective function value versus the num-ber of calls to the simulation for a budget of 1000 function evaluations. For clarity, the curvesrepresenting MADS(mod,neg) and MADS(mod,opt) are not plotted because they are practi-cally identical to MADS(suc,neg) and MADS(suc,opt), respectively.

100 200 300 400 500 600 700 800 900 10002

2.5

3

3.5

4

4.5

5

5.5

6×104

Number of evaluations

Obje

ctiv

efu

nct

ion

valu

e

uT

uTuT

uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uTuTuTuT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT

lD

lDlD

lD

lDlD lD lD

lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD lD

rS

rSrS

rS

rS

rS

rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS

uT MADS 2nlD MADS(suc,neg)rS MADS(suc,opt)

1Figure 8: Objective function value versus the number of calls to the simulation on the LOCK-WOOD problem.

20

Page 22: Reducing the Number of Function Evaluations in Mesh Adaptive

The MADS 2n algorithm gets stuck at local solutions, and the runs in which the numberof polling directions is reduced to n + 1 all have similar behavior and converge rapidly to amuch better solution.

7 DiscussionThe MADS algorithm is composed of two main steps: the global search and the local poll.We have focused on reducing the number of poll points. In previous instantiations of MADS,the poll set was constructed by considering the 2n directions of a maximal positive basis. Wehave proposed four combinations of strategies to reduce that number to n + 1, which is theminimal number required for the theory to hold. The reduction is applied at every iterationwhere the mesh is not the finest so far.

The next release of the NOMAD software will allow the reduction of the size of the pollset as we have described. To make the software package easily usable by a wide community,we try to limit the number of user-defined parameters. Numerical experiments in which thevalue of ε was varied led to minor changes in the solutions. Therefore, we chose to fix ε to1%.

Guided by our numerical results, we have set the default strategy for generating thepolling directions in NOMAD to MADS(suc,neg) when the user does not use the option tobuild quadratic models and when no surrogates are used; to MADS(mod,opt) when quadraticmodels are used; and to MADS(mod,neg) when a surrogate optimization problem is supplied.These options are enabled in NOMAD by setting the DIRECTION TYPE parameter to ORTHON+1 and may be overruled by setting it to e.g., ORTHO N+1 SUC OPT.

AcknowledgementsThe authors wish to thank Shawn Mattot, Genetha Gray, and Stefan Wild for making theLOCKWOOD problem available.

References[1] M.A. Abramson, C. Audet, and J.E. Dennis, Jr. Generalized pattern searches with derivative

information. Mathematical Programming, Series B, 100:3–25, 2004.

[2] M.A. Abramson, C. Audet, J.E. Dennis, Jr., and S. Le Digabel. OrthoMADS: A deterministicMADS instance with orthogonal directions. SIAM Journal on Optimization, 20(2):948–966,2009.

[3] P. Alberto, F. Nogueira, H. Rocha, and L.N. Vicente. Pattern search methods for user-provided points: Application to molecular geometry problems. SIAM Journal on Optimization,14(4):1216–1236, 2004.

[4] C. Audet. Convergence results for pattern search algorithms are tight. Optimization and Engi-neering, 5(2):101–122, 2004.

21

Page 23: Reducing the Number of Function Evaluations in Mesh Adaptive

[5] C. Audet and J.E. Dennis, Jr. Analysis of generalized pattern searches. SIAM Journal on Opti-mization, 13(3):889–903, 2003.

[6] C. Audet and J.E. Dennis, Jr. Mesh adaptive direct search algorithms for constrained optimiza-tion. SIAM Journal on Optimization, 17(1):188–217, 2006.

[7] C. Audet and J.E. Dennis, Jr. A progressive barrier for derivative-free nonlinear programming.SIAM Journal on Optimization, 20(4):445–472, 2009.

[8] C. Audet, J.E. Dennis, Jr., and S. Le Digabel. Parallel space decomposition of the mesh adaptivedirect search algorithm. SIAM Journal on Optimization, 19(3):1150–1170, 2008.

[9] C. Audet, J.E. Dennis, Jr., and S. Le Digabel. Globalization strategies for mesh adaptive directsearch. Computational Optimization and Applications, 46(2):193–215, 2010.

[10] A.J. Booker, J.E. Dennis, Jr., P.D. Frank, D.B. Serafini, V. Torczon, and M.W. Trosset. A rigor-ous framework for optimization of expensive functions by surrogates. Structural and Multidis-ciplinary Optimization, 17(1):1–13, 1999.

[11] G.E.P. Box. Evolutionary operation: A method for increasing industrial productivity. AppliedStatistics, 6(2):81–101, 1957.

[12] F.H. Clarke. Optimization and Nonsmooth Analysis. John Wiley & Sons, New York, 1983.Reissued in 1990 by SIAM Publications, Philadelphia, as Vol. 5 in the series Classics in AppliedMathematics.

[13] A.R. Conn and S. Le Digabel. Use of quadratic models with mesh-adaptive direct search forconstrained black box optimization. Optimization Methods and Software, 28(1):139–158, 2013.

[14] A.R. Conn, K. Scheinberg, and L.N. Vicente. Introduction to Derivative-Free Optimization.MOS/SIAM Series on Optimization. SIAM, Philadelphia, 2009.

[15] I.D. Coope and C.J. Price. Frame-based methods for unconstrained optimization. Journal ofOptimization Theory and Applications, 107(2):261–274, 2000.

[16] J. Craig. Bluebird developer manual, 2002. Available at http://www.groundwater.buffalo.edu/software/VBB/VBBMain_old.htm.

[17] A.L. Custodio, H. Rocha, and L.N. Vicente. Incorporating minimum Frobenius norm models indirect search. Computational Optimization and Applications, 46(2):265–278, 2010.

[18] C. Davis. Theory of positive linear dependence. American Journal of Mathematics, 76:733–746,1954.

[19] J.E. Dennis, Jr. and V. Torczon. Direct search methods on parallel machines. SIAM Journal onOptimization, 1(4):448–474, 1991.

[20] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. MathematicalProgramming, Series A, 91:239–269, 2002.

[21] R. Fletcher, S. Leyffer, and Ph.L. Toint. On the global convergence of an SLP–filter algorithm.Technical Report NA/183, Dundee University, Department of Mathematics, 1998.

22

Page 24: Reducing the Number of Function Evaluations in Mesh Adaptive

[22] K.R. Fowler, J.P. Reese, C.E. Kees, J.E. Dennis Jr., C.T. Kelley, C.T. Miller, C. Audet, A.J.Booker, G. Couture, R.W. Darwin, M.W. Farthing, D.E. Finkel, J.M. Gablonsky, G. Gray, andT.G. Kolda. Comparison of derivative-free optimization methods for groundwater supply andhydraulic capture community problems. Advances in Water Resources, 31(5):743–757, 2008.

[23] N.I.M. Gould, D. Orban, and Ph.L. Toint. CUTEr (and SifDec): A constrained and unconstrainedtesting environment, revisited. ACM Transactions on Mathematical Software, 29(4):373–394,2003.

[24] A. Hedar. Global optimization test problems. http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO.htm.

[25] R. Hooke and T.A. Jeeves. Direct search solution of numerical and statistical problems. Journalof the Association for Computing Machinery, 8(2):212–229, 1961.

[26] T.G. Kolda, R.M. Lewis, and V. Torczon. Optimization by direct search: New perspectives onsome classical and modern methods. SIAM Review, 45(3):385–482, 2003.

[27] S. Le Digabel. Algorithm 909: NOMAD: Nonlinear optimization with the MADS algorithm.ACM Transactions on Mathematical Software, 37(4):44:1–44:15, 2011.

[28] R.M. Lewis and V. Torczon. Rank ordering and positive bases in pattern search algorithms.Technical Report 96–71, Institute for Computer Applications in Science and Engineering, MailStop 132C, NASA Langley Research Center, Hampton, Virginia 23681–2199, 1996.

[29] L. Luksan and J. Vlcek. Test problems for nonsmooth unconstrained and linearly constrainedoptimization. Technical Report V-798, ICS AS CR, 2000.

[30] L.S. Matott, K. Leung, and J. Sim. Application of MATLAB and Python optimizers to twocase studies involving groundwater flow and contaminant transport modeling. Computers &Geosciences, 37(11):1894–1899, 2011.

[31] J.J. More and S.M. Wild. Benchmarking derivative-free optimization algorithms. SIAM Journalon Optimization, 20(1):172–191, 2009.

[32] D. Orban. Templating and automatic code generation for performance with Python. TechnicalReport G-2011-30, Les cahiers du GERAD, 2011.

[33] V. Torczon. On the convergence of pattern search algorithms. SIAM Journal on Optimization,7(1):1–25, 1997.

[34] L.N. Vicente and A.L. Custodio. Analysis of direct searches for discontinuous functions. Math-ematical Programming, 2010.

23