mathematical programming and its applications || directional derivatives for extremal-value...

Directional Derivatives for Extremal-Value Functions with Applications to the CompletelyConvex CaseAuthor(s): William HoganSource: Operations Research, Vol. 21, No. 1, Mathematical Programming and Its Applications(Jan. - Feb., 1973), pp. 188-209Published by: INFORMSStable URL: http://www.jstor.org/stable/169099 .

Accessed: 09/05/2014 12:21

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Operations Research.

http://www.jstor.org

This content downloaded from 169.229.32.138 on Fri, 9 May 2014 12:21:05 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=informs

http://www.jstor.org/stable/169099?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


Directional Derivatives for Extremal-Value Functions

with Applications to the Completely Convex Case

William Hogan

United States Air Force Academy, Colorado

(Received September 2, 1971)

Several techniques in mathematical programming involve the constrained optimization of an extremal-value function. Such functions are defined as the extremal value of a related parameterized optimization problem. This paper reviews and extends the characterization of directional derivatives for three major types of extremal-value functions. The characterization for the completely convex case is then used to construct a robust and convergent feasible direction algorithm. Such an algorithm has applications to the optimization of large-scale nonlinear decomposable systems.

MANY APPLICATIONS of mathematical programming, optimal control, game theory, etc., involve, at some stage, optimizing a function that is itself defined

as the value of the optimal solution for another optimization problem. Such a function can be defined, without loss of generality, as

v(y)--SUp-"X(v') f(X, Y), (1

where f: R'XR--*[- oo, + oo ], and X( * ) is a point-to-set map from R' to R'. The classical minmax problem can be viewed as the minimization of v over some

set Y, where X(.) is taken to be the constant map: X(y) =X. On the other hand, large-scale nonlinear decomposable systems produce examples that require maxi- mizing v over some set Y. [14-17,251 For example, a problem characterized by the existence of complicating variables could be formulated as

maximizex,,f(x, y), subject to g(x, y) <0, xEX, yEY,

where it is assumed that the problem is relatively easy to solve in x if y is held fixed. It is natural to focus attention on the more difficult optimization in y by projecting the problem onto y-space. This produces an optimization problem with an extremal-value function: maximized, y v (y), where

v(y) =supxfxf(x, y) subject to g(x, y) <0.

This type of application is considered further in a later section of this paper. A very general outer-approximation method for optimizing v has been developed

by GEOFFRION, [151 and the convergence of the method has been studied. [15221 Com- putational experience with this approach has been quite encouraging."151 A natural alternative to this outer-approximation approach is an interior-ascent type of algorithm-the subject of this paper.

The well known methods of feasible directions141' typically require the use of the gradient of the objective function. Since v is not generally a differentiable function, direct application of these methods is not possible. However, in many feasible-

188



ExtremaI-VaIue Functions 189

direction algorithms the gradient is only used to compute directional derivatives. If the extremal-value function is directionally differentiable, it would be natural to conjecture that the directional derivative could be used in a straightforward application of an algorithm developed for the differentiable problem. Fortunately, many extremal-value functions are directionally differentiable. For example, the directional derivative always exists when v is concave. The indicated approach therefore appears promising for the solution of a large class of optimization problems.

This approach for optimizing an extremal-value function creates the necessity for useful characterizations of the directional derivatives of such functions. Such characterizations have been the subject of a number of studies [see references 2, 3, 5, 6, 10, 14, 17, 19, 28, 30, 31, 33, 35-38]. These results are reviewed and extended in Section 1.

The straightforward substitution of the characterization of the directional derivative into a feasible-direction algorithm designed for a differentiable problem does not produce a generally convergent algorithm. The difficulties associated with the development of such algorithms, and some of the solutions that have been proposed [see references 1, 3, 4, 7-10, 23, 26, 27], are discussed in Section 2. Finally, Section 3 presents a robust convergent feasible-direction algorithm for optimizing a particular class of extremal value functions.

Throughout this paper, certain results from the theory of point-to-set maps are employed, with particular emphasis on the application of these concepts in the algorithmic convergence theory initiated by ZANGWILL. 104 The results needed for this paper are collected in an appendix.

Notation throughout is standard. All spaces are subsets of Euclidean spaces. Vectors are represented by lower-case letters (e.g., x, y, u, ** ). The distinction between row and column vectors should be clear from the context. The inner product of the row vector x and the column vector y is xmy. If f is a real-valued function of x and y, then the gradient of f with respect to y evaluated at (X, g) is

Vyf(G, y). Similarly, g:XX Y->Rm, Vqg (, g) represents the matrix whose rows consist of the gradients, with respect to y, of the m components of g evaluated at (x, y). Hence, u Vqg (, g) is a row vector with the dimensions of the variable y. Scalars are generally represented by lower case Greek letters (a, A, X, etc.). Sets and point to set maps are represented by upper case letters (X, M, H, etc.). A neighborhood of g is denoted by Ng.

1. DIRECTIONAL DERIVATIVES FOR EXTREMAL-VALUE FUNCTIONS

CONSIDER AN ARBITRARY function 0:R -4[- oo + ool, and a point (x, d),ER2. If 1+(x) 1< + oo and the limit

limxo+ [ (x+ Xd)-+(x) I/X (2)

exists, then 4 is said to be directionally differentiable at x in the direction d, and the directional derivative k(xld) is equal to the limit in (2). The possibility that

k(xld) = i oo is not excluded. For differentiable functions, the directional derivative is the inner product of the



190 William Hogan

gradient and the direction, k(xld)= V.(x) -d. This linearity in d is exploited in the design of feasible-direction algorithms for optimizing differentiable functions. There are many functions, however, for which k(xld) exists but is not necessarily linear in d. As noted earlier, all concave functions have this property wherever they are finite. Under certain conditions, extremal-value functions such as (1) can be shown to be directionally differentiable.

Since (2) is not very useful for applications, additional structure must be imposed on (1) to develop a tractable representation of b(yld). There does not seem to be a universal characterization of the directional derivative of the v function. However, three cases that represent the major steps taken in this direction are considered. The first case makes quite general assumptions about the function f, but works with a trivial map X(*). The second case imposes more structure on f in order to deal with a more general map X(*). The third case deals with the completely convex problem for which the most useful characterization is obtained. This last representation is subsequently applied in the development of a convergent algorithm. This section concludes with a special result about concave functions that explains an apparent anomaly between the first two cases and the third.

1.1 The Constant Map Without Convexity

The first case to be considered, and the best known, is interesting in that it requires no convexity assumptions: it restricts the X( * ) map to be constant, X(y) = X. Hence, the supremal value function is v(y) = supzex f(x, y). The following

theorem is taken from DANSKIN. [21

THEOREM 1. Suppose that VJf exists, f and Vaf are continuous in x and y, and X is compact. Then b(gld) exists for all dERI and

b jd) max.,m (p) Vyf(x, 1) d, there

M(y) = {XEXIV(y) ?f(x, y) I.

DEMYANOV31 developed a similar version of Theorem 1 with a different method of proof. SOTSKOV,1351 LEVITIN,[271 and DEMYANOV AND RUBINOV[101 have generalized this result to include other functions and spaces; see also references 1, 30, and 31. Under some additional assumptions, Demyanov[56' has investigated a similar result when f is in turn an infimal-value function of the same form as v. A special application of Theorem 1 to the Lagrangian function of a nonlinear programming problem can be found in GRINOLD."191 This last subject is also studied in FALK. [11 LASDON 251 applied Theorem 1 in the solution of a dual nonlinear programming problem.

Almost all work in the design of convergent feasible-direction algorithms for optimizing extremal-value functions has centered on the application of Theorem 1 or its variants [see references 3, 4, 7-10, 23, 27]. These methods are not repeated here, but Section 2 discusses the difficulties to be surmounted in the design of these algorithms.

The generality of Theorem 1 and its extensions seems to depend upon the assumption that the set X does not change with the parameter y. However, for many applications with extremal-value functions, the feasible set for the inner



Extremal-Value Functions 191

problem will indeed depend upon the parameter y [see references 14, 15, 17, 34, 35]. While this complicates the problem significantly, it is seen below that useful results can still be obtained if certain convexity assumptions are made.

1.2 Marginal Convexity

Consider now the case in which the map X(y) is described via a collection of inequalities,

X(y) = {xEXjg(x, y) _0}.

In this case, the supremal-value function becomes

v(y) =supxfxf(x, y) subject to g(x, y) <0, (3)

wheref:R'XR'-?[- oo, + oo] and g:R'XR'-*[- o0, + oo]'. Useful characterizations of i(yId) can be obtained if 'marginal convexity' is assumed: X is a convex set and -f, g are convex in x for each fixed y. WILLIAMS [38] studied this problem for the special case in which the inner problem is a linear program for each fixed y. SOTSKOV[371 studied (3) in complete normed spaces and obtained results that apply when g is jointly convex and the inner problem has a unique optimum for each fixed y. Without assuming uniqueness, and relaxing the linearity requirement, Theorem 2 below presents a characterization of i(yjd) that generalizes those found in references 37 and 38. The development of Theorem 2 requires applying the theory of point-to-set maps and nonlinear duality theory, for which the reader is referred to the appendix and to Geoffrion, [181 respectively.

The characterization of v(yld) is preceeded by two lemmas. As before, the M map identifies the set of optimal solutions

M(y) = {xeXjg(x, y) ?0 and v(y) ?f(x, y) }, and the set of optimal dual variables -for the inner problem is denoted by U(y),

U(y) {_ {u> OjL(y, u) =inf- >o L(y, ?2) }, where

L(y, u) -supxx {f(x, y) -u g(x, y) }.

LEMMA 1. Suppose g is convex on X forfixed y, f and g are continuous on XXy, X is closed and convex, and there is a point xEX such that g(J, t) <0. Then M is closed at g.

Proof. Let R(y) = {xEXjg(x, y) ?O}. The continuity of Q at g is provided by Theorems A.5 and A.6. Hence, Theorem A.2 demonstrates that M is closed at -y.

Additional assumptions provide a similar result for the U map. LEMMA 2. Suppose v(y) is finite, -f and g are convex on X for fixed y and continuous on XXN j, X is a closed convex set, M(y) is nonempty and bounded, and there is a point xEX such that g(x, y) <0. Then U(y) and M(y) are nonempty and uniformly bounded on a neighborhood of g and U is closed at g.

Proof. The fact that v(y) is finite, in conjunction with the condition that g(x, g) <0, is necessary [Proposition 1; reference 22] and sufficient [Corollary 29.1.5; reference 32] for U(g) to be nonempty and bounded.

Moreover, the condition g(X, g) <0 implies that Q(y) = {xEXIg(x, y) ?O} is



192 William Hogan

open at g [Theorem A.6]. Since Q is closed on Ng, Theorem A.4 can be applied to conclude that M(y) is nonempty and uniformly bounded near y. Therefore, it must be that v is continuous at g and finite near g. Since g is continuous on XXNP, similar arguments can be repeated to show the U(y) is nonempty and bounded near y.

Since U is nonempty near y, it follows that v(y) =inf-?oL(y, ft) =L(y, u), for uE U(y) [Theorems 1 and 3; reference 18]. Let K be a convex compact set such that M(y) cint (K). Let xEM(y). Since L(y, u) =f(x, y)-u.g(x, y)=v(y) <

inferno supxfxnK {f(x, y)-'4g(x, Y) }, for every uEU(y), it follows that U(y) is contained in the set of optimal dual solutions for the inner problem when the optimization is restricted to XnK. If the reverse containment were not true, then it would follow that L(y, fl) >v(y) =sup~exnK {f(x, y)-u' g(x, y) }, where ft is an optimal dual solution for the inner problem restricted to XnK. Therefore, there would be a point x1EXK such that f(x1, y) -ug(x', y) >f(x, y)-z2.g(x, y). Since the point Xx+ (1- X)x1 must lie in XnK for some XE(O, 1), this yields a contradiction to the fact that xEM(y). Therefore, restricting the inner problem to XAK does not change the set of dual solutions. Hence, X can be assumed to be compact without loss of generality in the definitions of M and U.

It follows immediately that L(y, u) is continuous in y, u [Theorem A.1]. Since L is convex in u for fixed y, Theorem A.4 can be applied to conclude that U(y) is uniformly bounded near g. Finally, the continuity of L and Theorem A.2 show that U is closed at g.

With this preparation, the characterization of b(gjd) can be stated. (See Note 1.) THEOREM 2. Suppose that X is a closed convex set, -f and g are convex on X for fixed y and continuously differentiable on XXNg (when Ng is a neighborhood of y), M(y) is nonempty and bounded, v(y) isfinjie, and there is a point zEX such that g(z, g) <0. Then v(yId) exists and is finite for all dER', and

v(g Id) = maxx, m(p) minu u (g) { Vy~f(x, g) d - u Vyg (x, g) * d

Proof. By Lemmas 1 and 2 we have that M(y) and U(y) are nonempty and uniformly bounded in a neighborhood of g, and the two maps are closed at g.

Let dER' and y(O) =g+Od, for 0>0. Select xEM(g) and uaEU(g). Then, by the standard saddle-point representation, for all xEX and uO it follows that

A~x, y) -, - g(x, y) -<f~ y) - f g(;x, y) _<Af'x y) - U g(x,~ y) (4)

Similarly, if x(O)EM[y(O)] and u(O)EU[y(O)], then (4) holds with x(O) in place of x, y(O) in place of g, and u(O) in place of a. Since v(y) =f(x, y) -,i g(ix, y), it follows that 1(O) ?v[y(O)]-v(g) ?r(O), where

l(O) f-fix, VOA)]-f1, y) -u(O) * 1{g-t, y(O)-gGx, y) 1X

and

r(O) -f[x(O), y(O) I-f[x(O), y -ft {g[x(O), y(O) I-g[X(O), y]} Hence,

lim infoo0+ [l(0)/O]<lim info- 0.+ {v[y(O)]-v(9) 1/@, (5) and

lim supO'+ {v[y(0)]-v(g) }/0?lim supso+ [r(0)/0]. (6)




Let I 0k } and I Ok1) be sequences tending to 0+ and such that

linlk~ DO V[Y(Ok') ]-v(y) I/0kl =lim infe o+ {~()-~)}8 and

linlk~ I0 V[y(0k2)]_V(g) I/ok2 =liM supe+ IV[y(O)]_V(g) 1/0.

It follows from the uniform boundedness of U(y) and M(y) that {u(0k1) } and {x(0k2) I have accumulation points u' and x, respectively. Further, since U and M are closed at g, it must be that uleU(g) and XeM(y).

According to the mean-value theorem, there is a point P on the line segment between y and y(0) such that

f[Ix y (0)]x y) =Vyf(xX~ g) -d.

Similarly, each component of g is subject to the same simplification. Remembering that g is actually different for f and each component of g, we obtain

1( 0) = OVIJ(x,, g) * d-Au(8) * [ Vg(X-, g) * d].

Hence, taking subsequences if necessary; the continuity of the gradient yields

limA m t [l ( Ok) / k ] = VJf(., Xxg) * d -a * Vyg ( x, * d * (7 )

By substituting x( 0) for x, the same argument can be repeated to obtain

2?-() = OVvf[x(O), g]-d-ff { V,&~(O), g] -dl and

limke.. [r(0k2)/0, ] - y) *d-ii- VV9(Z g) .d. (8)

Let the quantities in (7) and (8) be denoted aQt2, x) and a(ii, x), respectively, Of course, (5)-(8) are independent of the choice of x and fl; therefore, it follows that

AUt, &) _lim inlfe 0+ {V[y(0) ]- _(g) 1/0

_lmSUPper O+ {v[y (O8) ]_v (g) I /0! oa( tx

This demonstrates that v(qld) exists and is finite. The observation that b(9Id) =a(U, x) is not very useful, since it depends upon

the selection of the correct elements from M(y) and U(g). However, (5)-(9) also yield the equivalent statement

maxzM(p) a(f, x) <?(gld) ?minUWu(g) a(u, x)- (10)

Using the fact that xEM(y) and Ue U(g), it is also true that

min,,,u(,) tsX) _maxxfm(p) min.U(p, a(u, X) _maxxem(p) (8 )4 (l

The continuity of a and the compactness of M(y) and U(9) are sufficient to ensure that all operations in (11) are well defined. Therefore, (10) and (11) yield

i(yjd)=max.fM(p) minuwU(p) a(u, x).

Note that (10) and (11) can also be used to prove that the max min operators will commute, i.e.,

v(gld) =minUEU(j) max. M(v) { Vyf(x, y) *d-u* Vyg(x, g) *d .



194 William Hogan

The representation in Theorem 2 is more useful than (2), but the computation of v(gld) may not be a trivial matter. The following corollary may be of some assistance in this direction. COROLLARY 2.1. If, in addition to the assumptions of Theorem 2, we have M(y) C int (X) (e.g., if X=R'), then

v (gjd) = maxZM(.),wzeR Vyf(x, y) * d+ Vf(x, y) * w subject to zg(x, g) + Vg(x, g) -w< - Vyg(x, g) -d.

Proof. If M(g) cint (X), then uEU(g) if and only if, for all xEM(y), it is true that

u>0, (12a)

u g (x, y) - O. ( 12b)

VXfX, 9) - u- Vg (X, y) = 0. (12c)

Therefore, for any xEM(y), the minimization over U(g) is equivalent to

minus Vyf(x, j). d-u. Vygq(x, g) *d, (13)

subject to (12a), (12b), and (12c). This is a linear program with a finite optimal solution whose optimal value, by the dual theorem of linear programming, is equal to

Vyf(x, g) d+maxw,,ER VJf(x, g) *w,

subject to zg(x, 7) + Vg (x, g) *w_ - Vg(x, g) d. One obvious simplification becomes evident when M(y) is single valued, as

when f is strictly concave in x. When M(y) is single valued, evaluating z(gld) is reduced to the solution of an explicit linear programming problem that has one constraint for each binding constraint in the inner problem.

While this development is stated completely without equality constraints, marginally affine equality constraints can be included in the obvious manner. The additional requirements are essentially the same as those discussed in Section 3.4 of reference 22, to which the interested reader is referred.

1.3 The Completely Convex Case

If complete convexity is present, i.e., if (3) is such that -f and g are jointly convex in x and y, and X is a convex set, then most other assumptions can be deleted, including the requirement that f(x, y) be differentiable. Because of the simplicity of the proof and the inductive nature of the result, we consider this problem in its full generality. Once again, the corollaries provide the most useful statements of the results, and these representations are applied in the later portions of this paper. In particular, the characterization of z ( Id) involves the solution of a linear program that has no dependence on M(y), even though M(y) may contain more than one element. The apparent anomaly between this and previous results is explained in Theorem 4.

Throughout the remainder of the paper, it is necessary to define certain point- to-set maps whose definitions vary slightly in each application. The role played by the map is essentially the same in each case, and the transition requires nothing




more than the restatement of equalities in terms of inequalities or the distinction between two sets of inequalities (linear and nonlinear). To avoid additional cum- bersome notation, the symbols for these maps are not changed and it is assumed that, with this word of caution, no difficulty will be caused by this technical am- biguity.

The completely convex case was studied by Geoffrion,[17l who generalized his earlier results for a special case,E[14 which was also studied by SILVERMAN. E33J

Corollary 3.2(c) of this section is essentially equivalent to the result in reference 17, but the method of development is different.

To restate the problem:

v(y) =sup.,xf(x, y) subject to g(x, y) _O,

where f:R'XR'->[- oo, + oo], and g:R'XR--){[- o, + oo]m. The point-to-set map H identifies the set of feasible directions for the inner problem,

H(.y)(d)=_ {wjg(x+Xw, y+Xd) _O and x+XwEX for all O<X<X(x, y, d, w) }.

This set defines the set over which optimization takes place in the characterization of z(yld). THEOREM 3. Suppose that X is a nonempty convex set,-f andg are convex on XXN5, v(y) is finite, and tEM(yg). Then

b(gjd) = supWfH(.,,,) (d) f(X, glw, d) * (14)

Proof. The concavity of v is easily established, from which the existence of i(gjd), for all deR8, is guaranteed [Theorem 23.1; reference 32].

First, let us deal with the cases for which Iv(g7+Xd) I<+ oo for X small. If v(y+Xd)> - oo for Xe(0, Xo), for any Xo>0, then b(gjd) = - o. Either of

two cases must hold: (a) f(G+Xw, 7+Xd) > - oo, for Xe(O, Xo), and weH(*,)(d), or (b) H(i v)(d) =4d.

Since f(X, g) < oo, f exists and (a) would imply that f(x, gqw, d) =-oo, for all

WEH(,j )(d). Hence, (14) holds because of (a) or the convention that the su- premum over the empty set is - oo*

If v(g+Xd) < + oo for Xe(O, Xo), for any Xo>O, then the concavity of v implies that v(g+Xd) = + a) for Xe(O, Xo), for some Xo>O. Choose one such X. There must exist a sequence {x\k } cX such that limk, f(xxk, 7 + Xd) = + oo, and g(xk,

g+Xd) <0, for all k. Therefore, by the convexity of g and X, it follows that wxk_[(xxk ) )/x]EH(:i,,)(d), for all k. Since f is concave, we have

f(x+Xwxk, g +Xd) < y)+Xf(i? Y|wA, d).

But, X > 0 and limk,. f(X+ XwV, g+ Xd) = + oo, implying that limkAx f(X, YIWk, d) =

+ oo. Clearly b(gjd) = + oo, so the equality has been demonstrated. The finite case is proved through the development of two bounds for b(gjd). Since Iv(?7+Xd) I oo for Xe(0, Xo), for some Xo>0, there is some xeX such that

g(x, g+Xd) ?O. Therefore, H(zg) (d ) H+. For any wEH(,,) (d),

Vf(xt+ w, g + d) -f(xe, g) ]/ X [v (g + d) - v X)]

for Xe[O, X(x, g, d, w)]. Therefore,

f(X, g7w, d) _ (gjd), for all weH(tT)(d). (15)



196 William Hogan

To obtain the reverse inequality, choose some 8>0. For any Xe(0, X0) there is a point x(X, 8)eX such that

v(g+Xd) <f[x(X, 8), g+Xd]+ X,

and g[x(X, 8), 9+ Xd] <O0. Therefore, by the convexity of g and X,

w(X, 8)-[x(1X, 3)-zx]/XEH(z a)(d), for XE(0, X0). As before

f[xt+ w (X, 5), g+ Xd] _f(x, g) + Xf[x, y1w (X, 5) I d]*

Now X>0, so it follows that

f[o+Xw(X, 8), 9+Xfd]_f(G, 9)+X{1SUpWeH(,8) (d)d(G, 91w, d) }.

Therefore,

u6 d) = limx.o+ [v (g + d) -uy v X

_limxeo+ {f[x+Xw(X, 8), 9i+Xd]+XS-f(G, g) }/x

_ limXeO+ { X[SUPWeH(j8) (d) I(X, yIw, d) ]tX 8 }/X (16)

_<SUPWeH(,t:) (d) I(X, glw, d) +5

Since a >0 was arbitrary, ( 15) and ( 16) demonstrate the equality in ( 14). We note two facts about Theorem 3 that will not be pursued further. First,

the proof only uses the quasiconvexity of g. Second, the generality of the assumptions admits a simple induction argument applicable to the situation where f itself is a supremal value function.

The application of Theorem 3 requires some representation of ! and H, and these can be easily developed. The set of feasible directions is closely related to the local linearization of the binding constraints. When g is differentiable, this linearization is described by the point-to-set map L, defined as

L(2w)?(d)=IWIV.9k( W <_- Vk , g) .d, for all k such that gk(x 7) =O}.

It is well known that H(zs,)(d) is not equal to L(1,9)(d) unless some further qualification is made. 401 Given such a constraint qualification, the next two corollaries demonstrate that v(yjd) can be computed by solving an explicit linear program. COROLLARY 3.1. If, in addition to the assumptions of Theorem 3, f and g are differentiable on XXN9, and HZ 7) (d) =L(x )(d), then

z3Old) = Vvf(x, y) *d+sup, V -f(x, y) .w (17)

subject to V.9k( X, y) w<-V V9k(G, g) .d, for all k such thatgk(x, 7) =0. Proof. Since f is differentiable, we have

f(X, yjw, d)- Vf(x, y) .w+V ,f(x, g) *-d. Therefore,

b(gjd) = Vf(x, g) *d+SUPWEH( )(d) V Wf(x, y) w,

= Vvf(X, g) *d+supweH(x) (d) VXf(X, 9) *W.

The constraint qualification completes the proof. Many known conditions imply that the constraint qualification holds. Before

introducing one of these, we specialize our notation. Although it was not necessary




in Theorem 3, in subsequent applications it will be helpful to make a distinction between linear and nonlinear constraints. Hence, assume that there are also affine inequality constraints h(x, y) _0. The modifications of previous statements are straightforward. For example, (17) becomes

v(qld) = Vvf(x, y) *d+supw V gf(x, y) *w (17a)

subject to Vxgk(x, y) w< -Vjgk(x, g) -d and V~hj(x, g) .w? -Vvhj(., g) d, for k(j) such that gk(, 7) [hj(x, 0)]=0. The set Vo will be needed, where

Vo={yJg(x,y)<0 and h(x,y) O, forsome XEX}.

COROLLARY 3.2. If, in addition to the assumptions of Theorem 3, f and g are differentiable, h is affine, and teint(X) (e.g., X=R'), then:

(a) If y+ ?XdEVo for some X > O, then (17a) is valid. (b) If yEri(Vo), then (17a) is valid for all d such that g +dEaff(VO) [see Note 2]. (c) If ydint(Vo), then (17a) is valid for all dER'. Proof. First (a) is established. In view of Corollary 3.1, it suffices to show that

H (i p) (d) = L (: p) (d) . Let +?XodEVo with Xo>O, and let xoEX satisfy g(xo, y+Xod) <0 and h(xo, g +

Xod) <0. Define wo as (xo-x)/Xo. Then it is clear that woEH(:j,)(d). Since g is convex and differentiable, it follows that V.gkG(, y) wO+ Vygk (, y) d< 0, and VXhj(X, 9y)-wo+?vvh(:, y)-dO0, for all k(j) such that gk (, g)[hj(1X, )]=0.

Choose an element zEL( ,p)(d). Set w(O) =Owo+(1-0)z. By construction, it follows that Vxgk(-, Y).W(0)+Vvgk(t, y)-d<0, and Vxhj(, y)-w(6)+V21hj((,gy). d < 0 for all kl(j) such that gk(x, y)[hj(x, y)]= O.and forO 0<61. Hence, by Tay- lor's Theorem applied to gk and by the affineness of h, it follows that g[x+Xw(0), y+Xd)V<O, and h[L+Xw(O), g+Xd]?_O for O<X<X(0), for some X(0) >0, and for 0<0<1. In other words, w(0)EH( ,)(d), for 0<0<1. The construction of w(0) shows that limoo+ w (0) = z, implying that zEH(xp)(d). Since z is an arbitrary element of L(XX,)(d), it must be that L(-:,)(d)cH(j,5)(d). The reverse inclusion is is obvious, and (a) is established.

Conclusion (b) is established by observing that any such direction d must satisfy the assumptions of (a). Finally, if gsint( Vo), then aff( Vo) =R', and (c) follows from (b).

If x is not in the interior of X, then (17a) requires the additional constraint on w that ;? XwEX, for some X >0. The interiority assumption eliminates this awk- ward condition. In many cases, X can be described by a system of differentiable inequalities and subsumed under the g and h constraints. In such cases, it is con- sistent with the objective of obtaining a useful characterization of v(ygd) to let X=RP.

The chief advantage of (17) or (17a) over the characterizations of Sections 1.1 and 1.2 is that (17) and (17a) are valid for any tEM(y), and no optimization over the set M(y) is required. Simple examples can be constructed to show that the optimization over M(y) is truly required in Sections 1.1 and 1.2. The fact that this is avoided in the completely convex case reveals the powerful regularity properties associated with this convexity assumption. The full explanation of this apparent anomaly is contained in the following theorem, which may have other areas of application. THEOREM 4. Suppose that f is concave and differentiable in (x, y), and that X is con-



198 William Hogan

vex. Define v(y) -sup.,xf(x, y) and M(y)_ {xeX/v(y) <f(x, y) }. Then

Vuf(x, y) = Vf(x2, y), for all x, x2eM(y).

Note that the conclusion is concerned with the gradient of f with respect to y, although the optimization has been performed with respect to x.

Proof. The tedious proof consists of a straightforward application of Taylor's Theorem and the properties of concave functions. Recall that differentiable concave functions are continuously differentiable [Corollary 25.5.1; reference 32].

Let xl, x2eM(y) with x 3#x2. Of course, if M(y) does not contain distinct points, the conclusion is trivial. By the concavity of f and the convexity of X, it follows that

f[xl+X(x2_-l), y-=v(y), for O<A<1. (18) Hence,

Vxf(xl, y) . (X2_xX) =0= Vf(X2, y) (x2-xl). (19)

Using these facts, the proof proceeds by contradiction. Assume that Vyf(xl, y) $ Vyf(x2, y). Then there is a vector z such that Ijzjl = 1 and

V f(x2, y) *z < V-yf(x1, y) *z. (20)

Now, apply a first-order Taylor's expansion (T) of f(x, y) around the points (xl, y) and (x2, y), and in the directions X[(X2_xl), z] and -X[(X2_xl), z], respectively, where X>0. Hence,

T{f[x1?X(x2-x ), Y+?XZ] =f(xl, y)?+XVf(xl, y) *(X2-X1)+XV f(x 1 y) _z, and

T{f[x2-X(x2-x1), y-Xz]} =f(x2, y) -XVxf(X2, y) (X2-X1) -X f(X2, Y) z.

Therefore, invoking (19), we have

T{f[x1+ X(x2-x1) y+Xz]} =f(x1, y)+XVvf(x1, Y) Z, (21) and

T{f[x2 _ X (X2-xl) y-z_] I =f(x2, y) -XVvf(x2X Y) *Z. (22)

If the remainder term for the first expansion is denoted as R1(X), and the second as R2(X), then by combining (21) and (22), it follows that

((X)-f[xl+X(x2-X1), y+Xz]-f(xl y)+f[x2_X(x2_xl), yXZ]_f( 21y)

= XVf(x1, y) .Z+R1(X) _-XVf(X21 y) z+R2(X),

-X[Vaf(x1, y) _Z-Vf(X2, y) z]+?R(X)+R 2(X). (24)

Since X71[R1(X)+R2(X)]-O as X-A+, it follows directly from (20), (23), and (24) that a(X) >0, for X small enough. Therefore,

Hi~ f(X1, Y) +f(X2, Y)] (25) <of {f[xl+X(x2-X1), y+XZ]+f[x2_X(x2_-X1) yXZ] (2)

Now using the concavity of f and the fact that

12 [Xl+X(x2_xl), y+XZ]+y [X2-X(x2 x1)1

y-Z]=[x1+12 (x2-x 1) y],




it follows from (25) that

f[X'+,'z (X2 X') , A >h Y2 VW Y) +A~X' Y) I

But this contradicts (18), so it must be that Vyf(x1, y) = V2,f(x2, y). Theorem 4 explains the apparent divergence between Corollaries 2.1 and 3.1.

If, in Theorem 2, f and -g are jointly concave, then the vector V1f(x, y) -u* V2,g(x, g) is constant over M(g), for all uEU(9). Therefore, the optimization over M(y) is redundant and can be deleted from Theorem 2 and Corollary 2.1.

A similar application of Theorem 4 is found for the problem studied in Section 1.1. If, in Theorem 1, X is convex and f is jointly concave and differentiable, then the extremal value function v is differentiable, and Vv(g) = V2f(x, g), for any xEM(Y) -

The availability of a number of different characterizations for the directional derivatives of extremal value functions points to many applications involving the design of algorithms for optimizing v over some set. Some of the difficulties associated with the development of such convergent algorithms are discussed in the next section.

2. OBSTACLES TO CONVERGENCE WITHOUT DIFFERENTIABILITY

THE APPLICATION of the directional derivative to the development of a convergent algorithm for optimizing an extremal-value function is not a trivial matter. To motivate the discussion in the next section, it is useful to indicate why the naive extension of a differentiable feasible-direction algorithm may not converge when applied to an extremal-value problem.

Recall the case considered in Theorem 1, where X is compact and

v(y) =maxX xf(x, y). (26)

The design of algorithms for approximating a point yo that minimizes v(y) over some set Y (the standard minimax problem) has received a great deal of attention [see references 3, 4, 7-9, 23], in particular in a very general context by Levitin. 27" For concreteness, assume that the natural extension of the FRANK-WOLFE algorithm[12' is applied to seek a minimizing y. This would be:

Step 1. Choose ylEY. Let kl=1. Step 2. Determine ZkEy that minimizes i(ykIZ_yk) over Y. Let dk=Zk -yk.

Step 3. Let tk minimize v(y +td ) over [0, 1]. Set y =y + tkdk. Go to Step 2 with k replaced by k + 1.

if v is a differentiable convex function, and Y is compact and convex, then this algorithm can be shown to be convergent in the sense that the accumulation points of {I k are minimizers of v over Y. However, the method may not converge when v(y I) is not linear. Consider the equivalent representation of v(y): min,0 yo subject to f(x, y) -yo, for all xeX. The minimization of v(y) is equivalent to solving

minyy ey yo subject to f(x, y) <yo, for all xEX. (27)

This is a standard mathematical programming problem, except that it may have an infinite number of constraints. Under the assumptions of Theorem 1, the direction-



200 William Hogan

finding problem in Step 2 is equivalent to

minE Y maxxeM(yk) V'f(X, yk) * (z _yk)X

which can be rewritten as

minZOZe Y ZO subject to Vvf(x, yk) (z-yk) <zo, for xEM(yk). (28)

If ykEint(Y),then (28) is Zangwill's version[40] of ZOUTENDIJK'S direction-finding problem 41] in a feasible-direction algorithm applied to (27), but with a different normalization on zy yk and no E-perturbation techniques. The analogue of the E-perturbation techniques for (28) would include the E-optimal solutions for the inner problem, not just the optimal solutions as identified by M(yk). Even when there is a finite number of constraints in (27), it is well known that Zoutendijk's method may not converge without some form of E-perturbation or anti-jamming technique. t40'41' Given this analogy, it is not surprising that the natural extension of the Frank-Wolfe method may not converge. In addition, the difficulties that need to be overcome are identified. It is, therefore, not unexpected that the convergent algorithms that have been designed for this problem employ approximations for i(gld) that, when viewed in terms of the analogy drawn for (27), can be interpreted as E-perturbation or anti-jamming procedures.

For example, as an approximation to (28), Demyanov[31 uses

minz0,Ze Y ZO subject to Vyf(x, y k) (z yk) _Zo, for xEM*(yk, Ek),

where M*(yk, Xk)E{EXIV(yk) <f(X, yk)?}

From another point of view, some modification must be made to ensure certain continuity properties for the direction finding problem. Wolfe8391 observes that some type of continuity of v is needed. Indeed, if b ( L ) is continuous in both its arguments, then convergence can be easily demonstrated. As the following result indicates, such a condition is unlikely, at least for extremal-value functions. PROPOSITION 1.[20] Suppose f is a proper concave function. Then (x, d) E[int(dom f) XR1] is a point of continuity of f( I ) if and only iff(vId) = I-fx -d) .

For supremal-value functions it turns out that most points of interest do not satisfy the requirements of Proposition 1. Near such points, the direction-finding problem of Step 2 will be ill behaved in the sense that it will not have the closed- map property that Zangwill 40] has shown to be of such great importance in the es- tablishment of convergence.

These two arguments offer some insight into the difficulties attending the development of a convergent algorithm for extremal-value functions. Although the discussion has focused on (26), the arguments apply, mutatis mutandis, to other supremal-value functions, even in the completely convex case. An example of a completely convex problem for which the above natural extension of the Frank- Wolfe method need not converge was presented in reference 20.

We now turn to the development of a convergent algorithm for the completely convex case.

3. A CONVERGENT ALGORITHM

CONSIDER A PROBLEM of the form

max$,yf(x, y), subject to g(x, y) <O, h(x, y)=0, yEY, (29)




where f:R'XR'-*R', g:R"XR'-*Rm, and h:R XR-R'. If (29) is relatively easy to solve when y is fixed, as when f, g, and h have a multiechelon structure that decomposes on y, then it is natural to project on the complicating variables y to obtain the equivalent problem

maxy, y v(y), (30) where

v(y) =sup f(x, y), subject to g(x, y) <0, h(x, y) =0.

The purpose of this section is to capitalize on the results of Section 1 and present a convergent feasible-direction algorithm for obtaining a solution to (30) when -f and g are differentiable convex functions, h is affine, and Y is a convex set. The algorithm to be presented is a version of the Frank-Wolfe method, but many other known methods could be adapted in a similar manner.

Some modifications must be made to provide continuity properties for the direction-finding problem. As discussed in Section 2, the most popular type of modification has been to approximate the directional derivative in a manner that can be interpreted as an antijamming procedure. This approach is also applied here. Re- calling the representation of b(91d) found in (17a), the function b,,(91d) is defined as

v, (y Id) Vyf(x, y) *d+sup. Vxf(x, y) *w, (31) subject to

rg(x, y) + VXg(xXy) w<-Vag(x, y) *d, Vxh(x, y) *w=-Vyh(x, y) *d, JlwJJx < r a.

The scalar r can be viewed as a relaxation parameter, and it is always restricted so that r? 1. The constant a is chosen to be a large positive number. This bound on the norm of w is probably not necessary but it is a sufficient condition that pre- vents certain pathological behavior in the direction-finding problem. The lx norm is chosen merely to preserve the linearity of the problem in (31). Since the equality constraints in (29) can be represented as two inequality constraints, it is easy to see that (17a) and (31) are equivalent when xeM(y) and r= + 00.

The function b,,r (yld) is employed as an approximation of b(yld). As the following lemma demonstrates, this approximation can be made arbitrarily close.

In the conventions established for (29), the appropriate forms of the set V0 and map M are:

Vo= {ylg(x, y) <0 and h(x, y) =0, for some xeR`}, and

M(y) = {xlg(x, y) <O, h(x, y) =0, and v(y) <f(x, y) }.

LEMMA 3. Suppose -f and g are differentiable convex functions, h is affine, y>Y, Y is convex, Ycri(Vo), teM(g), M(y) is bounded, r>1, and a is greater than the diameter of M(y). Then the following conditions hold:

(a) r, y ?<(gly-9), for all yeaff (VO). (b) ez;(,ld) is nondecreasing in r, and for each yeaff (Vo) there is a number r(y) <

oo such that br,i(9ly -) =v(yly-y), for r>r(y). (c) The following three statements are equivalent:

(Ca1) SUpYeYb,,(9Jy-j) =0, for r>1. (c.2) SupyEyb)(gyy-9) =0. (c.3) The point y7 is an optimal solution for (30).

Proof. Condition (a) is easily established by applying Corollary 3.2(b) and observing that the feasible w in (31) are a subset of those in (17a). Since teM(9),



202 William Hogan

it must be that g(x, y) <0. Therefore, as r increases, the feasible set in (31) cannot decrease, thus establishing the first part of (b).

For the remainder of the proof it is necessary to demonstrate that M(y) $4 in a neighborhood of g relative to aff( V0). Theorems A.5 and A.7 show that the map Q is continuous on a neighborhood of g relative to aff(V0), where i(y) = {xlg(x, y) <0 and h(x, y) =0}. Hence, by Theorem A.4, it follows that M(y) is nonempty and uniformly bounded in a neighborhood of g relative to aff(V0). This implies that v[g +X(y-y)] is finite for yEaff(Vo), and for X small enough. Since v is concave, b(gly-g) is finite for any yeaff(V0) [Theorem 23.4 of reference 32]. Hence, the linear program in (17a) must have an optimal solution wo. But for r sufficiently large, w0 is also feasible in (31). In conjunction with (a), this establishes the remainder of (b).

The fact that (c.2) and (c.3) are equivalent is well known (under much weaker assumptions), hence, it suffices to show that (c.2)=X(c.1)=>(c.3). Now gEY and vr, (YIY-Y) ?0. Therefore, condition (a) establishes that

O<bri_+(919-I ) <supvyy b)r,(9IY-9) ?supyfy b(9jy-9) =0,

to show that (c.2)=>(c.1). Finally, assume that (c.1) holds but (c.3) does not. Then there must exist a

point y*eY in any neighborhood of y such that M(y*) is not empty and x*EM(y*) implies f(x*, y*) =v(y*) >v(y) =f(x, y). Therefore, from the convexity of -f, g, and h it can be shown with the usual gradient inequalities that

Vxf(x, y) * (x-x) + VXf(x, y) * (y*-) >0,

g(x, y) + VX9(X, y)* (x*-x) + Vvg x, y) (y* -) <0 and

Vxh(X, y) (x*-x) + Vvh(X, y) - (y*-)=0.

The continuity of Q (above) and Theorem A.2 provide that M is closed at y. Since M(y) is uniformly bounded near y, y can be chosen sufficiently close to y to ensure that JIx* -X 1I . . Therefore, it must be that vl, (Ay* y) >0. Given (b), this contradicts (c.1). Hence, (c.3) holds and the proof is complete.

Any step in an algorithm that calls for b(yld) can be executed with brx(yld) and, if r is large enough, the information obtained will be essentially the same. In fact, if r is large enough, i(yld) and b,,.(yld) are exactly the same number, except in the vicinity of points where jamming may occur. The pathological nature of non- convergence when b(yld) is used is clearly revealed by the fact that convergence can be demonstrated for any value of r(l <r< oo) when b,,-(yld) is used.

Although the inner problem is relatively easy to solve, it may be difficult to obtain an exact solution. To demonstrate the robustness of the algorithm, it is therefore appropriate to work with the set of E-optimal solutions for the inner problem. These solutions are identified by the map M*:

M*(y, E) {xIg(x, y) <0, h(x, y) =0, and v(y) <f(x, y)+-E}.

It is assumed that an element of M*(y, E) can be obtained in a finite number of steps whenever yEY and e>O.

With these preliminaries the modified version of the Frank-Wolfe algorithm can be stated.




MODIFIED FRANK-WOLFE ALGORITHM (FW)

Step 1. Choose y'eY, el> O, r>_ 1, and let k =1. Step 2. Choose xkeM*(yk, ek) and determine an element zkEY such that zk

maximizes brk(y kIZyk) over Y. Let d =z _yA.

Step 3. Determine an element tke[O, 1] such that tk maximizes v(yk+tdk) over [0, 11. Let yk+'y"k+tkdk, k+?1=O.5ek, and add 1 to k. Go to Step 2.

Our attention is focused on the accumulation points of {yk I .

THEOREM 5. Suppose that -f and g are differentiable convex functions, h is affine Y is a nonempty compact convex set, Ycri(Vo), r> 1 the set of feasible solutions to (29) is compact, and a- is greater than the diameter of this set. Then every accumulation point of the sequence {Iyk is an optimal solution to (30).

Proof. As in the proof of Lemma 3, Theorems A.5 and A.7 yield the continuity of the map Q that identifies the set of feasible solutions for the inner problem. Us- ing the compactness of the feasible solutions for (29), Theorem A.1 provides the continuity of v relative to Y. The algorithm (FW) is well defined, and the compactness of Y implies that I

yk has at least one accumulation point g. Since {v(yk) I is nondecreasing, it must be that v(yk) ;A v(g). Further, by construction

v(yk+l) ?maxte[O1] v(yk td k), for all k.

By subsequencing if necessary, we can assume that yk--->9 and dk_*d. Applying Theorem A.1 again yields v(y) ?maxE[O l] v(y+td). Therefore, b(yid) ?0.

Because of Lemma 3(a), we have br, (9jd) <0, for any xEM*(g, 0), because the feasible set in (29) is compact. From Lemma 3(c), the proof will be complete if it can be shown that, for some tEM*(g, .0), bri(g d) ?v> r(gly-g), for all yEY, because 9+dEY. To this end, consider the point-to-set map D, defined as

D(x, y)5 {dly+deY, )r,.(yld) !!7,,(ylz-y), for all zEYJ.

It suffices to show that dED(i, g) for some xEM*(g, 0). Define the function c as

c(x, y, d, w)= Vf(x, y) -d+ Vf(x, y) w,

and the point-to-set map P as

P(x, y) ={ (d, w) ly+dEY and (d, w) satisfy the constraints in (31) }.

Now, dED(x, y) if and only if there is a w such that (d, w) is an optimal solution of the problem

max(dW)eP(xy) c(x, y, d, w). (32)

Since f and g are continuously differentiable [Corollary 25.5.1; reference 32], c is continuous in all its arguments, and P is closed everywhere [Theorem A.5]. Assume for the moment that P is also open on M*(g, 0) Xg and consider the sequence I (Xk, yk, dk, Wk) Igenerated by the algorithm. It is bounded. Note that this is the only time the assumption that Ijwjj,?<rr is required. Taking subsequences if necessary, it must be that (Xk, Yk, dk, wk) X y, d, o). By construction, (dk, Wk)

is contained within the set of optimal solutions to (32) at the point (Xk, yk). Fur- thermore, -keo, and xkEM*(yk, Ek). Therefore, it follows that JEM*(g, 0) [The-



204 William Hogan

orem A.3]. Since P is continuous on M*(g, 0) Xy, Theorem A.2 yields the fact that (d, v) solves (32) at the point (x, g). But this implies that dED(x, ?) with xeM*(y, 0). Hence, the proof will be complete once it is shown that P is open on M*(y, 0) X.

Actually, using the fact that Ycri( Vo), we shall establish that P is open on the set of feasible solutions in (29). To this end, let { (xi, yI) } be an arbitrary sequence of feasible points in (29) such that (xi, yj)-*(?, y), and let (d, av) EP(&, y). It re- mains to establish that there is a sequence (d', w') EP(xj, yj) such that (dj, wj) (d, i).

Since Y is nonempty, ri(Y) is nonempty [Theorem 6.2; reference 32]. Choose a point yeri(Y). Let x be a point such that g(x, y) <0 and h(x, y) =0. Define w(0) and d(0) as w(0) a0 (x-?) + (1 -0) C, and d(0) a(y- ) + (1 - 0)d. Both (x, y) and (x, y') are feasible in (29), therefore,

V Ah(X, A) ,(x-) + v Ah( ) (y_ A) =0.

Further, the selection ( d, X) EP(X, y) provides V-h(x, y) * t+ V~h(2, y) *d=o. Since Vh(x, y) is constant over RXR", it follows that Vh(x, y) .[w(0), d(0)]=0, for 0?O?1, and for any (x, y).

Similarly,

g( A, A )+Vg(x, A) A()x-x)+V g( ).(y-A) <0,

and

rg(, D' ) + Vgf(.t A) _l+ Vlg(., A) .duo.

Therefore,

rg(, D + ~ff22A)*W(O) + VVg(j' A) *d(O) <0.

for any 0< 0 ?1. Hence, it follows from the continuity of g and Vg that there is a j(0) such that

rg(x', y') + V~g(x', y') *w(O) + Vvg(x', y') *d( @) <0O. for any j>j(O), and for 0<0? 1.

Further, since y is in ri(Y), there is a j*(O) such that yj+d(0) eY, for j~j*(0), and for 0<0? 1. This follows from the fact that yj+d() = (1-0)(Y+)+0Y+ yj-P. Therefore, yV+d(0)eaff(Y), y'+d(0)--+d(O), and y+d(O)eri(Y), for 0<0? 1.

Collecting these facts, it is clear that, for any 0< 0_ 1 and j>max j(0), j*(0) 1, we have [d(0), w() ]EP(x', yI). Therefore, a sequence I j I can be chosen such that Oj,-0 and [d(0j), w(0j)]EP(x', yj). Letting (dj, wj) =[d(Oj), w(0j)J, we see that (dj, wj) EP(xj, wj) and (dj, wj) -*(d, iv). It is thereby demonstrated that P is open on the set of feasible solutions to (29) and the proof is complete.

Certainly the nature of the sequence {eEk is arbitrary. The choice of 1E=+1

0.5 ek was made for convenience, but any sequence such that C-k*, including the zero sequence, will suffice. The e-optimal solutions are employed to demonstrate the robustness of the algorithm, but they play no role as an antijamming precau- tion. The approximation to the directional derivative fulfills this requirement. This is in contrast to the procedures discussed in Section 2, where the e-optimal solutions must be included to ensure convergence.

Complete optimization in Step 3 is not necessary. As long as some positive




proportion of the potential gain is achieved, the convergence result will still apply. If ,3 is a positive continuous function on Y, then tk need only be chosen so that

v(yk+t dk) >V(yk) +f(yk) [maXte,[l] V(yk+tdk) _V (yk)]

This change does not affect the conclusion that i'(yJd) <0, and the remainder of the proof is the same.

The parameter r is also arbitrary (r > 1) and, in the technical sense, Theorem 5 proves the convergence of a class of algorithms parameterized by r. Different values of r yield different improving directions and different sequences {yk}. It is interesting to note that, for g linear and r= 1, the direction-finding problem in Step 2 is identical to the direction-finding problem in the standard Frank-Wolfe method applied to (29) at the point (Xk, yk) . However, the sequence { (Xk, Yk) Iis not the same, because in FW xk+l optimizes the inner problem unrestricted by the direction vector, a constraint imposed in the standard method. This observation about the direction-finding problem may be an interesting source of conjectures about other proposed algorithms. Is it always true that a convergent method applied to (29) yields an improving y direction for (30) that preserves convergence?

One of the attractive features of (17a) is that it is a linear program with fewer constraints than the original problem. The characterization in (31) also involves the solution of a linear program, and the savings in the number of constraints will continue to be realized, at least for large values of r. Given an efficient method of linear programming, and a large value of r, relatively few of the constraints for which gj(Xk, Yk) <0 will enter into the solution of the direction finding problem. With restart mechanisms, the computational efficiencies inherent in the structure of the direction-finding problem could be exploited.

4. FURTHER RESEARCH

THE ALGORITHM for the completely convex case is only one of many that could be devised. Zoutendijk's 41' local approximation methods, with a normalization on the direction vector, are logical candidates for adaptation. In addition, the extension of this type of result to the marginally convex case of Section 1.2 would significantly widen the scope of problems that can be studied.

It would also be useful to compare these feasible-direction algorithms with the outer approximation methods developed for the same problem.E5'] This raises the important question of the rate of convergence of the algorithms and the computational efficiency of each step. The efficiencies inherent in the direction-finding problem were discussed above. Clearly the development of efficient methods for solving the step-size problem would have a great impact on the success of a feasible- directions algorithm. Some extension of the parametric methods of Geoffriont 13

would be an important contribution in this regard. This has been done for a special case of (29) by SILVERMAN, 341 but the details for the general problem have not been presented. Alternative approaches that exploit the structure of v further could also be developed. When v is concave, as in Section 3, a number of heuristics are available for accelerating direct-search techniques in the one-dimensional optimization. These are currently being investigated.



206 William Hogan

APPENDIX

THE FOLLOWING results are taken from the theory of point-to-set maps as surveyed in HOGAN. [21]

A point-to-set map Q from a set Y into a set X is a map that associates a subset of X with each point of Y. Although more general spaces are possible, it is convenient to assume that X and Y are closed subsets of Euclidean spaces.

Q is open at a point y in Y if {lyl} cY, y and xEQO() imply the existence of an integer m and a sequence {jx} cX such that xkQEO(yk), for kim, and k.x

Q is closed at a point y in Y if {Ik} c:Y, yk-j, xkEQ(yk), and Xkx imply xEU(y). Q is continuous at a point y in Y if it is both open and closed at y. Unless otherwise indicated, all statements about maps or functions are interpreted rela-

tive to the appropriate set X or Y. The map Q is uniformly bounded near y if there is a neighborhood, Ng, such

that Ul.EN(,) 0(y) is bounded. Let v(y) -supxe, () f(x, y), wheref:XxY-[- co, + a] and let

M(y)--{ xeE(y)jV(y) ?f(x, y) }.

The function v is a supremal-value function; the point-to-set map M describes its solution set. THEOREM A.1. If ? is continuous at y, uniformly bounded near y, and f is continuous on 0Q() X9, then v is continuous at y.

Proof. Theorem 7 in reference 21. THEOREM A.2. If Q is continuous at y and f is continuous on (y) X 7, then M is closed at y.

Proof. Theorem 8 of reference 21. A slightly more general version of Theorem A.2 can be applied to the map

M*(y, E)-{xEQ(y)Iv(y)?f(x, y)+E}.

THEOREM A.3. If Q is continuous at y and f is continuous on Q(y) Xy, then M* is closed on Y xR.

Proof. Let xkEM*(yk, Ik) where Xkx and (yk, ) a). Because of the continuity of Q. we have tEO(g). However, by [Theorem 6; reference 21] v is lower semicontinuous at P. Therefore,

f(x, y) = liMk f(Xk' yk)> liMk [V(yk)-EkE_ V(y)-e.

THEOREM A.4. Suppose that f is quasiconcave in x for fixed y and continuous on X XNV, Q is closed in a neighborhood of y and open at y, 0(y) is convex for each y in a neighborhood of y and M(y) is nonempty and bounded. Then M(y) is nonempty and uniformly bounded near y.

Proof. If the conclusion is false, then there must be a sequence yk--j such that M(yk) is empty, Uj>k M(yj) is unbounded, or both for large values of k. Let tEM(9). Since Q is open at y, there is a sequence xlkEQ(yk) such that Xerox. Let Ek >0 such that Ek?-*.

If M(yk) is empty, then the fact that Q(yk) is a closed set implies that there is a sequence {zi} cQ(yk) such that llziII-- -o and either (a) v(yk) ?f(Zi, yk) +Ek or (b) f(Xik, yk) f(Z1, yk)*

Condition (a) holds when v(yk) is finite, whereas (b) is true if v(yk) = + -o. The existence of the point xlk eliminates the possibility that v(yk) = - oo.

If Uj>k M(yi) is unbounded for all k, then we can choose subsequences {Jy} and {xi} such that xjEM(yi) and 11xi 31 -*oo*

Therefore, if the conclusion is false, there is a sequence I xk } such that xkEQ(yk), IIxk a- 00,

and f(xik, Yk) - k <f(xk yk). Construct a new sequence x2k = kxLk + (1-k)xk, where 4k4O,

1]. Since Q(yk) is convex, we have x2kEQ(yk). Moreover, the quasiconcavity of f implies that f(xlk, yk) -k <f(X2k, yk). Select {I k } so that x2k-?4. The map Q is closed at -, so

ZEQ(9). The continuity of f shows that ?EM(9). But {I k I could be chosen so that Ixlt takes on any positive value greater than ll II, thus contradicting the boundedness of M(y). Therefore, M(y) is nonempty and uniformly bounded near y.




For the remainder, consider to a point-to-set map defined by inequalities: 0(y) _xEXIg(x, y) ?O}, where g: XXY- -oo, + oo]m.

THEOREM A.5. If each component of g is lower semicontinuous on X X 7, then Q is closed at P. THEOREM A.6. Suppose that X is convex, each component of g is continuous on Q(9) Xg, and convex in x for each fixed yEY, and there is a point tEX such that g(X, y) <0. Then Q is open at P.

Proof. Theorem 12 in reference 21. Now, let the set V be defined as

V-{ yEYlg(x, y) SO, for some X

THEOREM A.7. Suppose that g is convex on X X Y, yEri(V) and X is convex and bounded. Then ? is open at y [relative to aff(V)].

Proof. Apply Theorem 14 in reference 21, where B is some open ball intersected with the affine hull of V and D is the relative boundary of this set.

ACKNOWLEDGMENTS

THE AUTHOR is indebted to A. C. WILLIAMS for his contributions to the development of Section 1.2 and to A. M. GEOFFRION, who motivated this research and carefully guided its execution. The work was supported by the National Science Foundation.

NOTES

1. The author is indebted to A. C. WILLIAMS who suggested Theorem 2 and collaborated in the development of the proof.

2. The affine hull of Vo[aff(Vo)] is the smallest affine set that contains Vo. The relative interior of Vo[ri(Vo)] is the interior of Vo in the relative topology on aff(Vo). If there are no affine constraints, then aff(Vo) =Rn and ri(Vo) =int(Vo). There is no great conceptual loss if these substitutions are made. See Rockafellar.1321

REFERENCES

1. B. BIRZAK AND B. N. PSHENICHNYI, "On Some Problems of the Minimization of Un- smooth Problems," Cybernetics 2, 6, 53-57 (1966).

2. J. M. DANSKIN, The Theory of Max Min, Springer-Verlag, Inc., New York, 1967. 3. V. F. DEMYANOV, "On the Solution of Several Minimax Problems I and II," Cybernetics

2, 6, 58-66 (1966), and 3, 3, 62-66 (1967). 4. , "Algorithms for Some Minimax Problems," J. of Computer and System Sciences 2,

342-380 (1968). 5. , "Directional Differentiation of the Max Min Function," Soviet Math Doklady

9, 2, 481-483 (1968). 6.-- , "Differentiation of the Max Min Function I and II," Z. Vychisl. Mat. i. Mat.

Fiz. (in Russian) 8, 1186-1195 (1968) and 9, 55-67 (1969). 7. , "On a Minimax Problem," Soviet Math Doklady 10, 4, 828-832 (1969).



208 William Hogan

8. , "Seeking a Minimax on a Bounded Set," Soviet Math Doklady 11, 2, 517-521 (1970).

9. ,"Finding Saddle Points on Polyhedra," Soviet Math Doklady 11, 3, 558-561 (1970).

10. AND A. M. RUBINOV, "Minimization of Functionals in Normal Spaces," SIAM J. on Control 6, 1, 73-88 (1968).

11. J. E. FALK, "Lagrange Multipliers and Nonlinear Programming," J. of Math. Anal. and Apple. 19, 141-159 (1967).

12. M. FRANK AND P. WOLFE, "An Algorithm for Quadratic Programming," Naval Res. Log. Quart. 3, 95-110 (1956).

13. A. M. GEOFFRION, "Strictly Concave Parametric Programming I and II," Management Sci. 13, 3, 244-253 (1966) and 5, 359-370 (1967).

14. ,"Primal Resource Directive Approaches for Optimizing Nonlinear Decomposable Systems," Opns. Res. 18, 375-403 (1970).

15. ,"Generalized Benders Decomposition," WP159, WMSI, April 1970 (revised January 1971). To appear in JOTA.

16. , "Elements of Large-Scale Mathematical Programming" Management Sci. 16, 11, 652-691 (1970).

17. , "Vector Maximal Decomposition Programming," WP164, WMSI, September 1970. To appear in Math. Programming.

18. ,"Duality in Nonlinear Programming: A Simplified Applications Oriented De- velopment," SIAM Rev. 13, 1-37 (1971).

19. R. C. GRINOLD, "Lagrangian Subgradients," Management Sci. 17, 3, 185-188 (1970). 20. W. W. HOGAN, "Convergence Results for Some Extensions of the Frank-Wolfe Method,"

WP169, WMSI, January 1971. 21. a"Point-to-Set Maps in Mathematical Programming," WP170, WMSI, February

1971; revised June 1971. To appear in the Siam Review. 22. , "Applications of a General Convergence Theory for Outer Approximation Al-

gorithms," WP174, WMSI, June 1971. 23. R. KLESSIG AND E. POLAK, "A Method of Feasible Directions Using Function Approxi-

mations with Applications to MinMax Problems," Memorandum No. ERL-M287, Electronics Research Laboratories, University of California, Berkeley, October 1970.

24. G. P. KLYMOV, "Minimization d'une Fonctionelle Convexe Continue Donn6e sur une Multiplicity Convexe Compacte de l'Espace Vectoriel Topologique," Academie Royale De Belgique, Classe De Sciences Bulletin, Series 5, 54, 1, January-June 1968, 417-432 (1968).

25. L. S. LASDON, Optimization Theory for Large Systems, The Macmillan Co., New York, 1970.

26. E. S. LEVITIN, "Necessary and Sufficient Conditions for a Given Sequence to be Mini- mizing," Soviet Math Doklady 9, 6, 1535-1538 (1968).

27. , "A General Method of Minimization for Nonsmooth Extremal Problems," Z. Vychisi. Mat. i. Mat. Fiz. (in Russian) 9, 2, 783-806 (1969).

28. V. L. LEVIN, "On the Subdifferential of a Composite Functional," Soviet Math Doklady 11, 5, 1194-1195 (1970).

29. B. T. POLYAK AND E. S. LEVITIN, "Constrained Minimization Methods," USSR Compu- tational Mathematics and Mathematical Physics 6, 5, 1-50 (1966).

30. B. N. PSHENICHNYI, "Dual Method in Extremum Problems I and II," Cybernetics 1, 3, 89-95 (1965) and 1, 4, 64-69 (1965).

31. ,"Convex Programming in a Normalized Space," Cybernetics 1, 5, 46-54 (1965). 32. R. T. ROCKAFELLAR, Convex Analysis, Princeton University Press, 1970. 33. G. J. SILVERMAN, "Primal Decomposition of Mathematical Programs by Resource

Allocation: I-Basic Theory and a Direction Finding Procedure," Opns. Res. 20, 58-74 (1972).




34. , "Primal Decomposition of Mathematical Programming by Resource Allocation: II-Computational Algorithm with an Application to the Modular Design Problem," Opns. Res. 20, 75-93 (1972).

35. A. I. SOTSKOV, "Differentiability of a Functional of a Programming Problem in an In- finite Dimensional Space," Kibernetika (in Russian) 5, 3, 81-88 (1969).

36. , "Minimax Criterion and Some Applications in Optimal Control Problems," Kibernetika (in Russian) 5, 4, 110-117 (1969).

37. , "Necessary Conditions for a Minimum for a Type of Nonsmooth Problem," Soviet Math Doklady 10, 6, 1410-1413 (1969).

38. A. C. WILLIAMS, "Marginal Values in Linear Programming," Journal of Soc. Indust. Apple. Math. 11, 1, 82-94 (1963).

39. P. WOLFE, "Convergence Conditions for Ascent Methods," SIAM Rev. 11, 2, 226-235 (1969).

40. W. I. ZANGWILL, Nonlinear Programming, Prentice-Hall, Englewood Cliffs, N. J., 1969. 41. G. ZOUTENDIJK, Methods of Feasible Directions, American Elsevier Publishing Co., Inc.,

Reading, Mass., 1964.



mathematical programming and its applications || directional derivatives for extremal-value...

Documents