practical mathematical optimization · 3.2 classical methods for constrained optimization problems...

276
PRACTICAL MATHEMATICAL OPTIMIZATION An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms JAN A. SNYMAN University of Pretoria, Pretoria, South Africa Springer -

Upload: others

Post on 16-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

PRACTICAL MATHEMATICAL OPTIMIZATION

An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms

JAN A. SNYMAN University of Pretoria, Pretoria, South Africa

Springer -

Page 2: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Library of Congress Control Number: 2005934835

Printed on acid-free paper.

AMS Subject Classifications: 65K05, 90C30, 90-00

O 2005 Springer Science+Business hledia, Inc.

All rights reserved. This work may not In translated or copied in whole or in part without the written permission of the publisher (Springer Sc~ence+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts m connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology n6)w known or hereafter developed is forbidden.

The use in this publication of trade name.;, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken cs an expression of opinion as to whether or not they are subject to proprietary rights.

Printed in the United States of America.

Page 3: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

To Alta

my wife and friend

Page 4: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 5: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Contents

PREFACE xv

TABLE OF NOTATION xix

1 INTRODUCTION 1

1.1 What is mathematical optimization? . . . . . . . . . . . . 1

1.2 Objective and constraint functions . . . . . . . . . . . . . 4

1.3 Basic optimization concepts . . . . . . . . . . . . . . . . . 6

Simplest class of problems: Unconstrained one-dimensional minimization . . . 6

Contour representation of a function of two vari- ables (n = 2) . . . . . . . . . . . . . . . . . . . . . 7

Contour representation of constraint functions . . 10

Contour representations of constrained optimiza- tion problems . . . . . . . . . . . . . . . . . . . . . 11

Simple example illustrating the formulation and solution of an optimization problem . . . . . . . . 12

Maximization . . . . . . . . . . . . . . . . . . . . . 14

The special case of Linear Programming . . . . . . 14

Page 6: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

viii CONTENTS

. . . . . . . . . . . . . . 1.3.8 Scaling of tiesign variables 15

. . . . . . . . . . . . . 1.4 Further mathematical prerequisites 16

. . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Convexity 16

1.4.2 Gradient vc:ctor of f (x) . . . . . . . . . . . . . . . 18

1.4.3 Hessian matrix of f (x) . . . . . . . . . . . . . . . . 20

1.4.4 The quadr~t ic function in Rn . . . . . . . . . . . . 20

1.4.5 The directimal derivative of f (x) in the direction u 21

1.5 Unconstrained mi1 imization . . . . . . . . . . . . . . . . . 21

1.5.1 Global and local minima; saddle points . . . . . . 23

1.5.2 Local chars.cterization of the behaviour of a multi- . . . . . . . . . . . . . . . . . . . variable fullction 24

1.5.3 Necessary and sufficient conditions for a strong . . . . . . . . . . . . . . . . . local minimum at x* 26

1.5.4 General intlirect method for computing x* . . . . . 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Exercises 31

2 LINE SEARCH DESCENT METHODS FOR UNCON- STRAINED MINIM IZATION 33

2.1 General line searc? descent algorithm for unconstrained . . . . . . . . . . . . . . . . . . . . . . . . . minimization 33

2.1.1 General structure of a line search descent method . 34

. . . . . . . . . . . . . . . . . 2.2 One-dimensional 1 ne search 35

. . . . . . . . . . . . . . . . 2.2.1 Golden section method 36

2.2.2 Powell's qcadratic interpolation algorithm . . . . . 38

. . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Exercises 39

Page 7: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONTENTS ix

2.3 First order line search descent methods . . . . . . . . . . 40

2.3.1 The method of steepest descent . . . . . . . . . . . 40

. . . . . . . . . . . . . 2.3.2 Conjugate gradient methods 43

. . . . . . . . . 2.4 Second order line search descent methods 49

2.4.1 Modified Newton's method . . . . . . . . . . . . . 50

2.4.2 Quasi-Newton methods . . . . . . . . . . . . . . . 51

2.5 Zero order methods and computer optimization subroutines . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Test functions 53

3 STANDARD METHODS FOR CONSTRAINED OPTI- MIZATION 57

3.1 Penalty function methods for constrained minimization . . 57

3.1.1 The penalty function formulation . . . . . . . . . . 58

3.1.2 Illustrative examples . . . . . . . . . . . . . . . . . 58

3.1.3 Sequential unconstrained minimization technique (SUMT) . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1.4 Simple example . . . . . . . . . . . . . . . . . . . . 60

3.2 Classical methods for constrained optimization problems . . . . . . . . . . . . . . . . . . . . 62

3.2.1 Equality constrained problems and the La- grangian function . . . . . . . . . . . . . . . . . . . 62

3.2.2 Classical approach to optimization with inequality constraints: the KKT conditions . . . . . . . . . . 70

3.3 Saddle point theory and duality . . . . . . . . . . . . . . . 73

3.3.1 Saddle point theorem . . . . . . . . . . . . . . . . 73

Page 8: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONTENTS

3.3.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . 74

3.3.3 Duality theorem . . . . . . . . . . . . . . . . . . . 74

. . . . . . . . . . . . . . . . . . . 3.4 Quadratic progra'nming 77

3.4.1 Active set of constraints . . . . . . . . . . . . . . . 78

. . . . . . 3.4.2 The method of Theil and Van de Panne 78

3.5 Modern methods for constrained optimization . . . . . . . 81

3.5.1 The gradient projection method . . . . . . . . . . 81

3.5.2 Multiplier methods . . . . . . . . . . . . . . . . . . 89

. . . . . 3.5.3 Sequential quadratic programming (SQP) 93

4 NEW GRADIENT-BASED TRAJECTORY AND AP- PROXIMATION METHODS 97

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.1.1 Why new dgorithms? . . . . . . . . . . . . . . . . 97

4.1.2 Research :.t the University of Pretoria . . . . . . . 98

4.2 The dynamic trajectory optimization method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.2.1 Basic dynamic model . . . . . . . . . . . . . . . . . 100

4.2.2 Basic a1go:ithm for unconstrained problems (LFOP) 101

4.2.3 Modificati ~n for constrained problems (LFOPC) . 101

. . . . . 4.3 The spherical quadratic steepest descent method 105

. . . . . . . . . . . . . . . . . . . . . 4.3.1 Introductilm 105

4.3.2 Classical steepest descent method revisited . . . . 106

4.3.3 The SQSIl algorithm . . . . . . . . . . . . . . . . . 107

Page 9: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONTENTS xi

4.3.4 Convergence of the SQSD method . . . . . . . . . 108

4.3.5 Numerical results and conclusion . . . . . . . . . . 113

. . . . . . . . . . . . 4.3.6 Test functions used for SQSD 117

. . . . . . . . . . 4.4 The Dynamic-Q optimization algorithm 119

. . . . . . . . . . . . . . . . . . . . . 4.4.1 Introduction 119

. . . . . . . . . . . . . . . 4.4.2 The Dynamic-Q method 119

. . . . . . . . . . 4.4.3 Numerical results and conclusion 123

4.5 A gradient-only line search method for conjugate gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 126

4.5.2 Formulation of optimization problem . . . . . . . . 127

. . . . . . . . . . . . . . 4.5.3 Gradient-only line search 128

4.5.4 Conjugate gradient search directions and SUMT . 133

4.5.5 Numerical results . . . . . . . . . . . . . . . . . . . 135

4.5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . 139

4.6 Global optimization using dynamic search trajectories . . 139

4.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . 139

. . . . . . . . 4.6.2 The Snyman-Fatti trajectory method 141

4.6.3 The modified bouncing ball trajectory method . . 143

4.6.4 Global stopping criterion . . . . . . . . . . . . . . 145

. . . . . . . . . . . . . . . . . . . 4.6.5 Numerical results 148

5 EXAMPLE PROBLEMS 151

5.1 Introductory examples . . . . . . . . . . . . . . . . . . . . 151

Page 10: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

xii CONTENTS

5.2 Line search descent methods . . . . . . . . . . . . . . . . . 157

5.3 Standard methods for constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.3.1 Penalty fuliction problems . . . . . . . . . . . . . . 170

5.3.2 The Lagraligian method applied to . . . . . . . . . . . . equality constrained problems 172

5.3.3 Solution of inequality constrained problems via auxilial y variables . . . . . . . . . . . . . . . . 181

5.3.4 Solution o i inequality constrained problems via the Karush-Kuhn-Tucker conditions . . . . . . . . 184

5.3.5 Solution of constrained problems via . . . . . . . . . . . . the dual p~oblem formulation 191

5.3.6 Quadratic programming problems . . . . . . . . . 194

5.3.7 Application of the gradient projection method . . 199

5.3.8 Applicatior~ of the augmented Lagrangian method 202

5.3.9 Applicatior~ of the sequential quadratic program- ming meth 3d . . . . . . . . . . . . . . . . . . . . . 203

6 SOME THEOREMS 207

6.1 Characterization c f functions and minima . . . . . . . . . 207

6.2 Equality constrained problem . . . . . . . . . . . . . . . . 211

. . . . . . . . . . . . . . . . . 6.3 Karush-Kuhn-Tucser theory 215

. . . . . . . . . . . . . . . . . . . 6.4 Saddle point cond~tions 218

. . . . . . . . . . . . . . . . . 6.5 Conjugate gradient methods 223

. . . . . . . . . . . . . . . . . . . . . . . . . 6.6 DFP method 227

Page 11: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONTENTS xiii

A THE SIMPLEX METHOD FOR LINEAR PROGRAM- MING PROBLEMS 233

A.l Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 233

A.2 Pivoting for increase in objective function . . . . . . . . . 235

A.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

A.4 The auxiliary problem for problem with infeasible origin . 238

A.5 Example of auxiliary problem solution . . . . . . . . . . . 239

A.6 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . 241

A.7 The revised simplex method . . . . . . . . . . . . . . . . . 242

A.8 An iteration of the RSM . . . . . . . . . . . . . . . . . . 244

Page 12: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 13: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Preface

It is intended that this book be used in senior- to graduate-level semester courses in optimization, as offered in mathematics, engineering, com- puter science and operations research departments. Hopefully this book will also be useful to practising professionals in the workplace.

The contents of the book represent the fundamental optimization mate- rial collected and used by the author, over a period of more than twenty years, in teaching Practical Mathematical Optimization to undergradu- ate as well as graduate engineering and science students at the University of Pretoria. The principal motivation for writing this work has not been the teaching of mathematics per se, but to equip students with the nec- essary fundamental optimization theory and algorithms, so as to enable them to solve practical problems in their own particular principal fields of interest, be it physics, chemistry, engineering design or business eco- nomics. The particular approach adopted here follows from the author's own personal experiences in doing research in solid-state physics and in mechanical engineering design, where he was constantly confronted by problems that can most easily and directly be solved via the judicious use of mathematical optimization techniques. This book is, however, not a collection of case studies restricted to the above-mentioned specialized research areas, but is intended to convey the basic optimization princi- ples and algorithms to a general audience in such a way that, hopefully, the application to their own practical areas of interest will be relatively simple and straightforward.

Many excellent and more comprehensive texts on practical mathematical optimization have of course been written in the past, and I am much indebted to many of these authors for the direct and indirect influence

Page 14: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

xvi PREFACE

their work has had in the writing of this monograph. In the text I have tried as far as possible to give due recognition to their contributions. Here, however, I wish to single out the excellent and possibly under- rated book of D. A. Wismc:r and R. Chattergy (1978), which served to introduce the topic of nonli lear optimization to me many years ago, and which has more than casually influenced this work.

With so many excellent texts on the topic of mathematical optimization available, the question can justifiably be posed: Why another book and what is different here? Here I believe, for the first time in a relatively brief and introductory war<, due attention is paid to certain inhibiting difficulties that can occur when fundamental and classical gradient-based algorithms are applied to red-world problems. Often students, after hav- ing mastered the basic theory and algorithms, are disappointed to find that due to real-world complications (such as the presence of noise and discontinuities in the functjons, the expense of function evaluations and an excessive large number ~f variables), the basic algorithms they have been taught are of little value. They then discard, for example, gradient- based algorithms and resort to alternative non-fundamental methods. Here, in Chapter 4 on newr gradient-based methods, developed by the author and his co-workers the above mentioned inhibiting real-world difficulties are discussed, m d it is shown how these optimization dif- ficulties may be overcome without totally discarding the fundamental gradient-based approach.

The reader may also find the organisation of the material in this book somewhat novel. The first three chapters present the basic theory, and classical unconstrained and constrained algorithms, in a straightforward manner with almost no formal statement of theorems and presentation of proofs. Theorems are of course of importance, not only for the more mathematically inclined students, but also for practical people inter- ested in constructing and developing new algorithms. Therefore some of the more important fundamental theorems and proofs are presented separately in Chapter 6. Where relevant, these theorems are referred to in the first three chapters. Also, in order to prevent cluttering, the presentation of the basic material in Chapters 1 to 3 is interspersed with very few worked out examples. Instead, a generous number of worked out example problems are presented separately in Chapter 5, in more or less the same order as the presentation of the corresponding theory

Page 15: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

PREFACE xvii

given in Chapters 1 to 3. The separate presentation of the example problems may also be convenient for students who have to prepare for the inevitable tests and examinations. The instructor may also use these examples as models to easily formulate similar problems as additional exercises for the students, and for test purposes.

Although the emphasis of this work is intentionally almost exclusively on gradient-based methods for non-linear problems, the book will not be complete if only casual reference is made to the simplex method for solving Linear Programming (LP) problems (where of course use is also made of gradient information in the manipulation of the gradient vector c of the objective function, and the gradient vectors of the constraint functions contained in the matrix A). It was therefore decided to in- clude, as Appendix A, a short introduction to the simplex method for LP problems. This appendix introduces the simplex method along the lines given by Chvatel (1983) in his excellent treatment of the subject.

The author gratefully acknowledges the input and constructive com- ments of the following colleagues to different parts of this work: Nielen Stander, Albert Groenwold, Ken Craig and Danie de Kock. A special word of thanks goes to Alex Hay. Not only did he significantly contribute to the contents of Chapter 4, but he also helped with the production of most of the figures, and in the final editing of the manuscript. Thanks also to Craig Long who assisted with final corrections and to Alna van der Merwe who typed the first @QX draft.

Jan Snyman

Pretoria

Page 16: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 17: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Table of notat ion

min. min

n-dimensional Euclidean (real) space (superscript only) transpose of a vector or matrix column vector of variables, a point in Rn X = [ ~ 1 , ~ 2 ) . . . ,xnIT element in the set objective function local optimizer optimum function value jth inequality constraint function vector of inequality constraint functions jth equality constraint function vector of equality constraint functions set of continuous differentiable functions set of continuous and twice continuous differen- tiable functions minimize w.r.t. x

vectors corresponding to points 0,1,. . . set of elements x such that . . . first partial derivative w.r.t. x,

ah1 ah2 ah, = [- - ,... axi ' ax* , -1 ax, as1 a92 as,]^ = [- - a ~ , ~ a ~ , ~ ~ ~ ~ ~ axi

first derivative o~e ra to r

af af a f T gradient vector = [- (x), - (XI, . . . , -(.)I axl ax2 8% (here g not to be confused with the inequality con- straint function vector)

Page 18: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

TABLE OF NOTATION

lim i--+w

d2 second derivative operator (elements -

axidxj )

Hessian matrix (second derivative matrix)

directional derivative at x in the direction u

subset of absolul e value Euclidean norm of vector approximately equal line sezrrch function first order divided difference second order divided difference scalar :~roduct of vector a and vector b identit., matrix jth au>.iliary variable Lagran gian function jth Lagrange multiplier vector of Lagrange multipliers exists implie. set set of constraints violated a t x empty set augmented Lagrange function maximum of a and zero

n x r .lacobian matrix = [Vhl, Vhz,. . . , Vh,]

n x m Jacobian matrix = [Vgl, Vgz,. . . , Vg,]

slack \ ariable vector of slack variables determinant of matrix A of interest in Ax = b determinant of matrix A with j th column replaced by b limit as i tends to infinity

Page 19: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 1

INTRODUCTION

1.1 What is mat hemat ical optimization?

Formally, Mathematical Optimization is the process of

(i) the formulation and

(ii) the solution of a constrained optimization problem of the general mathematical form:

minimize f (x), x = [ x l , Q,. . . , xnIT E Rn w.r.t. x

subject to the constraints:

where f(x), gj(x) and h j ( x ) are scalar functions of the real column vector x.

The continuous components x, of x = [xl, xz, . . . , x,IT are called the (design) variables, f ( x ) is the objective function, g j (x ) denotes the re- spective inequality constraint functions and h j ( x ) the equality constraint functions.

Page 20: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

2 CHAPTER 1.

The optimum vector x tha; solves problem (1.1) is denoted by x* with corresponding optimum fur ction value f (x*). If no constraints are spec- ified, the problem is called an unconstrained minimization problem.

Mathematical Optimizatior~ is often also called Nonlinear Programming, Mathematical Programminc or Numerical Optimization. In more general terms Mathematical Optimization may be described as the science of de- termining the best solutions to mathematically defined problems, which may be models of physical -eality or of manufacturing and management systems. In the first case solutions are sought that often correspond to minimum energy configurations of general structures, from molecules to suspension bridges, and are therefore of interest to Science and En- gineering. In the second case commercial and financial considerations of economic importance to Society and Industry come into play, and it is required to make decisicm that will ensure, for example, maximum profit or minimum cost.

The history of the Mathem %tical Optimization, where functions of many variables are considered, i:, relatively short, spanning roughly only 55 years. At the end of the 1940s the very important simplex method for solving the special clas:, of linear programming problems was devel- oped. Since then numerous methods for solving the general optimization problem (1.1) have been developed, tested, and successfully applied to many important problems c~f scientific and economic interest. There is no doubt that the advent of the computer was essential for the development of these optimization methods. However, in spite of the proliferation of optimization methods, the] e is no universal method for solving all opti- mization problems. According to Nocedal and Wright (1999): ". . . there are numerous algorithms, each of which is tailored to a particular type of optimization problem. It i; often the user's responsibility to choose an algorithm that is appropriate for the specific application. This choice is an important one; it m.ty determine whether the problem is solved rapidly or slowly and, indted, whether the solution is found at all." In a similar vein Vanderplaat: (1998) states that "The author of each algo- rithm usually has numericid examples which demonstrate the efficiency and accuracy of the methcd, and the unsuspecting practitioner will of- ten invest a great deal of t me and effort in programming an algorithm, only to find that it will not in fact solve the particular problem being attempted. This often leads to disenchantment with these techniques

Page 21: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 3

that can be avoided if the user is knowledgeable in the basic concepts of numerical optimization." With these representative and authoritative opinions in mind, and also taking into account the present author's per- sonal experiences in developing algorithms and applying them to design problems in mechanics, this text has been written to provide a brief but unified introduction to optimization concepts and methods. In addition, an overview of a set of novel algorithms, developed by the author and his students at the University of Pretoria over the past twenty years, is also given.

The emphasis of this book is almost exclusively on gradient-based meth- ods. This is for two reasons. (i) The author believes that the introduc- tion to the topic of mathematical optimization is best done via the clas- sical gradient-based approach and (ii), contrary to the current popular trend of using non-gradient methods, such as genetic algorithms (GA's), simulated annealing, particle swarm optimization and other evolutionary methods, the author is of the opinion that these search methods are, in many cases, computationally too expensive to be viable. The argument that the presence of numerical noise and multiple minima disqualify the use of gradient-based methods, and that the only way out in such cases is the use of the above mentioned non-gradient search techniques, is not necessarily true. It is the experience of the author that, through the judicious use of gradient-based methods, problems with numerical noise and multiple minima may be solved, and at a fraction of the compu- tational cost of search techniques such as genetic algorithms. In this context Chapter 4, dealing with the new gradient-based methods devel- oped by the author, is especially important. The presentation of the material is not overly rigorous, but hopefully correct, and should pro- vide the necessary information to allow scientists and engineers to select appropriate optimization algorithms and to apply them successfully to their respective fields of interest.

Many excellent and more comprehensive texts on practical optimization can be found in the literature. In particular the author wishes to ac- knowledge the works of Wismer and Chattergy (1978)' Chvatel (1983), Fletcher (1987), Bazaraa et al. (1993), Arora (1989), Haftka and Giirdal (1992), Rao (1996), Vanderplaats (1998), Nocedal and Wright (1999) and Papalambros and Wilde (2000).

Page 22: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

4 CHAPTER 1.

1.2 Objective and constraint functions

The values of the functions f(x), gj(x) and hj(x) at any point x = [xl, xz, . . . , x,IT, may in practice be obtained in different ways:

(i) from analytically known formulae, e.g. f (x) = xq + 2xi + sinx3;

(ii) as the outcome of some complicated computational process, e.g. gl(x) = a(x) -a,,, where a(x) is the stress, computed by means of a finite element andysis, a t some point in a structure, the design of which is specified ' ~ y x; or

(iii) from measurements tirken of a physical process, e.g. hl(x) = T(x)- To, where T(x) is the temperature measured a t some specified point in a reactor, a r d x is the vector of operational settings.

The first two ways of funct.ion evaluation are by far the most common. The optimization principles that apply in these cases, where computed function values are used, may be carried over directly to also be applica- ble to the case where the f lnction values are obtained through physical measurements.

Much progress has been mfi.de with respect to methods for solving differ- ent classes of the general problem (1.1). Sometimes the solution may be obtained analytically, i.e. 2 closed-form solution in terms of a formula is obtained.

In general, especially for n > 2, solutions are usually obtained numeri- cally by means of suitable algorithms (computational recipes):

Expertise in the formulatzon of appropriate optimization problems of the form (1.1), through which an optimum decision can be made, is gained from ezperience. This exercise also forms part of what is gener- ally known as the mathem,ztical modelling process. In brief, attempting to solve real-world problems via mathematical modelling requires the cyclic performance of the .'our steps depicted in Figure 1.1. The main steps are: 1) the observation and study of the real-world situation as- sociated with a practical problem, 2) the abstraction of the problem by the construction of a mat lematical model, that is described in terms

Page 23: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

evaluation of x' (p ) : Adjustment of p?

Mathematical solution to model: x*(P)

t t + t t r Optimization algorithms 1 1 Mathematical methods andcomputer programs 1

Figure 1.1 : The mathematical modelling process

of preliminary fixed model parameters p, and variables x , the latter to be determined such that model performs in an acceptable manner, 3) the solution of a resulting purely mathematical problem, that requires an analytical or numerical parameter dependent solution x*(p), and 4) the evaluation of the solution x*(p) and its practical implications. Af- ter step 4) it may be necessary to adjust the parameters and refine the model, which will result in a new mathematical problem to be solved and evaluated. It may be required to perform the modelling cycle a number of times, before an acceptable solution is obtained. More often than not, the mathematical problem to be solved in 3) is a mathematical optimization problem, requiring a numerical solution. The formulation of an appropriate and consistent optimization problem (or model) is probably the most important, but unfortunately, also the most neglected part of Practical Mathematical Optimization.

This book gives a very brief introduction to the formulation of opti- mization problems, and deals with different optimization algorithms in greater depth. Since no algorithm is generally applicable to all classes of problems, the emphasis is on providing sufficient information to al- low for the selection of appropriate algorithms or methods for different specific problems.

Page 24: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

6 CHAPTER 1

Figure 1.2: Function of single variable with optimum at x*

1.3 Basic optimization concepts

1.3.1 Simplest class of problems: Unconstrained one-dimensional minimization

Consider the minimization of a smooth, i.e. continuous and twice con- tinuously differentiable ( C 2 ) function of a single real variable, i.e. the problem:

minimize f (x), x E R, f E c2. w.r.t. c

(1.2)

With reference to Figure 1.2, for a strong local minimum, it is required to determine a x* such thc.t f (x*) < f (x) for all x.

Clearly x* occurs where tke slope is zero, i.e. where

which corresponds to the first order necessary condition. In addition non-negative curvature is necessary a t x*, i.e. it is required that the second order condition

must hold at x* for a strong local minimum.

A simple special case is where f (x) has the simple quadratic form:

2 ~ ' ( x ) = ax + bx + c. (1.3)

Page 25: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 7

Since the minimum occurs where fl(x) = 0, it follows that the closed- form solution is given by

b x* = -- provided fl1(x*) = a > 0.

2a' (1-4)

I f f (x) has a more general form, then a closed-form solution is in general not possible. In this case, the solution may be obtained numerically via the Newton-Raphson algorithm:

Given an approximation xO, iteratively compute:

fl(x') xi+l = xa . f "(xi) '

i = O , 1, 2, ...

Hopefully lim x' = x*, i.e. the iterations converge, in which case a 2--00

sufficiently accurate numerical solution is obtained after a finite number of iterations.

1.3.2 Contour representation of a function of two vari- ables (n = 2)

Consider a function f (x) of two variables, x = [xl, x2lT. The locus of all points satisfying f (x) = c = constant, forms a contour in the XI- x2 plane. For each value of c there is a corresponding different contour.

Figure 1.3 depicts the contour representation for the example f (x) = x:: + 22;.

In three dimensions (n = 3), the contours are surfaces of constant func- tion value. In more than three dimensions (n > 3) the contours are, of course, impossible to visualize. Nevertheless, the contour representation in two-dimensional space will be used throughout the discussion of opti- mization techniques to help visualize the various optimization concepts.

Other examples of 2-dimensional objective function contours are shown in Figures 1.4 to 1.6.

Page 26: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 1.

Figure 1.3: Contour rep] esentation of the function f (x) = x! + 22;

Figure 1.4 : General quadratic function

Page 27: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

Figure 1.5:

Figure 1.6: Potential energy function of a spring-force system (Vander- plaats, 1998)

Page 28: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

10 CHAPTER 1.

Xl

Figure 1.7: Contour:; within feasible and infeasible regions

1.3.3 Contour representation of constraint functions

1.3.3.1 Inequality constraint function g(x)

The contours of a typical inequality constraint function g(x), in g(x) 5 0, are shown in Figure 1.7. The contour g ( x ) = 0 divides the plane into a feasible region and an iqreasible region.

More generally, the bounclary is a surface in three dimensions and a so-called "hyper-surface" i ' n > 3, which of course cannot be visualised.

1.3.3.2 Equality const raint function h(x)

Here, as shown in Figure 1 8, only the line h(x ) = 0 is a feasible contour.

Page 29: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 11

Figure 1.8: Feasible contour of equality constraint

x*; f (x*) = 12 (constrained minimum)

d " ) ( 0 (feas~ble region)

un mi

region)

Figure 1.9: Contour representation of inequality constrained problem

1.3.4 Contour representations of constrained optimiza- tion problems

1.3.4.1 Representation of inequality constrained problem

Figure 1.9 graphically depicts the inequality constrained problem:

min f (x)

such that g(x) 5 0.

Page 30: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 1.

(feasible line (surface))

(constrained minimum)

/

unconstrair. minimum

Figure 1.10: Contour representation of equality constrained problem

Figure 1.11: Wire divided into two pieces with X I = x and xz = 1 - x

1.3.4.2 Representation of equality constrained problem

Figure 1.10 graphically de1)icts the equality constrained problem:

min f (x)

such that h(x) = 0.

1.3.5 Simple example illustrating the formulation and so- lution of an optimization problem

Problem: A length of wire 1 meter long is to be divided into two pieces, one in a circular shape anri the other into a square as shown in Figure 1.11. What must the individual lengths be so that the total area is a minimum?

Formulation 1

Page 31: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 13

Set length of first piece = x , then the area is given by f ( x ) = nr2 + b2. Since r = & and b = it follows that

The problem therefore reduces to an unconstrained minimization prob- lem:

minimize f ( x ) = 0.1421x2 - 0.1252 + 0.0625.

Solution of Formulation 1

The function f ( x ) is quadratic, therefore an analytical solution is given by the formula x* = - ( a > 0 ) :

and 1 - x* = 0.5602 m with f ( x * ) = 0.0350 m2.

Formulation 2

Divide the wire into respective lengths xl and x2 ( x l + 2 2 = 1) . The area is now given by

2 2 x: f ( x ) = nr + b = n + ( Y ) ~ = 0.07962: + 0.06252$.

Here the problem reduces to an equality constrained problem:

minimize f ( x ) = 0 .0796~: + 0 . 0 6 2 5 ~ ;

such that h(x) = X I + 2 2 - 1 = 0 .

Solution of Formulation 2

This constrained formulated problem is more difficult to solve. The closed-form analytical solution is not obvious and special constrained optimization techniques, such as the method of Lagrange multipliers to be discussed later, must be applied to solve the constrained problem analytically. The graphical solution is sketched in Figure 1.12.

Page 32: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

14 CHAPTER 1 .

Figure 1.12: Graphical solution of Formulation 2

1.3.6 Maximization

The maximization problem : max f (x) can be cast in the standard form X

(1.1) by observing that m u f (x) = - mini- f (x)) as shown in Figure X X

1.13. Therefore in applying a minimization algorithm set F(x) = - f (x).

Also if the inequality con:,traints are given in the non-standard form: gj(x) 2 0, then set jj(x) = -gj(x). In standard form the problem then becomes:

minimizc~F(x) such that j j (x) 5 0.

Once the minimizer x* is obtained, the maximum value of the original maximization problem is given by - F(x*).

1.3.7 The special case of Linear Programming

A very important special class of the general optimization problem arises when both the objective function and all the constraints are linear func-

Page 33: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTROD UCTION 15

Figure 1.13: Maximization problem transformed to minimization prob- lem

tions of x. This is called a Linear Programming problem and is usually stated in the following form:

T min f (x) = c x X

such that

A x s b ; x > O

where c is a real n-vector and b is a real rn-vector, and A is a m x n real matrix. A linear programming problem in two variables is graphically depicted in Figure 1.14.

Special methods have been developed for solving linear programming problems. Of these the most famous are the simplex method proposed by Dantzig in 1947 (Dantzig, 1963) and the interior-point method (Kar- markar, 1984). A short introduction to the simplex method, according to Chvatel (1983), is given in Appendix A.

1.3.8 Scaling of design variables

In formulating mathematical optimization problems great care must be taken to ensure that the scale of the variables are more or less of the same order. If not, the formulated problem may be relatively insensitive to the variations in one or more of the variables, and any optimization algo- rithm will struggle to converge to the true solution, because of extreme distortion of the objective function contours as result of the poor scaling. In particular it may lead to difficulties when selecting step lengths and

Page 34: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

16 CHAPTER 1.

linear constraints

5 1

Figure 1.14: Graphical re~resentation of a two-dimensional linear pro- gramming problem

calculating numerical gradients. Scaling difficulties often occur where the variables are of differer t dimension and expressed in different units. Hence it is good practice, .f the variable ranges are very large, to scale the variables so that all the variables will be dimensionless and vary between 0 and 1 approximt~tely. For scaling the variables, it is necessary to establish an approximate range for each of the variables. For this, take some estimates (based on judgement and experience) for the lower and upper limits. The values of the bounds are not critical. Another related matter is the scaling or normalization of constraint functions. This becomes necessary whenever the values of the constraint functions differ by large magnitudes

1.4 f i r t her mat hemat ical prerequisites

1.4.1 Convexity

A line through the points x1 and x2 in Rn is the set

L = (xlx = x1 + X(X~ - xl), for all X E R).

Page 35: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

Figure 1.15: Representation of a point on the straight line through x1 and x2

convex non-convex

Figure 1.16: Examples of a convex and a non-convex set

Equivalently for any point x on the line there exists a X such that x may be specified by x = x(A) = Ax2 + (1 - A)xl as shown in Figure 1.15.

1.4.1.1 Convex sets

A set X is convex if for all x', xZ E X it follows that

x = Xx2 + (1 - X)xl E X for all 0 5 X 5 1.

If this condition does not hold the set is non-convex (see Figure 1.16).

Page 36: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

18 CHAPTER 1.

1.4.1.2 Convex functions

Given two points x1 and x' in Rn, then any point x on the straight line connecting them (see Figure 1.15) is given by

x = x(X) = x1 + X(x2 - xl), 0 < X < 1. (1.8)

A function f (x) is a convex function over a convex set X if for all xl, x2 in X and for all X E [0, 11:

f (Xx2 + (1 - 4)x1) I A f (x2) + (1 - A) f (xl). (1.9)

The function is strictly convex if < applies. Concave functions are sim- ilarly defined.

Consider again the line connecting x1 and x2. Along this line, the func- tion f (x) is a function of t le single variable A:

F(X) = f (x(X)) = f (xl + X(x2 - xl)). (1.10)

This is equivalent to F(X) = f (Xx2 + (1 - X)xl), with F(0) = f (xl) and F(l) = f (x2). Therefore (1.9) may be written as

F(X) < ) F(l) + (1 - X ) F ( O ) = Ftnt

where Fint is the linearly interpolated value of F at X as shown in Figure 1.17.

Graphically f (x) is convez over the convex set X if F ( X ) has the convex form shown in Figure 1.17 for any two points x1 and x2 in X .

1.4.2 Gradient vectclr of f (x)

For a function f (x) E C2 there exists, a t any point x a vector of first order partial derivatives, or gradient vector:

Page 37: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

Figure 1.17: Convex form of F(X)

Figure 1.18: Directions of the gradient vector

It can easily be shown that if the function f(x) is smooth, then a t the point x the gradient vector V f (x) (also often denoted by g(x)) is always perpendicular to the contours (or surfaces of constant function value) and is in the direction of mmirnum increase of f (x), as depicted in Figure 1.18.

Page 38: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

20 CHAPTER 1 .

1.4.3 Hessian matrix of f (x)

If f (x) is twice continuou:ly differentiable then at the point x there exists a matrix of second ol.der partial derivatives or Hessian matrix:

-- O Z f (x) Oz2O::1

Clearly H(x) is a n x n sylnmetrical matrix.

1.4.3.1 Test for convexity of f (x)

If f(x) E c2 is defined over a convex set X , then it can be shown (see Theorem 6.1.3 in Chalker 6) that if H(x) is positive-definite for all x E X , then f (x) is strictl!~ convex over X.

To test for convexity, i.e. to determine whether H(x) is positive-definite or not, apply Sylvester's Theorem or any other suitable numerical method (Fletcher, 1987).

1.4.4 The quadratic function in Rn

The quadratic function in rl variables may be written as

f (x I = ~ X ~ A X + bTx + c (1.13)

where c E R, b is a real n- vector and A is a n x n real matrix that can be chosen in a non-unique manner. It is usually chosen symmetrical in which case it follows that

Page 39: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 2 1

The function f (x) is called positive-definite if A is positive-definite since, by the test in 1.4.3.1, a function f (x ) is convex if H(x) is positive- definite.

1.4.5 The directional derivative of f (x) in the direction u

It is usually assumed that llull = 1. Consider the differential:

A point x on the line through x t in the direction u is given by x = x(X) = xt + Xu, and for a small change dX in A, d x = udX. Along this line F(X) = f (xt + Xu) and the differential at any point x on the given line in the direction u is therefore given by d F = df = vTf(x)udA It follows that the directional derivative at x in the direction u is

1.5 Unconstrained minimization

In considering the unconstrained problem: min f(x) , x E X C Rn, the X

following questions arise:

(i) what are the conditions for a minimum to exist,

(iii) are there any relative minima?

Figure 1.19 (after Farkas and Jarmai, 1997) depicts different types of minima that may arise for functions of a single variable, and for functions of two variables in the presence of equality constraints. Intuitively, with reference to Figure 1.19, one feels that a general function may have a single unique global minimum, or it may have more than one local minimum. The function may indeed have no local minimum at all, and in two dimensions the possibility of saddle points also comes to mind.

Page 40: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 1.

I C D

I*

D

X* x x global minimum unique global no minimum

minimum

local minimum local minimum lob 11 minimum lobal minimum

ible Ion

feasible

--t

2 1

pi cal minimum

Figure 1.19: Types of minima

Page 41: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

f (4

Figure 1.20: Graphical representation of the definition of a local mini- mum

Thus, in order to answer the above questions regarding the nature of any given function more analytically, it is necessary to give more precise meanings to the above mentioned notions.

1.5.1 Global and local minima; saddle points

1.5.1.1 Global minimum

x* is a global minimum over the set X if f ( x ) 2 f ( x * ) for all x E X c W n .

1.5.1.2 Strong local minimum

x* is a strong local minimum if there exists an E > 0 such that

f ( x ) > f ( x * ) for all {xlllx - x*ll < E )

where 1 1 . 1 1 denotes the Euclidean norm. This definition is sketched in Figure 1.20.

Page 42: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

24 CHAPTER 1.

1.5.1.3 Test for uniquc local global minimum

It can be shown (see Theonms 6.1.4 and 6.1.5 in Chapter 6) that if f (x) is strictly convex over X, t i en a strong local minimum is also the global minimum.

The global minimizer can b.3 difficult to find since the knowledge of f (x) is usually only local. Most ninimization methods seek only a local min- imum. An approximation t ~ ) the global minimum is obtained in practice by the multi-start application of a local minimizer from randomly se- lected different starting points in X. The lowest value obtained aftcr a sufficient number of trials is then taken as a good approximation to the global solution (see Snyman and Fatti, 1987; Groenwold and Snyman, 2002). If, however, it is kn~)wn that the function is strictly convex over X, then only one trial is sufficient since only one local minimum, the global minimum, exists.

1.5.1.4 Saddle points

f (x) has a saddle point at Z = [ $ ] if there exists an E > 0 such that

fora l lx , ( ( x - x 0 ( 1 < ~ a n c a l ly , I / y - y O 1 ( < ~ : f ( ~ , ~ O ) s f (x0 ,y0 )< f (xO, Y).

A contour representation of a saddle point in two dimensions is given in Figure 1.21.

1.5.2 Local characterization of the behaviour of a multi- variable function

It is assumed here that f (x) is a smooth function, i.e., that it is a twice continuously differentiable function (f (x) E C2). Consider again the line x = x(X) = x' + Xu through the point x' in the direction u.

Along this line a single val iable function F(X) may be defined:

Page 43: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 25

Figure 1.21: Contour representation of saddle point

It follows from (1.16) that

which is also a single variable function of X along the line x = x(X) = x' + Xu.

Summarising: the first and second order derivatives of F ( X ) with respect to X a t any point x = x(X) on any line (any u) through x' is given by

Page 44: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

26 CHAPTER 1.

where x(X) = x' + Xu and F ( X ) = f (x(X)) = f (x' + Xu).

These results may be usec to obtain Taylor's expansion for a multi- variable function. Consider again the single variable function F(X) de- fined on the line through x' in the direction u by F(X) = f (x' + Xu). It is known that the Taylor expansion of F(X) about 0 is given by

With F ( 0 ) = f (x'), and substituting expressions (1.17) and (1.18) for respectively F1(X) and F"(4) at X = 0 into (1.19) gives

F(X) = f ( x ' + Xu) = f : X I ) + v T f ( x ' ) X u + f r X u T ~ ( x ' ) h + . . . Setting 6 = Au in the abov. gives the expansion:

f (x' + 6 ) = f (x') -t vTf (x1)6 + $ d T ~ ( x ' ) 6 + . . . (1.20)

Since the above applies f o ~ any line (any u) through x', it represents the general Taylor expansion for a multi-variable function about x'. If f ( x ) is fully continuously differentiable in the neighbourhood of x' it can be shown that the truncated second order Taylor expansion for a multi-variable function is given by

for some 0 E [0, 11. This expression is important in the analysis of the behaviour of a multi-variak'le function at any given point x'.

1.5.3 Necessary and sufficient conditions for a strong lo- cal minimum at x*

In particular, consider x' == x* a strong local minimizer. Then for any line (any u) through xf the behaviour of F(X) in a neighbourhood of x* is as shown in Figure 1.22, with minimum at at X = 0.

Clearly, a necessary first order condztion that must apply at x* (corre- sponding to X = 0 ) is that

Page 45: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION

Figure 1.22: Behaviour of F(X) near A = 0

It can easily be shown that this condition also implies that necessarily v f(x*) = 0.

A necessary second order condition that must apply at x * is that

d2F(0) -- dX2

- uT~(x*)u > 0, for all u # 0.

Conditions (1.22) and (1.23) taken together are also suficient conditions (i.e. those that imply) for x* to be a strong local minimum if f (x) is continuously differentiable in the vicinity of x*. This can easily be shown by substituting these conditions in the Taylor expansion (1.21).

Thus in summary, the necessary and suficient conditions for x* to be a strong local minimum are:

V f (x*) = 0 H(x*) positive-definite.

In the argument above it has implicitly been assumed that x* is an unconstrained minimum interior to X. If x* lies on the boundary of X (see Figure 1.23) then

for all allowable directions u, i.e. for directions such that x* + Xu E X for arbitrary small X > 0.

Conditions (1.24) for an unconstrained strong local minimum play a very important role in the construction of practical algorithms for un- constrained optimization.

Page 46: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

28 CHAPTER 1.

Figure 1.23: Behaviour of F(A) for all allowable directions of u

1.5.3.1 Application t o t h e quadrat ic funct ion

Consider the quadratic function:

f (x) = i x T ~ x + bTx + c.

In this case the first order necessary condition for a minimum implies that

V f ( x ) = A x + b = O .

Therefore a candidate solu ;ion point is

If the second order necessary condition also applies, i.e. if A is positive- definite, then x* is a u n i q ~ e minimizer.

1.5.4 General indirect method fo r c o m p u t i n g x*

The general indirect m e t h ~ d for determining x* is to solve the system of equations V f (x) = 0 (corresponding to the first order necessary condition in(1.24)) by sonie numerical method, to yield all stationary points. An obvious methc.d for doing this is Newton's method. Since in general the system will be non-linear, multiple stationary points are possible. These stationary points must then be further analysed in order to determine whether or n3t they are local minima.

Page 47: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 29

1.5.4.1 Solution by Newton's method

Assume x* is a local minimum and xi an approximate solution, with associated unknown error 6 such that x* = xi + 6. Then by applying Taylor's theorem and the first order necessary condition for a minimum a t x* it follows that

0 = V f (x*) = v f (xi + 6) = v f (xi) + H(X')G + 0116)1~.

If xi is a good approximation then 6 A A , the solution of the linear system ~ ( x y A + v f (xi) = 0, obtained by ignoring the second order term in 6 above. A better approximation is therefore expected to be xi+l = xi + A which leads to the Newton iterative scheme: Given an

initial approximation xo, compute

for i = 0, 1, 2, . . . Hopefully lim x' = x* a+m

1.5.4.2 Example of Newton's method applied to a quadratic problem

Consider the unconstrained problem:

minimize f (x) = : x T ~ x + bTx + c.

In this case the first iteration in (1.27) yields

= xO - A - ~ ( A ~ O + b) = xO - xO - A-lb = - ~ - l b

i.e. x1 = x* = -AW1b in a single step (see (1.26)). This is to be expected since in this case no approximation is involved and thus A = 6.

1.5.4.3 Difficulties with Newton's method

Unfortunately, in spite of the attractive features of the Newton method, such as being quadratically convergent near the solution, the basic New- ton method as described above does not always perform satisfactorily. The main difficulties are:

Page 48: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

30 CHAPTER 1.

Figure 1.24: Graphical representation of Newton's iterative scheme for a single variable

(i) the method is not ahrays convergent, even if x0 is close to x*, and

(ii) the method requires the computation of the Hessian matrix at each iteration.

The first of these difficulties may be illustrated by considering Newton's method applied to the onc-dimensional problem: solve fl(x) = 0. In this case the iterative sche:ne is

and the solution corresponds to the fixed point x* where x* = d(x*). Unfortunately in some casta, unless xO is chosen to be exactly equal to x*, convergence will not necessarily occur. In fact, convergence is de- pendent on the nature of the fixed point function d(x) in the vicinity of x*, as shown for two diTerent q5 functions in Figure 1.24. With ref- erence to the graphs Newton's method is: yi = d(x9, xi+' = yi for i = 0, 1, 2, . . . . Clearly in the one case where lc#~'(x)l < 1 convergence occurs, but in the other case where 1q5'(x)1 > 1 the scheme diverges.

In more dimensions the situation may be even more complicated. In addition, for a large numbcr of variables, difficulty (ii) mentioned abovc becomes serious in that tlle computation of the Hessian matrix repre- sents a major task. If thc Hessian is not available in analytical form, use can be made of automatic differentiation techniques to compute it,

Page 49: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INTRODUCTION 3 1

or it can be estimated by means of finite differences. It should also be noted that in computing the Newton step in (1.27) a n x n linear system must be solved. This represents further computational effort. There- fore in practice the simple basic Newton method is not recommended. To avoid the convergence difficulty use is made of a modified Newton method, in which a more direct search procedure is employed in the direction of the Newton step, so as to ensure descent to the minimum x*. The difficulty associated with the computation of the Hessian is addressed in practice through the systematic update, from iteration to iteration, of an approximation to the Hessian matrix. These improve- ments to the basic Newton method are dealt with in greater detail in the next chapter.

1.6 Exercises

1.6.1 Sketch the graphical solution to the following problem:

minx f ( x ) = ( X I - 2)2 + (x2 - 2)2

such that X I + 2x2 = 4; x1 > 0; 2 2 2 0.

In particular indicate the feasible region: F = { ( x l , x z ) ( x l + 2x2 = 4;

X I 2 0; x2 2 0) and the solution point x * .

1.6.2 Show that x2 is a convex function.

1.6.3 Show that the sum of convex functions is also convex.

1.6.4 Determine the gradient vector and Hessian of the Rosenbrock func- tion:

f ( x ) = 100(x2 - x ; ) ~ + ( 1 - x ~ ) ~ .

1.6.5 Write the quadratic function f ( x ) = x: + 2 ~ ~ x 2 + 3s; in the stan- dard matrix-vector notation. Is f ( x ) positive-definite?

1.6.6 Write each of the following objective functions in standard form:

f ( x ) = $ X ~ A X + bTx + C,

Page 50: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 1.

(ii) f (x) = 5%: + 1 2 ~ ~ x 2 - 16x123 + lox; - 26x223 + 172; - 2x1 - 4x2 - 6x3.

(iii) f (x) = x: - 4x1 c2 + 6x1x3 + 52; - 10x2x3 + 8xg. 1.6.7 Determine the definiteness of the following quadratic form:

f (x) = X: - 4x1x2 + 6 ~ ~ x 3 + 5s; - 10~2x3 + 82;.

Page 51: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 2

LINE SEARCH DESCENT METHODS FOR UNCONSTRAINED MINIMIZATION

2.1 General line search descent algorithm for unconstrained minimization

Over the last 40 years many powerful direct search algorithms have been developed for the unconstrained minimization of general functions. These algorithms require an initial estimate to the optimum point, de- noted by xO. With this estimate as starting point, the algorithm gen- erates a sequence of estimates xO, x l , x2, . . . , by successively searching directly from each point in a direction of descent to determine the next point. The process is terminated if either no further progress is made, or if a point xk is reached (for smooth functions) a t which the first necessary condition in (1.24), i.e. V f (x) = 0 is sufficiently accurately satisfied, in which case x* 2 xk. It is usually, although not always, required that the function value at the new iterate xi+' be lower than that at x'.

Page 52: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

34 CHAPTER 8.

An important sub-class of direct search methods, specifically suitable for smooth functions, are the sc-called line search descent methods. Basic to these methods is the selectimm of a descent direction ui+' at each iterate xi that ensures descent a t xi in the direction ui+l , i.e. it is required that the directional derivat we in the direction ui+l be negative:

The general structure of such descent methods is given below.

2.1.1 General structure of a line search descent method

1. Given starting point x0 and positive tolerances 61 , €2 and €3, set 2 = 1.

2. Select a descent direction ui (see descent condition (2.1).

3. Perform a one-dimensional line search in direction u': i.e.

mir F(X) = min f (xi-' + Xui) X X

to give minimizer X i .

4. Set xi = xi-' + Xiui

5. Test for convergence:

else go to Step 6.

6. Set i = i + 1 and go ;o Step 2.

In testing for termination in step 5, a combination of the stated termi- nation criteria may be used, i.e. instead of or, and may be specified. The structure of the abovc descent algorithm is depicted in Figure 2.1.

Different descent methods within the above sub-class, differ according to the way in which the descent directions ui are chosen. Another impor- tant consideration is the method by means of which the one-dimensional line search is performed.

Page 53: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 35

Figure 2.1: Sequence of line search descent directions and steps

2.2 One-dimensional line search

Clearly, in implementing descent algorithms of the above type, the one- dimensional minimization problem:

is an important sub-problem. Here the minimizer is denoted by A*, i.e.

F(X*) = min F(X). X

Many one-dimensional minimization techniques have been proposed and developed over the years. These methods differ according to whether they are to be applied to smooth functions or poorly conditioned func- tions. For smooth functions interpolation methods, such as the quadratic interpolation method of Powell (1964) and the cubic interpolation algo- rithm of Davidon (1959), are the most efficient and accurate methods. For poorly conditioned functions, bracketing methods, such as the Fi- bonacci search method (Kiefer, 1957), which is optimal with respect to the number of function evaluations required for a prescribed accuracy, and the golden section method (Walsh, 1975), which is near optimal but much simpler and easier to implement, are preferred. Here Pow- ell's quadratic interpolation method and the golden section method, are respectively presented as representative of the two different approaches that may be adopted to one-dimensional minimization.

Page 54: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 2.

a X I A* A2 b

Figure 2.2: Unimoqial function F(X) over interval [a, b]

2.2.1 Golden section method

I t is assumed that F(X) is unimodal over the interval [a, b] , i.e. that it has a minimum A* within the nterval and that F(X) is strictly descending for X < A* and strictly ascmding for X > A*, as shown in Figure 2.2.

Note that if F(X) is unimorlal over [a, b] with A * in [a, b], then to deter- mine a sub-unimodal interval, a t least two evaluations of F(X) in [a, b] must be made as indicated in Figure 2.2.

If F(X2) > F(X1) + new u:limodal interval = [a, Xz], and set b = X2 and select new X2; otherwise nt,w unimodal interval = [Xi, b] and set a = X1

and select new XI.

Thus, the unimodal interv;tl may successively be reduced by inspecting values of F(X1) and F(X2) at interior points XI and X2.

The question arises: How can X1 and X2 be chosen in the most eco- nomic manner, i.e. such that a least number of function evaluations are required for a prescribed accuracy (i.e. for a specified uncertainty inter- val)? The most economic method is the Fibonacci search method. It is however a complicated me;hod. A near optimum and more straightfor- ward method is the golden section method. This method is a limiting form of the Fibonacci seaxh method. Use is made of the golden ratio r when selecting the value3 for A1 and X2 within the unimodal interval. The value of r corresponds - to the positive root of the quadratic equation: r2 + T - 1 = 0, thus r = y F = 0.618034.

Page 55: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 37

Figure 2.3: Selection of interior points X 1 and A2 for golden section search

The details of the selection procedure are as follows. Given initial uni- modal interval [a, b] of length Lo, then choose interior points X 1 and X 2 as shown in Figure 2.3.

Then, if F(Xl ) > F(X2) * new [a, b] = [ A l , b] with new interval length L1 = rLo, and

i f F(X2) > F(X1) new [a, b] = [a, X z ] also with L1 = rLo.

The detailed formal algorithm is stated below.

2.2.1.1 Basic golden section algorithm

Given interval [a, b] and prescribed accuracy E ; then set i = 0; Lo = b-a, and perform the following steps:

1. Set X I = a + r 2 ~ o ; A2 = a + rLo.

2. Compute F(X1) and F(X2); set i = i + 1.

3. If F(X1) > F(X2) then

set a = A1; X l = A2; Li = (b - a); and A2 = a + rL,,

else

set b = A2; A2 = X I ; L, = ( b - a); and X 1 = a + r 2 ~ i .

4. If Li < E then

+ a , compute F(A*) and STOP, set A* = -. 2

else go to Step 2.

Page 56: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

38 CHAPTER 2.

Figure 2.4: Approximatc minimum Am via quadratic interpolation

2.2.2 Powell's quadratic interpolation algorithm

In Powell's method successive quadratic interpolation curves are fitted to function data giving a 5equence of approximations to the minimum point A*.

With reference to Figure 2.1, the basic idea is the following. Given three data points {(A,, F(Xi)), i = 1,2,3), then the interpolating quadratic polynomial through these points p2(A) is given by

where F [ , ] and F[ , , ] r:spectively denote the first order and second order divided differences.

The turning point of p2(X) occurs where the slope is zero, i.e. where

dp2 - -- dX

which gives the turning point Am as

with the further conditior that for a minimum the second derivative must be non-negative, i.e. FIXo, Xi, X2] > 0.

The detailed formal algori.,hm is as follows.

Page 57: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 39

2.2.2.1 Powell's interpolation algorithm

Given starting point Xo, stepsize h, tolerance E and maximum stepsize H; perform following steps:

1. Compute F(Xo) and F(Xo + h).

2. If F(Xo) < F(Xo + h) evaluate F(Xo - h),

else evaluate F(Ao + 2h). (The three initial values of X so chosen constitute the initial set (Xo, XI, Xz) with corresponding function values F(Xi), i = 0,1,2.)

3. Compute turning point A, by formula (2.4) and test for minimum or maximum.

4. If Am a minimum point and IX, - XnI > H, where A, is the nearest point to A,, then discard the point furthest from A, and take a step of size H from the point with lowest value in direction of descent, and go to Step 3;

zf A, a maximum point, then discard point nearest A, and take a step of size H from the point with lowest value in the direction of descent and go to Step 3;

else continue.

5. If /Arn - X,I < E then F(X*) S min[F(Xm), F(Xn)] and STOP,

else continue.

6. Discard point with highest F value and replace it by A,; go to Step 3

Note: It is always safer to compute the next turning point by interpola- tion rather than by extrapolation. Therefore in Step 6: if the maximum value of F corresponds to a point which lies alone on one side of A,, then rather discard the point with highest value on the other side of A,.

2.2.3 Exercises

Apply the golden section method and Powell's method to the problems below. Compare their respective performances with regard to the num-

Page 58: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

40 CHAPTER 2.

ber of function evaluation required to attain the prescribed accuracies.

(i) minimize F(X) = X2 +- 2e-X over [O, 21 with E = 0.01.

(ii) maximize F(X) = XcosX over [O, ~ / 2 ] with E = 0.001.

(iii) minimize F(X) = 4(X--7)/(X2+X-2) over [-1.9; 0.91 by performing only 10 function evaluations.

(iv) minimize F(X) = A* - - 20X3 + 0.1X over [0;20] with E =

2.3 First order line search descent methods

Line search descent metho'ls (see Section 2.1.1), that use the gradient vector Vf(x) to determine the search direction for each iteration, are called first order methods because they employ first order partial deriva- tives o f f (x) to compute the search direction at the current iterate. The simplest and most famous of these methods is the method of steepest descent, first proposed by Cauchy in 1847.

2.3.1 The method of' steepest descent

In this method the direction of steepest descent is used as the search direction in the line search descent algorithm given in section 2.1.1. The expression for the directiorl of steepest descent is derived below.

2.3.1.1 The direction of steepest descent

At x' we seek the unit vector u such that for F(X) = f (x' + Xu), the directional derivative

assumes a minimum valut, with respect to all possible choices for the unit vector u at x' (see F i y r e 2.5).

Page 59: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 41

Figure 2.5: Search direction u relative to gradient vector a t x'

By Schwartz's inequality:

vTf(xl)u > -IIvf(xl)IIIIuII = -IIVf(xt)II = least value.

Clearly for the particular choice u = -vf the directional derivative IlVf (xl)ll

a t x' is given by

Thus this particular choice for the unit vector corresponds to the direc- tion of steepest descent.

The search direction

u = -Vf ( 4 I l Vf (4 II

is called the nomalzzed steepest descent direction at x.

2.3.1.2 Steepest descent algori thm

Given xO, do for iteration i = 1'2,. . . until convergence:

-Of(xi-') 1. set u' =

IlVf (x"-l)ll

2. set X" xi-' + Xiui where Xi is such that

F(X,) = f (xi-' + Xiui) = min x f (x'-' + Xui) (line search).

Page 60: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

42 CHAPTER 2.

Figure 2.6: Orthogonality of successive steepest descent search directions

2.3.1.3 Characteristic property

Successive steepest descent search directions can be shown to be orthog- onal. Consider the line search through xi-' in the direction ui to give xi. The condition for a mi~iimum at Xi, i.e. for optimal descent, is

and with ui+l = Vf(x') it follows that ui+lTui = 0 as shown in

Figure 2.6. IlVf (xi)ll

2.3.1.4 Convergence criteria

In practice the algorithm is terminated if some convergence criterion is satisfied. Usually termination is enforced at iteration i if one, or a combination, of the following criteria is met:

where €1, ~2 and ~3 are pr :scribed small positive tolerances.

Page 61: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 43

Figure 2.7: Sensitivity of finite difference approximation to 6j

2.3.1.5 Gradients by Anite differences

Often the components of the gradient vector is not analytically available in which case they may be approximated by forward finite differences:

where aj = [O, 0, . . . dj, 0 , . . . , 0IT, 6j > 0 in the j-th position.

Often hj ZE 6 for all j = 1,2,. . . ,n . A typically choice is 6 = If however "numerical noise" is present in the computation of f(x), special care should be taken in selecting 6j. This may require doing some numerical experiments such as, for example, determining the sensitivity of approximation (2.6) to dj, for each j. Typically the sensitivity graph obtained is as depicted in Figure 2.7, and for the implementation of the optimization algorithm a value for 6j should be chosen which corresponds to a point on the plateau as shown in Figure 2.7. Better approximations, at of course greater computational expense, may be obtained through the use of central finite differences.

2.3.2 Conjugate gradient methods

In spite of its local optimal descent property the method of steepest de- scent often performs poorly, following a zigzagging path of ever decreas-

Page 62: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

44 CHAPTER 2.

u6

Figure 2.8: Orthogonal zigzagging behaviour of the steepest descent method

ing steps. This results in slow convergence and becomes extreme when the problem is poorly scaled, i.e. when the contours are extremely elon- gated. This poor performance is mainly due to the fact that the method enforces successive orthogcnal search directions (see Section 2.3.1.3) as shown in Figure 2.8. Altkough, from a theoretical point of view, the method can be proved to be convergent, in practice the method may not effectively converge wi;hin a finite number of steps. Depending on the starting point this poor convergence also occurs when applying the method to positive-definite quadratic functions.

There is, however, a class of first order line search descent methods, known as conjugate gradient methods, for which it can be proved that whatever the scaling, a method from this class will converge exactly i n a finite number of iterations when applied to a positive-definite quadratic function, i.e. to a function of the form

where c E R, b is a real n-vector and A is a positive-definite n x n real symmetric matrix. Nethods that have this property of quadratic termination are highly rattld, because they are expected to also perform well on other non-quadratic functions in the neighbourhood of a local minimum. This is so, because by the Taylor expansion (1.21)) it can be seen that many general differentiable functions approximate the form (2.7) near a local minimun.

Page 63: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

2.3.2.1 Mutually conjugate directions

Two vectors u, v # 0 are defined to be orthogonal if the scalar product uTv = (u,v) = 0. The concept of mutual conjugacy may be defined in a similar manner. Two vectors u, v # 0, are defined to be mutually conjugate with respect to the matrix A in (2.7) if uTAv = (u, Av) = 0. Note that A is a positive definite symmetric matrix.

It can also be shown (see Theorem 6.5.1 in Chapter 6 ) that if the set of vectors ui, i = 1,2,. . . , n are mutually conjugate, then they form a basis in Wn, i.e. any x E Rn may be expressed as

where (ui , Ax)

Xi = (u" Aui) '

2.3.2.2 Convergence theorem for mutually conjugate direc- tions

Suppose ui, i = 1,2,. . . , n are mutually conjugate with respect to positive-definite A, then the optimal line search descent method in Sec- tion 2.1.1, using ui as search directions, converges to the unique mini- mum x* of f (x) = $ X ~ A X + bTx + c in less than or equal to n steps.

Proof:

If x0 the starting point, then after i iterations:

Xa = + xiUi = X'-2 +X,-~U"'+X~U" . . .

The condition for optimal descent at iteration i is

Page 64: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

46 CHAPTER 2.

because ui, i = 1,2,. . . , n .ire mutually conjugate, and thus

Substituting (2.11) into (2.10) above gives

Now by utilizing (2.8) and (2.9) it follows that

The implication of the ab3ve theorem for the case n = 2 and where mutually conjugate line search directions u1 and u2 are used, is depicted in Figure 2.9.

2.3.2.3 Determination of mutual ly conjugate direct ions

How can mutually conjugate search directions be found? One way is to determine all the eigenvectors u', i = 1,2, . . . , n of A. For A positive-definite, all the eif;envectors are mutually orthogonal and since Aui = piui where p, is the associated eigenvalue, it follows directly that for all i # j that (ui, Auj ) = ( U ~ , ~ ~ U ~ ) = pj(ui, uj) = 0, i.e. the eigenvectors are mutually conjugate with respect to A. It is, however, not very practical to deterrnine mutually conjugate directions by finding all the eigenvectors of A , since the latter task in itself represents a com- putational problem of magnitude equal to that of solving the original unconstrained optimizatioli problem via any other numerical algorithm. An easier method for obtaining mutually conjugate directions, is by means of the Fletcher-Ree ~ e s formulae (Fletcher and Reeves, 1964).

Page 65: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 47

Figure 2.9: Quadratic termination of the conjugate gradient method in two steps for the case n = 2

2.3.2.4 The Fletcher-Reeves directions

The Fletcher-Reeves directions ui, i = 1,2, . . . , n, that are listed below, can be shown (see Theorem 6.5.3 in Chapter 6) to be mutually conjugate with respect to the matrix A in the expression for the quadratic function in (2.7) (for which V f (x) = Ax + b). The explicit directions are:

and for i = 1,2, . . . , n - 1

where xi = xi-' + Xiui, and Xi corresponds to the optimal descent step in iteration i, and

The Polak-Ribiere directions are obtained if, instead of using (2.13), Pi is computed using

I f f (x) is quadratic it can be shown (Fletcher, 1987) that (2.14) is equiv- alent to (2.13).

Page 66: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

48 CHAPTER 2.

2.3.2.5 Formal Fletcher-Reeves conjugate gradient algori thm for general functions

Given x0 perform the following steps:

1. Compute v f (xO) ant l set u1 = - v f (xO).

2. F o r i = 1 , 2 ,..., n d o :

2.1 set xi = xi-' + .iiui where Xi such that

f (xG1 + X ~ U " = =in f (xi-' + Xui) (line search), X

2.2 compute V f (xi #, 2.3 if convergence CI iteria satisfied, then STOP and x* S xZ, else

go to Step 2.4.

2.4 if 1 5 i 5 n - ., ui+' = - ~ f ( x " + +,ui with Pi given by (2.13).

3. Set x0 = xn and go t:, Step 2 (restart).

If pi is computed by (2.14) instead of (2.13) the method is known as the Polak-Ribiere method.

2.3.2.6 Simple illustrative example

Apply the Fletcher-Reeves method to minimize

1 2 f (:c) = 5x1 + x1x2 + 2; with x0 = [lo, -5IT.

Solution:

Iteration 1:

+ x2 anti therefore u1 = -V f (xO) = vf = [ XI + 2x2 ]

Page 67: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 49

x1 = xO + Xu1 = [ lo--; ] and

F(X) = f (xO + XU') = :(lo - 5 ~ ) ~ + (10 - 5X)(-5) + 25.

For optimal descent

( A ) = $1 = -5(10 - 5A) + 25 = 0 (line search). u 1

This gives A1 = 1, x1 = [ -: ] and V j ( x l ) = [ -: 1, Iteration 2:

x2 = x1 + xu2 = [ -;I + A [ -;I = [ ] and -5(1 - A)

F(A) = f (x l + Au2) = :[25(1- - 50(1 - x ) ~ + 50(1 - x ) ~ ] .

Again for optimal descent

%(A) = $ = -25(1 - A) = 0 (line search).

This gives A2 = 1, x2 = [ ] and v f (x2) = [ 1. Therefore STOP.

The two iteration steps are shown in Figure 2.10.

2.4 Second order line search descent methods

These methods are based on Newton's method (see Section 1.5.4.1) for solving V f (x) = 0 iteratively: Given xO, then

xi = Xi-' - H- (x "')v f (xi-'), i = 1,2,. . . (2.15)

As stated in Chapter 1, the main characteristics of this method are:

1. In the neighbourhood of the solution it may converge very fast, in

Page 68: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

50 CHAPTER 2.

Figure 2.10: Convergence cf Fletcher-Reeves method for illustrative ex- ample

fact it has the very de:iirous property of being quadratically conver- gent if it converges. IJnfortunately convergence is not guaranteed and it may sometimes diverge, even from close to the solution.

2. The implementation of the method requires that H(x) be evalu- ated a t each step.

3. To obtain the Newt01 step, A = xi - xi-' it is also necessary to solve a n x n linear r.ystem H(x)A = -V f (x) at each iteration. This is computationdly very expensive for large n, since an or- der n3 multiplication operations are required to solve the system numerically.

2.4.1 Modified Newton's method ,

To avoid the problem of convergence (point 1. above), the computed Newton step A is rather used as a search direction in the general line search descent algorithm ,~iven in Section 2.1.1. Thus at iteration i: select ui = A = -H-'(x"~)v f (xi-'), and minimize in that direction to obtain a A i such that

f (xi--' + &ui) = min f (xi-' + X U ~ ) X

and then set xi = xi-' + ).iui.

Page 69: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS 5 1

2.4.2 Quasi-Newton methods

To avoid the above mentioned computational problems (2. and 3.), methods have been developed in which approximations to H-' are ap- plied at each iteration. Starting with an approximation G o to H-' for the first iteration, the approximation is updated after each line search. An example of such a method is the Davidon-Fletcher-Powell (DFP) method.

2.4.2.1 DFP quasi-Newton me thod

The structure of this (rank-1 update) method (Fletcher, 1987) is as follows.

1. Choose x0 and set G o = I .

2. Do for iteration i = 1,2, . . . , n:

2.1 set xi = xi-' + x ~ u ~ , where ui = - G ~ - ~ v ~ (xi-') and Xi is such that f (xi-' + Xiui) = rnin f (xi-' t Xu", Xi 3 0 (line

X search),

2.2 if stopping criteria satisfied then STOP, x* Z xi,

2.3 set v' = Xiui and

set yi = V f (xi) - V f (xi-'),

2.4 set G i = Gi-1 + Ai + Bi (rank 1-update) (2.16)

ViViT - G ~ - ~ ~ ~ ( ( G ~ - ~ ~ ~ ) ~ where Ai = m1 Bi = y i T ~ . ' v Y z - 1 ~ '

3. Set x0 = xn; Go = G n (or Go = I), and go to Step 2 (restart).

2.4.2.2 Characteris t ics of DFP method

1. The method does not require the evaluation of H or the explicit solution of a linear system.

Page 70: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

52 CHAPTER 2.

2. If Gi-1 is positive-de:inite then so is G i (see Theorem 6.6.1).

3. If Gi is positive-definite then descent is ensured at xi because

= -vr f (xi))civ f (xi) < 0, for all V f (x) # 0.

4. The directions uZ, i = 1,2, . . . ,n are mutually conjugate for a quadratic function w ~ t h A positive-definite (see Theorem 6.6.2). The method therefore possesses the desirable property of quadratic termination (see Section 2.3.2).

5. For quadratic functions: G, = A - ~ (see again Theorem 6.6.2).

2.4.2.3 The BFGS method

The state-of-the-art quasi-Newton method is the Broyden-Fletcher-Gold- farb-Shanno (BFGS) metk od developed during the early 1970s (see Fletcher, 1987). This method uses a more complicated rank-2 update formula for H-'. For this method the update formula to be used in Step 2.4 of the algorithm given n Section 2.4.2.1 becomes

2.5 Zero order rrlethods and computer optimization subroutines

This chapter would not be complete without mentioning something about the large number of so-called zero order methods that have been developed. These method:, are called such because they do not use ei- ther first order or second o?der derivative information, but only function values, i.e. only zero order derivative information.

Page 71: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Zero order methods are of the earliest methods and many of them are based on rough and ready ideas without very much theoretical back- ground. Although these ad hoc methods are, as one may expect, much slower and computationally much more expensive than the higher order methods, they are usually reliable and easy to program. One of the most successful of these methods is the simplex method of Nelder and Mead (1965). This method should not be confused with the simplex method for linear programming. Another very powerful and popular method that only uses function values is the multi-variable method of Powell (1964). This method generates mutually conjugate directions by performing sequences of line searches in which only function evaluations are used. For this method Theorem 2.3.2.2 applies and the method therefore possesses the property if quadratic termination.

Amongst the more recently proposed and modern zero order methods, the method of simulated annealing and the so-called genetic algorithms (GAS) are the most prominent (see for example, Haftka and Giindel, 1992).

Computer programs are commercially available for all the unconstrained optimization methods presented in this chapter. Most of the algorithms may, for example, be found in the Matlab Optimization Toolbox and in the ZMSL and NAG mathematical subroutine libraries.

2.6 Test functions

The efficiency of an algorithm is studied using standard functions with standard starting points xO. The total number of functions evaluations required to find x* is usually taken as a measure of the efficiency of the algorithm. Some classical test functions (Rao, 1996) are listed below.

1. Rosenbrock's parabolic valley:

f (x) = 100(x2 - x : ) ~ + ( 1 - x1)2; x0 = [-;:;I x*= [ ; I . 2. Quadratic function:

f (x) = ( 2 1 + 2x2 - 7)2 + (2x1 + 2 2 - 5)2; x0 = [:] x*= [:I.

Page 72: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

54 CHAPTER 2.

3. Powell's quartic funct on:

4. Fletcher and Powell's helical valley:

5 2 arctan - if x1 > 0 where 27rB(xl, c2) =

x 1

x0 = [ - l ,O,OIT; x* = [1,0,0IT.

5. A non-linear function of three variables:

6. Reudenstein and Rot h function:

7. Powell's badly scaled function:

f (x) = (10000x1~2 - 1)2 + (exp(-zl) + exp(-x2) - 1.0001)~; xo = [O, 1IT; X' = [1.098.. . x 9.106.. .lT.

8. Brown's badly scaled function:

f (x) = (21 - 1 0 ~ ) ~ + (22 - 2 X I O - ~ ) ~ + (x1z2 - 2)2; xO = [1, l j T ; x* = [lo6, 2 x lo"?

Page 73: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

LINE SEARCH DESCENT METHODS

xo = [-3, -1, -3, -1IT; x* = [l, 1,1, llT

Beale's function:

f (x) = (1.5 - xi(1 - 2 ~ ) ) ~ + (2.25 - z l ( l - 2%))' +(2.625 - xl(1 - x;) )~;

d' = (1, llT; x* = [3,0.51T.

Wood's function:

f (x) = 100(~2 - 2:)' + (1 - ~ 1 ) ~ + 9 0 ( ~ 4 - x:)~ + (1 - x3) 2

+10(22 + 2 4 - 2)' + O.~(ZZ - ~ 4 ) ~

Page 74: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 75: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 3

STANDARD METHODS FOR CONSTRAINED OPTIMIZATION

3.1 Penalty function met hods for constrained minimization

Consider the general constrained optimization problem:

minimize f (x) X

such that gj(x) 5 0 j = 1,2 , . . . , m hj(x)=O j = 1 , 2 , ... ,r .

The most simple and straight forward approach to handling constrained problems of the above form is to apply a suitable unconstrained op- timization algorithm to a penalty function fornulation of constrained problem (3.1).

Page 76: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

58 CHAPTER 3.

3.1.1 The penalty function formulation

The penalty function formulation of the general constrained problem (3.1) is

minimize P (x ) X

where m

and where the component:, of the penalty parameter vectors p and P are given by

The latter parameters, pj and ,Oj, are called penalty parameters, and P ( x , p, 0 ) the penalty func :ion. The solution to this unconstrained min- imization problem is denoted by x * ( ~ , @ ) , where p and P denote the respective vectors of penal..y parameters.

Often pj r constant E p, for all j , and also pj r p for all j such that gj(x) > 0. Thus P in (3.2) is denoted by P (x ,p ) and the correspond- ing minimum by x*(p). It can be shown that under normal continuity conditions the lim x*(p) =: x*. Typically the overall penalty parameter

P--+W p is set at p = lo4 if the c:onstraints functions are normalized in some sense.

3.1.2 Illustrative examples

Consider the following two one-dimensional constrained optimization problems:

( 4 min f (x) such that h(x) = x - a = 0,

then P (x )= , " (x )+p(x -a )2 ;

and

Page 77: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 59

P(x) = f (x) + p(x - a)' P(x) = f (x) + p(x - b)2

Figure 3.1: Behaviour of penalty function for one-dimensional (a) equal- ity constrained, and (b) inequality constrained minimization problems

(b) min f (x) such that g(x) = x - b 5 0,

then P ( x ) = f ( ~ ) + P ( x - b ) ~

The penalty function solutions to these two problems are as depicted in Figures 3.1 (a) and (b). The penalty function method falls in the class of external methods because it converges externally from the infeasible region.

3.1.3 Sequential unconstrained minimization technique (SUMT)

Unfortunately the penalty function method becomes unstable and in- efficient for very large p if high accuracy is required. This is because rounding errors result in the computation of unreliable descent direc- tions. If second order unconstrained minimization methods are used for the minimization of P ( x , p), then the associated Hessian matrices be- come ill-conditioned and again the method is inclined to break down. A remedy to this situation is to apply the penalty function method to a sequence of sub-problems, starting with moderate penalty parameter

Page 78: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

60 CHAPTER 3.

values, and successively increasing their values for the sub-problems. The details of this approac I (SUMT) is as follows.

SUMT algorithm:

1. Choose tolerances €1 and €2, starting point x0 and initial overall penalty parameter value po, and set k = 0.

2. Minimize P(x , pk) b ~ , any unconstrained optimization algorithm to give x*(pk).

3. If (for k > 0) the convergence criteria are satisfied: STOP

i.e. stop if Ilx*(pk) - ~ * ( p ~ - ~ ) 1 1 < €1

and/or IP(x*(plc-1) - - P(x*(pk))l < ~ 2 ,

else

set pk+l = cpk, c > 1 and x0 = x*(pk),

set k = k + 1 and go to Step 2.

Typically choose po = 1 and c = 10.

3.1.4 Simple example

Consider the constrained F roblem:

min .f(x) = i ( x l + 1)3 + x2

such that 1 - x1 I 0; -x2 5 0.

The problem is depicted ir Figure 3.2.

Define the penalty functiox

(Of course the p only comc:s into play if the corresponding constraint is violated.) The penalty fur ction solution may now be obtained analyti- cally as follows.

Page 79: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION

Figure 3.2: Contour representation of objective function and feasible region for simple example

The first order necessary conditions for an unconstrained minimum of P are

From (3.4): x;(p) = -1. 2~

and from (3.3): x: + 2(1 + p)xl + 1 - 2p = 0.

Solving the quadratic equation and taking the positive root gives

Clearly lim x;(p) = 0 and lim xf (p) = 1 (where use has been made of P--W P--J

the expansion (1+&)'12 = 1+ ; E + . . . ) giving x* = [I, 0IT, as one expects from Figure 3.2. Here the solution to the penalty function formulated

Page 80: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

62 CHAPTER 3.

problem has been obtained analytically via the first order necessary con- ditions. In general, of course, the solution is obtained numerically by applying a suitable unconstrained minimization algorithm.

3.2 Classical met hods for constrained optimization problems

3.2.1 Equality constrained problems and the L a g r a n g i a n func t ion

Consider the equality cons1 rained problem:

minimize f (x) suchthat Flj(x)=O, j = 1 , 2 ,..., r < n .

(3.5)

In 1760 Lagrange transforned this constrained problem to an uncon- strained problem via the introduction of so-called Lagrange multipliers Aj, j = 1,2, . . . , r in the fo~mulation of the Lagrangian function:

T

L(x, A) = f (x) + C Xjhj(x) = f (x) + ATh(x). (3.6) j=l

The necessary conditions fclr a constrained minimum of the above equal- ity constrained problem may be stated in terms of the Lagrangian func- tion and the Lagrange multipliers.

3.2.1.1 Necessary contiitions for a n equality constrained min- imum

Let the functions f and h j E C1 then, on the assumption that the n x r Jacobian matrix

is of rank r , the necessary conditions for x* to be a constrained internal local minimum of the equality constrained problem (3.5) is that x* cor- responds to a stationary point (x*, A*) of the Lagrangian function, i.e.

Page 81: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZA TION 63

that a vector A* exists such that

For a formal proof of the above, see Theorem 6.2.1.

3.2.1.2 TheLagrang ian me thod

Note that necessary conditions (3.7) represent n+r equations in the n+r unknowns x;, x i , . . . , x;, A;, . . . ,A;. The solutions to these, in general non-linear equations, therefore give candidate solutions x* to problem (3.5). This indirect approach to solving the constrained problem is il- lustrated in solving the following simple example problem.

3.2.1.3 Example

minimize f (x) = (xl - 2)2 + (x2 - 2)2 such that h(x) = XI + 2 2 - 6 = 0.

First formulate the Lagrangian:

By the Theorem in Section 3.2.1.1 the necessary conditions for a con- strained minimum are

Solving these equations gives a candidate point: x i = 3, xg = 3, A* = -2 with f (x*) = 2. This solution is depicted in Figure 3.3

Page 82: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 3.

Figure 3.3: Graphicill solution to example problem 3.2.1.3

3.2.1.4 Sufflcient conditions

In general the necessary c,mditions (3.7) are not sufficient to imply a constrained local minimum at x*. A more general treatment of the suf- ficiency conditions is however very complicated and will not be discussed here. It can, however, be sf own that i ff (x) and the hj(x) are all convex functions, with XJ > 0 for 211 j , then conditions (3.7) indeed constitute sufficiency conditions. In this case the local constrained minimum is unique and represents the $obal minimum.

3.2.1.5 Saddle point of the Lagrangian function

Assume

(i) f (x) has a constraint d minimum at x* (with associated A*) and

(ii) that if A is chosen in the neighbourhood of A*, then L(x, A) has a local minimum with respect to x in the neighbourhood of x*. (This can be expected if the Hessian matrix of L with respect to x at (x*, A*) is positive definite.)

Page 83: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION

Figure 3.4: Saddle point of Lagrangian function L = x2 + A(x - 1)

It can be shown (see Theorem 6.2.2) that if (i) and (ii) applies then L(x, A) has a saddle point at (x*, A*). Indeed it is a degenerate saddle point since

L(x, A*) 2 L(x*, A*) = L(x*, A).

Consider the example:

2 minimize f (x) = x such that h(x) = z - 1 = 0.

With L = x2 + A(x - 1) it follows directly that x* = 1 and A* = -2.

Since along the straight line asymptotes through (x*, A*), AL = 0 for changes Ax and AA, it follows that

Ax(Ax + AA) = 0.

The asymptotes therefore are the lines through (x*, A*) with Ax = 0, and 2 = -1 respectively, as shown in Figure 3.4, i.e. the lines x = 1 and A = -a:- 1.

In general, if it can be shown that candidate point (x*, A*) is a saddle point of L, then the Hessian of the Lagrangian with respect to x , HL, at the saddle point is positive definite for a constrained minimum.

Page 84: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

66 CHAPTER 3.

3.2.1.6 Special case: quadratic function with linear equality constraints

From a theoretical point of view, an important application of the La- grangian method is to the n~inimization of the positive-definite quadratic function:

f (x: = : x T ~ x + bTx + c (3.8)

subject to the linear constraints

Here A is a n x n positive-definite matrix and C a r x n constraint matrix, r < n, b is a n-vec~or and d a r-vector.

In this case the Lagrangiar. is

and the necessary conditions (3.7) for a constrained minimum at x* is the existence of a vector A' such that

V,L(x*, A') = Ax* + b + c T A * = 0

VxL(xe,X") = CX*-d=O

The solution to this linear system is given by

3.2.1.7 Inequality constraints as equality constraints

Consider the more general problem:

minimize f (x) such that gj(x) 1 0 , j = 1,2 ,..., m

hj(x) = 0, j = 1,2, . .. , r .

Page 85: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 67

The inequality constraints may be transformed to equality constraints by the introduction of so-called auxiliary variables B j , j = 1 , 2 , . . . , m:

Since g j ( x ) = -9; 5 0 for all j, the inequality constraints are autornat- ically satisfied.

The Lagrangian method for equality constrained problems may now be applied, where

and X j and Pj denote the respective Lagrange multipliers.

Fkom (3.7) the associated necessary conditions for a minimum at x are

aL - = g j ( ~ ) + ~ ~ = O , j = 1 , 2 ,..., m ax,

The above system (3.12) represents a system of n + 2772 + r simultaneous non-linear equations in the n + 2 m + r unknowns x , 8, X and p. Obtain- ing the solutions to system (3.12) yields candidate solutions x* to the general optimization problem (3.10). The application of this approach is demonstrated in the following example.

3.2.1.8 Example

Minimize f ( x ) = 2s: - 3x; - 2z1

such that x! + x; 5 1 by making use of auxiliary variables.

Page 86: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

68 CHAPTER 3.

Introduce auxiliary variabh: 8 such that

x:! + x; - 1 + e2 = 0

then

L(x, 8, A) = 22: -- 3x5 - 2x1 + A(X? + si - 1 + e2).

The necessary conditions at the minimum are

As first choice in (3.15), select X = 0 which gives x l = 112, 2 2 = 0, and e2 = 314.

Since O2 > 0 the problem s unconstrained a t this point. Further since

H = [ -: ] is non-definite the candidate point x0 corresponds to

a saddle point where f (xO: = -0.5.

Select as second choice in I 3.15),

2 2 19 = 0 which gives xl + x 2 - 1 = 0 (3.17)

i.e. the constraint is active. F'rom (3.14) it follows that for x2 # 0, X = 3, and substituting into (3.13) gives XI = 115 and from (3.17): x2 =

f a / 5 = k0.978 which g ve the two possibilities: x* = (1,q) , f (x*) =

q) , with f (x*) = -3.189. -3.189 and x* = (i,

3.2.1.9 Direction of asymptotes at a saddle point x0

If, in the example above, the direction of an asymptote a t saddle point x0 is denoted by the unit vector u = [ U ~ , U ~ ] ~ , then for a displacement

Page 87: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZA TION

x2

4 asymptote K

Figure 3.5: Directions of asymptotes at a saddle point x0

Ax = u along the asymptote the change in the function value is A f = 0. It follows from the Taylor expansion that

for step Ax = u at saddle point xO. Since H =

it follows that 2uf - 3u; = 0 and also, since llull = 1:

Solving for u1 and uz in the above gives

which, taken in combinations, correspond to the directions of the four asymptotes at the saddle point xO, as shown in Figure 3.5.

Page 88: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

70 CHAPTER 3.

3.2.2 Classical approach to optimization with inequality constraints: thc KKT conditions

Consider the primal problein ( P P ) :

minimize f ( x )

suchthat g j ( x ) < O , j = 1 , 2 , . . . , m. (3.18)

Define again the Lagrangia n:

m

L ( x I > ) = f ( x ) + C ~ j g j ( x ) . (3.19) j=1

Karush (1939) and Kuhn and Tucker (1951) independently derived the necessary conditions that must be satisfied at the solution x* of the primary problem (3.18), These conditions are generally known as the KKT conditions which are ttxpressed in terms of the Lagrangian L ( x , A).

3.2.2.1 The KKT necessary conditions for an inequality con- strained mininmm

Let the functions f and gj E C1, and assume the existence of Lagrange multipliers A*, then a t the ~ o i n t x* , corresponding to the solution of the primal problem (3.18), the following conditions must be satisfied:

a f m agj - ( x * ) + C > ; - ( x * ) = 0 , i = 1 , 2 , ..., n

ax , j=1 dxi

For a formal proof of the above see Theorem 6.3.1. It can be shown that the KKT conditions also constitute suficient conditions (those that imply that) for x* to be a constrained minimum, if f ( x ) and the g j ( x ) are all convex functions.

Let x* be a solution to prclblem (3.18), and suppose that the KKT con- ditions (3.20) are satisfied. If now gk(x*) = 0 for some k E { 1 , 2 , . . . , m),

Page 89: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 7 1

then the corresponding inequality constraint k is said to be active and binding at x*, if the corresponding Lagrange multiplier A; 2 0. It is strongly active if A; > 0, and weakly active if A; = 0. However, if for some candidate KKT point x, gk(%) = 0 for some k, and all the K K T conditions are satisfied except that the corresponding Lagrange multi- plier Xk < 0, then the inequality constraint k is said to be inactive, and must be deleted form the set of active constraints at x.

3.2.2.2 Constraint qualification

It can be shown that the existence of A* is guaranteed if the so-called constraint quali.,fication is satisfied at x*, i.e. if a vector h E Rn exists such that for each active constraint j at x*

then A* exists.

The constraint qualification (3.21) is always satisfied

(i) if all the constraints are convex and at least one x exists within the feasible region, or

(ii) if the rank of the Jacobian of all active and binding constraints at X* is maximal, or

(iii) if all the constraints are linear.

3.2.2.3 Illustrative example

Minimize f (x) = (zl - 2)2 + x: such that XI > 0, 2 2 L 0, (1 - x ~ ) ~ L 22.

Clearly minimum exists at XI = 1, xz = 0 where

g ~ ( x * ) = -x; < 0, g2(x*) = -x; = 0 and g3(x*) = x; - (1 - = 0

and therefore 92 and g3 are active at x*

Page 90: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

72 CHAPTER 3.

~easible region vg3

X*

Figure 3.6: Fz.ilure of constraint qualification

giving Vg2 = -Vg3 at x* as shown in Figure 3.6.

Thus no h exists that satisfies the constraint qualification (3.21). There- fore the application of KKI' theory to this problem will break down since no A* exists.

Also note that the Jacobian of the active constraints at x* = [1, OjT:

bg - (x*) = bx

is of rank 1 < 2, and therefore not maximal which also indicates that A* does not exist, and therefore the problem cannot be analysed via the KKT conditions.

3.2.2.4 Simple exampre of application of KKT conditions

Minimize f (x) = (xi - + (x2 - 2)2

such that XI 2 0, x2 2 0, ::I + 22 5 2 and 22 - x1 = 1.

Page 91: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 73

In standard form the constraints are:

-XI 5 0, -x2 I 0, XI + 352 - 2 I 0, and x2 - xl - 1 = o

and the Lagrangian, now including the equality constraint with Lagrange multiplier p as well:

The KKT conditions are

In general the approach to the solution is combinatorial. Try different possibilities and test for contradictions. One possible choice is X3 # 0 that implies xi + 22 - 2 = 0. This together with the equality constraint gives

This candidate solution satisfies all the KKT conditions and is indeed the optimum solution. Why? The graphical solution is depicted in Figure 3.7.

3.3 Saddle point theory and duality

3.3.1 Saddle point theorem

If the point (x*, A*), with A* 2 0, is a saddle point of the Lagrangian associated with the primal problem (3.18) then x* is a solution to the primal problem. For a proof of this statement see Theorem 6.4.2 in Chapter 6.

Page 92: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

74 CHAPTER 3.

minimum)

Figure 3.7: Graphical solution to example problem in 3.2.2.4

3.3.2 Duality

Define the dual function:

h ( A ) = min L ( x , A). X

Note that the minimizer >.*(A) does not necessarily satisfy g(x) < 0 , and indeed the minimum rlay not even exist for all A.

Defining the set D = {AIh(X) 3 and A > 0 ) (3.23)

allows for the formulation ,f the dual problem (DP):

maximize h(A) (3.24) XED

which is equivalent to

3.3.3 Duality theorem

The point (x*, A*), with A* 1 0 , is a saddle point of the Lagrangian function of the primal pro'dem (PP), defined by (3.18), if and only ib

Page 93: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 75

Figure 3.8: Schematic representation of saddle point solution to P P

(i) x* is a solution to the primal problem (PP),

(ii) A* is a solution to the dual problem (DP), and

(iii) f (x*) = h(A*).

A schematic representation of this theorem is given in Figure 3.8 and a formal proof is given in listed Theorem 6.4.4 in Chapter 6.

3.3.3.i Practical significance of the duality theorem

The implication of the Duality Theorem is that the P P may be solved by carrying out the following steps:

1. If possible, solve the DP separately to give A* 2 0, i.e. solve an essentially unconstrained problem.

2. With A* known solve the unconstrained problem: min L(x, A*) to X

give x* = x*(X*).

3. Test whether (x*, A*) satisfy the KKT conditions.

3.3.3.2 Example of the application of duality

Consider the problem:

Page 94: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

76 CHAPTER 3.

minimize f (x) = xf + 22; s ~ c h that XI + 22 1 1.

Here the Lagrangian is

For a given X the necessary conditions for min L(x, A) a t x*(X) are X

Note that x*(X) is a minimlm since the Hessian of the Lagrangian with

respect to x , HL = [ ] is positive-definite.

Substituting the minimizin; values (i.t.0. A) into L gives

i.e. the dual function is h(,\) = -:A2 + A.

Now solve the DP: max h(,i). X

The necessary condition fo- a for maximum is $ = 0, i.e.

d2 h -:A + 1 = 0 + A* = i , > 0 (a maximum since = -; < 0).

Substituting A* in x*(A) al)ove gives

z; = i; xi = + f (x*) = $ = h(X*).

and since (x*, A*), with A* = 4 > 0, clearly satisfies the K K T conditions, it indeed represents the optimum solution.

The dual approach is an irr portant method in some structural optimiza- tion problems (Fleury, 1979). It is also employed in the development of the augmented Lagrange n.ultiplier method to be discussed later.

Page 95: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION

feasible region t 2 1

Figure 3.9: The solution of the QP problem: may be an interior point or lie on the boundary of the feasible region

3.4 Quadratic programming

The problem of minimizing a positive-definite quadratic function subject to linear constraints, dealt with in Section 3.2.1.6, is a special case of a quadratic programming (QP) problem. Consider now a more general case of a QP problem with inequality constraints:

minimize f ( x ) = $ x T ~ x + b T x + c

subject to

where C is a m x n matrix and d is a m-vector.

The solution point may be an interior point or may lie on the boundary of the feasible region as shown for the two-dimensional case in 3.9.

If the solution point is an interior point then no constraints are active, and x* = x0 = - ~ - ' b as shown in the figure.

The QP problem is often an important subproblem to be solved when applying modern methods to more general problems (see the discussion of the SQP method later).

Page 96: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

78 CHAPTER 3.

3.4.1 Active set of ccmstraints

It is clear (see Section 3.2.1.6) that if the set of constraints active a t x* is known, then the problem is greatly simplified. Suppose the active set at x* is known, i.e. cjx* = dj for some j E {1,2,. . . , m), where c j here denotes the 1 x n matrix corresponding to the jth row of C . Represent this active set in matrix fcrm by C'x = dl. The solution x* is then obtained by minimizing f (x) over the set {xlCtx = dl). Applying the appropriate Lagrange the01 y (Section 3.2.1.6), the solution is obtained by solving the linear systenl:

In solving the QP the major task therefore lies in the identzfication of the active set of constraints.

3.4.2 The method of Theil and Van de Panne

The method of Theil and van de Panne (1961) is a straight-forward method for identifying the t~ctive set. A description of the method, after Wismer and Chattergy (1978), for the problem graphically depicted in Figure 3.10 is now given.

Let V[x] denote the set of constraints violated at x. Select as initial candidate solution the unccmstrained minimum x0 = A-'b. Clearly for the example sketched in Figure 3.10, v[xO] = {1,2,3). Therefore x0 is not the solution. Now consider, as active constraint (set SI ) , each constraint in v[xO] separately, and let x[S1] denote the corresponding minimizer:

S1 = (1) =+ x[S1] = a + V[a] = {2,3) # 4 (not empty)

Sl={2)=+- x [ S 1 ] = b + V [ b ] = { 3 ) # 4

s, = (3) =+ x[S1] = c =+ V[c] = {1,2) # 4.

Since all the solutions with a single active constraint violate one or more constraints, the next step i:, to consider different combinations S2 of two

Page 97: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION

2 2

Figure 3.10: Graphical illustration of the method of Theil and Van de Panne

simultaneously active constraints from V[xO]:

Since both V[u] and V[v] are empty, u and v are both candidate so- lutions. Apply the KKT conditions to both u and v separately, to determine which point is the optimum one. Assume it can be shown, from the solution of (3.26) that for u: XI < 0 (which indeed is appar- ently so from the geometry of the problem sketched in Figure 3.10), then it follows that u is non-optimal. On the other hand, assume it can be shown that for v: X2 > 0 and X3 > 0 (which is evidently the case in the figure), then v is optimum, i.e. x* = v.

3.4.2.1 Explicit example

Solve by means of the method of Theil and van de Panne the following Q P problem:

1 2 minimize f (x) = %xl - xlx2 + X; - 2xl + x2

Page 98: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

80 CHAPTER 5.

feasible region c

1 2 3

Figure 3.11: Graphical solution to explicit example 3.4.2.1

such that

In matrix form f(x) is given by f(x) = ;xTAx + bTx, with A =

[ -: -: ] and b = [-2, 1lT.

The unconstrained solutio.1 is x0 = A-'b = [3, 1IT. Clearly v[xO] = {1,2) and therefore x0 is not the solution. Continuing, the method yields:

Sl = (1) + x[,''~] = a + V[a] = (2) # 4 Sl = (2) + x[Sl] = b + V[b] = (1) # 4

7 2 T S2 = {1,2) + X [ L ( ' ~ ] = c + V[c] = 4, where c = [g, 3] .

Applying the KKT conditions to c establishes the optimality of c since 1 A1 = A2 = g > 0.

Page 99: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 8 1

3.5 Modern methods for constrained optimiza- t ion

The most established gradient-based methods for constrained optimiza- tion are

(i) gradient projection methods (Rosen, 1960, 1961),

(ii) augmented Lagrangian multiplier methods (see Haftka and Giindel, 1992), and

(iii) successive or sequential quadratic programming (SQP) methods (see Bazaraa et al., 1993).

All these methods are largely based on the theory already presented here. SQP methods are currently considered to represent the state-of- the-art gradient-based approach to the solution of constrained optimiza- tion problems.

3.5.1 The gradient projection method

3.5.1.1 Equality constrained problems

The gradient projection method is due to Rosen (1960, 1961). Consider the linear equality constrained problem:

minimize f (x)

such that Ax - b = 0 (3.27)

where A is a r x n matrix, r < n and b a r-vector.

The gradient projection method for solving (3.27) is based on the fol- lowing argument.

Assume that x' is feasible, i.e. Ax'-b = 0 in Figure 3.12. A direction s ,

reduces to A s = 0.

Page 100: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

82 CHAPTER 3.

Figure 3.12: Schematic rep~esentation of the subspace {xJAx - b = 0)

Also since llsll = 1, it also fidlows that

1 - sTs = 0 .

It is now required that s be chosen such that it corresponds to the direction which gives the steepest descent at x' subject to satisfying the constraints specified by (3.23) and (3.29). This requirement is equivalent to determining a s such that the directional derivative at x':

(3.30)

is minimized with respect t~ s , where F ( a ) = f (x' + a s ) .

Applying the classical Lag1 ange theory for minimizing a function sub- ject to equality constraintt., requires the formulation of the following Lagrangian function:

where the variables s = Isl, s2, . . . , s,IT correspond to the direction cosines of s. The Lagrangian necessary conditions for the constrained minimum are

V,L = v ~ ( x ' ) + A*X - 2 ~ ~ s = 0 (3.32)

V A L = As = 0 (3.33) T V x o L = ( 1 - s s ) = 0 . (3.34)

Equation (3.32) yields

Page 101: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 83

Figure 3.13: Schematic representation of projected direction of steepest descent s

Substituting (3.35) into (3.34) gives

and thus 1

xo = k511Vf(x1) + ATAll. (3.36)

Substituting (3.36) into (3.35) gives s = f ( V f (xl) + A~A)/IIv f (XI) + ATAI1.

For maximum descent choose the negative sign as shown in Figure 3.13. This also ensures that the Hessian of the Lagrangian with respect to s is positive-definite (sufficiency condition) which ensures that R(s) in (3.30) indeed assumes a minimum value with respect to s. Thus the constrained (projected) direction of steepest descent is chosen as

It remains to solve for A. Equations (3.33) and (3.37) imply A ( V f (xl) + ATA) = 0. Thus if s # 0 then AATA = - A V f (XI) with solution

The direction s , called the gradient projection direction, is therefore finally given by

Page 102: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

84

A projection matrix is defir ed by

CHAPTER 3.

The un-normalized gradient projection search vector u, that is used in practice, is then simply

3.5.1.2 Extension t o non-linear constraints

Consider the more general problem:

minimize f (x)

such that hi(x) = 0, i = 1,2 , . . . , r, (3.42)

or in vector form h(x) = 0 , where the constraints may be non-linear.

Linearize the constraint fu ~ctions hi(x) at the feasible point xt , by the truncated Taylor expansions:

h, (x) = hi (XI + (X - XI)) r hi (XI) + vT hi (XI) (X - xt)

which allows for the follow lng approximations to the constraints:

vThi(xt)(x - xt) = 0, i = 1 , 2 , . . . , r (3.43)

in the neighbourhood of x , since the hi(xt) = 0. This set of linearized constraints may be writter in matrix form as

where dh(xf)

b = x'.

The linearized problem at :d therefore becomes

minimize f (x) such that A x - b = 0

a h(xt ) ah(xt) where A = ancl b = [F] x'.

Page 103: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 85

Figure 3.14: Schematic representation of correction steps when applying the gradient projection method to nonlinear constrained problems

Since the problem is now linearized, the computation of the gradient projection direction at x' is identical to that before, i.e.:

u = P(xl)V f (x'), but the projection matrix P(xl) = (I-AT(AAT)-'A)

is now dependent on x', since A is given by

For an initial feasible point xO(= x') a new point in the gradient pro- jection direction of descent u1 = - p ( x O ) v f (xO), is Z' = x0 + a l u l for step size a1 > 0 as shown in Figure 3.14.

In general h(E1) # 0, and a correction step must be calculated: E1 -+ xl. How is this correction step computed? Clearly (see Figure 3.14)) the step should be such that its projection at x1 is zero, i.e. p(xl)(xl - xl) = 0 and also h(xl) = 0. These two conditions imply that

with A evaluated at x l , which gives

as the correction step, where use was made of the expression h(x) A x - b for both h(zl) and h(xl) and of the fact that h(xl) = 0.

Since the correction step is based on an approximation it may have to be applied repeatedly until h(xl) is sufficiently small. Having found a satisfactory xl , the procedure is repeated successively for k = 2 ,3 , . . . to give x2, x3,. . . , until p ( x k ) v f (xk) E 0.

Page 104: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

86 CHAPTER 3.

3.5.1.3 Example problem

Minimize f (x) = x; + xi + xi suchthat h ( x ) = z l + x 2 + x g = l

with initial feasible point x0 = [I , 0, 0IT.

First evaluate the projecticn matrix P = I - A ~ ( A A ~ ) - ~ A .

Here A = [l 1 11, AAT = :I and (AAT)-' = $, thus giving

1 0 0 1 1 1 2 -1 -1

0 0 1 1 1 1 -1 -1

The gradient vector is V f (x) = [ ii ] giving at xO, .fix) = [ 8 ] . The search a t direction at x0 is therefore given by

or more conveniently, for t lis example, choose the search direction sim-

p l y m u = [ - ; I . For a suitable value of X the next point is given by

Substituting the above in f (xl) = f (xO + Xu) = F(A) = (1 - 2 ~ ) ~ + X~ + X2, it follows that for optimal descent in the direction u:

Page 105: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 87

which gives x' = [ i , $ , $ l T , v h ( x l ) = [' 39 3 39 51

and

Since the projection of the gradient vector at x' is zero it is the optimum point.

3.5.1.4 Extension t o linear inequality constraints

Consider the case of linear inequality constraints:

where A is a m x n matrix and b a m-vector, i.e. the individual con- straints are of the form

where a3 denotes the 1 x n matrix corresponding to the jth row of A . Suppose at the current point x k , r ( l m) constraints are enforced, i.e. are active. Then a set of equality constraints, corresponding to the active constraints in (3.47) apply at xk , i.e. A , X ~ - bT = 0, where A, and br correspond to the set of r active constraints in (3.47).

Now apply the gradient projection method as depicted in Figure 3.15 where the recursion is as follows:

uk+' = - p ( x k ) v f ( x k ) and xk+' = xk -+ ak+luk+l

where f ( x k + a k + l u k + l ) = min f ( x k + auk+').

Q (3.48)

Two possibilities may arise:

(i) No additional constraint is encountered along uk+l before xk+' = xk + c r k + l ~ k + ' . Test whether PV f (xk+') = 0. If so then x* = xk+l , otherwise set u ~ + ~ = -P(xk+' )V f (xk+') and continue.

Page 106: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 3.

Figure 3.15: Representatior, of the gradient projection method for linear inequality constraints

(ii) If an additional conrtraint is encountered before xk++' = xk + cxk+ l~k+ l at xk+l, (see Figure 3.15), then set xk+' = xk+l and add new constraint to active set, with associated matrix A,+1. Compute new P and set uk+' = -p(xk++')v f (xk+l). Continue this process until for some active set at xP, P V f (xp) = 0.

How do we know that if I'V f (xk) = 0 occurs, that all the identified constraints are active? Tht! answer is given by the following argument.

If P V f (xk) = 0 it implies that (I - AT(AAT)-'A)V f (xk) = 0 which, using expression (3.38), is e,pivalent to V & ( X ~ ) + A ~ X = 0, i.e. v f (xk)+

Xivgi(x" = 0 for all i z I, = set of all active constraints. This ex- pression is nothing else but the KKT conditions for the optimum at xk, provided that Xi 2 0 for all i E I,. Now, if PO f (xk) = 0 occurs, then if Xi < 0 for some i, remove the corresponding constraint from the active set. In practice remove the constraint with the most negative multiplier, compute the new projecticn matrix P, and continue.

Page 107: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 89

3.5.2 Multiplier methods

These methods combine the classical Lagrangian method with the penalty function approach. In the Lagrangian approach the minimum point of the constrained problem coincides with a stationary point (x*, A*) of the Lagrangian function which, in general, is dificult to determine ana- lytically. On the other hand in the penalty function approach the con- strained minimum approximately coincides with the minimum of the penalty function. If, however, high accuracy is required the problem becomes ill-conditioned.

In the multiplier methods (see Bertsekas, 1976) both the above ap- proaches are combined to give an unconstrained problem which is not ill-conditioned.

As an introduction to the multiplier method consider the equality con- strained problem:

minimize f (x)

such that hj(x) = 0, j = 1 ,2 , . . . , r. (3.49)

The augmented Lagmnge function C is introduced as

If all the multipliers X j are chosen equal to zero, C becomes the usual external penalty function. On the other hand, if all the stationary values Xj* are available, then i t can be shown (Fletcher, 1987) that for any positive value of p, the minimization of C(x, A*, p) with respect to x gives the solution x * to problem (3.49). This result is not surprising since it can be shown that the classical Lagrangian function L ( x , A) has a saddle point at (x*, A*).

The multiplier methods are based on the use of approximations to the Lagrange multipliers. If is a good approximation to A* then it is pos- sible to approach the optimum through the unconstrained minimization of C(x, Ak, p) without using large values of p. The value of p must only be sufficiently large to ensure that C has a local minimum point with respect to x rather than simply a stationary point a t the optimum.

Page 108: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

90 CHAPTER 3.

How is the approximation to the Lagrange multiplier vector X~ obtained? To answer this question, co npare the stationary conditions with respect to x for L (the augmented Lagrangian) with those for L (the classical Lagrangian) a t x*.

For C:

a t af T

-- a h . - - + ~ ( ~ S - - 2 ~ h . ) ~ = O , i = 1 , 2 ,..., n. (3.51)

axi axi j=l axj

The comparison clearly indicates that as the minimum point of L tends to x* that

A: + 2phj -+ Xj* (3.53)

This observation prompted Hestenes (1969) to suggest the following scheme for approximating A*. For a given approximation Ak, Ic = 1,2, . . . , minimize L(x, Ak, p) by some standard unconstrained mini- mization technique to givr: a minimum x * ~ . The components of the new estimate to A*, as sug~ested by (3.53), is then given by

The value of the penalty parameter p(= pk) may also be adjusted from iteration to iteration.

3.5.2.1 Example

Minimize f (x) = xf + lox; such that h(x) = XI + 2 2 - 4 = 0.

Here L = x: + lox: + X(XI + x2 - 4) + p(xl + 2 2 - 4)'.

The first order necessary conditions for a constrained minimum with respect to x for any given X are

Page 109: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 91

from which it follows that

-5X + 40p x1 = lox2 =

10+ llp '

Taking XI = 0 and pl = 1 gives x*l = [1.905,0.1905]~ and h ( x * l ) =

-1.905.

Using the approximation scheme (3.54) for A* gives A2 = 0+2(1)(-1.905) = -3.81.

Now repeat the minimization of L: with X2 = -3.81 and p2 = 10. This gives x * ~ = [3.492,0.3492IT and h ( ~ * ~ ) = -0.1587, resulting in the new approximation: A3 = -3.81 + 2(10)(-0.1587) = -6.984.

Using A3 in the next iteration with ps = 10 gives x * ~ = [3.624,0.3624IT, h ( ~ * ~ ) = 0.0136, which shows good convergence to the exact solution [3.6363,0.3636IT without the need for increasing p.

3.5.2.2 T h e multiplier me thod extended t o inequality con- s t ra in ts

The multiplier method can be extended to apply to inequality con- straints (Fletcher, 1975).

Consider the problem:

minimize f (x)

such that gj(x) 1 0, j = 1,2, . . . , m. (3.55)

Here the augmented Lagrangian function is

where (a) = max(a, 0).

In this case the stationary conditions at x* for the augmented and clas- sical Lagrangians are respectively

Page 110: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

92 CHAPTER 3.

and a L af m -- - - + c x * @ = o , i = 1 , 2 ,..., n .

3 axi (3.58) a ~ i axi j=l

The latter classical KKT conditions require in addition that A:gj(x*) = 0 , j = 1,2 ,..., m.

A comparison of conditions (3.57) and (3.58) leads to the following iter- ative approximation schem 3 for A*:

A* r ~ k + l = ( ~ k + 2 3 I ~ k g j ( x * ~ ) ) (3.59)

where x * ~ minimizes C(x, Ak, pk).

3.5.2.3 Example problem wi th inequality constraint

Minimize f (x) = xf + lox; such that g(x) = 4 - XI - 2 2 < 0

Here L(x, X,p) = xf + 10xii + p >' Now perform the vnconstrc:ined minimization of C with respect to x for given X and p. This is usually done by some standard method such as Fletcher-Reeves or BFC S but here, because of the simplicity of the illustrative example, the necessary conditions for a minimum are used, namely:

Clearly x1 = loxZ (3.62)

provided that ( ) is nonzero. Utilizing (3.60), (3.61) and (3.62), succes- sive iterations can be perfcrmed as shown below.

Iteration 1: Use X1 = 0 ant1 pl = 1, then (3.60) and (3.62) imply

xi1 = ..go47 and xl l = 0.19047

Page 111: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 93

The new X2 is now given via (3.59) as X2 = (X1 + 2p1(4 - xi1 - xzl)) = 3.80952.

Iteration 2: Now with X2 = 3.80952 choose p2 = 10 which gives

xi2 = 3.2234, x12 = 0.3223 and X3 = ( X ~ + ~ ~ ~ ( ~ - X ; ~ - Z ~ ~ ) ) = 12.8937.

Iteration 3: With the above X3 and staying with the value of 10 for p, the iterations proceed as follows:

Iteration xik xzk ~ k + l

Thus the iterative procedure converges satisfactorily to the true solution: x* = [3.6363,0.3636IT.

3.5.3 Sequential quadratic programming (SQP)

The SQP method is based on the application of Newton's method to determine x* and A* from the KKT conditions of the constrained op- timization problem. It can be shown (see Bazaraa et al., 1993) that the determination of the Newton step is equivalent to the solution of a quadratic programming (QP) problem.

Consider the general problem:

minimize f (x)

such that gj (x) 5 0; j = 1,2, . . . , rn (3.63) hj(x) = 0; j = 1,2,. . . , r .

It can be shown (Bazaraa et al., 1993) that given estimates (xk, x ~ , pk), k = 0,1, . . . , to the solution and the respective Lagrange multipliers values, with X~ 2 0, then the Newton step s of iteration k+ 1, such that xk++' = xk + s is given by the solution to the following k-th QP problem:

Page 112: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

QP-k (xk, Ak, / . A ~ ) : Minimi::e with respect to s:

such that

and

CHAPTER 3.

and where g = [gl,g2,. . . , gm]T, h = [hl, h2,. . . , h,IT and the Hessian of the classical Lagrangian with respect to x is

Note that the solution of QP-k does not only yield s, but also the La- grange multipliers Ak"' an cl / .A~+' via the solution of equation (3.26) in Section 3.4. Thus with x k = xk + s we may construct the next QP problem: QP-k + 1.

The solution of successive QP problems is continued until s = 0. It can be shown that if this occurs, then the KKT conditions of the original problem (3.63) are satisfiec .

In practice, since convergence from a point far from the solution is not guaranteed, the full step s is usually not taken. To improve convergence, s is rather used as a search direction in performing a line search mini- mization of a so-called mel,it or descent function. A popular choice for the merit function is (Bazoraa et al., 1993):

Note that it is not advisable here to use curve fitting techniques for the line search since the function is non-differentiable. More stable methods, such as the golden section method, are therefore preferred.

Page 113: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CONSTRAINED OPTIMIZATION 95

The great advantage of the SQP approach, above the classical Newton method, is that it allows for a systematic and natural way of selecting the active set of constraints and in addition, through the use of the merit function, the convergence process may be controlled.

3.5.3.1 Example

Solve the problem below by means of the basic SQP method, using x0 = [0, 1IT and A' = 0.

2 Minimize f (x) = 2xf + 2x2 - 221x2 - 4z1 - 6x2

4x1 - 2x2 - 4 Since V f (x) = I it follows that V f (xO) =

4x2 - 2x1 - 6

Thus with A0 = 0, HL, = [ -; -: ] and it follows from (3.64) that

the starting quadratic programming problem is:

QP-0: Minimize with respect to s:

F(s) = -4 - 6sl - 2.92 + i (4sf + 4 4 - 4 ~ 1 ~ 2 )

such that

-1 - ~2 50, SI + 5sz _< 0

-s1 < 0, and -1 - sz _< 0

where s = [sl, s2IT.

The solution to this QP can be obtained via the method described in Sec- tion 3.4, which firstly shows that only the second constraint is active, and then obtains the solution by solving the corresponding equation (3.26) giving s = [1.1290, -0.22581~ and A' = [O, 1.0322,0,0]~ and therefore x1 = x0 + s = [1.1290,0.7742]~ which completes the first iteration.

Page 114: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

96 CHAPTER 3.

The next quadratic program QP-1 can now be constructed. It is left to the reader to perform the further iterations. The method, because it is basically a Newton meti od, converges rapidly to the optimum x* = [0.6589,0.8682IT.

Page 115: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 4

NEW GRADIENT-BASED TRAJECTORY AND APPROXIMATION METHODS

4.1 Introduction

4.1.1 Why new algorithms?

In spite of the mathematical sophistication of classical gradient-based algorithms, certain inhibiting difficulties remain when these algorithms are applied to real-world problems. This is particularly true in the field of engineering, where unique difficulties occur that have prevented the general application of gradient-based mathematical optimization tech- niques to design problems.

Optimization difficulties that arise are:

(i) the functions are often very expensive to evaluate, requiring, for example, the time-consuming finite element analysis of a struc- ture, the simulation of the dynamics of a multi-body system, or a

Page 116: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

98 CHAPTER 4.

computational fluid dynamics (CFD) simulation,

(ii) the existence of noise. numerical or experimental, in the functions,

(iii) the presence of discor!tinuities in the functions,

(iv) multiple local minima, requiring global optimization techniques,

(v) the existence of regims in the design space where the functions are not defined, and

(vi) the occurrence of an extremely large number of design variables, disqualifying, for example, the SQP method.

4.1.2 Research at the Unive r s i ty of Pretoria

All the above difficulties have been addressed in research done at the University of Pretoria over the past twenty years. This research has led to, amongst others, the development of the new optimization algorithms and methods listed in the r.ubsections below.

4.1.2.1 Unconstrained optimization

(i) The leap-frog dynam c trajectory method: LFOP (Snyman, 1982, 1983),

(ii) a conjugate-gradient method with Euler-trapezium steps in which a novel gradient-only line search method is used: ETOP (Snyman, 1985), and

(iii) a steepest-descent method applied to successive spherical quadratic approximations: SQE'D (Snyman and Hay, 2001).

4.1.2.2 Direct constrained optimization

(i) The leapfrog metho11 for constrained optimization, LFOPC (Sny- man, 2000), and

Page 117: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 99

(ii) the conjugate-gradient method with Euler-trapezium steps and gradient-only line searches, applied to penalty function formula- tions of constrained problems: ETOPC (Snyman, 2003).

4.1.2.3 Approximation methods

(i) A feasible descent cone method applied to successive spherical quadratic sub-problems: FDC-SAM (Stander and Snyman, 1993; Snyman and Stander, 1994, 1996; De Klerk and Snyman, 1994), and

(ii) the leap-frog method (LFOPC) applied to successive spherical quadratic sub-problems: Dynamic-Q (Snyman et al., 1994; Snyman and Hay, 2002).

4.1.2.4 Methods for global unconstrained optimizat ion

(i) A multi-start global minimization algorithm with dynamic search trajectories: SF-GLOB (Snyman and Fatti, 1987), and

(ii) a modified bouncing ball trajectory method for global optimization: MBB (Groenwold and Snyman, 2002).

All of the above methods developed at the University of Pretoria are gradient-based, and have the common and unique property, for gradient- based methods, that no explicit objective function line searches are re- quired.

In this chapter the LFOP/C unconstrained and constrained algorithms are discussed in detail. This is followed by the presentation of the SQSD method, which serves as an introduction to the Dynamic-Q approxima- tion method. Next the ETOP/C algorithms are introduced, with special reference to their ability to deal with the presence of severe noise in the objective function, through the use of a gradient-only line search tech- nique. Finally the SF-GLOB and MBB stochastic global optimization algorithms, which use dynamic search trajectories, are presented and discussed.

Page 118: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

100 CHAPTER 4.

4.2 The dynamic trajectory optimization method

The dynamic trajectory method for unconstrained minimization (Sny- man, 1982, 1983) is also k:lown as the "Ieap-frog" method. It has re- cently (Snyman, 2000) been modified to handle constraints via a penalty function formulation of the constrained problem. The outstanding char- acteristics of the basic metilod are:

(i) it uses only function yadient information V f ,

(ii) no explicit line searclles are performed,

(iii) it is extremely robusr, handling steep valleys, and discontinuities and noise in the objtctive function and its gradient vector, with relative ease,

(iv) the algorithm seeks rt:lative low local minima and can therefore be used as the basic component in a methodology for global optimiza- tion, and

(v) when applied to smooth and near quadratic functions it is not as efficient as classical niethods.

4.2.1 Basic dynamic model

Assume a particle of unit mass in a n-dimensional conservative force field with potential energy at x glven by f (x), then at x the force (acceleration a) on the particle is given 3y:

from which it follows that for the motion of the particle over the time interval [0, t]:

Page 119: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 101

where T ( t ) represents the kinetic energy of the particle at time t . Thus it follows that

f ( t ) + T ( t ) = constant (4 .4)

i.e, conservation of energy along the trajectory. Note that along the particle trajectory the change in the function f , A f = -AT, and there- fore, as long as T increases, f decreases. This is the underlying principle on which the dynamic leap-frog optimization algorithm is based.

4.2.2 Basic algorithm for unconstrained problems (LFOP)

The basic elements of the LFOP method are as listed in Algorithm 4.1. A detailed flow chart of the basic LFOP algorithm for unconstrained problems is given in Figure 4.1.

4.2.3 Modification for constrained problems (LFOPC)

The code LFOPC (Snyman, 2000) applies the unconstrained optimiza- tion algorithm LFOP to a penalty function formulation of the con- strained problem (see Section 3.1) in 3 phases. For engineering prob- lems (with convergence tolerance E~ = the choice po = 10 and pl = 100 is recommended. For extreme accuracy ( E , = use po = 100 and pl = lo4.

4.2.3.1 Example

Minimize f (x) = xf+2x% such that g(x) = -21 -z2+1 5 0 with starting point x 0 = 13, lJT b y means of the LFOPC algorithm. Use po = 1.0 and pl = 10.0. The computed solution is depicted in Figure 4.2.

Page 120: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

102 CHAPTER 4.

Algorithm 4.1 LFOP algc rithm

1. Given f (x) and start ng point x(0) = xO, compute the dynamic trajectory of the particle by solving the initial value problem:

i (t) = -V f ( x ( t ) ) with i ( 0 ) = 0; and x(0) = xO. (4.5)

2. Monitor v(t) = x(t), clearly as long as T = $ ( ( ~ ( t ) ( ( ~ increases, f (x(t)) decreases as desired.

3. When Jlv(t)lJ decreasc:~, i.e. when the particle moves uphill, ap- ply some interfering ~trategy to gradually extract energy from the particle so as to increase the likelihood of its descent, but not so that descent occurs irlmediately.

4. In practice the numwical integration of the initial value prob- lem (4.5) is done by the "leap-frog" method: compute for k = 0,1,2,. . . , and given ,ime step At:

where ak = - v f ( x k ) and v0 = +a0a t ,

to ensure an initial step if a0 # 0.

5. Typically the interfe~ing strategy is as follows:

if Ilvk++'ll 2 1(vkll coninue the trajectory,

else set vk = :(vk+' + vk), xk+l = +(xk+' + xk), compute new vk+l and continue.

6. Further heuristics (Snyman, 1982, 1983) are introduced to deter- mine a suitable initial time step At, to allow for the magnification and reduction of At. and to control the magnitude of the step A x = xk+' - xk by satting a step size limit 6 along the computed trajectory. (The rec~~rnmended magnitude of 6 is 6 = hfi x (maximum variable r mge).)

Page 121: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS

i d = 0 id , = 5 p = l k = -1

6' = 0.001

Figure 4.1: Flowchart of the LFOP unconstrained minimization algo- rithm

Page 122: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 4.

Figure 4.2: The (a) comp ete LFOPC trajectory for example problem 4.2.3.1, with x0 = (3, 1IT, and magnified views of the final part of the trajectory shown in (b) and (c), giving x* -- [0.659,0.341IT

Page 123: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 105

Algorithm 4.2 LFOPC algorithm

Phase 0: Given some xO, then with overall penalty parameter p = po, apply LFOP to the penalty function P(x , po) to give x*(po) . Phase 1: With xO := x*(po) and p := pl, where pl >> PO, apply LFOP to P(x ,p l ) to give x*(pl) and identify the set of active constraints I,, such that g i , (~*(pl ) ) > O for 2, E I,. Phase 2: With xO := x*(pl) use LFOP to minimize

to give x*.

4.3 The spherical quadratic steepest descent method

4.3.1 Introduction

In this section an extremely simple gradient only algorithm (Snyman and Hay, 2001) is proposed that, in terms of storage requirement (only 3 n-vectors need be stored) and computational efficiency, may be consid- ered as an alternative to the conjugate gradient methods. The method effectively applies the steepest descent (SD) method to successive simple spherical quadratic approximations of the objective function in such a way that no explicit line searches are performed in solving the minimiza- tion problem. It is shown that the method is convergent when applied to general positive-definite quadratic functions. The method is tested by its application to some standard and other test problems. On the ev- idence presented the new method, called the SQSD algorithm, appears to be reliable and stable, and very competitive compared to the well established conjugate gradient methods. In particular, it does very well when applied to extremely ill-conditioned problems.

Page 124: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

106 CHAPTER 4.

4.3.2 Classical steepest descent method revisited

Consider the following unconstrained optimization problem:

rnin f (x), x E Rn (4.6)

where f is a scalar objectivc: function defined on Rn, the n-dimensional real Euclidean space, and x is a vector of n real components X I , x2, . . . , x,. It is assumed that f is differentiable so that the gradient vector V f (x) exists everywhere in Rn. The solution is denoted by x*.

The steepest descent (SD) algorithm for solving problem (4.6) may then be stated as follows:

Algori thm 4.3 SD algoritlim

Initialization: Specify convt:rgence tolerances cg and E,, select starting point xO. Set k := 1 and go to main procedure. Main procedure:

1. If ((v f (xk-I)(( < E ~ , i,hen set x* xC = xk-I and stop; otherwise set uk := -V f (xk-l)

2. Let Xk be such that f xk--' + Xkuk) = minX f (xk-I + Xuk) subject to X > 0 {line search step).

3. Set xk := xk-I + Xkuk; if (Ixk - xk-I (1 < E,, then x* 2 xC = x k

and stop; otherwise set k := k + 1 and go to Step 1.

It can be shown that if the steepest descent method is applied to a general positive-definite quadratic function of the form f (x) = ~ X ~ A X +

bTx + c, then the sequence { f (xk)) -, f (x*). Depending, however, on the starting point x0 and the condition number of A associated with the quadratic form, the rate of convergence may become extremely slow.

It is proposed here that fo- general functions f (x), better overall per- formance of the steepest dtscent method may be obtained by applying it successively to a sequence of very simple quadratic approximations of f (x). The proposed modif~cation, named here the spherical quadratic steepest descent (SQSD) method, remains a first order method since only gradient information is used with no attempt being made to construct

Page 125: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 107

the Hessian of the function. The storage requirements therefore remain minimal, making it ideally suitable for problems with a large number of variables. Another significant characteristic is that the method requires no explicit line searches.

4.3.3 The SQSD algorithm

In the SQSD approach, given an initial approximate solution xO, a sequence of spherically quadratic optimization subproblems P[k], k = 0,1,2,. . . is solved, generating a sequence of approximate solutions xk+l. More specifically, at each point xk the constructed approximate subprob- lem is P[k]:

m;n f k b ) (4.7)

where the approximate objective function fk(x) is given by

and Ck = diag(ck, ck, . . . , ck) = ckI. The solution to this problem will be denoted by x * ~ , and for the construction of the next subproblem P[k + 11, xk+' := x * ~ .

For the first subproblem the curvature co is set to co := I I v f (xO) 11 /p, where p > 0 is some arbitrarily specified step limit. Thereafter, for k 2 1, ck is chosen such that j (xk) interpolates f (x) a t both x k and xk-l. The latter conditions imply that for k = 1,2 , . . .

Clearly the identical curvature entries along the diagonal of the Hessian, mean that the level surfaces of the quadratic approximation fk(x), are indeed concentric hyper-spheres. The approximate subproblems P[k] are therefore aptly referred to as spherical quadratic approximations.

It is now proposed that for a large class of problems the sequence xO, x l , . . . will tend to the solution of the original problem (4.6), i.e.

lim x = x*. k--00

Page 126: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

108 CHAPTER 4.

For subproblems P[k] that are convex, i.e. ck > 0, the solution occurs where v&(x) = 0 , that is where

The solution to the subprol)lem, x * ~ is therefore given by

Clearly the solution to the spherical quadratic subproblem lies along a line through xk in the direction of steepest descent. The SQSD method may formally be stated in the form given in Algorithm 4.4.

Step size control is introduced in Algorithm 4.4 through the specification of a step limit p and the test for llxL - xk-'ll > p in Step 2 of the main procedure. Note that the cnoice of co ensures that for P[O] the solution x1 lies at a distance p frcam x0 in the direction of steepest descent. Also the test in Step 3 that ck < 0, and setting ck := where this condition is true ensurc:s that the approximate objective function is always positive-definite.

4.3.4 Convergence of' the SQSD method

An analysis of the convergence rate of the SQSD method, when applied to a general positive-definite quadratic function, affords insight into the convergence behavior of tht: method when applied to more general func- tions. This is so because for a large class of continuously differentiable functions, the behavior close to local minima is quadratic. For quadratic functions the following theorem may be proved.

4.3.4.1 Theorem

The SQSD algorithm (without step size control) is convergent when applied to the general quadratic function of the form f (x) = i x T ~ x + bTx, where A is a n x n positive-definite matrix and b E Rn.

Proof. Begin by consider ng the bivariate quadratic function, f (x) = x: + yx;, y > 1 and wit!^ x0 = [a,/?JT. Assume co > 0 given, and

Page 127: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS

Algorithm 4.4 SQSD algorithm

Initialization: Specify convergence tolerances E~ and E,, step limit p > 0 and select starting point xO. Set co := 11 v f ( xO) 1 1 l p . Set k := 1 and go to main procedure. Main procedure:

1. If I I V f ( x ~ - ~ ) I I < E ~ , then x' xC = xk-l and stop; otherwise set

2. If llxk - xk-'11 > p, then set

if I\xk - xk-l1I < then x' Z xC = xk and stop.

3. Set

if ck < 0 set ck :=

4. Set k := k + 1 and go to Step 1 for next iteration.

Page 128: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

110 CHAPTER 4.

for convenience in what fo lows set co = 1/6,6 > 0. Also employ the notation f k = f(xk).

Application of the first stell of the SQSD algorithm yields

v fo x l = x O - - - = 141 - 2 4 , P(1 - 2-f&)lT (4.13)

( '0

and it follows that

and V f i = [2cr(l - 26),2yP(l - 2?6)lT. (4.15)

For the next iteration the (urvature is given by

Utilizing the information cmtained in (4.13)-(4.15), the various entries in expression (4.16) are known, and after substitution cl simplifies to

In the next iteration, Step 1 gives

And after the necessary sut stitutions for x l , V f l and cl, given by (4.13), (4.15) and (4.17) respectiwly, (4.18) reduces to

where

and

Page 129: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 111

Clearly if y = 1, then p1 = 0 and wl = 0. Thus by (4.19) x2 = 0 and convergence to the solution is achieved within the second iteration.

Now for y > 1, and for any choice of a and p, it follows from (4.20) that

which implies from (4.19) that for the first component of x2:

or introducing a notation (with a0 = a), that

{Note: because co = 1/6 > 0 is chosen arbitrarily, it cannot be said that (all < laol. However a1 is finite.)

The above argument, culminating in result (4.24), is for the two iter- ations xo 4 x1 -+ x2. Repeating the argument for the sequence of overlapping pairs of iterations x1 -+ x2 -+ x3; x2 -t x3 -+ x4;. . ., it fol- lows similarly that lag1 = (p2a2( < lazl; lael = 1p3a31 < (ff31;. . ., since 0 5 p2 I 1; 0 I p3 I 1;. . ., where the p's are given by (corresponding to equation (4.20) for pl):

Thus in general O _ < p j < l

and

Jffj+lI = I~jffjI < IffjI. For large positive integer m it follows that

and clearly for y > 0, because of (4.26)

lim /aml = 0. m-+oo

Page 130: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

112 CHAPTER 4.

Now for the second component of x2 in (4.19), the expression for wl , given by (4.21), may be simplified to

Also for the second compor~ent:

or introducing /3 notation

P2 = WP1.

The above argument is for x0 -+ x1 -+ x2 and again, repeating it for the sequence of overlapping pairs of iterations, it follows more generally for j = l , 2 , . . ., that

Pj+l = wjPj (4.33)

where w j is given by

Since by (4.29), lam] -+ 0, it follows that if lPml -+ 0 as m -, oo, the theorem is proved for the b variate case. Make the assumption that JPmI does not tend to zero, ther. there exists a finite positive number E such that

for all j .

l ~ j l =

This allows the fdlowing argument:

Clearly since by (4.29) la,,\ -+ 0 as m -+ oo, (4.36) implies that also lwml -+ 0. This result taken together with (4.33) means that IP,J -+ 0 which contradicts the assumption above. With this result the theorem is proved for the bivariate case.

Page 131: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 113

Although the algebra becomes more complicated, the above argument can clearly be extended to prove convergence for the multivariate case, where

n 2 f ( x ) = x ' Y i ~ ; , X = 1 < 7 2 < 7 3 < - . . < ' Y n . (4.37)

i=l

Finally since the general quadratic function ,

1 f (x) = - X ~ A X + bTx, A positive - definite

2 (4.38)

may be transformed to the form (4.37), convergence of the SQSD method is also ensured in the general case. 0

It is important to note that, although the above analysis indicates that llxk - x* 11 is monotonically decreasing with k, it does not necessarily fol- low that monotonic descent in the corresponding objective function val- ues f (xk), is also guaranteed. Indeed, numerical experimentation with quadratic functions shows that, although the SQSD algorithm mono- tonically approaches the minimum, relatively large increases in f (xk) may occur along the trajectory to x*, especially if the function is highly elliptical (poorly scaled).

4.3.5 Numerical results and conclusion

The SQSD method is now demonstrated by its application to some test problems. For comparison purposes the results are also given for the standard SD method and both the Fletcher-Reeves (FR) and Polak- Ribiere (PR) conjugate gradient methods. The latter two methods are implemented using the CG+ FORTRAN conjugate gradient program of Gilbert and Nocedal (1992). The CG+ implementation uses the line search routine of More and Thuente (1994). The function and gradient values are evaluated together in a single subroutine. The SD method is applied using CG+ with the search direction modified to the steepest descent direction. The FORTRAN programs were run on a 266 MHz Pentium 2 computer using double precision computations.

The standard (refs. Rao, 1996; Snyman, 1985; Himmelblau, 1972; Manevich, 1999) and other test problems used are listed in Section 4.3.6 and the

Page 132: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

114 CHAPTER 4.

results are given in Tables 1.1 and 4.2. The convergence tolerances ap- plied throughout are e, = 1 oW5 and e, = except for the extended homogenous quadratic function with n = 50000 (Problem 12) and the extremely ill-conditioned Wanevich functions (Problems 14). For these problems the extreme tolerances e, % 0(= and ex = 10-12, are prescribed in an effort to ensure very high accuracy in the approximation xC to x*. For each method the number of function-cum-gradient-vector evaluations (Nfg) are given. For the SQSD method the number of itera- tions is the same as Nfg. Fcar the other methods the number of iterations (N") required for convergence, and which corresponds to the number of line searches executed, are dso listed separately. In addition the relative error (ET) in optimum fuwtion value, defined by

where xC is the approximat ion to x* a t convergence, is also listed. For the Manevich problems, with n 2 40, for which the other (SD, FR and PR) algorithms fail to coriverge after the indicated number of steps, the infinite norm of the error in the solution vector ( I w ) , defined by IIx* - xClloo is also tabulated. These entries, given instead of the relative error in function value (ET I , are made in italics.

Inspection of the results shows that the SQSD algorithm is consistently competitive with the other three methods and performs notably well for large problems. Of all the rlethods the SQSD method appears to be the most reliable one in solving each of the posed problems. As expected, be- cause line searches are eliminated and consecutive search directions are no longer forced to be orthcgonal, the new method completely overshad- ows the standard SD method. What is much more gratifying, however, is the performance of the SQSD method relative to the well-established and well-researched conjugate gradient algorithms. Overall the new method appears to be very competitive with respect to computational efficiency and, on the evidence presented, remarkably stable.

In the implementation of the SQSD method to highly non-quadratic and non-convex functions, some care must however be taken in ensuring that the chosen step limit parameter p, is not too large. A too large value may result in excessive oscillaticm occurring before convergence. Therefore a relatively small value, p = 0.3, was used for the Rosenbrock problem with n = 2 (Problem 4). For the extended Rosenbrock functions of

Page 133: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 115

Prob. #

1 2 3 4

5 ( 4 5@)

6 7 8 9 10 11 12

13

14

Stee N 3 g 41 266 2316

> 20000 60 49

> 20000 156

12050* 6065 1309 16701 276 2717

> 20000 > loo00 > 10000 > 20000 > 20000 > 20000 > 20000 > 30000 > 20000

> 30000

> 30000

> 50000

> 100000

* Convergence to a local minimum with f (xc) = 48.9.

Table 4.1: Performance of the SQSD and SD optimization algorithms when applied to the test problems listed in Section 4.3.6

Page 134: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

116 CHAPTER 4.

Prob. #

1 2 3 4

5 ( 4 5(b)

6 7 8 9 10 11 12

13

14

* Convergence to a local minim lm with f (xC) = 48.9; $ Solution to machine accuracy.

Table 4.2: Performance of the FR and PR algorithms when applied to the test problems listed in Section 4.3.6

Page 135: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

N E W GRADIENT-BASED METHODS 117

larger dimensionality (Problems 13), correspondingly larger step limit values (p = */lo) were used with success.

For quadratic functions, as is evident from the convergence analysis of Section 4.3.4, no step limit is required for convergence. This is borne out in practice by the results for the extended homogenous quadratic functions (Problems 12), where the very large value p = lo4 was used throughout, with the even more extreme value of p = 101° for n = 50000. The specification of a step limit in the quadratic case also appears to have little effect on the convergence rate, as can be seen from the results for the ill-conditioned Manevich functions (Problems 14), that are given for both p = 1 and p = 10. Here convergence is obtained to a t least 11 significant figures accuracy (((x* - xC((, < lo-") for each of the variables, despite the occurrence of extreme condition numbers, such as 1060 for the Manevich problem with n = 200.

The successful application of the new method to the ill-conditioned Manevich problems, and the analysis of the convergence behavior for quadratic functions, indicate that the SQSD algorithm represents a pow- erful approach to solving quadratic problems with large numbers of vari- ables. In particular, the SQSD method can be seen as an unconditionally convergent, stable and economic alternative iterative method for solv- ing large systems of linear equations, ill-conditioned or not, through the minimization of the sum of the squares of the residuals of the equations.

4.3.6 Test functions used for SQSD

Minimize f (x):

2. f (x ) = x': - 2x;x2 + 2: + 21 - 2x1 + 1, x0 = [3,3IT, x* = (1, 1IT, f (x*) = 0.0.

3. f (x) = x ~ - 8 ~ ~ + 2 5 ~ ~ + 4 ~ ~ - 4 ~ ~ ~ ~ - 3 2 ~ ~ + 16, x0 = (3, 3IT, x* = [2, I]*, f (x*) = 0.0.

4. f (x) = 1 0 0 ( ~ 2 - x q ) ~ + ( 1 - x ~ ) ~ , x0 = [-1.2, 1IT, X* = [ l , 1IT1 f (x*) = 0.0 (Rosenbrock's parabolic valley, Rao, 1996).

Page 136: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

118 CHAPTER 4.

5. f (x) = x;' + xf - XI -- x$ - x; + 2 2 + xg - 23 + 21x223, (Zlobec's function, Snyman, 1935):

(a) x0 = [I, - 1, 1IT a d

(b) x0 = [o, 0, 0IT, X* = [0.57085597, -0.93955591, 0.76817555IT, f(x*) = -1.91157218907.

6. f (x) = (xl + 10x2)~ + 5(x3 - ~ 4 ) ~ + (x2 - 2 ~ 3 ) ~ + 10(xl - x ~ ) ~ , x0 = [3, -1,O, lJT, X* = [O 0,O, 0IT, f (x*) = 0.0 (Powell's quartic func- tion, Rao, 1996).

7. f (x) = - h2 + sin ( ~ ~ ~ 2 x 3 ) + exp [- (y - 2) 2] } , x0 = { [O, 1, 2IT, x* = [l, 1, 1IT, f (x) = -3.0 (Rao, 1996).

8. f (x) = {-13+~1+ [(5 - x2)x2 - 2 ] ~ 2 ) ~ + {-29+ XI + [(x2 + 1)x2 - 1 4 ] ~ 2 ) ~ , x0 = 1112, -.2IT, X* = [5,4IT, f (x*) = 0.0 (Freudenstein and Roth function, R.ao, 1996).

9. f(x) = 100(x2-x~)2--(1-21)2, x0 = [-1.2, 1IT, x* = [I, 1IT, f (x*) = 0.0 (cubic valley, Hi~nmelblau, 1972).

10. f(x) = [1.5 - xl( l - 22)12 + [2.25 - xl(1- x;)l2 + 12.625 - xl ( l - x!)l2, x0 = [I, 1IT, x* = (3, 1/2IT, f (x*) = 0.0 (Beale's function, Rao, 1996).

11. f(x) = [10(x2 - X?)l2 4- (1 - + 9 0 ( ~ 4 - + (1 - ~ 3 ) ~ + lO(x2 + 2 4 - 2)2 + 0.1(x2 - x4)2, x0 = [-3,1, -3, -1IT, X* = [ l , 1,1, 1IT, f (x*) = 3.0 (Wood's function, Rao, 1996).

12. f (x) = Cr!=l ix;, x0 = [3,3,. . . , 3IT, x* = [O, 0,. . . , OIT, f (x*) = 0.0 (extended homogeneous quadratic functions).

13. f(x) = C ~ ~ ~ [ ~ O O ( X ~ + ~ - X ~ ) ~ + ( ~ - X ~ ) ~ ] , XO = [-1.2,1, -1.2,1,.. .IT, X* = [I, 1,. . . , 1IT, w r ( ~ * ) = 0.0 (extended Rosenbrock functions, Rao, 1996).

14. f (x) = '&(1-~~) ' /2~- ' , x O = [0,0 ,..., 0IT, X* = [1,1, ..., llT, f (x*) = 0.0 (extended Manevich functions, Manevich, 1999).

Page 137: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 119

4.4 The Dynamic-Q optimization algorithm

4.4.1 Introduction

An efficient constrained optimization method is presented in this sec- tion. The method, called the Dynamic-Q method (Snyman and Hay, 2002), consists of applying the dynamic trajectory LFOPC optimization algorithm (see Section 4.2) to successive quadratic approximations of the actual optimization problem. This method may be considered as an extension of the unconstrained SQSD method, presented in Section 4.3, to one capable of handling general constrained optimization problems.

Due to its efficiency with respect to the number of function evalua- tions required for convergence, the Dynamic-& method is primarily in- tended for optimization problems where function evaluations are expen- sive. Such problems occur frequently in engineering applications where time consuming numerical simulations may be used for function evalu- ations. Amongst others, these numerical analyses may take the form of a computational fluid dynamics (CFD) simulation, a structural analysis by means of the finite element method (FEM) or a dynamic simulation of a multibody system. Because these simulations are usually expen- sive to perform, and because the relevant functions may not be known analytically, standard classical optimization methods are normally not suited to these types of problems. Also, as will be shown, the storage requirements of the Dynamic-Q method are minimal. No Hessian in- formation is required. The method is therefore particularly suitable for problems where the number of variables n is large.

4.4.2 The Dynamic-Q method

Consider the general nonlinear optimization problem:

min f (x); x = [xl, x2,. . . , x,lT E Rn X

subject to

gj(x) = 0; j = 1,2,. . . , p

hk(x) = 0; k = 1,2, . . . , q

where f (x), gj (x) and hk(x) are scalar functions of x.

Page 138: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

120 CHAPTER 4.

In the Dynamic-& approach, successive subproblems P[i] , i = 0,1,2,. . . are generated, at successive approximations xi to the solution x*, by constructing spherically qw.dratic approximations f(x), j j(x) and kk(x) to f (x), gj(x) and hk(x). 'l'hese approximation functions, evaluated at a point x', are given by

1 +-(x - x ) T ~ k ( ~ - xi), k = 1 , . . . ,q

2 with the Hessian matrices .\, Bj and Ck taking on the simple forms

A = diag(a, a , . . . , a) = a1

B j = bjI (4.42)

Ck = sI.

Clearly the identical entries along the diagonal of the Hessian matrices indicate that the approximate subproblems P[i] are indeed spherically quadratic.

For the first subproblem ( . = 0) a linear approximation is formed by setting the curvatures a , ly and ck to zero. Thereafter a , bj and ck are chosen so that the app:.oximating functions (4.41) interpolate their corresponding actual funct~ons at both xi and xi-'. These conditions imply that for i = 1,2,3, . . .

2 [f (xi-') - f (x; ) - vT f (xi)(x'-' - xi)] a = 11 2-1 - xi 1 1 2

(4.43) X

2 [gj(xi-l) - gj(xi) - vTgj(xi)(xi-' - xi)] . b . = 3 ' 1 1 2

, 3 = I , . . . IP llxi-l - xa

2 [hk(xi-') - hk( xi) - vThk(xi)(xi-I - xi)] Ck =

1 1 2 , k = 1, . . . , q .

IXa-1 - xi

If the gradient vectors vT f , vTgj and v T h k are not known analytically, they may be approximated from functional data by means of first-order forward finite differences.

Page 139: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 121

The particular choice of spherically quadratic approximations in the Dynamic-Q algorithm has implications on the computational and stor- age requirements of the method. Since the second derivatives of the objective function and constraints are approximated using function and gradient data, the 0(n2) calculations and storage locations, which would usually be required for these second derivatives, are not needed. The computational and storage resources for the Dynamic-Q method are thus reduced to O(n). At most, 4 + p + q + r + s n-vectors need be stored (where p, q, r and s are respectively the number of inequality and equality constraints and the number of lower and upper limits of the variables). These savings become significant when the number of variables becomes large. For this reason it is expected that the Dynarnic- Q method is well suited, for example, to engineering problems such as structural optimization problems where a large number of variables are present.

In many optimization problems, additional simple side constraints of the form ki I x, < k, occur. Constants ki and ki respectively represent lower and upper bounds for variable xi. Since these constraints are of a simple form (having zero curvature), they need not be approximated in the Dynamic-Q method and are instead explicitly treated as special linear inequality constraints. Constraints corresponding to lower and upper limits are respectively of the form

where vl E i = (vl, v2,. . . , VT) the set of T subscripts corresponding to the set of variables for which respective lower bounds L1 are prescribed, and wm E 1 = (wl, w2,. . . , ws) the set of s subscripts corresponding to the set of variables for which respective upper bounds kw, are pre- scribed. The subscripts vl and wm are used since there will, in general, not be n lower and upper limits, i.e. usually r # n and s # n.

In order to obtain convergence to the solution in a controlled and stable manner, move limits are placed on the variables. For each approximate subproblem P[i] this move limit takes the form of an additional single inequality constraint

Page 140: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

122 CHAPTER 4.

where p is an appropriately chosen step limit and xi-' is the solution to the previous subproblem.

The approximate subproblt:m, constructed a t x$ to the optimization problem (4.41) (plus simple :side constraints (4.44) and move limit (4.45))) thus becomes P[i ] :

subject to

jj(x) 5 0, j = 1,2 ,..., p - h k ( x ) =O, k = 1 ,2 , . . . , q

&(x) 5 0, 1 = 1,2,. . . , r Gm(x) < 0, m = 1,2, . . . , s

2 gp(x) == IIx - xi-ill - ,02 I 0

with solution x*" The Dynhmic-Q algorithm is given by Algorithm 4.5. In the Dynamic-Q method the subproblems generated are solved using the dynamic trajectory, or "leap-frog" (LFOPC) method of Snyman (1982, 1983) for unconstrained optimization applied to penalty function formulations (Snyman et al., 1994; Snyman, 2000) of the constrained problem. A brief descriptioll of the LFOPC algorithm is given in Section 4.2.

Algorithm 4.5 Dynamic42 algorithm

Initialization: Select startirig point x0 and move limit p. Set i := 0. Main procedure:

1. Evaluate f (xi), gj(xi: and hk(xi) as well as v f (xi), vgj(xi) and Vhk(xi). If termination criteria are satisfied set x* = xi and stop.

2. Construct a local approximation P[ i ] to the optimization problem a t xi using expressior s (4.41) to (4.43).

3. Solve the approximated subproblem P[i ] (given by (4.46)) using the constrained optin~izer LFOPC with x0 := xi (see Section 4.2) to give x*'.

4. Set i := i + 1, xi := x*(~-') and return to Step 2.

Page 141: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 123

The LFOPC algorithm possesses a number of outstanding characteris- tics, which makes it highly suitable for implementation in the Dynamic- Q methodology. The algorithm requires only gradient information and no explicit line searches or function evaluations are performed. These properties, together with the influence of the fundamental physical prin- ciples underlying the method, ensure that the algorithm is extremely ro- bust. This has been proven over many years of testing (Snyman, 2000). A further desirable characteristic related to its robustness, and the main reason for its application in solving the subproblems in the Dynamic-Q algorithm, is that if there is no feasible solution to the problem, the LFOPC algorithm will still find the best possible compromised solution without breaking down. The Dynamic-Q algorithm thus usually con- verges to a solution from an infeasible remote point without the need to use line searches between subproblems, as is the case with SQP. The LFOPC algorithm used by Dynamic-Q is identical to that presented in Snyman (2000) except for a minor change to LFOP which is advisable should the subproblems become effectively unconstrained.

4.4.3 Numerical results and conclusion

The Dynamic-Q method requires very few parameter settings by the user. Other than convergence criteria and specification of a maximum number of iterations, the only parameter required is the step limit p. The algorithm is not very sensitive to the choice of this parameter, however, p should be chosen of the same order of magnitude a . the diameter of the region of interest. For the problems listed in Table 4.3 a step limit of p = 1 was used except for problems 72 and 106 where step limits and p = 100 were used respectively.

Given specified positive tolerances e,, e j and E,, then a t step i termina- tion of the algorithm occurs if the normalized step size

or if the normalized change in function value

Page 142: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 4.

where fbest is the lowest previous feasible function value and the current x"s feasible. The point x i is considered feasible if the absolute value of the violation of each constnint is less than E,. This particular function termination criterion is used since the Dynamic-Q algorithm may at times exhibit oscillatory beqavior near the solution.

In Table 4.3, for the same starting points, the performance of the Dynamic- Q method on some standard test problems is compared to results ob- tained for Powell's SQP method as reported by Hock and Schittkowski (1981). The problem numbers given correspond to the problem numbers in Hock and Schittkowski's book. For each problem, the actual function value fmt is given, as well its, for each method, the calculated function value f * a t convergence, therelative function error

and the number of function-gradient evaluations ( ~ f g ) required for con- vergence. In some cases i; was not possible to calculate the relative function error due to rounding off of the solutions reported by Hock and Schittkowski. In these cases the calculated solutions were correct to a t least eight significant figures. For the Dynamic-Q algorithm, conver- gence tolerances of ef = on the function value, E~ = on the step size and e, = for constraint feasibility, were used. These were chosen to allow for compar son with the reported SQP results.

The result for the 12-corner polytope problem of Svanberg (1999) is also given. For this problem the results given in the SQP columns are for Svanberg's Method of Moving Asymptotes (MMA). The recorded number of function evaluations for this method is approximate since the results given correspor#d to 50 outer iterations of the MMA, each requiring about 3 function evaluations.

A robust and efficient metl~od for nonlinear optimization, with minimal storage requirements compared to those of the SQP method, has been proposed and tested. Tht: particular methodology proposed is made possible by the special prollerties of the LFOPC optimization algorithm (Snyman, 2000), which is used to solve the quadratic subproblems. Com- parison of the results for Dynamic-Q with the results for the SQP method show that equally accurate results are obtained with comparable number of function evaluations.

Page 143: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS

Prob. #

2 10

12 13 14 15 16 17 20 22 23 24 26 27 28 29 30 31 32 33 36 45 52 55 56 60 61 63 65 71 72 76 78 80 81 100 104 lo6 108 118

fact

5.043-02 -1 .OOE+Oo -3.00E+01 1.00E+00 1.393+00 3.073+02 2.50E-01 1.00E+00 3.82E+Ol 1.00E+00 2.00E+00 -1.00E+00 O.OOE+00 4.003-02 0.00E+00 -2.263+01 1.00E+00 6.00E+00 1.00E+00 -4.59E+00 -3.303+03 1.00E+00 5.33E+00 6.333+00 -3.46E+00 3.26E-02

-1.443+02 9.623+02 9.543-01 1.70E+01 7.28E+02 -4.68E+00 -2,92E+OO 5.39E02 5.393-02 6.803+02 3.953+00 7.05E+03 -8.663-01 6.653+02

nir

E' 2070E+01 5.00E-08

<1.003-08 5.00E08 8.073-09 < 1.00E-08 <1.00E08 <1.003-08 4.833-09

<1.00E08 <1.003-08 <1.003-08 4.053-08 1.733-08 2.983-21 8.593-11

<l.OOE-08 <1.00E-08 <1.00E08 < 1.00E08 <l.M)EO8 <1.00E08 5.623-09 4.543-02

<1.003-08 3.173-08 1.52E08 2.183-09 9.473-01 1.673-08 1.373-08 3.343-09 2.553-09 7.593-10 1.71E09

<l.OOE-08 8.00E09 1.18E05 1.323-02

w

9.963-05 ative to fur:

-1.00E+00 -3.00E+01 9.593-01 1.39E+00 3.603+02 2.31E+01 1.00E+00 4.02E+01 1.00E+00 2.00E+00 -1.00E+00 1.793-07 4.003-02 7.563-10

-2.263+01 1.00E+00 6.00E+00 1.00E+00 -4.00E+OD -3.303+03 1.00E+00 5.33E+00 6.663+00 -3.463+00 3.263-02

-1.443+02 9.623+02 9.543-01 1.70E+01 7.283+02 -4.683+00 -2.923+00 5.393-02 5.393-02 6.803+02 3.953+00 7.053+03 -8.66E-01 6.653+02 2.80E+02

e at local n d N Fails; 8 Terminates on maximum number of steps

Dynamic f*

4.94EI-00

- u

Table 4.3: Performance of the Dynamic-Q and SQP optimization algo- rithms

Page 144: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

126 CHAPTER 4.

4.5 A gradient-only line search method for con- jugate gradient methods

4.5.1 Introduction

Many engineering design optimization problems involve numerical com- puter analyses via, for example, FEM codes, CFD simulations or the computer modeling of the dynamics of multi-body mechanical systems. The computed objective function is therefore often the result of a com- plex sequence of calculations involving other computed or measured quantities. This may resu t in the presence of numerical noise in the objective function so that it exhibits non-smooth trends as design pa- rameters are varied. It is well known that this presence of numerical noise in the design optimization problem inhibits the use of classical and traditional gradient-bssed optimization methods that employ line searches, such as for exam~le, the conjugate gradient methods. The nu- merical noise may prevent or slow down convergence during optimiza tion. It may also promote convergence to spurious local optima. The computational expense of the analyses, coupled to the convergence dif- ficulties created by the numerical noise, is in many cases a significant obstacle to performing multidisciplinary design optimization.

In addition to the problems anticipated when applying the conjugate gradient methods to noisj functions, it is also known that standard implementations of conjugiite gradient methods, in which conventional line search techniques have been used, are less robust than one would expect from their the theoretical quadratic termination property. There- fore the conjugate gradient method would, under normal circumstances, not be preferred to quasi-Newton methods (Fletcher, 1987). In par- ticular severe numerical difficulties arise when standard line searches are used in solving constrained problems through the minimization of associated penalty functions. However, there is one particular advan- tage of conjugate gradient methods, namely the particular simple form that requires no matrix operations in determining the successive search directions. Thus, conjugate gradient methods may be the only meth- ods which are applicable to large problems with thousands of variables (Fletcher, 1987), and are therefore well worth further investigation.

Page 145: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 127

In this section a new implementation (ETOPC) of the conjugate gra- dient method (both for the Fletcher-Reeves and Polak-Ribiere versions (see Fletcher, 1987) is presented for solving constrained problem. The essential novelty in this implementation is the use of a gradient-only line search technique originally proposed by the author (Snyman, 1985), and used in the ETOP algorithm for unconstrained minimization. It will be shown that this implementation of the conjugate gradient method, not only easily overcomes the accuracy problem when applied to the mini- mization of penalty functions, but also economically handles the problem of severe numerical noise superimposed on an otherwise smooth under- lying objective function.

4.5.2 Formulation of optimization problem

Consider again the general constrained optimization problem:

min f (x), x = [XI, x2,23,. . . , xnlT E Rn X

(4.50)

subject to the inequality and equality constraints:

where the objective function f (x), and the constraint functions gj(x) and hj(x), are scalar functions of the real column vector x. The optimum solution is denoted by x*, with corresponding optimum function value f (x*).

The most straightforward way of handling the constraints is via the unconstrained minimization of the penalty function:

where pj >> 0, Pj = 0 if gj(x) 5 0, and pj = pj >> 0 if gj(x) > 0.

Usually pj = pj = p >> 0 for all j, with the corresponding penalty function being denoted by P (x , p) .

Page 146: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

128 CHAPTER 4.

Central to the application of the conjugate gradient method to penalty function formulated problems presented here, is the use of an uncon- ventional line search methc d for unconstrained minimization, proposed by the author, in which no function values are explicitly required (Sny- man, 1985). Originally this gradient-only line search method was applied to the conjugate gradient method in solving a few very simple uncon- strained problems. For SOIT ewhat obscure reasons, given in the original paper (Snyman, 1985) and briefly hinted to in this section, the combined method (novel line search plus conjugate gradient method) was called the ETOP (Euler-Trapezium Optimizer) algorithm. For this historical reason, and to avoid confusion, this acronym will be retained here to denote the combined method for unconstrained minimization. In sub- sequent unreported numerical experiments, the author was successful in solving a number of more challenging practical constrained optimization problems via penalty funct on formulations of the constrained problem, with ETOP being used in the unconstrained minimization of the se- quence of penalty function:;. ETOP, applied in this way to constrained problems, was referred to is the ETOPC algorithm. Accordingly this acronym will also be used here.

4.5.3 Gradient-only line search

The line search method used here, and originally proposed by the author (Snyman, 1985) uses no explicit function values. Instead the line search is implicitly done by using only two gradient vector evaluations at two points along the search direction and assuming that the function is near- quadratic along this line. I 'he essentials of the gradient-only line search, for the case where the furction f ( x ) is unconstrained, are as follows. Given the current design pc int xk at iteration k and next search direction vk+l, then compute

xk+l = xk + V k S I T (4.53)

where T is some suitably chosen positive parameter. The step taken in (4.53) may be seen as an "Euler step". With this step given by

the line search in the direct .on vk++' is equivalent to finding x*~+' defined

f (x*"') = min f ( x k + X A X ~ ) . A

Page 147: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS

Figure 4.3: Successive steps in line search procedure

These steps are depicted in Figure 4.3.

It was indicated in Snyman (1985) that for the step xk+' = xk + vk+'r the change in function value A f k , in the unconstrained case, can be approximated without explicitly evaluating the function f ( x ) . Here a more formal argument is presented via the following lemma.

4.5.3.1 Lemma 1

For a general quadratic function the change in function value, for the step a x k = xk+l - xk = vk+l, is given by:

where ak = -V f ( x k ) and ( , ) denotes the scalar product.

Proof:

In general, by Taylor's theorem:

Page 148: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 4.

4

Figure 4.4: Approximation of minimizer x*~+' in the direction vk+l

and

1 f ( x k + l ) - j ( x k ) = (xk+ - x k , v f ( X I + ' ) ) - L) ( a x k , H ( X ~ ) A X ~ )

where xa = xk + OoAxk, xb = xk + &Axk and both e0 and el in the interval [0, 11, and where H [ x ) denotes the Hessian matrix of the general function f ( x ) . Adding the above two expressions gives:

If f ( x ) is quadratic then R ( x ) is constant and it follows that

where ak = v f ( x k ) , which completes the proof. 0

By using expression (4.56) the position of the minimizer x*~+' (see Fig- ure 4.4), in the direction vk+l, can also be approximated without any explicit function evaluation. This conclusion follows formally from the second lemma given below. Note that in (4.56) the second quantity in

Page 149: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 131

the scalar product corresponds to an average vector given by the "trapez- ium rule". This observation together with the remark following equation (4.53), gave rise to the name "Euler-trapezium optimizer (ETOP)" when applying this line search technique in the conjugate gradient method.

4.5.3.2 Lemma 2

For f (x) a ~ositive-definite quadratic function the point x*~+ ' defined by f (x*~+') = minA f (xk + XAX') is given by

1 X*k+l = Xk + -0axk

2 (4.57)

where

1 k k f3 = p/((vk++', -(ak + ak++')7) + p) and p = -(Ax ,a ). (4.58)

2

Proof:

First determine 0 such that

f (xk + Ohxk) = f (xk).

By Taylor's expansion:

1 1 f (xk++') - f (xk) = p + - ( a x k , ,Axk), i.e., -(Axk, H A X ~ ) = A f k - p

2 2

which gives for the step BAx:

1 f (xk + 0Axk) - f (xk) = Bp + I ~ 2 ( ~ ~ k , ,Axk) = Bp -i- e2(A fk - p).

For both function values to be the same, 0 must therefore satisfy:

0 = O(P + B(Afk - P ) )

which has the non-trivial solution:

6 = -pl(Af,, - PI. Using the expression for A fk given by Lemma 1, it follows that:

Page 150: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

132 CHAPTER 4.

and by the symmetry of quadratic functions that

Expressions (4.57) and (4.58) may of course also be used in the general non-quadratic case, to determine an approximation to the minimizer x*~+' in the direction vkS', when performing successive line searches using the sequence of descent directions, vk f l , k = 1,2, . . . Thus in practice, for the next (kS1)-th iteration, set xk+l := x*"+', and with the next selected search directim vk+2 proceed as above, using expressions (4.57) and (4.58) to find x*"' and then set xk+' := x * ~ + ~ . Continue iterations in this way, with only two gradient vector evaluations done per line search, until conve .gence is obtained.

In summary, explicit function evaluations are unnecessary in the above line search procedure, since the two computed gradients along the search direction allow for the computation of an approximation (4.56) to the change in objective function, which in turn allows for the estimation of the position of the minimum along the search line via expressions (4.57) and (4.58), based on the asjumption that the function is near quadratic in the region of the search.

4.5.3.3 Heuristics

Of course in general the o3jective function may not be quadratic and positive-definite. Additionitl heuristics are therefore required to ensure descent, and to see to it thitt the step size (corresponding to the param- eter T between successive (;radient evaluations, is neither too small nor too large. The details of tlr ese heuristics are as set out below.

(i) In the case of a successful step having been taken, with A f k com- puted via (4.56) negsiive, i.e. descent, and 9 computed via (4.58) positive, i.e. the function is locally strictly convex, as shown in Figure 4.4, T is increased by a factor of 1.5 for the next search direction.

Page 151: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 133

(ii) It may turn out that although A f k computed via (4.56) is negative, that 0 computed via (4.58) is also negative. The latter implies that the function along the search direction is locally concave. In this case set 0 := -0 in computing x*"' by (4.57), so as to ensure a step in the descent direction, and also increase r by the factor 1.5 before computing the step for the next search direction using (4.53).

(iii) It may happen that Afk computed by (4.56) is negative and ex- actly equal to p, i.e. Afk - p = 0. This implies zero curvature with 8 = oo and the function is therefore locally linear. In this case enforce the value 8 = 1. This results in the setting, by (4.57), of x*&+' equal to a point halfway between xk and xk+l. In this case T is again increased by the factor of 1.5.

(iv) If both Afk and 0 are positive, which is the situation depicted in Figure 4.4, then r is halved before the next step.

(v) In the only outstanding and unlikely case, should it occur, where A f k is positive and f3 negative, r is unchanged.

(vi) For usual unconstrained minimization the initial step size param- eter selection is T = 0.5.

The new gradient-only line search method may of course be applied to any line search descent method for the unconstrained minimization of a general multi-variable function. Here its application is restricted to the conjugate gradient method.

4.5.4 Conjugate gradient search directions and SUMT

The search vectors used here correspond to the conjugate gradient di- rections (Bazaraa et al., 1993). In particular for k = 0,1,. . ., the search vectors are

where sk+l denote the usual conjugate gradient directions, P1 = 0 and for k > 0:

Pk+l = 11vf (xk))l12/~~vf (xk-') 1 l 2 (4.60)

Page 152: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

134 CHAPTER 4.

for the Fletcher-Reeves im~llementation, and for the Polak-Ribiere ver- sion:

As recommended by Fletcher (1987), the conjugate gradient algorithm is restarted in the direction of steepest descent when k > n .

For the constrained problem the unconstrained minimization is of course applied to successive pena ty function formulations P(x) of the form shown in (4.52), using the well known Sequential Unconstrained Mini- mization Technique (SUMT) (Fiacco and McCormick, 1968). In SUMT, for j = 1,2, . . ., until convergence, successive unconstrained minimiza- tions are performed on successive penalty functions P(x) = ~ ( x , p(j)) in which the overall pena1:y parameter p(j) is successively increased: p(j+l) := l0p(j). The corresponding initial step size parameter is set at T = 0 . 5 / j ~ ( j ) for each sub problem j . This application of ETOP to the constrained problem, via the unconstrained minimization of successive penalty functions, is referred to as the ETOPC algorithm. In practice, if analytical expressions for the components of the gradient of the ob- jective function are not avaJable, they may be calculated with sufficient accuracy by finite differences. However, when the presence of severe noise is suspected, the application of the gradient-only search method with conjugate gradient search directions, requires that centralfinite dzf- ference approximations to the gradients be used in order to effectively smooth out the noise. In tE is case relatively excessive perturbations 6 ~ i in xi must be used, which in practice may typically be of the order of 0.1 times the range of interest!

In the application of ETOI'C a limit A,, is in practice set to the max- imum allowable magnitude A* of the step A* = x*~+' - xk. If A* is greater than A,, then set

and restart the conjugate ;radient procedure, with x0 := xk+' , in the direction of steepest descent. If the maximum allowable step is taken n times in succession, then C, is doubled.

Page 153: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 135

4.5.5 Numerical results

The proposed new implementation of the conjugate gradient method (both the Fletcher- Reeves and Polak-Ribiere versions) is tested here using 40 different problems arbitrarily selected from the famous set of test problems of Hock and Schittkowski (1981). The problem numbers (Pr. #) in the tables, correspond to the numbering used in Hock and Schittkowski. The final test problem, (12-poly), is the 12 polytope prob- lem of Svanberg (1995, 1999). The number of variables (n) of the test problems ranges from 2 to 21 and the number of constraints (m plus r ) per problem, from 1 to 59. The termination criteria for the ETOPC algorithm are as follows:

(i) Convergence tolerances for successive approximate sub-problems within SUMT: sg for convergence on the norm of the gradient vector, i.e. terminate if (JvP(x*~+', < eg , and ex for con- vergence on average change of design vector: i.e. terminate if ~ I I x * ~ " - ~ * ~ - ' l l < E z .

(ii) Termination of the SUMT procedure occurs if the absolute value of the relative difference between the objective function values at the solution points of successive SUMT problems is less than ~ f .

4.5.5.1 Results for smooth functions with no noise

For the initial tests no noise is introduced. For high accuracy require- ments (relative error in optimum objective function value to be less than

it is found that the proposed new conjugate gradient implemen- tation performs as robust as, and more economical than, the traditional penalty function implementation, FMIN, of Kraft and Lootsma reported in Hock and Schittkowski (1981). The detailed results are as tabulated in Table 4.4. Unless otherwise indicated the algorithm settings are: E, = E - A, = 1.0, E , = p(0) = 1.0 and iout = 15, 9 - where iout denotes the maximum number of SUMT iterations allowed. The number of gradient vector evaluations required by ETOPC for the different problems are denoted by nge (note that the number of explicit function evaluations is zero), and the relative error in function value at

Page 154: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

136 CHAPTER 4.

convergence to the point xc is denoted by r f , which is computed from

For the FMIN algorithm or.ly the number of explicit objective function evaluations n f e are listed, together with the relative error rf a t conver- gence. The latter method requires, in addition to the number of function evaluations listed, a comparable number of gradient vector evaluations, which is not given here (set Hock and Schittkowski, 1981).

4.5.5.2 Results for se\ere noise introduced in the objective function

Following the successful implementation for the test problems with no noise, all the tests were rerun, but with severe relative random noise in- troduced in the objective function f(x) and all gradient components computed by central finitt, differences. The influence of noise is in- vestigated for two cases, namely, for a variation of the superimposed uniformly distributed rand3m noise as large as (i) 5% and (ii) 10% of (1 + I f (x*)l), where x* is the optimum of the underlying smooth prob- lem. The detailed results are shown in Table 4.5. The results are listed only for the Fletcher-Reeves version. The results for the Polak- Ribiere implementation are almost identical. Unless otherwise indicated the al- gorithm settings are: 6xi = 1.0, E~ = l ~ - ~ , A, = 1.0, ~f = l ~ - ~ , ,x(O) = 1.0 and iout = 6, where iout denotes the maximum number of SUMT iterations allowed. For termination of sub-problem on step size, cX was set to ex := 0.005J;i for the initial sub-problem. Thereafter it is successively halved for each subsequent sub-problem.

The results obtained are surprisingly good with, in most cases, fast con- vergence to the neighbourhood of the known optimum of the underlying smooth problem. In 90% ot' the cases regional convergence was obtained with relative errors r, < 0.025 for 5% noise and r, < 0.05 for 10% noise, where

Tz = IIx* - xcII/(IIx*II + 1) (4.64)

and xC denotes the point cf convergence. Also in 90% of the test prob- lems the respective relative errors in final objective function values were rf < 0.025 for 5% noise and rf < 0.05 for 10% noise, where rf is as defined in (4.63).

Page 155: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS

Pr. # 1 2 10 12

13['1 14 15 16 17

20131 22 23 24 26 27 28 29 30 31 32

33131 36 45 52

55131 56 60 61 63 65 71

721'1 76 78 80

8 1 151 106 lo8

ll8['] 12-p01y[61

Polak-Ribiere

[llconstraint

137

7MIN '- f

< lo-" 1 x 10-8 7 x 1 x 10-8

0.163 2 x lo-' 4 x lo-' 1 x 10-8 1 x 10-8 4 x lo-' I x lo-' 6 x lo-' 2 x 10-8 1 x 10-8 1 x 10-8 < 10-8 < 1 0 - ~

4 x 10-8 < 10-8 < 10-8

3 x 10-T 2 x lo-' < 10-8 < 10-8

3 x 10-8 <10-8

1 x 1 0 - 8 <10-8 <10-8

fails 5 x 10-3

5 x 10-2 < 10-8 < 10-8

2 x 10-8 5 x lo-'

fails 7 x 1 0 - ~

fails L L qualification not satisfied. Iz1T a on ma: ~ imum number of steps.

131~onvergence to local minimum. ['Ip(O) = 1.0, Am = 1.0. [61p(0) = 102. ['I~radients by central finite differences, 6zi = lo-', e, = lo-'.

Table 4.4: The respective performances of the new conjugate gradient implementation ETOPC and FMIN for test problems with no noise in- troduced

Page 156: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 4.

Pr. # 1 2 10 12 13 14 15

16[11 17 20 22 23 24 26 27 28 29 30 31 32 33

36'" 45 52 55 56 60

61 1'' 63lZ1 65 71 72 76 78 80

81[3] 10614' 108~'~ 118['I

2-poly'21

5% noise 10% noise

Table 4.5: Performance of ETOPC for test problems with severe noise introduced

Page 157: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 139

4.5.6 Conclusion

The ETOPC algorithm performs exceptionally well for a first order method in solving constrained problems where the functions are smooth. For these problems the gradient only penalty function implementation of the conjugate gradient method performs as well, if not better than the best conventional implementations reported in the literature, in pro- ducing highly accurate solutions.

In the cases where severe noise is introduced in the objective function, relatively fast convergence to the neighborhood of x*, the solution of the underlying smooth problem, is obtained. Of interest is the fact that with the reduced accuracy requirement associated with the presence of noise, the number of function evaluations required to obtain sufficiently accurate solutions in the case of noise, is on the average much less than that necessary for the high accuracy solutions for smooth functions. As already stated, ETOPC yields in 90% of the cases regional convergence with relative errors r, < 0.025 for 5% noise, and r, < 0.05 for 10% noise. Also in 90% of the test problems the respective relative errors in the final objective function values are rf < 0.025 for 5% noise and r j < 0.05 for 10% noise. In the other 10% of the cases the relative errors are also acceptably small. These accuracies are more than sufficient for multidisciplinary design optimization problems where similar noise may be encountered.

4.6 Global optimization using dynamic search trajectories

4.6.1 Introduction

The problem of globally optimizing a real valued function is inherently intractable (unless hard restrictions are imposed on the objective func- tion) in that no practically useful characterization of the global optimum is available. Indeed the problem of determining an accurate estimate of the global optimum is mathematically ill-posed in the sense that very similar objective functions may have global optima very distant from each other (Schoen, 1991). Nevertheless, the need in practice to find

Page 158: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

140 CHAPTER 4.

a relative low local minimum has resulted in considerable research over the last decade to develop algorithms that attempt to find such a low minimum, e.g. see Torn and Zilinskas (1989).

The general global optimi~~ttion problem may be formulated as follows. Given a real valued objective function f (x) defined on the set x E D in Rn, find the point x* and t le corresponding function value f * such that

f * = f (x*) = minimum { f (x)lx E D) (4.65)

if such a point x* exists. ;f the objective function and/or the feasible domain D are non-convex, then there may be many local minima which are not global.

If D corresponds to all Rn the optimization problem is unconstrained. Alternatively, simple bounds may be imposed, with D now correspond- ing to the hyper box (or domain or region of interest) defined by

where e and u are n-vect x s defining the respective lower and upper bounds on x.

From a mathematical poitt of view, Problem (4.65) is essentially un- solvable, due to a lack of mathematical conditions characterizing the global optimum, as opposed to the local optimum of a smooth continu- ous function, which is chari~cterized by the behavior of the problem func- tion (Hessians and gradients) a t the minimum (Arora et al., 1995) (viz. the Karush-Kuhn-Tucker s:onditions). Therefore, the global optimum f * can only be obtained b j an exhaustive search, except if the objective function satisfies certain subsidiary conditions (Griewank, 1981), which mostly are of limited practical use (Snyman and Fatti, 1987). Typically, the conditions are that f should satisfy a Lipschitz condition with known constant L and that the stmarch area is bounded, e.g. for all x, x E X

So called space-covering deterministic techniques have been developed (Dixon et al., 1975) unde- these special conditions. These techniques are expensive, and due to the need to know L, of limited practical use.

Global optimization algori3,hms are divided into two major classes (Dixon et al., 1975): deterministic and stochastic (from the Greek word stokhastikos,

Page 159: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRA DIENT-BASED METHODS 141

i.e. 'governed by the laws of probability'). Deterministic methods can be used to determine the global optimum through exhaustive search. These methods are typically extremely expensive. With the introduction of a stochastic element into deterministic algorithms, the deterministic guar- antee that the global optimum can be found is relaxed into a confidence measure. Stochastic methods can be used to assess the probability of having obtained the global minimum. Stochastic ideas are mostly used for the development of stopping criteria, or to approximate the regions of attraction as used by some methods (Arora et al., 1995).

The stochastic algorithms presented herein, namely the Snyman-Fatti algorithm and the modified bouncing ball algorithm (Groenwold and Snyman, 2002), both depend on dynamic search trajectories to minimize the objective function. The respective trajectories, namely the motion of a particle of unit mass in a n-dimensional conservative force field, and the trajectory of a projectile in a conservative gravitational field, are modified to increase the likelihood of convergence to a low local minimum.

4.6.2 The Snyman-Fatti trajectory method

The essentials of the original SF algorithm (Snyman and Fatti, 1987) us- ing dynamic search trajectories for unconstrained global minimization will now be discussed. The algorithm is based on the local algorithms presented by Snyman (1982, 1983). For more details concerning the motivation of the met hod, its detailed construction, convergence theo- rems, computational aspects and some of the more obscure heuristics employed, the reader is referred to the original paper.

4.6.2.1 Dynamic trajectories

In the SF algorithm successive sample points xj, j = 1,2, ..., are selected at random from the box D defined by (4.66). For each sample point xj, a sequence of trajectories T', i = 1,2, ..., is computed by numerically

Page 160: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

142 CHAPTER 4.

solving the successive initid value problems:

This trajectory represents ,he motion of a particle of unit mass in a n- dimensional conservative force field, where the function to be minimized represents the potential encrgy.

Trajectory T~ is terminated when x(t) reaches a point where f (x( t ) ) is arbitrarily close to the .ralue f (xi) while moving "uphill", or more precisely, if x(t) satisfies the conditions

f :x(t)) > f ( x 3 - (4.69)

and ~ ( t ) ~ ~ f (x(t)) > 0 where E, is an arbitrary sn~all prescribed positive value.

An argument is presented in Snyman and Fatti (1987) to show that when the level set {xi f (x) 5 f (x;)) is bounded and ~f (xb) # 0, then conditions (4.69) above will be satisfied at some finite point in time.

Each computed step along, trajectory T= is monitored so that a t ter- mination the point x& a t which the minimum value was achieved is recorded together with thc associated velocity $ and function value fk. The values of xL and xk are used to determine the initial values for the next trajectory T ~ + [ . From a comparison of the minimum values the best point xi , for the :urrent j over all trajectories to date is also recorded. In more detail t t e minimization procedure for a given sample point xJ, in computing the sequence xi, i = 1,2, ..., is as follows.

In the original paper (Snyman and Fatti, 1987) an argument is presented to indicate that under nor~nal conditions on the continuity of f and its derivatives, x t will converge to a local minimum. Procedure MP1, for a given j, is accordingly txminated a t step 3 above if I lVf (xi11 1 5 E,

for some small prescribed positive value r, and x6 is taken as the local minimizer 2, i.e. set x: :== x i with corresponding function value fi := f f ( +

Reflecting on the overall approach outlined above, involving the compu- tation of energy conserving trajectories and the minimization procedure,

Page 161: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 143

Algorithm 4.6 Minimization Procedure MP1

1. For given sample point x j , set x: := x j and compute T 1 subject to x: := 0 ; record x&,x& and f; ; set x t := x& and i := 2,

2. compute trajectory T~ with x6 := f (xi-' +xi-*) and kb := 1 1-1 TX, , record x&, x& and f,!,,,

3. iff; < f (xF1) then xt := XL ; else x i := xi-',

4. set i := i + 1 and go to 2.

it should be evident that, in the presence of many local minima, the prob- ability of convergence to a relative low local minimum is increased. This one expects because, with a small value of E , (see conditions (4.69)), it is likely that the particle will move through a trough associated with a relative high local minimum, and move over a ridge to record a lower function value at a point beyond. Since we assume that the level set associated with the starting point function is bounded, termination of the search trajectory will occur as the particle eventually moves to a region of higher function values.

4.6.3 The modified bouncing ball trajectory method

The essentials of the modified bouncing ball algorithm using dynamic search trajectories for unconstrained global minimization are now pre- sented. The algorithm is in an experimental stage, and details con- cerning the motivation of the method, its detailed construction, and computational aspects will be presented in future.

4.6.3.1 Dynamic trajectories

In the MBB algorithm successive sample points xj, j = 1,2 , ..., are se- lected at random from the box D defined by (4.66). For each sample point x j , a sequence of trajectory steps Axi and associated projection points xi+', i = 1,2, ..., are computed from the successive analytical

Page 162: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

144 CHAPTER 4.

relationships (with x1 := x i and prescribed Vo, > 0):

where

7r 8, = tan-'([ ~f (xi)] 1) + 5, (4.71)

1 ti = - [voi s n 0, + {(vo, sin 13i)2 + 2 s h ( ~ i ) ) li2] , (4.72)

9 h(xi) = f(xi) + k (4.73)

with k a constant chosen such that h(x) > 0 V x E D, g a positive constant, and

Xi+1 = Xi + Axi. (4.74)

For the next step, select 1/oi+, < hi. Each step Axi represents the ground or horizontal displa1:ement obtained by projecting a particle in a vertical gravitational field I constant g) a t an elevation h(xi) and speed Voi a t an inclination 8i. The angle Oi represents the angle that the outward normal n to the hypersurface represented by y = h(x) makes, at xi in n + 1 dimensional :pace, with the horizontal. The time of flight ti is the time taken to reac.1 the ground corresponding to y = 0.

More formally, the minimization trajectory for a given sample point x j and some initial prescribed speed l41 is obtained by computing the sequence xa, 2 = 1,2, ..., as follows.

Algorithm 4.7 Minimization Procedure MP2

1. For given sample point x i , set x1 := x j and compute trajectory step Ax1 according I o (4.70) - (4.73) and subject to Vo, := Vo; record x2 := x1 + Ax1, set i := 2 and Vo, := aVo, (a < 1).

2. Compute Axi according to (4.70) - (4.73) to give xi+' := xi +Axi, record xi+' and set Ibi+, := avoi.

3. Set i := i + 1 and go to 2.

In the vicinity of a local rr inimum 2 the sequence of projection points x', i = 1,2, ..., constituting the search trajectory for starting point XJ

will converge since a x i + 0 (see (4.70)). In the presence of many local

Page 163: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 145

minima, the probability of convergence to a relative low local minimum is increased, since the kinetic energy can only decrease for cr < 1.

Procedure MP2, for a given j, is successfully terminated if IIVf (x')ll 5 E for some small prescribed positive value E, or when aVJ < PV,', and xi is taken as the local minimizer x: with corresponding function value

fj := h(x;) - k.

Clearly, the condition crVi < PV; will always occur for 0 < P < cr and O < a < l .

MP2 can be viewed as a variant of the steepest descent algorithm. How- ever, as opposed to steepest descent, MP2 has (as has MP1) the ability for 'hill-climbing', as is inherent in the physical model on which MP2 is based (viz., the trajectories of a bouncing ball in a conservative gravita- tional field.) Hence, the behavior of MP2 is quite different from that of steepest descent and furthermore, because of it's physical basis, it tends to seek local minima with relative low function values and is therefore suitable for implementation in global searches, while steepest descent is not.

For the MBB algorithm, convergence to a local minimum is not proven. Instead, the underlying physics of a bouncing ball is exploited. Unsuc- cessful trajectories are terminated, and do not contribute to the prob- abilistic stopping criterion (although these points are included in the number of unsuccessful trajectories f i ) . In the validation of the algo- rithm the philosophy adopted here is that the practical demonstration of convergence of a proposed algorithm on a variety of demanding test problems may be as important and convincing as a rigorous mathemat- ical convergence argument.

Indeed, although for the steepest descent method convergence can be proven, in practice it often fails to converge because effectively an infinite number of steps is required for convergence.

4.6.4 Global stopping criterion

The above methods require a termination rule for deciding when to end the sampling and to take the current overall minimum function value f,

Page 164: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

146 CHAPTER 4 .

i.e. .f = minimum { f j , over all j to date 1 (4.75)

as the global minimum value f *.

Define the region of convergence of the dynamic methods for a local minimum k as the set of all points x which, used as starting points for the above procedures, converge to P. One may reasonably expect that in the case where the regions of attraction (for the usual gradient-descent methods, see Dixon et al., 1976) of the local minima are more or less equal, that the region of convergence of the global minimum will be relatively increased.

Let Rk denote the region of convergence for the above minimization procedures MPI and MP.2 ~f local minimum kk and let a k be the asso- ciated probability that a stLmple point be selected in Rk. The region of convergence and the associated probability for the global minimum x* are denoted by R* and a* ~espectively. The following basic assumption, which is probably true for many functions of practical interest, is now made. BASIC ASSUMPTION.

a* 2 a,: for all local minima kk. (4.76)

The following theorem may be proved.

4.6.4.1 Theorem (Snyman and Fatti, 1987)

Let r be the number of sample points falling within the region of con- vergence of the current olrerall minimum $ after 6 points have been sampled. Then under thc above assumption and a statistically non- informative prior distribution the probability that .f corresponds to f * may be obtained from

[ - I (5 + 1)!(2fi - r ) ! Pr f = f* 2 q(fi,r) = 1 -

(2fi + l)!(fi - r ) ! . (4.77)

On the basis of this theorem the stopping rule becomes: STOP when P r f = f * 2 q*, where q* is some prescribed desired confidence level, [ - I typically chosen as 0.99.

Page 165: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Proof:

We present here an outline of the proof of (4.77), and follow closely the presentation in Snyman and Fatti (1987). (We have since learned that the proof can be shown to be a generalization of the procedure proposed by Zielinsky, 1981.) Given ii* and a*, the probability that a t least one point, ii 2 1, has converged to f * is

In the Bayesian approach, we characterize our uncertainty about the value of a* by specifying a prior probability distribution for it. This distribution is modified using the sample information (namely, .fi and r ) to form a posterior probability distribution. Let p,(a81ii, r ) be the posterior probability distribution of a* . Then,

Now, although the r sample points converge to the current overall mini- mum, we do not know whether this minimum corresponds to the global minimum of f *. Utilizing (4.76), and noting that (1 - a)e is a decreasing function of a, the replacement of a* in the above integral by a yields

Now, using Bayes theorem we obtain

Since the ii points are sampled at random and each point has a proba- bility a of converging to the current overall minimum, r has a binomial distribution with parameters a and ii. Therefore

p( r )o , ii) = (t) a r ( I - a)'-' .

Substituting (4.82) and (4.81) into (4.80) gives:

Page 166: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

148 CHAPTER 4.

A suitable flexible prior dirtribution p(a) for a is the beta distribution with parameters a and b. I[ence,

p(a) = [l/P(a, b)] (la-'(1 - a)*-', 0 5 a 5 1. (4.84)

Using this prior distribution gives:

Assuming a prior expectat on of 1, (viz. a = b = I), we obtain

(fi + I)! (2fi - r)! Pr[ii* 2. l ( i i , r ] = 1-

(2fi + I)! (fi - r)! which is the required resul:.

4.6.5 Numerical results

No. 1 2 3 4 5 6 7 8 9 10

Naine Griewank G1 Griewank G2 Goldstein-Price Six-hump Camelback Shubert, Levi No. 4 Branin Rastrigin Hartman 3 Hartman 6 Shekel 5

Ref. Torn and Zilinskas; Griewank Torn and Zilinskas; Griewank Torn and Zilinskas; Dixon and Szego Torn and Zilinskas; Branin Lucidi and Piccioni Torn and Zilinskas; Branin and Hoo Torn and Zilinskas Torn and Zilinskas; Dixon and Szego Torn and Zilinskas; Dixon and Szego Torn and Zilinskas; Dixon and Szego

11 Shekel 7 S7 4 Torn and Zilinskas; Dixon and Szego 12 Shekel 10 S10 4 Tom and Zilinskas; Dixon and Szego

Tabk: 4.6: The test functions

The test functions used are tabulated in Table 4.6, and tabulated nu- merical results are presented in Tables 4.7 and 4.8. In the tables, the reported number of function values N j are the average of 10 independent (random) starts of each a.gorithm.

Page 167: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

NEW GRADIENT-BASED METHODS 149

SF - This Study SF - Previous MBB NO. ID Nf ( T / f i ) b (T /?( )w Nf T/?( Nr ( ~ / f i ) ~ ( ~ / f i ) ~ 1 G1 4199 6/40 6/75 1606 6/20 2629 518 6/23

Table 4.7: Numerical results

Test Function Method BR C6 GP RA SH H3 TRUST 55 31 103 59 72 58 MBB 25 29 74 168 171 24

Table 4.8: Cost (Nf) using a prlori stopping condition

Unless otherwise stated, the following settings were used in the SF al- gorithm (see Snyman and Fatti, 1987): y = 2.0, a = 0.95, e = l ~ - ~ , w = loe2, 6 = 0.0, g* = 0.99, and At = 1.0. For the MBB algorithm, a = 0.99, E = and q* = 0.99 were used. For each problem, the ini- tial velocity VJ was chosen such that Ax1 was equal to half the 'radius' of the domain D. A local search strategy was implemented with varying a in the vicinity of local minima.

In Table 4.7, ( ~ / f i ) ~ and (rlfi), respectively indicate the best and worst r/ii ratios (see equation (4.77)), observed during 10 independent opti- mization runs of both algorithms. The SF results compare well with the previously published results by Snyman and Fatti, who reported values for a single run only. For the Shubert, Branin and Rastrigin functions, the MBB algorithm is superior to the SF algorithm. For the Shekel functions (S5, S7 and SlO), the SF algorithm is superior. As a result of the stopping criterion (4.77), the SF and MBB algorithms found the

Page 168: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

150 CHAPTER 4.

global optimum between 4 and 6 times for each problem.

The results for the trying Griewank functions (Table 4.7) are encourag- ing. G1 has some 500 1oc;tl minima in the region of interest, and G2 several thousand. The values used for the parameters are as specified, with At = 5.0 for G1 and C2 in the SF-algorithm. It appears that both the SF and MBB algorithm are highly effective for problems with a large number of local minima in D, and problems with a large number of design variables.

In Table 4.8 the MBB algorithm is compared with the recently pub- lished deterministic TRUST algorithm (Barhen et al., 1997). Since the TRUST algorithm was ter~ninated when the global approximation was within a specified tolerance of the (known) global optimum, a similar criterion was used for the MBB algorithm. The table reveals that the two algorithms compare wt 11. Note however that the highest dimension of the test problems used in Barhen et al. (1997) is 3. It is unclear if the deterministic TRUST algo1,ithm will perform well for problems of large dimension, or problems with a large number of local minima in D.

In conclusion, the numerical results indicate that both the Snyman-Fatti trajectory method and the, modified bouncing ball trajectory method are effective in finding thtb global optimum efficiently. In particular, the results for the trying Griewank functions are encouraging. Both algorithms appear effective for problems with a large number of local minima in the domain, ar d problems with a large number of design variables. A salient feature of the algorithms is the availability of an apparently effective global stopping criterion.

Page 169: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 5

EXAMPLE PROBLEMS

5.1 Introductory examples

Problem 5.1.1

Sketch the geometrical solution to the optimization problem:

minimize f (x) = 2x2 - XI

subject to gl(x) = xy + 42; - 16 5 0, g2(x) = (xl - 3)2 + (x2 - 3)2 - 9 < 0

and XI 2 0 and xz L 0.

In particular sketch the contours of the objective function and the con- straint curves. Indicate the feasible region and the position of the opti- m u m x* and the active constraint(s).

The solution to this problem is indicated in Figure 5.1.

Problem 5.1.2

Consider the function f (x) = 100(x2 - x:)~ + (1 - ~ 1 ) ~ .

(i) Compute the gradient vector and the Hessian matrix.

(ii) Show that V f (x) = 0 if x* = [I, 1IT and that x* is indeed a strong local minimum.

Page 170: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

152 CHAPTER 5.

Figure 5.1 : Solution to problem 5.1.1

(iii) Is f (x) a convex function? Justify your answer.

Solution

(ii) V f (x) = 0 implies that

Page 171: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 153

From (5.2) 22 = x f , and substituting into (5.1) yields

giving xl = 1 and x2 = 1 which is the unique solution.

Therefore a t x* = [I, 1IT:

and since the principal minors arl = 802 > 0, and a2 = det H(x*) = 400 > 0, it follows by Sylvester's theorem that H(x) is positive- definite a t x* (see different equivalent definitions for positive def- initeness that can be checked for, for example Fletcher (1987)). Thus x* = [I, lJT is a strong local minimum since the necessary and sufficient conditions in (1.24) are both satisfied a t x*.

-398 0 (iii) By inspection, if X I = 0 and x2 = 1 then H = [ 0 200

] and

the determinant of the Hessian matrix is less than zero and H is not positive-definite at this point. Since by Theorem 6.1.2, f (x) is convex over a set X if and only if H(x) is positive semi-definite for all x in X, it follows that f (x) is not convex in this case.

Problem 5.1.3

Determine whether or not the following function is convex:

Solution

The function f (x) is convex if and only if the Hessian matrix H(x) is positive semi-definite at every point x.

The principal minors are:

a1 = 8 > 0, a2 = I 1 = 12 > 0 and as = IH( = 114 > 0.

Page 172: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

154 CHAPTER 5.

Thus by Sylvester's theorem H(x) is positive-definite and thus f (x) is convex.

Problem 5.1.4

Determine all the stationary points of

Classify each point according to whether it corresponds to maximum, minimum or saddle point.

Solution

The first order necessary condition for a stationary point is that

from which it follows that 5x2(xl - 1) = 0.

Therefore either x2 = 0 or xl = 1 which respectively give:

Therefore the stationary p i n t s are: (0; O), (2; 0); (1; I), (1; -1).

The nature of these stationary points may be determined by substituting

their coordinates into the Hessian matrix H(x) =

and applying Sylvester's theorem.

The results are listed belov.

Page 173: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS

Point Hessian

Problem 5.1.5

155

Minors Nature of H Type

a1 < 0 negative-definite maximum

a2 > 0

a1 > 0 positive-definite minimum

a2 > 0

a1 = o indefinite saddle

a2 < 0

a1 = o indefinite saddle

a2 < 0

Characterize the stationary points of f (x) = x; + xz + 2x7 + 42% + 6 .

Solution

First determine gradient vector V f (x) and consider V f (x) = 0:

The solutions are: (0,O); (0, - i) ; (- $, 0) ; (- $, - ) .

To determine the nature of the stationary points substitute their coor- dinates in the Hessian matrix:

and study the principal minors a1 and a2 for each point:

Page 174: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

156 CHAPTER 5.

Pointx cwl a2 NatureofH(x) Type f(x)

(0,O) 4 32 positive-definite minimum 6

0 - ) 4 -32 indefinite saddle 15.48

(-$,o) -4 -32 indefinite saddle 7.18

( - - ) -4 32 negative-definite maximum 16.66

Problem 5.1.6

Minimize f (x) = xl - x2 + 2xf + 2x1x2 + x; by means of the basic Newton method. Use initial estimate for the minimizer x0 = [0, OjT.

Solution 1 + 4x1 + 2x2

Vf (x) = [ -1 + 2x1 + 2x2

First Newton iteration:

Therefore since H is positive-definite for all x, the sufficient conditions (1.24) for a strong local ~ninimum at x1 are satisfied and the global minimum is x* = xl.

Problem 5.1.7

Minimize f (x) = 32: - 2x1x2 + xz + x1 by means of the basic Newton method using x0 = (1, 1IT.

Page 175: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 157

First Newton iteration:

With vf(xl) = 0 and H positive-definite the necessary and sufficient conditions (1.24) are satisfied at x1 and therefore the global minimum is given by

x = -- -- * [ :! :IT and f(x*) = -0.125.

5.2 Line search descent methods

Problem 5.2.1

Minimize F(X) = -A cos X over the interval [o, 51 by means of the golden section method.

Solution

The golden ratio is r = 0.618 and F(0) = F ( 5 ) = 0.

Figure 5.2

Page 176: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

158 CHAPTER 5.

The first two interior points are: rLo = 0.9707 and r2Lo = 0.5999 with function values: F(0.9707) = -0.5482 and F(0.5999) = -0.4951.

Next new point= 0.5999 + pL1 = 1.200 with F(1.200) = -0.4350.

0.5999 0.9707 1.200 ~ / 2 L1 = 0.9707 1 I I I

F = -0.4951 -0.5482 -0.4350

Figure 5.3

Next new point= 0.5999 + r2L2 = 0.8292 with F(0.8292) = -0.5601.

0.5999 0.8292 0.9708 1.200 Lz = 0.6000 1 I I

F = -0.4951 -0.5601 -0.5482 -0.4350

Figure 5.4

Next new point= 0.5999 + r2L3 = 0.7416 with F(0.7416) = -0.5468.

0.6 0.7416 0.8292 0.9708 L3 = 0.3708 1 1 I I

F = -0.4951 -0.5468 -0.5601 -0.5482

Figure 5.5

Next new point= 0.7416 + rL4 = 0.8832 with F(0.8832) = -0.5606.

0.7416 0.8292 0.8832 0.9708 L4 = 0.2292 1 I 1 I

F = -0.5468 -0.5601 -0.5606 -0.5482

Figure 5.6

The uncertainty interval is now [0.8282; 0.97081. Stopping here gives A* 2 0.9 (midpoint of interval) with F ( A * ) 2 -0.5594 which is taken as the approximate minimum after only 7 function evaluations (indeed the actual A* = 0.8603 gives FlA*) = -0.5611).

Page 177: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 159

Problem 5.2.2 Minimize the function F(X) = (A - 1)(X + I ) ~ , where X is a single real variable, by means of Powell's quadratic interpolation method. Choose Xo = 0 and use h = 0.1. Perform only two iterations.

Solution

Set up difference table:

Turning point Am given by Am = F [ , , I ( X o + W - F [ , I . 2 F [ , 9 I

First iteration: =

1.3(0.1) + 0.89 = 0.392

2(1.3)

Second itemtion:

Problem 5.2.3 Apply two steps of the steepest descent method to the minimization of f (x) = XI - x2 + 22; + 2x1x2 + 2;. Use as starting point x0 = [o, 0IT.

Solution

-1 3- 2x1 + 2x2

Step I

x0 = [ ] and the first steepest descent direction u1 = -vf(xO) =

r -1 1

Page 178: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

160 CHAPTER 5.

Here it is convenient not to normalize.

The minimizer along the direction u1 is given by

To find XI, minimize with iespect to X the one variable function:

The necessary condition for a minimum is = 2X-2 = 0 giving X1 = 1 d'JF X and with &) = 2 > 0, XI indeed corresponds to the minimum of

F(X), and x1 = [ -: 1. Step 2

The next steepest descent clirection is

and

To minimize in the directicn of u2 consider

F(X) = f (xi+Xu2) = (-l-t A ) - ( 1 + ~ ) + 2 ( X - 1 ) 2 + 2 ( ~ ) ( ~ + 1 ) + ( ~ + 1 ) 2

and apply the necessary condition

d a ~ Xa This gives X2 = f and witk = 10 > 0 (minimum) it follows that

2 X =

Problem 5.2.4

Page 179: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 161

Apply two steps of the steepest descent method to the minimization of f (x) = ( 2 x 1 - ~ 2 ) ~ + ( x 2 + 1)' with x0 = [ q , 2lT.

Solution

Step 1

giving = 8 - 4(5) + 2

After normalizing, the first search direction is u1 = [ -: ] and x1 =

[ ; 1. Now minimize with respect to A:

d2 F The necessary condition = 0 gives XI = $, and with = ( X I ) > 0 ensuring a minimum, it follows that

Step 2

New normalized steepest descent direction: u2 = [ 1, thus x2 =

1 [ 2 - A 2 I e

In the direction of u 2 :

d2F Setting $ = 2 X - 2(3 - A) = 0 gives X2 = $ and since n(A2) > 0 a minimum is ensured, and therefore

Page 180: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

162 CHAPTER 5.

P r o b l e m 5.2.5

Apply the Fletcher-Reeves method to the minimization of

f ( x ) = (2x1 - x : ! ) ~ + ( x 2 + 1)2 with x0 = [ g , 2 l T .

Solut ion

The first step is identical to that of the steepest descent method given for Problem 5.2.4 above, wlth u1 = -go = [-12,0jT.

For the second step the Fletcher-Reeves search direction becomes

Using the data from the fi-st step in Problem 5.2.4, the second search direction becomes

and 2 1 X = x +x2u2:=

In the direction u2:

F(X) = (2(1 - 3X) - 2 + 6 ~ ) ~ + (2 - 6X + 1)2 = ( 3 - 6 ~ ) ~

and the necessary conditio 1 for a minimum is

dF - = -1,2(3 - 6X) = 0 giving X 2 = dX

and thus with = 3t > 0:

with

Page 181: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 163

Since g2 = 0 and H = [ -: -: ] is positive-definite, x2 = x*.

Problem 5.2.6

Minimize F(x) = XI - 22 + 2x7: + 2 ~ ~ x 2 + xg by using the Fletcher-Reeves method. Use as starting point xo = [0, 0IT.

Solution

Step 1

of (x) = [ ' + 421 + 2x2 ] , x0 = [o, OIT, -1 + 2x1 + 2x2

go= [-:I , u1 = -go = [-: ] and x1 = xO+Xlul = [ 2 ] Therefore

F(X) = f (xo + Xu') = -A - X + 2A2 - 2A2 + X2 = - 2X

and = 2A - 2 = 0 giving XI = 1 and with F*l = 2 > 0 enswing a minimum, it follows that x1 = [-I, 1IT.

Step 2

Thus

Page 182: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

164 CHAPTER 5.

This results in

and since H is positive-definite the optimum solution is x* = x2 =

[-I, 1.5IT.

Problem 5.2.7

Minimize f(x) = x: - 213.2 + 3x$ with x0 = (1, 2)T by means of the Fletcher-Reeves method.

Solution

Step 1

2x1 - x2 Since = [ ] = [ pl 1, u1 =

= 10, -11IT and 6x2 - X I

This results in

F(X) = 1 - l(2 - 11X) + 3(2 - 1 1 ~ ) ~

with dF - - dX

- 11 + 6(> - 11X)(-11) = 0 giving X 1 = Q.

d2F XI) Thus, with -&- > 0: x1 = [ l , 2IT + ;[o, -llIT = [I, tlT. Step 2

g l = [ ] and thus

giving

Page 183: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 165

Thus

11 d2iJ A2 and $f = $ (1 - TX) (-9) = 0 gives X2 = A, and with --&) > 0 gives x2 = [0,0IT. With g2 = 0, and since H is positive-definite for all x, this is the optimal solution.

Problem 5.2.8

Obtain the first updated matrix GI when applying the DFP method to the minimization of f (x) = 42: - 40x1 + x i - 12x2 + 136 with starting point x0 = 18, 9IT.

Solution

Factorizing f (x) gives f (x) = 4(x1 - 5)2 + (xz - 6)2 and V f (x) = 8x1 - 40

[ 2 x 2 - 1 2 1 .

Step 1

Choose Go = I, then for x0 = [8,91T, = [24,61T.

and x1 = x0 + X2u1 = [ 89~yJh ] The function to be minimized with respect to X is

The necessary condition for a minimum, = -8(24) (3-24X)+2(-6) (3- d2iJ X 6X) = 0 yields X I = 0.1308 and thus with .* > 0:

8 - 0.1308(24) 4.862 -1.10 x1 = [ 9 - 0.1308(6) ] = [ ] with g' = v f (xl) = [ 4.43 ] . 8.215

Page 184: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

166 CHAPTER 5.

The DFP update now requres the following quantities:

giving vlTyl = [-3.14, -0.7851 [ ~~~~ ] = 80.05

and ylTyl = [(25.10)~ + (- 1.57)'] = 632.47

to be substituted in the update formula (2.16):

Problem 5.2.9

Determine the first updated matrix G 1 when applying the DFP method to the minimization of f (x) = 32: - 2x1x2 + x; + x1 with x0 = [l, 1IT.

Solution

Step 1

go = [5, 0IT and Go = I w4ch results in

Page 185: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 167

For the function F ( X ) = 3 (1 -5X)2 - 2 (1 -5X )+ 1+ (1 - 5 X ) the necessary condition for a minimum is = 6(1 - 5X)(-5) + 10 - 5 = 0, which

d 2 ~ A 1 gives A1 = 3 = Q, and since +l > 0:

It follows that

and ylTyl = [-5, g]

In the above computations Go has been taken as Go = I.

Substituting the above results in the update formula (2.16) yields G 1 as

Page 186: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

168 CHAPTER 5.

follows:

Comparing GI with H-I = 1 f 1 shows that after only one iter- L ' '

4 4 1 ation a reasonable approximation to the inverse has already been ob- tained.

Problem 5.2.10

Apply the DFP-method tcl the minimization f(x) = xl - x2 + 2x7 + 2x1x2 + xi with starting p , in t xo = [0, 0IT.

Solution

Step 1

1 + 4x1 + 2x2 , ,=I= [i y ] a n d g o = [ ] = [-:I gives -1 + 2x1 + 2x2

u1 = -cog0 = -I [ -: ] = [ -: ] and thus

Thus F(X) = - 2h, and $f = 0 yields X1 = 1, and with d21$1) > 0:

x1 = [ -: ] and g1 = [ 1:; ] .

Page 187: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS

Now, using v1 = [ -: ] and y1 = g1 -go =

in the update formula (2.16), gives

vlvlT [ -; ] I-1 11 A 1 = - - - 1 -1

vlTyl 2 = 4 [ 1, and since Goy1 = yl:

Substitute the above in (2.16):

Step 2

1

New search direction u2 = - G ~ ~ ~ = - 2 Z

and therefore

Minimizing F(A) = -1 - (1 + A) + 2 - 2(1+ A) + (1 + implies

$f = -1 -2+2(1+X) = O which gives X2 = 4. @F Xz) Thus, since -&- > 0, the minimum is given by x2 =

L J

4 [ ] = [ -; ] with g2 = [ ] and therefore x2 is optimal.

Page 188: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

170 CHAPTER 5.

5.3 Standard met hods for constrained optimization

5.3.1 Penalty function problems

Problem 5.3.1.1

Determine the shortest dis-,ante from the origin to the plane xl + 2 2 + 5 3 = 1 by means of the penalty function method.

Solution

Notice that the problem is equivalent to the problem:

minimize f ( x ) = xq + xi t x; such that X I + x2 + x3 = 1.

The appropriate penalty fu:iction is P = X ~ + X ~ + X ~ + ~ ( X ~ + x 2 + z 3 - I ) ~ .

The necessary conditions a t an unconstrained minimum of P are

Clearly xi = x2 = x3, and it follows that X I = -p(3xl - l ) , i.e.

1 x l ( p ) = -L = 4 and lim x l ( p ) = g. 1 + 3 ~ 3+; p-+oo

Problem 5.3.1.2

Apply the penalty function method to the problem:

minimize f ( x ) = ( x l - 1)2 + ( x 2 - 2)2

such that

h(x ) = x2 - X l - 1 = 0 , ! / ( X ) = X I + x2 - 2 5 0, -21 5 0 , -x2 5 0 .

Page 189: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 171

Solution

The appropriate penalty function is

where p >> 0 and pj = p if the corresponding inequality constraint is violated, otherwise P j = 0.

Clearly the unconstrained minimum [ I , 2IT violates the constraint g(x) 5 0, therefore assume that x is in the first quadrant but outside the feasible region, i.e. pl = p. The penalty function then becomes

P = ( 2 1 - 1)2 + ( 2 2 - 2)2 + p(x2 - x1 - 1)2 + p(xl + $2 - 2)2.

The necessary conditions a t the unconstrained minimum of P are:

The first condition is x l ( 2 + 4p) - 2 - 2p = 0, from which it follows that

The second condition is x2(2 + 4p) - 4 - 6p = 0, which gives

6 p + 4 6 - c ; -- X Z ( P ) = - - and lim x2(p ) = 4. 4 p + 2 4 + $ P--00

The optimum is therefore x* = [i, :IT. Problem 5.3.1.3

Apply the penalty function method to the problem:

minimize f (x) = x i + 22% such that g(x) = 1 - X I - 2 2 5 0.

Solution

The penalty function is P = x: + 22% + P ( l - X I - ~ 2 ) ~ . Again the unconstrained minimum clearly violates the constraint, therefore assume

Page 190: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

172 CHAPTER 5.

the constraint is violated n the penalty function, i.e. P = p. The necessary conditions at an mconstrained minimum of P are

These conditions give xl = 2x2, and solving further yields

2P - 2p and thus lirn x2(p) = x; = 4, and also x2(p) = - - ---- 4 + 6 ~ p($+ti) P-00

Problem 5.3.1.4

Minimize f (x) = 2xq + xz .such that g(x) = 5 - X I + 3x2 5 0 by means of the penalty function approach.

Solution

The unconstrained solution, x! = x; = 0, violates the constraint, there- fore it is active and P becomes P = 2xT + x$ + p(5 - xl + 3 ~ 2 ) ~ with the necessary conditions fo - an unconstrained minimum:

It follows that 22 = -6xl and substituting into the first condition yields

lop - -- a ( ~ ) = - - lop and lim xl(p) = x; = = 0.2632. This + 38p p (! + .18) P-00

, . gives x; = -1.5789 with f(x*) = 2.6316.

5.3.2 The Lagrangian method applied to equality constrained problems

Problem 5.3.2.1

Page 191: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 173

Determine the minima and maxima of f (x) = 21x2, such that x: + x i = 1, by means of the Lagrangian method.

Solution

Here L = 21x2 + X ( X ~ + x i - 1) and therefore the stationary conditions are

F'rom the equality it follows that 1 = x; + x i = 4x?X2 + 4xiX2 = 4X2 giving X = f i.

This results in the possibilities:

1 1 Alternatively choosing X = - 3 gives 2 2 = XI, 2s: = 1, and XI = fz and the po&ibilities:

1 21 = -- 22 = 1 Jz' -Ji * f* = 4.

These possibilities are sketched in Figure 5.7.

Problem 5.3.2.2

Determine the dimensions, radius r and height h, of the solid cylinder of minimum total surface area which can be cast from a solid metallic sphere of radius ro.

Solution

Page 192: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

1 74 CHAPTER 5.

Figure 5.7

This problem is equivalent to: minimize f (r , h) = 2mh+ 27rr2 such that 7rr2h = i7rr;. The Lagrang,ian is L(r, h, A ) = -27rrh - 27rr2 + X(.rrr2h - $mi). The necessary conditions for stationary points are

The second condition giver x7rr2 = 27rr, i.e. X = $, and substituting in the first condition yields r = 5. Substituting this value in the equality gives 7rr22r - inri = 0, i . ~ .

Page 193: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 175

Problem 5.3.2.3

Minimize f (x) = -2xl - x2 - 10 such that h(x) = X I - 2x22 - 3 = 0. Show whether or not the candidate point, obtained via the Lagrangian method, is indeed a constrained minimum.

Solution

and the necessary stationary conditions are

Solving gives the candidate point A* = 2, x; = -i, x i = 3.03 with f (x*) = -16.185.

To prove that this point, x* = [3.03, -klT, indeed corresponds to a minimum requires the following argument. By Taylor, for any step A x compatible with the constraint, i.e. h(x) = 0, it follows that

f (x* + A x ) = f (x*) + vTf (X*)AX (5.3)

and h(x* + A x ) = h(x*) + v T h ( x * ) ~ x + ~ A X ~ V ~ ~ ( X * ) A X = 0.

The latter equation gives 0 = [I, -4221 [ ] +I [ A Z ~ A X ~ ] [ O ] [ Ax2 0 -4 Ax2

and (5.3) gives

Substituting (5.4) into (5.5) results in A f = -2(4x2Ax2 + AX;) - Ax2 and setting 2 2 = x; = gives A f (x*) = Ax2 - 4Ax$ - Ax2 = -4Ax; < 0 for all Ax2. Thus x* is not a minimum, but in fact a constrained maximum.

Page 194: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

176 CHAPTER 5.

Problem 5.3.2.4

Minimize f (x) = 3s : + 2% -t 221x2 + 6x1 + 2z2 such that h(x) = 2x1 - x2 - 4 = 0. Show that the candidate point you obtain is indeed a local constrained minimum.

Solution

The Lagrangian is given b)

and the associated necessal y stationary conditions are

7 30 x* 24 Solving gives xf = n, x$ = - ii, 11 '

To prove that the above point indeed corresponds to a local minimum, the following further argument is required. Here

is positivedefinite. Now for any step Ax compatible with the constraint h(x) = 0 it follows that :he changes in the constraint function and objective function are resp~xtively given by

and A f = v ~ , ~ ( x * ) A x + ~ A X ~ H ( X * ) A X . (5.7)

Also since V f (x*) = $ [ -f 1 , the first term in (5.7) may be written

as

vTf (X*)AX = #[2, -1) [ i:: ] = AX^ - A x 2 ) = 0 (from (5 .6 ) ) .

Page 195: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 177

Finally, substituting the latter expression into (5.7) gives, for any step Ax a t x* compatible with the constraint, the change in function value as A f = 0 + ~ A X ~ H ( X * ) A X > 0 since H is positive-definite. The point x* is therefore a strong local constrained minimum.

Problem 5.3.2.5

2 2 2 Maximize f = xyz such that (:) + (t) + (:) = 1.

Solution L = xyr + A ( ( z ) ~ + ( f ) 2 + ( : ) 2 - 1)

with necessary conditions:

Solving the above together with the given equality, yields A = - $ x y z , 1 1 x = -p, y = Jjband z = &c, and thus f* = jab^. 3 3

Problem 5.3.2.6

Minimize ( X I - 1)3 + (22 - 1)2 + 2x1x2 such that xl + x2 = 2 .

Solution

With h ( x ) = X I + xn - 2 = 0 the Lagrangian is given by

with necessary conditions:

Solving gives A* = -2 and the possible solutions

Page 196: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

178 CHAPTER 5.

Analysis of the solutions:

For any A x consistent with the constraint:

and

A f = vT f AX + ~AXH(Z)AX where j7 = x* + OAx, 0 5 0 5 1.

For both candidate points x* above, vT f = [2,2] and thus: VT f AX = 2(Ax1 + Ax2) = 0. Considering only AxH(x)Ax , it is clear that as A x -, 0, 5Z + x*, and A f > 0 if H(x*) is positive-definite.

2 I and thus a t x* = [I, 1IT, H = [ ] is not

positive-definite, and a t x* = [i, f l T , H = [ ] is positive-definite.

Therefore the point x* = [!j, !lT is a strong local constrained minimum.

Problem 5.3.2.7

Determine the dimensions of a cylindrical can of maximum volume sub- ject to the condition that the total surface area be equal to 247~. Show that the answer indeed corresponds to a maximum. (xl = radius, x2 = height)

Solution

The problem is equivalent to:

minimize f (xl, x2) = -7rz~xZ such that 27rx: + 2 ~ x 1 ~ 2 = Ao = 2 4 ~ .

Thus with h(x) = 27rxq + 27rx1x2 - 247~ = 0 the appropriate Lagrangian is

L(x, A) = -7r.c:~~ + ~ ( 2 7 ~ x : + 27rx1x2 - 247~)

with necessary conditions lor a local minimum:

and h(x) = 27rxf + 27rxlx2 - 247~ = 0.

Page 197: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 179

Solving gives xi = 2, xa = 4 with A* = 1 and f* = -16n.

Analysis of the solution:

At x* for change Ax compatible with the constraint it is required to show that

A f = vT f AX + ~ A X ~ H A X > o (5.8)

in the limit as Ax -+ 0 and h(x* + Ax) = 0.

For f (x) at x* the gradient vector and Hessian are given by

For satisfaction of the constraint: Ah = vThAx + ;AxTv2hAx = 0, i.e.

V ~ ~ A X = - $ A X ~ V ~ ~ A X (5.9)

and where v2h = . Now clearly Vh = -V f at the candi-

date point.

It therefore follows that v f T ~ x = - vT hAx = 4 AxTv2 hAx.

Substituting in (5.8):

~f = ; A X ~ ( V ~ ~ + H ) A X

From (5.9) in the limit as Ax + 0, Ax2 = -4Ax1, and thus

Af = - T [ ~ A X ? + 2 A x 1 ( - 4 A x 1 ) ] = TAX: > 0, as expected for a constrained local minimum.

Page 198: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

CHAPTER 5.

Problem 5.3.2.8

Minimize f ( x ) = x ~ + x ~ + . . .+xi such that h(x ) = x1+x2+. . .+xn-1 = 0.

Solution

The Lagrangian is

L = (xq + xi + . + a;) + A(xl + zz + . . + 2, - 1 )

with necessary conditions:

n

Thus x 2 x i + nA = 0 + A = -! and i=l

Therefore the distance = \'fO = 6 = h. Test for a minimum a t x*:

A f = vT f ( X * ) A X + 4 l l x T ~ ( % ) A x and vT f = [2x1, . . . , 2xnIT

and H(f) = 2 which is positive-definite, Af > 0 if

V f ( x * ) A x 2 0.

For A x such that Ah = ~ ' " h ~ x = 0, gives Axl + Ax2 + . + Ax, = 0.

Thus V T f ( x * ) A x = ~ [ x T , . . . , x ~ ] ~ A x = : ( A x ~ + A x ~ + . . . + A x ~ ) = 0 and therefore x* is indeed t constrained minimum.

Problem 5.3.2.9

Minimize xq + xi + xg such that X I + 3x2 + 2x3 - 12 = 0.

Solution L = X: + x% + X: + X(xl + 3x2 + 2x3 - 12)

Page 199: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 181

with solution: A* = -?, xi = f , X ; = y , xj = with f * = 10.286.

Test A f for all A x compatible with the constraint: A f = v f * A X + r 2 0 1

~ A X ~ H A X , ~f = [ 2 ~ ~ , 2x2, 2x3jT, H = 1 2 ] positive-definite and

Therefore

A f ( x * ) = ~ x ; A x ~ + 2 ~ 1 A x 2 + 2xjAx3 + ~ A X ~ H A X = ~ ; ( A X I + 3Ax2 + 2 h 3 ) + ~ A X ~ H A X = 0 + ;AX*HAX > 0.

Thus x* is indeed a local minimum.

5.3.3 Solution of inequality constrained problems via auxiliary variables

Problem 5.3.3.1

Consider the problem:

Minimize f ( x ) = x: + x; - 3 x 1 ~ ~ such that xy + xi - 6 5 0 .

Solution

Introducing an auxiliary variable 8 the problem is transformed to an equality constrained problem with xi + x$ - 6 + 8' = 0 . Since e2 2 0 the original equality constraint is satisfied for all values of 8 for which the new equality constraint is satisfied. Solve the new equality constrained problem with additional variable 8 by the Lagrange method:

L ( x , 8 , A ) = X: + xi - 3X1x2 + X(xi + x i - 6 + e2)

Page 200: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

182 CHAPTER 5.

with necessary conditions:

The third equation implies three possibilities:

(i) A = 0 , 8 # 0 , + - x ; = x $ = 0 , e2 = 6 and f(x) = O .

(ii) 8 = 0 , X # 0 , then frcm the first two conditions:

2X = -2 + 3zz = -2 + 3 9 j x: = x i . XI 2 2

Thus X I = f x z and from the last condition it follows that xi = kfi.

Choosing xf = -xz gives f(x*) = 15, and choosing x i = xz = kfi gives f (x*) = -3.

(iii) 8 = 0 and X = 0 lead:; to a contradiction.

The constrained minimum therefore corresponds to case (ii) above.

Problem 5.3.3.2

Minimize f (x) = 22: - 3s: - 2x1 such that x: + x; 5 1.

Solution

Introduce auxiliary variablt 8 such that x ~ + x ~ + ~ ~ = 1 . The Lagrangian is then

L ( x , 0) = 22: - 32% - 2x1 + ~ ( x f + xi -t 19~ - 1 )

and the necessary conditions for a minimum:

Page 201: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 183

and X! + X: + e2 = 1.

The possibilities are:

either (i) X = 0, 8 # 0 or (ii) X # 0, 8 = 0 or (iii) X = 0 and 0 = 0. Considering each possibility in turn gives the following:

1 (i) X = 0 j x i = i, x; = 0, e f 2 = and f (x; ,x ; ) = - 2 .

(ii) 0 = 0 + x: + xi = 1; 4x1 - 2 + 2Xxl = 0; x2(-6 + 2X) = 0 giving firstly for x; = 0 + x i = &l; A* = -1; -3 and f ' ( 1 , 0) = 0; f *(-1,O) = 4 , or secondly for A* = 3 it follows that x i = $ and

x; = fa, for which in both cases f * = -3.2.

(iii) leads to a contradiction.

Inspection of the alternatives above gives the global minima at x i = Q and x; = f & with f* = -3.2 and maximum a t x i = -1, x; = 0,

with f * = 4.

Problem 5.3.3.3

Maximise f (x) = -xi + 51x2 - 2s; + x1 + x2 such that 2x1 + 2 2 5 1, X I , 2 2 > 0 by using auxiliary variables.

Solution

2 Here solve the minimization problem: min f = x: - x lxz + 2x2 - xl - x2 X

such that 1 - 2x1 - x2 > 0.

Introducing an auxiliary variable it follows that 2 L(X, e ) = - x l ~ z + 2 ~ ; - X I - x2 + x ( 1 - 2x1 - x2 - e2)

with necessary conditions:

Page 202: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

184 CHAPTER 5.

The possibilities are:

(i) X = O a n d O # O ~ 2 ~ ~ 1 - x 2 - 1 = O a n d - x l + 4 x ~ - 1 = 0 , g i v i n g 5 x1 = 7 and x2 = $. S hstituting in the equality 1 - 2x1 - xz - O2 =

0 + O2 = --! < 0, which is not possible for 6 real.

(ii) 6 = 0 and X # 0 gifes 1 - 2x1 - 2 2 = 0 and solving together 4 with the first two necmary conditions gives x$ = A, xf = 17 and

1 A* = Ti.

5.3.4 Solution of inequality constrained problems via the Karush-Kuhn-Tucker conditions

Problem 5.3.4.1

Minimize f (x) = 3x1 + 2 2

subject to g(x) = xq + x i -- 5 5 0.

Solution

For L(x , A) = 3x1 + x2 + X:x: + x i - 5 ) the KKT conditions are:

From X(z7 + x i - 5 ) = 0 it follows that either X = 0 or xq + 2; - 5 = 0.

If X = 0 then the first two conditions are not satisfied and therefore X # 0 and we have the equality x: + x; - 5 = 0. It now follows that

3 2 1 = -- and ~2 = -' 2X 2X'

Substituting these values into the equality yields

Page 203: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 185

which implies that A* = +fi > 0. The optimal solution is thus x; = 1 10 -5. x i = -Ji with f(x*) = -S i .

Problem 5.3.4.2

Minimize x y + x i - 14x1 - 6 ~ 2 - 7 such that X I + x 2 5 2 and xl +2x2 5 3 .

Solution

2 2 L ( ~ 1 ~ ) = ~ ~ + ~ 2 - 1 4 ~ 1 - 6 ~ 2 - 7 + A ~ ( ~ ~ + ~ 2 - 2 ) + A z ( x l + 2 x 2 - 3 )

with KKT conditions:

d L - = 2x2 - 6 + A1 + 2A2 = 0 8x2

A1(x1 + 2 2 - 2) = 0

A2(x1 + 2x2 - 3 ) = 0

X 1 0.

The possibilities are:

(i) The choice X1 # 0 and Xz # 0 gives

x1+x2-2 = 0, x1+2x2-3 = 0 + x2 = 1, X I = 1, with f = -25.

Both constraints are satisfied but A2 = -8 < 0 with A2 = 20.

(ii) A1 # 0 and A2 = 0 gives xl + 2 2 - 2 = 0 and

2x1 - 14+ X 1 = 0

2x2 - 6 + A1 = 0

which yield

X 1 = 8 >_ 0 and xl = 3, x2 = -1, with f = -33

and both constraints are satisfied.

(iii) X I = 0 and X2 # 0 gives X I + 2x2 - 3 = 0 and it follows that

X2 = 4 1 0 and X I = 5, 2 2 = -1, with f = -45. However, the

Page 204: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

186 CHAPTER 5.

(iv) The final possibility .il = X 2 = 0 gives xl = 7 and x2 = 3 with f = -65 but both tht: first and second constraints are violated.

The unique optimum solution is therefore given by possibility (ii).

Problem 5.3.4.3

Minimize f (x) = x: + xi - 4x1 - 6x2 + 13 such that X I + x2 2 7 and

Solution

We note that in this case ,he KKT conditions are also sufficient since the problem is convex.

with KKT conditions:

The possibilities are:

(i) X 1 = A2 = 0 x , = 2, 2 2 = 3 with x1 +x2 = 2 + 3 < 7 and thus the first constraint is not satisfied.

(ii) A1 # 0, A2 = 0 2x1 - 4 - X 1 = 0, 2x2 - 6 - X 1 = 0 and X l - 2 2 $- 1 = 0.

The above implies - xl - x2 + 7 = 0, An = 0, X I = 2, 2 2 = 4, $1 = 3 with f = 2. This point satisfies all the K K T conditions and is therefore a local minimum.

(iii) X I = 0 and A2 # 0 2x1 - 4 + X 2 = 0, 2x2 - 6 - X 2 = 0, 2xl + 2x2 - 10 = 0, .c1 - 2 2 - 2 = 0 and solving yields

3 x1 = 5, X 1 = 0 , x2= and X2=-3.

Page 205: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 187

This point violates the condition that Xz > 0, and is therefore not a local minimum.

(iv) XI Z O a n d X 2 f O + 2x1-4-X1+X2=0,2x2-6-XI-X2=0, and -21 - 2 2 + 7 = 0, and XI - 2 2 - 2 = 0 =+ x2 = 5 , z1 = 9 2 2 with A1 = 2 and A2 = -3 < 0 which violates the condition X 2 0.

The unique optimum solution therefore corresponds to possibility (ii), i.e. XI = 3 , XT = 2, 2; = 4, X; = 0 with f * = 2.

Problem 5.3.4.4

Minimize x: + 2(x2 + 1)2 such that -XI + 2 2 = 2, -21 - xz - 1 5 0.

Solution

Here the Lagrangian is given by

with KKT conditions:

Possibilities:

(i) X # 0 * -21 - 2 2 - 1 = 0 and with -XI + 2 2 - 2 = 0 + 3 XI = - 2 , 2 2 = k. Substituting into the first two conditions give

X = 3 and /A = - i, and thus all conditions are satisfied.

(ii) X = O =$ 2x1-p=Oand4x2+4+p=0giving2x1+4x2+4=0 and with -XI + 2 2 - 2 = 0 it follows that 2 2 = 0 and x l = -2. However, -21 - 2 2 - 1 = 1 which does not satisfy the inequality.

Page 206: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

188 CHAPTER 5.

Case (i) therefore represents the optimum solution with f * = F. Problem 5.3.4.5

Determine the shortest distance from the origin to the set defined by:

Solution

with KKT conditions:

The possibilities are:

(i) A1 = A2 = 0 xi = x2 = 0 and both constraints are violated.

(ii) X1 = 0, X2 # 0 =+ 5 - 2x1 - x2 = 0 which gives 22 = 1, xl = 2 and X2 = 2 > 0, but I his violates constraint gl.

(iii) X1 # 0, X2 = 0 =+ cl = x2 = 2 and X 1 = 4 > 0.

(iv) X1 # 0, X2 # 0 => g1 = 0 and 92 = 0 which implies that X2 = -4 < 0, xl = 1, 22 = 3 which violates the non-negativity condition on Xz.

The solution x* therefore c xresponds to case (iii) with shortest distance = 6. Problem 5.3.4.6

Minimize sf + x$ - 2x1 - 2x2 + 2 such that -221 - x2 + 4 5 0 and -XI - 2x2 + 4 _< 0.

Page 207: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 189

Solution

with KKT conditions:

The fifth conditions give the following possibilities:

(i) X 1 = X2 = 0 =+ 2x1 - 2 = 0, 2x2 - 2 = 0, x1 = 2 2 = 1 , not valid since both gl and g2 > 0.

(ii) A1 = 0 and 92 = 0 =+ 2x1 - 2 - A2 = 0, 2x2 - 2 - 2Xz = 0 and - X I - 2x2 + 4 = 0 which yield

but, not valid since gl = 0.2 > 0.

(iii) gl = 0, = 0 =+ 2x1 - 2 - 2X1 = 0, 2x2 - 2 - X I = 0 which together with gl = 0 give xl = i , x2 = $ and X 1 = % > 0, but not valid since gz = 0.2 > 0.

Since (iv) satisfies all the conditions it corresponds to the optimum so- 2 lution with f (x*) = g.

Problem 5.3.4.7

Minimize (xl - q)' + (x2 - 2 ) z

Page 208: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

190 CHAPTER 5.

Is the point j7 = (g , g) a lccal minimum?

Solution

9 2 L ( x , x ) = (XI - ;i) + ( ~ ~ - ~ ) 2 + ~ ~ ( ~ ~ - ~ ~ ) + ~ g ( x ~ + ~ 2 - - 6 ) - ~ 3 x 1 - ~ 4 x 2

with KKT conditions:

A t X = ( ; , z ) , x l a n d x 2 + 0 + X 3 = X 4 = O a n d g 1 = ( P ) ~ - ) - O 3 9 15 2 4 -

andg2 = 5 + 3 - 6 = - 6 < 0 and thus X2 = 0.

Also from the first condition it follows that since 2 ($ - g ) + 2X13 = 0 that XI = 4 > 0. This value also satisfies the second condition.

Since Ti satisfies all the KKT conditions, and all the constraints are convex, it is indeed the global constrained optimum.

Problem 5.3.4.8

Minimize xy + x i - 8x1 - .Ox2 such that 3x1 + 2x2 - 6 I 0.

Solution

with KKT conditions:

The possibilities are:

Page 209: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 191

(i) X = 0 then xi = 4 and x2 = 5 giving 3x1 + 2x2 - 6 = 16 > 0 and thus this point is not valid.

Thus the solution corresponds to case (ii).

Problem 5.3.4.9

Minimize f (x) = x? - xz such that X I 1. 1, sf + xg < 26.

Solution

with KKT conditions:

Investigate the possibilities implied by the third and fourth conditions.

By inspection the possibility 1 - xl = 0 and x: + xz - 26 = 0 yields x1 = 1; x2 = 5. This gives X2 = $ > 0 and X1 = 2 > 0 which satisfies the last condition. Thus all the conditions are now satisfied. Since f (x) and all the constraints functions are convex the conditions are also sufficient and x* = [I, 5IT with f (x*) = -4 is a constrained minimum.

5.3.5 Solution of constrained problems via the dual problem formulation

Problem 5.3.5.1

Page 210: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

192 CHAPTER 5.

Minimize x: + 22; such that 2 2 2 - X I + 2.

Solution

The constraint in standard form is 2 - X I - 2 2 5 0 and the Lagrangian therefore:

L(x, A ) = x:: + 22; + X(2 - X l - x2).

For a given value of X the r tationary conditions are

dl , - - = 4 x 2 - A = 0 d5:2

giving xl = 4 and x2 = $. 3ince the Hessian of L, HI, is positive-definite the solution corresponds tc a minimum.

Substituting the solution into L gives the dual function:

h(X) = $ + $1 + A (2 - $ - 4) = -3X2 8 + 2X.

Since 9 < 0 the maximunl occurs where $ = - $ A +2 = 0, i.e. A* = i with

3 8 2 h(X*) = - g (,) + 2 (i) = g . Thus x* = (i, $)T with f (x*) = i. Problem 5.3.5.2

Minimize xq + xi such tha; 2x1 + x2 5 -4.

Solution L(x, A ) = xq + x; + X(2x1 + x2 + 4)

and the necessary conditions for a minimum with respect to x imply

dL and - = 2x2 + X = 0

ax2

giving X I = -A and 2 2 = --i. Since HL is positive-definite the solution is indeed a minimum with respect to x. Substituting in L gives

$ ( A ) = -:x2 + 4X.

Page 211: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 193

5 Since 3 = -1 < 0 the maximum occurs where = -;A + 4 = 0, i.e. where A* = > 0 and f(x*) = h(X*) = 16. 5

The solution is thus XI = -!; X; = -; which satisfies the KKT con&- tions.

Problem 5.3.5.3

Minimize 22: + xi such that XI + 2x2 2 1.

Solution L ( x , A) = 22: + x; + X(1 - 21 - 2x2)

with stationary conditions

giving xl = 2 and x2 = A. Since HL is positive definite the solution is a minimum and

d2h Since < 0 the maximum occurs where $ = -:A + 1 = 0, i.e. X = $ and since X > 0, X E D.

Thus

h(X*) = 4 - 2 9 - 8

1 4 and x; = . (V) = 1 9 , x; = 4 9 '

Test: f(x*) = 2 (i)2 + (i)2 = & + # = = 1 = h ( C ) .

Problem 5.3.5.4

Minimize (xl - 1)2 + (x2 - 2)2 such that XI - x2 5 1 and 21 + 22 < 2. Solution

L(x, A) = (21 - + (x2 - 2)2 + X1(x1 - 3C2 - 1) + X2(x1 + z2 - 2)

Page 212: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

194 CHAPTER 5.

and it follows that for a f.xed choice of X = [ X I , X21T the stationary conditions are:

which give xl = 1 - ; ( A 1 + A 2 ) and x2 = 2 + ; ( X I - X 2 ) . HL =

is positive-definite and therefore the solution is a minimum with respect to x. Substituting the solu.,ion into L gives

The necessary conditions for a maximum are

and since the Hessian of h with respect to X is given by Hh = I -; -! 1 L .I

which is negative-definite the solution indeed corresponds to a maxi- mum, with h(X*) = ; = f ( x * ) and thus zi = !j and xz = $.

5.3.6 Quadratic programming problems

Problem 5.3.6.1

Minimize f ( x ) = x: + xi 1- xl such that h l ( x ) = X I + 2 2 + x3 = 0 and hz(x) = xl + 2 x 2 +3x3 - I = 0.

Solution

Here, for the equality constrained problem the solution is obtained via the Lagrangian method wi ;h

Page 213: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 195

The necessary conditions for a minimum give:

Substituting into the equality constraints gives:

- (9 + "?*a + = 0 , i.e. + 2x2 = o

and

Solving for the X's: X 2 = -1 and X I = 2 .

The candidate solution is therefore: x: = - i , x i = 0 , xs = i. For the further analysis:

where V f = (2x1, 2x2, 2 ~ 3 ) ~ and thus V f (x*) = [ -1 ,0 , l J T and H =

[ % i ] is positive-definite.

For changes consistent with the constraints:

0 = Ahl = v T h l A x + ~ A X ~ V ~ ~ ~ A X , with V h l =

0 = Ah2 = v T h 2 A x + $ A x T v 2 h 2 A x , with V h 2 =

It follows that Ax1 + Ax2 + Ax3 = 0 and Axl + 2Ax2 + 3Ax3 = 0 giving - A x l + Ax3 = 0 and thus

Page 214: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

196 CHAPTER 5.

The candidate point x* above is therefore a constrained minimum.

Problem 5.3.6.2

Minimize f ( x ) = -221 - 6x2 + - 2 x 1 ~ ~ + 2 x ; such that 21 2 0, x2 > 0 and g l ( x ) = x i + X Z 5 2 a r d g 2 ( x ) = - 2 1 + 2 z 2 2 .

Solution

In matrix form the problen~ is:

2 - 2 minimize f ( x ) = ~ X ~ A X -I b T x where A =

subject to the specified lincar constraints. [ - 2 4 ] J = [I;]

First determine the unconstrained minimum:

Test for violation of constrirints:

Thus two constraints are \ iolated. Considering each separately active assume firstly that x l + q = 2. For this case L = f (x) + X g l ( x ) = 4 x T A x + b T x + X ( x l + 2 2 - 2 ) . The necessary conditions are:

2 -2 1 [ A :] [;I = [ I i.e solve [ - 4 I ] [:I =

1 1 0 1 0

D - This, by Cramer's rule, gives xl = = t; x2 = % = 6 5 and X = 2 - 9 > 0, and therefore

X I , 2 2 > 0 and - X I + 2x2 = --$ + 9 = < 2 .

Thus with all the constraints satisfied and X > 0 the KKT sufficient conditions apply and x* = [ i ; !] is a local constrained minimum. In- deed, since the problem is convex it is in fact the global minimum and no further investigation is required.

Page 215: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 197

Problem 5.3.6.3

Minimize f (x) = x: + 42; - 2x1 + 8x2 such that 5x1 + 2x2 < 4 and Xl, 2 2 2 0.

Solution

Try various possibilities and test for satisfaction of the KKT conditions.

(i) For the unconstrained solution: solve V f (x) = 0, i.e. 2x1 - 2 = 0, 8x2 + 8 = 0 giving xl = 1; 2 2 = -1 which violates the second non-negativity constraint.

(ii) Setting xz = 0 results in x l = 1 which violates the constraint 5x1 + 2x2 j 4.

(iii) Similarly, setting XI = 0 results in x2 = -1, which violates the second non-negativity constraint.

(iv) Setting 5x1 + 2x2 = 4 active gives

with necessary conditions:

Solving gives X = -&, XI = 1.92 and 2 2 = -0.98 < 0, which also violates a non-negativity constraint.

(v) Now setting 2 2 = 0 in addition to 5x1 + 2x2 - 4 = 0 results in an additional term -X2x2 in L giving

Solving together with the equalities gives x l = $; x2 = 0 which satisfies all the constraints and with X1 = & > 0 and Xz = & > 0.

Page 216: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

198 CHAPTER 5.

(vi) Finally setting xi = 0 and 5xl + 2x2 - 4 = 0 leads to the solution X I = 0, x2 = 2 but w th both associated X's negative.

Thus the solution corresponds to case (v) i.e.:

24 xl = 2, x2 = O with f * = -%.

Prob lem 5.3.6.4

1 -1 Minimize f (x) = ;xTAx $ bTx, A =

that

Solution

First determine the uncon. trained minimum:

X I , xz > 0, xl + x2 = 3 + 1 = 4 > 3 (constraint violation), and 2x1 - x2 = 5 > 4 (const,-aint violation). Now consider the violated constraints separately actbe.

Firstly for X I + 22 - 3 = 0 the Lagrangian is

L(x, A) = ~ : I ~ A X + bTx + X(xl + z 2 - 3)

with necessary conditions:

The solution of which is given by Cramer's rule:

This solution, however violates the last inequality constraint.

Page 217: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 199

Similarly, setting 2x1 - x2 - 4 = 0 gives the solution 1 :: 1 = 1 ::: 1 , which violates the first constraint.

Finally try both constraints simultaneously active. This results in a system of four linear equations in four unknowns, the solution of which (do self) is X I = 2;) xz = and XI = Xz = 0.11 > 0, which satisfies the

KKT conditions. Thus x* = [2$, $lT with f (x*) = 1.8.

5.3.7 Application of the gradient projection method

Problem 5.3.7.1

Apply the gradient projection method to the following problem:

minimize f (x) = x! + xg + xi - 2x1 such that 2x1 + 22 + x3 = 7; XI 4- x2 + 2x3 = 6 and given initial starting point x0 = [2,2, 1lT. Perform the minimization for the first projected search direction only.

Solution

It now follows that

1 -3 P = h [ -3 9 -: ] and Vf(xO) = [ 9 ]

1 -3

Page 218: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

200 CHAPTER 5.

This gives the projected ste:pest descent direction as -PV f = -A L 2;: 1 and therefore select the descent search direction as u1 = 11, -3, 1IT.

Along the line through x0 in the direction ul, x is given by

2+X x = + A [ -:I = [ 2 3 X ] and

1 + X

f(xO + Xu) = F(X) = (2+ A)' + (2 - 3X)'+ (1 +A)' - 2(2 + A).

For a minimum $f = 2(2 f A) - 6(2 - 3X) + 2(1 + A) - 2 = 0, which gives X I = A. Thus the next iterate is

4 T n . 1 - 2 + 4 . 2 - 1 2 ; 1 + n ) =(2n1nr x - ( 11' 11

Problem 5.3.7.2

Apply the gradient projection method to the minimization of

f (x) = x; + x! - 2xl - 42:: such that h(x) = x1 + 4x2 - 5 = 0

with starting point x0 = [l; 1IT.

Solution

The projection matrix is P = I - AT(AAT)-'A.

Here A = [I 41 and therefore

2x': - 2 With V f (xO) = [ ] = [ -; ] the search direction is

22, -

or more conveniently u' = [-:I

Page 219: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 201

Thus x = x0 + Xu1 = [ :;: ] for the line search. Along the search

line

1 with minimum occurring where = 34X - 2 = 0, giving XI =

16 -4 0.4706 Next, compute u2 = -PV~(X' ) = -A [ -4 ] [ ] = [ i:: 1. Since the projected gradient equals 0, the point x' is the optimum x*.

Problem 5.3.7.3

Apply the gradient projection method to the problem:

minimize f (x) = x: + xi + x i such that 21 + x2 + 23 = 1

with starting point x0 = [I, 0, 0IT.

Solution P = I - A'(AA~)-]A.

Here A = [l 1 11, AAT = 3, ( A A ~ ) - ' = 4 and thus

1 0 0 1 1 1 2 -1 -1

0 0 1 1 1 1 -1 -1

Since V f (x) = [ ::: ] it follows that v f (xO) = 2x3

The projected search direction is therefore given by u1 = -PV f (xO) = 2 -1 -1

- : [ I -1 -1 2 -:I [:I = i [ " - 2 =- : [ - : ] ,o rmorecon- - 1

veniently u1 = [ ] .

Page 220: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

202 CHAPTER 5.

The next point therefore .ies along the line x = x0 + X [ I ] =

[ ' YX] with

The minimum occurs whe~ e $ = -2(1 - 2X)2 + 2X + 2X = 0, giving X1 = 5 .

1 - 1 1 I . T Thus x - [$, 3 , S ] , vf(:cl) = [g, f , $ l T and

Since the projected gradient is equal to 0, the optimum point is x* = xl.

5.3.8 Application of the augmented Lagrangian method

Problem 5.3.8.1

2 Minimize f(x) = 6x: + 4 x 1 ~ 2 + 3x2 such that h(x) = X I + x2 - 5 = 0, by means of the augmented Lagrangian multiplier method.

Solution

Here L(x, A, p) = 6x: f 4x122 + 3x3 + X(xl+ 22 - 5) + p(xl + 2 2 - 5)2 with necessary conditions ^or a stationary point:

With p = 1 the above beclme 14x1 f 6x2 = 10 - X = 6x1 + 8x2 and it follows that

22 = 4x1 and xl = w.

Page 221: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 203

The iterations now proceed as follows.

Iteration 1: A1 = 0

h = X I + 2 2 - 5 = 0.2632 + 1.0526 - 5 = -3.6842

and for the next iteration

and it follows that x2 = 1.8283 and thus h = 0.4571 + 1.8283 - 5 = -2.7146 with the next multiplier value given by

X3 = X~ + ( 2 ) ( l ) h = -7.3684 + 2(-2.7146) = -12.7978.

Iteration 3:

Now increase p to 10; then xl = and x2 = 4x1.

Thus X I = '00+&7978 = 0.8812 and x2 = 3.5249 with

X4 = -12.7978 + 2(10)(-0.5939) = -24.675.

Iteration 4:

x1 = 100+24.675 = 0.9740, x2 = 3.8961 with A5 = -27.27 128

Iteration 5: heration 5 gives x l = 0.9943, 2 2 = 3.9772 with A6 = -27.84 and rapid convergence is obtained to the solution x* = [ l , 4IT.

5.3.9 Application of the sequential quadratic program- ming method

Problem 5.3.9.1

Minimize f ( x ) = 2%: + 2x8 - 2xlx2 - 4x1 - 6x2, such that h ( x ) = 22: - xz = 0 b y means of the SQP method and starting point x0 = [0, I]*.

Page 222: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

204 CHAPTER 5.

Solution

Here L(x , A ) = f ( x ) + Ah(x) and the necessary conditions for a station- ary point are

This system of non-linear 1:quations may be written in vector form as c ( X ) = 0, where X = [xl ,x2, A]'. If an approximate solution ( x k , A k ) to this system is available ;L possible improvement may be obtained by solving the linearized system via Newton's method:

k k [f]' ;;I;:] =-4. , A ,,

where is the Jacobian o l the system c.

In detail this linear system becomes

and for the approximation x0 = [0, 1IT and X0 = 0, the system may be written as

4 -2 [ - o - I -:I [;;I = - [!;I which has the solution sl := 1, s2 = -1 and AX = -8. Thus after the first iteration xi = 0 + 1 = 1, x!, = 1 - 1 = 0 and A1 = 0 - 8 = -8.

The above Newton step is equivalent to the solution of the following quadratic programming (QP) problem set up at x0 = [0, 1IT with XO = 0 (see Section 3.5.3):

minimize F(s ) = isT [ -; -:Is+[-6 -21s-4suchthat s 2 + l = 0 .

Page 223: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

EXAMPLE PROBLEMS 205

Setting s2 = -1 gives F ( s ) = F ( s l ) = 2s: - 4sl - 2 with minimum a t sl = 1. Thus after the first iteration xi = sl = 1 and xi = s2 + 1 = 0 , giving x 1 = [ l , O j T , which corresponds to the Newton step solution.

Here, in order to simplify the computation, A is not updated. Con- tinue by setting up the next QP: with f (x ' ) = -2; v f (x') = [O, -8IT; h ( x l ) = 2; v h ( x l ) = [4, -I]* the QP becomes:

minimize F ( s ) = - s [ !2 i2 ] s + 10, -81s - 2 2

with constraint

Substituting s2 = 4s1+ 2 in F ( s ) results in the unconstrained minimiza- tion of the single variable function:

Setting = 52sl - 4 = 0 , gives sl = 0.07692 and sz = 2.30769. Thus

x2 = [1 + SI, 0 + s2] = [1.07692,2.30769]~. Continuing in this manner yields x3 = [1.06854, 2.2834IT and x4 = [1.06914, 2.28575IT, which repre- sents rapid convergence to the exact optimum x* = [1.06904, 2.28569IT, with f(x*) = -10.1428. Substituting x* into the necessary conditions for a stationary point of L, gives the value of A* = 1.00468.

Page 224: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 225: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Chapter 6

SOME THEOREMS

6.1 Characterization of functions and minima

Theorem 6.1.1

If f (x) is a differentiable function over the convex set X C Rn then f (x) is convex over X if and only if

for all xl, x2 E X.

Proof

If f (x) is convex over X then by the definition (1.9), for all xl, x2 E X and for all X E [0, 11

Taking the limit as X 4 0 it follows that

Page 226: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

208 CHAPTER 6.

The directional derivative cln the left hand side may also, by (1.16) be written as

= vTf(x)(x2-xl) .

Substituting (6.5) into (6.4) gives f (x2) 2 f (xl) + vT f (x1)(x2 - xl), i.e. (6.1) is true.

Conversely, if (6.1) holds, t en for x = Xx2 + (1 - X)xl E X, X E [ O , 11:

Multiplying (6.6) by X and (6.7) by (1 - A ) and adding gives

since A ( X ~ - X) + (1 - X)(xL - X) = 0 and it follows that

f (x) = f ( X X ~ + (1 - X)xl) I x f (x2) + (1 - A) f (xl),

i.e. f (x) is convex and the ;heorem is proved. 0

(Clearly i ff (x) is to be stri~:tly convex the strict inequality > applies in

W ) . )

Theorem 6.1.2

If f (x) E C2 over an open convex set X C Rn, then f (x) is convex zf and only if H(x) is positive semi-definite for all x E X.

Proof

If H(x) is positive semi-defmite for all x E X, then for all xl, x2 in X it follows by the Taylor expansion (1.21) that

where 1 = x1+O(x2-xl)), tJ E [O, 11. Since H(x) is positive semi-definite it follows directly from (6.8) that f (x2) >_ f (xl) + vT f (x1)(x2 - xl) and therefore by Theorem 6.1.1 f (x) is convex.

Page 227: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 209

Conversely if f (x) is convex, then from (6.1) for all xl, x2 in X

Also the Taylor expansion (6.8) above applies, and comparison of (6.8) and (6.9) implies that

Clearly since (6.10) must apply for all x', x2 in X and since x is assumed to vary continuously with x', x2, (6.10) must be true for any 2 in X, i.e. H(x) is positive semi-definite for all x E X, which concludes the proof. 0

Theorem 6.1.3

I f f (x) 6 c2 over an open convex set X C_ Wn, then if the Hessian matrix H(x) is positive-definite for all x E XI then f (x) is strictly convex over X.

Proof

For any xl, x2 in X it follows by the Taylor expansion (1.21) that

where x = x1 + 6(x2 - xl ) , 6 E [0, 11. Since H(x) is positive-definite it follows directly that f (x2) > f (x') + vT f (xL)(x2 - x l ) and therefore by Theorem 6.1.1 f (x) is strictly convex and the theorem is proved.

Theorem 6.1.4

If f (x) E c2 over the convex set X Wn and if f (x) is convex over X I then any interior local minimum of f (x) is a global minimum.

Proof

If f (x ) is convex then by (6.1):

Page 228: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

210 CHAPTER 6.

for all xl, x2 E X. In partixlar for any point x2 = x and in particular x1 = x* an interior local m.nimum, it follows that

f(x) 1 j(x*) + vTf(x*)(x - x*). Since the necessary condit on vTf (x*) = 0 applies at x*, the above reduces to f (x) _> f (x*) and therefore x* is a global minimum. 0

Theorem 6.1.5

More generally, let f(x) b ? strictly convex on the convex set X, but f (x) not necessarily E C2 then a strict local minimum is the global minimum.

Proof

Let x0 be the strict local minimum in a 6neighbourhood and x* the global minimum, xO, x* E X and assume that x0 # x*. Then there exists an E, 0 < E < 6, with f(x) > f (xO) for all x such that I(x-xoll < E .

A convex combination of x) and x* is given by

Note that if X --+ 0 then 2 -+ xO.

As f (x) is strictly convex it follows that

f (2) < X f (x*) + (1 - A) f (xO) I f (xO) for all X E (0,l).

In particular this holds for X arbitrarily small and hence for X such that JIE(X) - xOJI < E . But a:; x0 is the strict local minimum

f (2) > f (xc) for all 2 with 11% - xOll < E .

This contradicts the fact tliat

that followed from the convexity and the assumption that x0 # x*. It follows therefore that x0 x* which completes the proof. 0

Page 229: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 211

6.2 Equality constrained problem

Theorem 6.2.1

In problem (3.5), let f and hj E C1 and assume that the Jacobian matrix [y] is of rank r . Then the necessary conditions for a bounded

internal local minimum x* of the equality constrained problem (3.5) is that x* must coincide with the stationary point (x*, A*) of the Lagrange function L, i.e. that there exits a A* such that

Note: Here the elements of the Jacobian matrix are taken as [Elii =

Proof

Since f and hj E c', it follows that for a bounded internal local mini- mum at x = x*, that

since df > 0 for dx and for -dx, for all perturbations d x which are consistent with the constraints, i.e. for all dx such that

Consider the Lagrange function

The differential of L is given by

Page 230: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

212 CHAPTER 6.

and it follows from (6.11) and (6.12) that at x*

for all dx such that h(x) = 0, i.e. h(x* + dx) = 0.

Choose Lagrange multipliers Xj, j = 1,. . . ,r such that at x*

The solution of this system provides the vector A*. Here the r variables, xj, j = 1,2, . . . , r, may be any appropriate set of r variables from the set xi, i = 1,2,. . . , n. A u;~ique solution for A* exists as it is assumed

dh(x*) that is of rank r. It follows that (6.13) can now be written as

Again consider the constraints hj(x) = 0, j = 1,2, . . . , r. If these equa- tions are considered as a :,ystem of r equations in the unknowns X I ,

2 2 , . . . , x,, these dependent unknowns can be solved for in terms of &+I, . . . , 2,. Hence the latter n - r variables are the independent variables. For any choice of these independent variables, the other dependent variables XI, . . . , x,, are determined by solving h(x) =

[hl(x), h2(x), . . . , h,(x)lT == 0. In particular x,+l to x, may by var- ied one by one at x*, and i-. follows from (6.15) that

and, together with (6.14) and the constraints h(x) = 0, it follows that the necessary conditions for a bounded internal local minimum can be written as

VxL(x*, A') = 0 and VAL(x*, A*) = 0.

Page 231: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 213

Note that (6.16) provides n + T equations in the n + r unknowns x!, xz, . . . , x i , A;, . . . , A:. The solutions of these, in general non-linear, equations will be candidate solutions for problem (3.5).

As a general rule, if possible, it is advantageous to solve explicitly for any of the variables in the equality constraints in terms of the others, so as to reduce the number of variables and constraints in (3.5).

Note on the existence of A*

Up to this point it has been assumed that A* does indeed exist. In Theorem 6.2.1 it is assumed that the equations

apply for a certain appropriate set of T variables from the set xi, i = 1,2, . . . , n. Thus X may be solved for to find A*, via the linear system

This system can be solved if there exists a r x T submatrix H, c [el *, evaluated at x*, such that H, is non-singular. This is the same as requiring that [El * be of rank r at the optimal point x*. This result is interesting and illuminating but, for obvious reasons, of little practical value. It does, however, emphasise the fact that it may not be assumed that multipliers will exist for every problem.

Theorem 6.2.2

If it is assumed that:

(i) f(x) has a bounded local minimum a t x* (associated with this minimum there exists a A* found by solving (6.16)); and

(ii) if X is chosen arbitrarily in the neighbourhood of A* then L(x, A) has a local minimum x0 with respect to x in the neighbourhood of x*,

Page 232: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

214 CHAPTER 6.

then for the classical equality constrained minimization problem (3.5), the Lagrange function L(x, A) has a saddle point at (x*, A*). Note that assumption (ii) implies tha;

and that a local minimum may indeed be expected if the Hessian matrix of L is positive-definite at I x*, A*).

Proof

Consider the neighbourhood of (x*, A*). As a consequence of assumption (ii), applied with A = A* (x0 = x*), it follows that

L(x, A*) 2 L(x*, A*) = f(x*) = L(x*, A). (6.17)

The equality on the right holds as h(x*) = 0. The relationship (6.17) shows that L(x, A) has a del:enerate saddle point at (x*, A*); for a regular saddle point it holds that

L(x , A* I _> L(x*, A*) 2 L(x*, A).

Theorem 6.2.3

If the above assumptions (i) and (ii) in Theorem 6.2.2 hold, then it follows for the bounded mi limum x*, that

f (x*) = max min L(x, A)) . ( X x

Proof

From (ii) it is assumed tha; for a given A the

min L(x, XI = h(A) (the dual function) X

exists, and in particular fol. A = A*

min L(x, A*) = h( A*) = f (x*) ( from (ii) xo = x*). X

Let X denote the set of p i n t s such that h(x) = 0 for all x E X. It follows that

min L(x, A) = min f (x) = f (x*) = h(A*) X E X X E X

Page 233: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 215

and

h ( A ) = min L ( x , A) 5 min L ( x , A) = min f ( x ) = f ( x * ) = h(A* ) . X X E X X E X

(6.19) Hence h(A* ) is the maximum of h(X) . Combining (6.18) and (6.19) yields that

f ( x * ) = h(A* ) = man h(A) = max X

which completes the proof.

6.3 Karush-Kuhn-Tucker theory

Theorem 6.3.1

In the problem (3.18) let f and g, E c', and given the existence of the Lagrange multipliers A*, then the following conditions have to be satisfied at the point x* that corresponds to the solution of the primal problem (3.18):

or in more compact notation:

These conditions are known as the Karush-Kuhn-lhcker (KKT) station- ary conditions.

Proof

Page 234: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

216 CHAPTER 6.

First convert the inequality 2onstraints in (3.18) into equality constraints by introducing the slack va-iables s,:

Define the corresponding L.igrange function

Assume that the solution t (3.18) with the constraints (6.21) is given by x*, s*.

Now distinguish between t t e two possibilities:

(i) Let sf > 0 for all i. In this case the problem is identical to the usual minimization problem with equality constraints which is solved using Lagrange multipliers. Here there are m additional variables sl, sz, . . . , s,. Hence the necessary conditions for the minimum are

As sf > 0, it also follows that gi(x*) < 0 and with the fact that Af = 0 this yields

Xfg,(x*) = 0.

Consequently all the conditions of the theorem hold for the case sf > 0 for all i.

(ii) Lets f = O f o r i = 1 , 2 ,..., p a n d s f > O f o r i = p + l , ..., m.

In this case the solut,on may be considered to be the solution of an equivalent minimization problem with the following equality constraints:

Page 235: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 217

Again apply the regular Lagrange theory and it follows that

As gi(x*) = 0 for i = 1,2, . . . , p it follows that

Obviously g,(x*) < 0, i = p + 1 ,..., m

and since gi(x*) = 0 for i = 1,2,. . . , p it follows that

However, no information concerning Xf, i = 1,. . . , p is available . This information is obtain from the following additional argument.

Consider feasible changes from x*, s* in all the variables X I , . . . , Xn, S l y s,. Again consider m df these as dependent vari- ables and the remaining n as independent variables. If p < n then s l , sz, . . . , s, can always be included in the set of independent vari- ables. (Find A* by putting the partial derivatives of L at x*, s* with respect to the dependent variables, equal to zero and solving for A*.)

As dsi > 0 (sf = 0) must apply for feasible changes in the indepen- dent variables s l , sz, . . . , s,, it follows that in general for changes which are consistent with the equality constraints, that

df 2 0, (See Remark 2 below.)

for changes involving s l , . . . , s,. Thus if these independent vari- ables are varied one at a time then, since all the partial derivatives of L with respect to the dependent variables must be equal to zero, that

Page 236: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

218 CHAPTER 6.

As dsi > 0 it follows that X i 2 0, i = 1,2, . . . , p. Thus, since it has already been proved that X t = 0, i = p + 1,. . . , m, it follows that indeed Xf > 0, i = I ,? , . . . , rn.

This completes the ploof of the theorem. 0

Remark 1 Obviously, if m equality constraint hk(x) = 0 is also pre- scribed explicitly, then sk loes not exist and nothing is known of the

i3L sign of X i as - does not exist.

a s k

Exercise Give a brief o~t l ine of how you will obtain the necessary conditions if the equality constraints hj(x) = 0, j = 1,2, . . . , r are added explicitly and L is defined l ~ y L = f + x~~ + CLTh.

Remark 2 The constrai,lts (6.21) imply that for a feasible change dsi > 0 there will be a chaxlge dx from x* and hence

If the condition dsi > 0 does not apply, then a negative change dsi < 0, equal in magnitude to the p'mitive change considered above, would result in a corresponding change -dx and hence df = vT f (-dx) 2 0, i.e. df = vT fdx 5 0. This is o 11y possible if vT f dx = 0, and consequently in this case df = 0.

Remark 3 It can be shown (not proved here) that the KKT stationary conditions are indeed necessary and suficient conditions for a strong constrained global minimum at x*, if f (x) and gi(x) are convex func- tions. This is not surprising for in the case of convex functions a local unconstrained minimum is also the global minimum.

Remark 4 Also note that if p > n (see possibility (ii) of the proof) then it does not necessarily follow that X i > 0 for i = 1 , 2 , . . . ,p, i.e. the KKT conditions do not necessarily apply.

6.4 Saddle point conditions

Two drawbacks of the Kunn-Tucker stationary conditions are that in general they only yield necessary conditions and that they apply only

Page 237: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 219

if f (x) and the gi(x) are differentiable. These drawbacks can be re- moved by formulating the Karush-Kuhn-'Ibcker conditions in terms of the saddle point properties of L ( x , A ) . This is done in the next two theorems.

Theorem 6.4.1

A point ( x * , A*) with A* > 0 is a saddle point of the Lagrange function of the primal problem (3.18) if and only if the following conditions hold:

1. x* minimizes L ( x , A*) over all x ;

Proof

If (x*, A*) is a saddle point, then L ( x * , A ) 5 L ( x * , A*) 5 L ( x , A*). First prove that if ( x * , A*) is a saddle point with A* 2 0, then conditions 1. to 3. hold.

The right hand side of the above inequality yields directly that x* min- imizes L ( x , A*) over all x and the first condition is satisfied.

By expanding the left hand side:

and hence T x(Xi - Ar)g,(x*) 5 0 for all A 2 0.

i=l

Assume that g,(x*) > 0. Then a contradiction is obtain for arbitrarily large Xi, and it follows that the second condition, gi (xC) 5 0, holds.

In particular for A = 0 it follows that

Page 238: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

220 CHAPTER 6.

But for A* 2 0 and g,(x*) :2 0 it follows that

The only way both inequakties can be satisfied is if

and as each individual term is non positive the third condition, Xrgi(x*) = 0, follows.

Now proceed to prove the :onverse, i.e. that if the three conditions of the theorem hold, then L(x*, A*) has a saddle point at (x*, A*).

The first condition implies that L(x*, A*) I L(x, A*) which is half of the definition of a saddle point The rest is obtained from the expansion:

m

L(xL,A) = f ( ~ * ) + Chig i (x*) . i=l

Now as g(x*) 5 0 and A > 0 it follows that L(x*, A) < f (x*) = m

L(x*, A*) since, from the third condition, Afgi(x*) = 0. i=l

This completes the proof 0:' the converse. 0

Theorem 6.4.2

If the point (x*, A*) is a saddle point, A* 2 0, of the Lagrange function associated with the primal problem, then x* is the solution of the primal problem.

Proof

If (x*, A*) is a saddle point t he previous theorem holds and the inequality constraints are satisfied at x*. All that is required additionally is to show that f (x*) 5 f (x) for all x such that g(x) 5 0. From the definition of a saddle point it follows th i t

Page 239: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 221

for all x in the neighbourhood of x*. As a consequence of condition 3. of the previous theorem, the left hand side is f (x*) and for any x such

m

that gi(x) < 0, it holds that h:s(x) < 0 and hence it follows that i=l

f (x*) 5 f (x) for all x such that g(x) 5 0, with equality at x = x*.

This completes the proof.

The main advantage of these saddle point theorems is that necessary conditions are provided for solving optimization problems which are nei- ther convex nor differentiable. Any direct search method can be used to minimize L(x, A*) over all x. Of course the problem remains that we do not have an a priori value for A*. In practice it is possible to obtain estimates for A* using iterative techniques, or by solving the so called dual problem.

Theorem 6.4.3

The dual function h(X) 5 f(x) for all x that satisfy the constraints g(x) < 0 for all X E D. (Hence the dual function yields a lower bound for the function f (x) with g(x) < 0.)

Proof

Let X = {xlg(x) < 0) , then

h(X) = min x L(x, A), X E D

The largest lower bound is attained at max h(X), X E D.

Theorem 6.4.4 (Duality Theorem)

The point (x*, A*) with A* 2 0 is a saddle point of the Lagrange function of the primal problem if and only if

Page 240: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

222 CHAPTER 6.

1. x* is a solution of the, primal problem;

2. A* is a solution of thc dual problem;

3. f ( x * ) = h(A*).

Proof

First assume that L ( x , A) ias a saddle point at (x*, A*) with A* 2 0 . Then from Theorem 6.4.2 :t follows that x* is a solution of the primal problem and 1. holds.

By definition: h(A) = minL(x ,A ) .

X

From Theorem 6.4.1 it fo1:ows that x* minimizes L ( x , A * ) over all x , thus

m

h(A*) = f ( x * ) + m ; g i ( x * ) i=l

and as X;gi(x8) = 0 it follows that

h(A*) = f ( x * )

which is condition 3.

Also by Theorem 6.4.1 g(:c*) < 0, and it has already been shown in Theorem 6.4.3 that

h(A) 5 f (x) for all x E {x lg (x ) 5 0 )

and thus in particular h(A) 5 f ( x * ) = h(A*) . Consequently h(A*) = max h(A) and condition 2. holds. X E D

Conversely, prove that if conditions 1. to 3. hold, then L ( x , A) has a saddle point at ( x * , A*), with A* 2 0, or equivalently that the conditions of Theorem 6.4.1 hold.

As x* is a solution of the primal problem, the necessary conditions

g(x * ) 5 0

hold, which is condition 2. of Theorem 6.4.1.

Page 241: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 223

Also, as A* is a solution of the dual problem, A* 3 0. It is now shown that x* minimizes L(x, A*).

Make the contradictory assumption, i.e. that there exists a minimizer 2 # x* such that

L(2 , A*) < L(x*, A*).

By definition: ...

h(A*) = L(P, A*) = f (2) + C A:gi(P) i=l

rn

< f ( x* ) + C Xfg*(x*) i= 1

but from condition 3.: h(A*) = f (x* ) and consequently

which contradicts the fact that A* 1 0 and g(x*) 5 0; hence 2 = x* and x* minimizes L(x , A*) and condition 1 . of Theorem 6.4.1 holds.

Also, as m

h(A*) = f ( x* ) + Afg,(x*)

and h(A*) = f (x*) by condition 3., it follows that:

As each individual term is non positive the third condition of Theorem 6.4.1 holds: Afgi(x*) = 0.

As the three conditions of Theorem 6.4.1 are satisfied it is concluded that (x* , A*) is a saddle point of L(x , A) and the converse is proved. 0

6.5 Conjugate gradient met hods

Let u and v be two non-zero vectors in Rn. Then they are mutually orthogonal if uTv = (u, v) = 0. Let A be an n x n symmetric positive-

Page 242: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

224 CHAPTER 6.

definite matrix. Then u and v are mutually conjugate with respect to A if u and A v are mutuallq orthogonal, i.e.

U ~ A V = (u, Av) = 0. (6.22)

Let A be a square matrix. A has an eigenvalue X and an associated eigenvector x if for x # 0 , A x = Ax. It can be shown that if A is positive-definite and symmetric and x and y are distinct eigenvectors, then they are mutually orthogonal, i.e.

Since (y, Ax) = (y, Ax) = ?(y,x) = 0, it follows that the eigenvectors of a positive-definite matrix A are mutually conjugate with respect to A. Hence, given any positive-definite matrix A then there exists at least one pair of mutually conjugate directions with respect to this matrix. It is now shown that a set of mutually conjugate vectors in Wn forms a basis and thus spans Rn.

Theorem 6.5.1

Let ui, i = 1,2, . . . , n be a set of vectors in Rn which are mutually conjugate with respect to a given symmetric positive-definite matrix A . Then for each x E Rn it ho ds that

n (ui , Ax) x = &u%here A, = . ,

i= 1 (ui, A d )

Proof

Consider the linear combin&ion C:=l aiu i = 0. Then

Since the vectors ui are mutually conjugate with respect to A it follows that

(uk, A(CY=, aiui)) = ak(uk, A U ~ ) = 0.

Since A is positive-definite and uk # 0 it follows that (uk, A U ~ ) # 0, and thus a k = 0, k = 1,2 , . . , n. The set ui, i = 1,2, . . . , n thus forms a linear independent set of vectors in Rn which may be used as a basis.

Page 243: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 225

Thus for any x in Rn there exists a unique set X i , i = 1 , 2 , . . . , n such that

n

x = c Xiu'. (6.23)

Now since the ui are mutually conjugate with respect to A it follows that (ui, AX) = ( x ' u ~ , A U ~ ) giving

(ui, Ax) X i =

(ui, Aua)

which completes the proof. 0

The following lemma is required in order to show that the Fletcher- Reeves directions given by u1 = and formulae (2 .12) and (2.13) are mutually conjugate. Here the notation gk = v f ( x k ) is also used.

Lemma 6.5.2

Let ul , u2, . . . , un be mutually conjugate directions with respect to A along an optimal descent path applied to f (x) given by (2.7). Then

(uk lg i )=O, k = 1 , 2 ,..., i; l < i l n .

Proof

For optimal decrease a t step k it is required that

( u k , v f ( x k ) ) = o , k = 1 , 2 ,..., i. Also

v f(xi) = AX' + b

and thus

Page 244: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

226 CHAPTER 6.

which completes the proof. 0

Theorem 6.5.3

The directions ui, i = 1,2 , . . . , n of the Fletcher-Reeves algorithm given in Section 2.3.2.4 are mutually conjugate with respect to A off (x) given by (2.7).

Proof

The proof is by induction.

First, u1 and u2 are mutually conjugate:

Now assume that u l , u2, . . , U' are mutually conjugate, i.e.

It is now required to prove that (uk, Aui+l) = 0 for k = 1,2, . . . , i .

First consider -(gk, g" for k = 1,2,. . . , i - 1:

-(gk, gi) = (-gk + pkuk, gi) from Lemma 6.5.2

= (uk--' , gi) = 0 also from Lemma 6.5.2.

Hence k i (g , g ) = 0 , k = 1,2 ,..., i - 1.

Now consider (uk, Aui+') for k = 1,2, . . . , i - 1:

(uk, Aui") = - (uk , Agi - piAui)

= -(uk , from the induction assumption

= -(ga, Auk)

= -'c gi, gk - gk-') = 0 from (6.25). Xk

Page 245: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 227

Hence (uk, Aui+') = 0 , k = 1 , 2 , . . . , i - 1. (6.26)

All that remains is to prove that (ui,Au'+') = 0 which implies a Pi such that

(ui, ~ ( - g ' + +,ui)) = 0 ,

Now

and

(ui, AU') = (u', A ( x i - xi--'))

= l ( u i , g i - &

& - I )

= - (u" gi-') from Lemma 6.5.2

= - (-gi-l + u i - l ,

= &llgi-q2.

Thus from (6.27)it is required that Pi = - ' l g i ' 1 2 which agrees with the Ilgi-1112

value prescribed by the Fletcher-Reeves algorithm. Consequently

(ui, Aui") = 0 .

Combining this result with (6.26) completes the proof.

6.6 DFP method

The following two theorems are concerned with the DFP algorithm. The first theorem shows that it is a descent method.

Page 246: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

228 CHAPTER 6.

Theorem 6.6.1

If Gi is positive-definite an3 Gi+1 is calculated using (2.16) then Gi+1 is also positive-definite.

Proof

If G, is positive-definite artd symmetric then there exists a matrix Fi such that FTF~ = Gi, and

where pi = Fix and qi = Fiyj.

If x # 8y3 for some scalar 8, it follows from the Schwartz inequality that the first term is strictly po~.itive.

For the second term it holds for the denominator that

(y3, J ) = (& - p, 79)

= -(gJ-l,vj)

= ~ ~ ( ~ j - l , ~ ~ . . ~ ~ j - l )

= x ~ + ~ ( ~ ' , Gigi) > 0 (&+I > 0 and Gi positive-definite)

and hence if x # fly3 the second term is non-negative and the right hand side is strictly positive.

Else, if x = 8y3, the first twm is zero and we only have to consider the second term:

This completes the proof.

The theorem above is impclrtant as it guarantees descent.

Page 247: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 229

For any search direction uk+':

For the DFP method, uk+' = - ~ ~ g ~ and hence at x k ( ~ = 0):

Consequently, if G k is positive-definite, descent is guaranteed at xk , for gk # 0-

Theorem 6.6.2

If the DFP method is applied to a quadratic function of the form given by (2.7), then the following holds:

(i) ( v i , A v j ) = 0 , 1 < i < j i k, k = 2 , 3 ,..., n

(ii) G k A v i = v i , l < i < k , k = 1 , 2 ,..., n

where the vectors that occur in the DFP algorithm (see Section 2.4.2.1) are defined by:

Proof

The proof is by induction. The most important part of the proof, the induction step, is presented. (Prove the initial step yourself. Property (ii) will hold for k = 1 if GIAvl = vl. Show this by direct substitution. The first case (vl , A V ~ ) = 0 in (i) corresponds to k = 2 and follows from the fact that (ii) holds for k = 1. That (ii) also holds for k = 2, follows from the second part of the induction proof given below.)

Assume that (i) and (ii) hold for k. Then it is required to show that they also hold for k + 1.

Page 248: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

230 CHAPTER 6.

First part: proof that (i) is true for k + 1. For the quadratic function:

Now consider:

The first term on the righ., is zero from the optimal descent property and the second term as a consequence of the induction assumption that (i) holds for k.

Consequently, with vk+l = x ~ + ~ u ~ + ~ = -Xk+,Gkgk it follows that

as a consequence of the induction assumption (ii) and the result above.

Hence with (vi, Avk+l) = I), property (i) holds for k + 1.

Second part: proof that (ii) holds for k + 1. Furthermore

(yk++', GkAvi) = (& k+l , vi) for i 5 k from assumption (ii) = (g"+l -g'C,vi)

= (h(xk+l - xk), vi)

= (AV~+' , vi) = (t i , $vk+') = 0 from the first part.

Using the update formula, it follows that . k + l (,k+l)TAVi - Gx+lAvi = G ~ A V ' + - (vk+l)Tyk+l (yk+')*Gkyk+l

= G ~ A V ' (bccause (vi, Avk+') = 0, (yk+l, GkAvi) = 0) - - vZ for i 5 k from assumption (ii).

Page 249: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SOME THEOREMS 231

It is still required to show that G ~ + ~ A v ~ + ' = vk+'. This can be done, as for the initial step where it was shown that G ~ A V ' = v l , by direct substitution. 0

Thus it was shown that v k , k = 1 , . . . , n are mutually conjugate with respect to A and therefore they are linearly independent and form a basis for Rn. Consequently the DFP method is quadratically terminating with gn = 0. Property (ii) also implies that Gn = A-'.

A final interesting result is the following theorem.

Theorem 6.6.3

If the DFP method is applied to f (x ) given by (2.7), then

Proof

In the previous proof it was shown that the vi vectors , i = 1 , 2 , . . . , n, are mutually conjugate with respect to A. They therefore form a basis in Rn. Also

,,iViT ViViT ViViT A i = - = . - -- VaTya , , z T ( ~ ~ - & l ) v i T ~ v i '

Let

Then n ViViT vzTAX .

B A ~ = C -AX = C -va = x i=l vET AvE i=l vET Ava

as a consequence of Theorem 6.5.1.

This result holds for arbitrary x E In and hence BA = I where I is the identity matrix, and it follows that

Page 250: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 251: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Appendix A

THE SIMPLEX METHOD FOR LINEAR PROGRAMMING PROBLEMS

A. 1 Introduction

This introduction to the simplex method is along the lines given by Chvatel (1983).

Here consider the maximization problem:

maximize Z = cTx such that Ax 5 b, A an m x n matrix

x , L O , i = 1 , 2 ,... ,n.

Note that AX 5 b is equivalent to C:=l ajixi 5 b j , j = 1 , 2 , ..., m.

Introduce slack variables x,+1, x,+2, ..., x,+, > 0 to transform the in-

Page 252: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

234 APPENDIX A.

equality constraints to equdity constraints:

where x = [ X I , x2, ..., x , + , ] ~ , b = [bl, b2, ..., bmIT, and X I , 2 2 , ..., xn > 0 are the original decision var,iables and xn+l, xn+2, . . . , xn+, > 0 the slack variables.

Now assume that bi 1 0 for all i, To start the process an initial feasible solution is then given by:

with x1 = 2 2 = . . = xn = 0.

In this case we have a feasiile origin.

We now write system (A.2) in the so called standard tableau format:

The left side contains the basic variables, in general # 0 , and the right side the nonbasic variables, .ill = 0 . The last line Z denotes the objective function (in terms of nonbasic variables).

Page 253: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 235

In a more general form the tableau can be written as

The 2 at the right serves to remind us that X B ~ 2 0 is a necessary condition, even when the values of X N ~ change from their zero values.

x g = vector of basic variables and XN = vector of nonbasic variables represent a basic feasible solution.

A.2 Pivoting for increase in objective function

Clearly if any c,vp > 0, then Z increases if we increase X N ~ , with the other x ~ i = 0, i # p. Assume further that clvp > 0 and clvp 1 CN,, i = 1,2, ..., n, then we decide to increase X N ~ . But X N ~ can not be increased indefinitely because of the constraint x g 1 0 in tableau (A.4). Every entry i in the tableau, with sip > 0, yields a constraint on Xhrp of the form:

Assume now that i = k yields the strictest constraint, then let xgk = 0 and Xhrp = dk. Now xBk(= 0) is the outgoing variable (out of the base), and Xhrp = dk(# 0) the incoming variable. The k-th entry in tableau (A.4) changes to

x l ~ p = dk - Zkixl\ii - 7ikkx~k ( A 4 i#k

with Tiki = aki/akp, Tikk = l/akp.

Replace X N ~ by (A.6) in each of the remaining m - 1 entries in tableau (A.4) as well as in the objective function 2. With (A.6) as the first entry this gives the new tableau in terms of the new basic variables:

X B ~ , X B ~ , . . . , X N ~ , ..., XBm (left side) # 0

Page 254: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

236 APPENDIX A .

and nonbasic variables:

X N ~ , X N ~ , . . . , x ~ k , . . . , X N ~ (right side) = 0.

As X N , has increased with lk , the objective function has also increased by cNpdk. The objective fu lction (last) entry is thus of the form

where the X N , now denotes the new nonbasic variables and c ~ i the new associated coefficients. Thir, completes the first pivoting iteration.

Repeat the procedure above until a tableau is obtain such that

The optimal value of the o1)jective function is then Z = Z* (no further increase is possible).

A.3 Example

maximize Z = 5x1 + 4x2 + 3 x 3 such that

This problem has a feasible origin. Introduce slack variables x 4 , x 5 and x 6 and then the first tableax is given by:

Here 5 > 0 and 5 > 4 > 3. Choose thus x 1 as incoming variable. To find the outgoing variable, calct.late the constraints on x l for all the entries

Page 255: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 237

(see right side). The strictest (s) constraint is given by the first entry, and thus the outgoing variable is x4. The first entry in the next tableau is

5 3 1 1 " 1 = 5 - 2x2 - 3x3 - 2x4 .

Replace this expression for xl in all other entries to find the next tableau:

After simplification we obtain the second tableau in standard format:

This completes the first iteration. For the next step it is clear that x3

is the incoming variable and consequently the outgoing variable is X6.

The first entry for the next tableau is thus 23 = 1 + 22 + 3x4 - 2x6 (3-rd entry in previous tableau).

Replace this expression for x3 in all the remaining entries of tableau 11. After simplification we obtain the third tableau:

In the last entry all the coefficients of the nonbasic variables are negative. Consequently it is not possible to obtain a further increase in Z by increasing one of the nonbasic variables. The optimal value of Z is thus Z* = 13 with

x ; = 2 ; x ; = 0 ; % ; = I .

Page 256: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

238 APPENDIX A .

Assignment A.l

Solve by using the simplex method:

maximize z = 3x1 + 2x2 + 4x3 such that

A.4 The auxiliary problem for problem with in- feasible origin

In the previous example it is possible to find the solution using the simplex method only because bi > 0 for all i and an initial solution x, = 0 , i = 1,2, ... n with x,+~ = bj, j = 1,2, ..., m was thus feasible, that is, the origin is a feasible initial solution.

If the LP problem does no., have a feasible origin we first solve the so called auxiliary problem:

Phase 1:

n~aximize W = - so

such that Cy=, ajixi - xo 5 bj, j = 1,2, ..., m (A.7) x , ? O , i = 0 , 1 , 2 ,... n

where XQ is called the new artificial variable. By setting x, = 0 for i = 1,2, ... n and choosing xcl large enough, we can always find a feasible solution.

The original problem c1ear.y has a feasible solution if and only if the auxiliary problem has a feasible solution with xo = 0 or, in other words, the original problem has a feasible solution if and only if the optimal value of the auxiliary protdem is zero. The original problem is now solved using the simplex mcthod, as described in the previous sections. This solution is called Pha3e 2.

Page 257: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 239

A.5 Example of auxiliary problem solution

Consider the LP:

maximize Z = x1 - xz + x3 such that

Clearly this problem does not have a feasible origin.

We first perform Phase 1:

Consider the auxiliary problem:

maximize W = -xo such that

Introduce the slack variables ~ 4 ~ x 5 and 26, which gives the tableau (not yet in standard form):

This is not in standard form as xo on the right is not zero. With xl = x2 = 23 = 0 then 24, xg, 2 6 2 0 if x0 2 max{-4; 5; 1).

Choose xo = 5 , as prescribed by the second (strictest) entry. This gives x5 = 0,x4 = 9 and x6 = 4. Thus xo is a basic variable (# 0) and xs a nonbasic variable. The first standard tableau can now be write as

Page 258: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

240 APPENDIX A .

Now apply the simplex method. From the last entry it is clear that W increases as 2 2 increases. l'hus x2 is the incoming variab2e. With the strictest bound 22 < 1 as prescribed by the third entry the outgoing variable is x6. The second tableau is given by:

The new incoming variable is x3 and the outgoing variable xo. Perform the necessary pivoting and :iimplify. The next tableau is then given by:

x2 = 1+ 0.75~1+ 0.75x3+ 0.25~~- 0 . 2 5 ~ ~ 2 0 so = 2- 0.2521- 1.25~3+ 0.25~~- 0 . 7 5 ~ ~ 2 0

As the coefficients of xo in the last entry is negative, no further increase in W is possible. Also, as X I = 0, the solution

no bound

x3 5 i ( s )

corresponds to a feasible solution of the original problem. This means that the first phase has been completed.

The initial tableau for Phase 2 is simply the above tableau I11 without

Page 259: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 24 1

the xo terms and with the objective function given by:

in terms of nonbasic variables.

Thus the initial tableau for the original problem is:

23 = 1.6 - 0.221 + 0 . 2 ~ ~ + 0 . 6 ~ 6 0 no bound x2 = 2.2 + 0.621 + 0 . 4 ~ ~ + 0 . 2 ~ 6 2 0 no bound

Perform the remaining iterations to find the final solution (the next incoming variable is x6 with outgoing variable x4).

Assignment A.2

Solve the following problem using the two phase simplex method:

maximize Z = 3x1 + 2 2 such that

A.6 Degeneracy

A further complication that may occur is degeneracy. It is possible that there is more than one candidate outgoing variable. Consider, for example, the following tableau:

Page 260: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

APPENDIX A .

With x3 the incoming variable there are three candidates, 24, x5 and x6, for outgoing variable. Chcose arbitrarily 24 as the outgoing variable. Then the tableau is:

53 = 0.5 - 0.524 > 0 1 no bound on XI

xg = - 2x1 4 4x2 3x4 2 0 5 O(S)

This tableau differs from tne previous tableaus in one important way: two basic variables have the value zero. A basic feasible solution for which one or more or the basic variables are zero, is called a degenerate solution. This may have bo:hersome consequences. For example, for the next iteration in our exam~)le, with x1 as the incoming variable and x5 the outgoing variable there is no increase in the objective function. Such an iteration is called a dege ~ e r a t e iteration. Test the further application of the simplex mefhod to the3 example for yourself. Usually the stalemate is resolved after a few deger~erate iterations and the method proceeds to the optimal solution.

In some, very exotic, cases it may happen that the stalemate is not re- solved and the method gets stuck in an infinite loop without any progress towards a solution. So called cycling then occurs. More information on this phenomenon can be o l tained in the book by Chvatel (1983).

A.7 The revised simplex method

The revised simplex methocl (RSM) is equivalent to the ordinary simplex method in terms of tab lea~s except that matrix algebra is used for the calculations and that the method is, in general, faster for large and sparse systems. For these reasons modern computer programs for LP problems always use the RSM.

Page 261: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 243

We introduce the necessary terminology and then give the algorithm for an iteration of the RSM. From an analysis of the RSM it is clear that the algorithm corresponds in essence with the tableau simplex method. The only differences occur in the way in which the calculations are performed to obtain the incoming and outgoing variables, and the new basic feasible solution. In the RSM two linear systems are solved in each iteration. In practice special factorizations are applied to find these solutions in an economic way. Again see Chvatel (1983).

Consider the LP problem:

maximize Z = cTx ( A 4

such that A x 5 b, A m x n and x 1 0.

After introducing the slack variables the constraints can be written as:

A x = b , x > O (A.9)

where x includes the slack variables.

Assume that a basic feasible solution is available. Then, if x~ denotes the m basic variables, and XN the n nonbasic variables, (A.9) can be written as:

where AB is an m x m and A N an m x n matrix.

The objective function Z can be written as

where c g and CN are respectively the basic and nonbasic coefficients.

It can be shown that AB is always non-singular. It thus follows that

Expression (A.12) clearly corresponds, in matrix form, to the first m entries of the ordinary simplex tableau, while the objective function

Page 262: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

244 APPENDIX A .

entry is given by

Denote the basis matrix A,3 by B, The complete tableau is then given

We now give the RSM in terms of the matrix notation introduced above. A careful study of the algorithm will show that this corresponds exactly to the tableau method which we developed by way of introduction.

A.8 An iteration of the RSM

(Chvatal, 1983)

Step 1: Solve the following system:

y T ~ = cg. This gives yT = C ~ B - ' .

Step 2: Choose an incoming column. This is any column a of AN such that yTa is less than the corresponding component of CN.

(See (A.13): Z = Z* + (c: - yTAN)xN)

If no such column exists, the current solution is optimal.

Step 3: Solve the following system:

Bd = a. This gives d = B-'a.

(From (A.13) it follows that xs = x; - dt 2 0 , where t is the value of the incoming variable).

Step 4: Find the largest vallle of t such that

X; - td 2 0.

Page 263: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

SIMPLEX METHOD FOR LP PROBLEMS 245

If no such t exists, then the problem is unbounded; otherwise a t least one component of xk - td will equal zero and the corresponding variable is the outgoing variable.

Step 5: Set the incoming variable, (the new basic variable), equal to t and the other remaining basis variables

and exchange the outgoing column in B with the incoming column a in AN.

Page 264: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization
Page 265: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Bibliography

J.S. Arora. Introduction to Optimal Design. McGraw-Hill, New York, 1989.

J.S. Arora, O.A. El-wakeil, A.I. Chahande, and C.C. Hsieh. Global op- timization methods for engineering applications: A review. Preprint, 1995.

J . Barhen, V. Protopopescu, and D. Reister. TRUST: A deterministic algorithm for global optimization. Science, 276:1094-1097, May 1997.

M.S. Bazaraa, H.D. Sherali, and C.M. Shetty. Nonlinear Programming, Theory and Algorithms. John Wiley, New York, 1993.

D.P. Bertsekas. Multiplier methods: A survey. Automatica, 12:133-145, 1976.

F.H. Branin. Widely used convergent methods for finding multiple so- lutions of simultaneous equations. IBM J. Research Development., pages 504-522, 1972.

F.H. Branin and S.K. Hoo. A method for finding multiple extrema of a function of n variables. In F.A. Lootsma, editor, Numerical Methods of Nonlinear Optimization, pages 231-237, London, 1972. Academic Press.

V. Chvatel. Linear Programming. W.H. F'reeman, New York, 1983.

G . Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, 1963.

W.C. Davidon. Variable metric method for minimization. R and D Report ANL-5990D (Rev.), Atomic Energy Commission, 1959.

Page 266: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

248 BIBLIOGRAPHY

E. De Klerk and J. A. Snlman. A feasible descent cone method for linearly constrained minhization problems. Computers and Mathe- matics with Applications, 28:33-44, 1994.

L.C.W. Dixon, J . Gomulka, and S.E. Hersom. Reflections on the global optimization problem. In L.C.W. Dixon, editor, Optimization in Ac- tion, pages 398-435, London, 1976. Academic Press.

L.C.W. Dixon, J . Gomulka, and G.P. Szego. Towards a global optimiza- tion technique. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization, pages 29-54, Amsterdam, 1975. North-Holland.

L.C.W. Dixon and G.P. Szego. The global optimization problem: An in- troduction. In L.C.W. Dixon and G.P. Szego, editors, Towards Global Optimization 2, pages 1- j 5, Amsterdam, 1978. North-Holland.

J. Farkas and K. Jarmai. Analysis and Optimum Design of Metal Struc- tures. A.A. Balkema, Rotterdam, 1997.

A.V. Fiacco and G.P. McCo -mick. Nonlinear Programming. John Wiley, New York, 1968.

R. Fletcher. An ideal penalty function for constrained optimization. Journal of the Institute 0.f Mathematics and its Applications, 15:319- 342, 1975.

R. Fletcher. Practical Methods of Optimization. John Wiley, Chichester, 1987.

R. Fletcher and C.M. Reeves. Function minimization by conjugate gra- dients. Computer Journal, 7:140-154, 1964.

C. Fleury. Structural weight optimization by dual methods of convex programming. International Journal for Numerical Methods in Engi- neering, 14:1761-1783, 1979.

J.C. Gilbert and J. Nocedal. Global convergence properties of conjugate gradient methods. SIAM Journal on Optimization, 2:21-42, 1992.

A.O. Griewank. Generalizec: descent for global optimization. Journal of Optimization Theory and Applications, 34: 11-39, 1981.

A.A. Groenwold and J.A. Snyman. Global optimization using dynamic search trajectories. Joun, a1 of Global Optimization, 24:51-60, 2002.

Page 267: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

BIBLIOGRAPHY 249

R.T. Haftka and Z. Giirdal. Elements of Structural Optimization. Kluwer, Dortrecht, 1992.

M.R. Hestenes. Multiplier and gradient methods. Journal of Optimiza- tion Theory and Applications, 4:303-320, 1969.

D.M. Himmelblau. Applied Nonlinear Programming. McGraw-Hill, New York, 1972.

W. Hock and K. Schittkowski. Lecture Notes in Economics and Math- ematical Systems. No 187: Test examples for nonlinear programming codes. Springer-Verlag, Berlin, Heidelberg, New York., 1981.

N. Karmarkar. A new polynomial time algorithm for linear program- ming. Combinatorica, 4:373-395, 1984.

W. Karush. Minima of functions of several variables with inequalities as side conditions. Master's thesis, Department of Mathematics, Uni- versity of Chicago, 1939.

J. Kiefer. Optimal sequential search and approximation methods under minimum regularity conditions. SIAM Journal of Applied Mathemat- ics, 5:105-136, 1957.

H.W. Kuhn and A.W. Tucker. Nonlinear programming. In J. Neyman, editor, Proceedings of the Second Berkeley Simposium on Mathemati- cal Statistics and Probability. University of California Press, 1951.

S. Lucidi and M. Piccioni. Random tunneling by means of acceptance- rejection sampling for global optimization. Journal of Optimization Theory and Applications, 62:255-277, 1989.

A.I. Manevich. Perfection of the conjugate directions method for un- constrained minimization problems. In Third World Congress of Structural and Multidisciplinary Optimization, volume 1, Buffalo, New York, 17-21 May 1999.

J.J. Mork and D. Thuente. Line search algorithms with guaranteed sufficient decrease. ACM ~ n s a c t i o n s on Mathematical Software, 20: 286-307, 1994.

J.A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308-313, 1965.

Page 268: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

250 BIBLIOGRAPHY

J . Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999.

P.Y. Papalambros and D.J. Wilde. Principles of Optimal Design, Model- ing and Computation. Caxbridge University Press, Cambridge, 2000.

M.J.D. Powell. An efficient method for finding the minimum of a func- tion of several variables without calculating derivatives. Computer Journal, 7:155-162, 1964.

S.S. Rao. Engineering Optimization, Theory and Practice. John Wiley, New York, 1996.

J.B. Rosen. The gradient projection method for nonlinear programming, Part I, Linear constraints SIAM Journal of Applied Mathematics, 8: 181-217, 1960.

J.B. Rosen. The gradient projection method for nonlinear programming, Part 11, Nonlinear constra nts. SIAM Journal of Applied Mathematics, 9:514-553, 1961.

F. Schoen. Stochastic tech~~iques for global optimization: A survey of recent advances. Journal of Global Optimization, 1:207-228, 1991.

J. A. Snyman. A gradient-only line search method for the conjugate gradient method applied to constrained optimization problems with severe noise in the objectrve function. Technical report, Department of Mechanical Engineering, University of Pretoria, Pretoria, South Africa, 2003. Also accepted for publication in the International Jour- nal for Numerical Methocs in Engineering, 2004.

J.A. Snyman. A new and d l namic method for unconstrained minimiza- tion. Applied Mathematical Modelling, 6:449-462, 1982.

J.A. Snyman. An improved version of the original leapfrog method for unconstrained minimi~ation. Applied Mathematical Modelling, 7: 216-218, 1983.

J.A. Snyman. Unconstrained minimization by combining the dynamic and conjugate gradient mc:thods. Quaestiones Mathematicae, 8:33-42, 1985.

Page 269: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

BIBLIOGRAPHY 251

J.A. Snyman. The LFOPC leap-frog algorithm for constrained optimiza- tion. Computers and Mathematics with Applications, 40:1085-1096, 2000.

J.A. Snyman and L.P. Fatti. A multi-start global minimization algorithm with dynamic search trajectories. Journal of Optimization Theory and Applications, 54:121-141, 1987.

J.A. Snyman and A.M. Hay. The spherical quadratic steepest descent method for unconstrained minimization with no explicit line searches. Computers and Mathematics with Applications, 42:169-178, 2001.

J.A. Snyman and A.M. Hay. The Dynamic-Q optimisation method: An alternative to SQP? Computers and Mathematics with Applications, 44: 1589-1598, 2002.

J.A. Snyman and N. Stander. A new successive approximation method for optimal structural design. AIAA, 32:1310-1315, 1994.

J.A. Snyman and N. Stander. Feasible descent cone methods for in- equality constrained optimization problems. International Journal for Numerical Methods in Engineering, 39:1341-1356, 1996.

J.A. Snyman, N. Stander, and W.J. Roux. A dynamic penalty function method for the solution of structural optimization problems. Applied Mathematical Modelling, 18:435-460, 1994.

N. Stander and J.A. Snyman. A new first order interior feasible direc- tion method for structural optimization. International Journal for Numerical Methods in Engineering, 36:4009-4026, 1993.

K. Svanberg. A globally convergent version of MMA without line search. In G.I.N. Rozvany and N. Olhoff, editors, Proceedings of the 1st World Conference on Structural and Multidisciplinary Optimization, pages 9-16, Goslar, Germany, 1995. Pergamon Press.

K. Svanberg. The MMA for modeling and solving optimization prob- lems. In Proceedings of the 3rd World Conference on Structural and Multidisciplinary Optimization, pages 301-302, Buffalo, New York, 17-21 May 1999.

H. Theil and C. van de Panne. Quadratic programming as an extension of conventional quadratic maximization. Management Science, 7: 1- 20, 1961.

Page 270: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

252 BIBLIOGRAPHY

A. Torn and A. Zilinskas. Clobal Optimization: Lecture Notes in Com- puter Science, volume 350. Springer-Verlag, Berlin, 1989.

G.N. Vanderplaats. Numerical Optimization Techniques for Engineering Design. Vanderplaats R 6.5 D, Colorado Springs, 1998.

G.R. Walsh. Methods of Optimization. John Wiley, London, 1975.

D.A. Wismer and R. Chattergy. Introduction to Nonlinear Optimization. A Problem Solving Approzch. North-Holland, New York, 1978.

R. Zielinsky. A statistical estimate of the structure of multiextremal problems. Math. Program., 21:348-356, 1981.

Page 271: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

Index

active constraints, 71 active set of constraints, 78 approximation methods, 97

Dynamic-Q, 119 SQSD, 107

asymptotes at a saddle point, 68

augmented Lagrangian function definition, 91

augmented Lagrangian method example problem, 202

auxiliary variables example problems, 181 for inequality constraints, 66

central finite differences to smooth out noise, 134

characterization of functions theorems, 207

characterization of minima theorems, 209

classical steepest descent method revisited, 106

compromised solution no feasible solution, 123

conjugate gradient methods, 43 Fletcher-Reeves directions, 47 ETOPC implementation, 127 Polak-Ribiere directions, 47 theorems, 223

constrained optimization

augmented Lagrangian method, 89

classical Lagrangian function, 62

classical methods, 62 gradient projection method,

8 1 modern methods, 81 multiplier methods, 89 penalty function methods, 57 standard methods, 57

constraint functions definition, 4

constraint qualification, 71 contour representation

constraint functions, 10 function of two variables, 7 geometrical solution, 151 of constrained problem, 11

convergence of the SQSD method, 108 region of, 146 theorem for mutually conju-

gate directions, 45

convergence criteria, 42 convexity, 16

convex functions, 18 convex sets, 17 test for, 20

design variables, 1

Page 272: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INDEX

DFP method characteristics of, 51 theorems, 227

difficulties in practical optinliza- tion, 97

directional derivative definition, 21

duality dual function, 74 dual problem, 74 example problems, 191 primal problem, 70 proof of duality theorem, 221 saddle point theorem, 7 3 theorem, 74

duality theorem practical implication, 7:'

dynamic trajectory method basic dynamic model, 100 LFOP for unconstrained prob-

lems, 101 LFOPC for constrained prob-

lems, 101 the leap-frog method, 100

Dynamic-Q optimization algxithm introduction, 119 numerical results, 123

equality constrained problems theorems, 211

equality constraint function$ definition, 1

feasible contour, 10 feasible descent cone method, 99 feasible region, 10 finite differences

gradient determination, 43

geometrical solution

to optimization problem, 151 global convergence

basic assumption, 146 Bayes theorem, 147 Theorem, 146

global minimization procedure bouncing ball trajectories, 144 dynamic trajectories, 142

global minimum test for, 24

global optimization dynamic search trajectories,

139 general problem, 140 global stopping criterion, 145 modified bouncing ball tra-

jectory, 143 numerical results, 148 Snyman-Fatti trajectory method,

141 golden section method

basic algorithm, 37 example problem, 157 one-dimensional line search,

36 gradient projection method

example problems, 199 extension to non-linear con-

straints, 84 introduction, 81

gradient vector definition, 18

gradient-based met hods advantages of, 3

gradient-only line search for conjugate gradient meth-

ods, 126, 133 lemmas, 129 numerical results, 135

Page 273: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INDEX

Hessian matrix approximation to, 3 1 definition, 20 determination of stationary

points, 155 diagonal curvature elements,

107 ill-conditioned, 59

inequality constrained problems classical KKT approach, 70

inequality constraint functions definition, 1

inequality constraints as equality constraints, 66

Jacobian of active constraints, 71 rank condition, 62

Karush-Kuhn-Tucker conditions as sufficient conditions, 70 example problems, 184 for inequality constraints, 70 theorems, 215

Lagrangian method example problems, 172 for equality constrained prob-

lems, 63 leap-frog method

characteristics, 100 dynamic trajectory algorithm,

100 line search

golden section method, 36 gradient-only method, 128 Powell's quadratic interpo-

lation algorithm, 38 line search descent methods

conjugate gradient method, 43

first order methods, 40 general structure, 34 method of steepest descent,

40 second order methods, 49 unconstrained minimization,

33 linear programming

auxiliary problem, 238 degeneracy, 241 graphical representation, 16 pivoting, 235 revised simplex method, 242 simplex method, 233 special class of problems, 14

local minimum necessary conditions, 26 sufficient conditions, 26

mathematical modelling four steps in, 4

mathematical optimization definition, 1 history, 2 nonlinear programming, 2 numerical optimization, 2

mat hematical prerequisites, 16 maximization, 14 modern methods

gradient projection method, 81

multiplier methods, 89 sequential quadratic program-

ming, 93 multi-variable function

local characterization, 24 multiplier method

Page 274: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INDEX

extension to inequality con- straints, 91

multiplier met hods augmented Lagrangian neth-

ods, 89 mutually conjugate directiol~s, 45

necessary and sufficient conditions for strong local minimurn, 26

necessary conditions for equality constrained min-

imum, 62 Newton's method

difficulties with, 29 for computing local minimum,

29 graphical representatio 1 of,

30 modified algorithm, 50 quasi-Newton methods, 51 relevance to quadratic pro-

gramming, 93 noisy objective function

numerical results, 136

objective function definition, 1, 4

one-dimensional line search, 35 optimization concepts

basics, 6 optimization problem

formulation of, 4 optimum solution

indirect method for con.put- ing, 28

optimum vector for optimum function value,

2

penalty function

formulation, 58 illustrative example, 58

penalty function method constrained optimization, 57 example problems, 170 SUMT, 59

Powell's quadratic interpolation met hod

example problem, 159 formal algorithm, 39 one-dimensional line search,

38

quadratic function general n-variable case, 20 with linear equality constraints,

66 quadratic programming

definition, 77 example problems, 194 method of Theil and Van de

Panne, 78 quadratic termination

property of, 44 quasi-Newton methods

BFGS algorithm, 52 DFP algorithm, 51

Rosenbrock function, 31

saddle point duality theory, 73 definition, 24 Lagrangian function, 64 stationary conditions, 218

scaling of design variables, 15 Schwartz's inequality, 41 sequential quadratic programming

example problem, 203 SQP, 93

Page 275: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization

INDEX

sequential unconstrained minimiza- tion technique

SUMT, 59 spherical quadratic steepest de-

scent met hod numerical results, 113 SQSD algorithm, 107

stationary points analysis of, 28 determination of, 154

steepest descent method classical approach, 40 orthogonality of search direc-

tions, 42 spherical quadratic approxi-

mation approach, 105 stochastic algorithms

dynamic search trajectories, 141

sufficient condition for constrained local minimum,

64 SUMT algorithm

details of, 60 simple example, 60

Sylvester's theorem, 153

Taylor expansion, 26 test functions

unconstrained classical prob- lems, 53

used for SQSD algorithm, 117 trajectory methods, 97

unconstrained minimization line search descent methods,

33 local and global minima, 23 one-dimensional, 6 saddle points, 24

unimodal function definition, 36 representation of, 36

University of Pretoria list of algorithms, 98 optimization research, 98

zero-order methods genetic algorithms, 53 Nelder-Mead algorithm, 52 simulated annealing, 53

Page 276: PRACTICAL MATHEMATICAL OPTIMIZATION · 3.2 Classical methods for constrained optimization problems ..... 62 3.2.1 Equality ... years, in teaching Practical Mathematical Optimization