constrained nonlinear programmingpitt.edu/~jrclass/opt/notes4.pdf · constrained nonlinear...
TRANSCRIPT
149
CONSTRAINED NONLINEAR PROGRAMMING
We now turn to methods for general constrained nonlinear programming. These may
be broadly classified into two categories:
1. TRANSFORMATION METHODS: In this approach the constrained nonlinear
program is transformed into an unconstrained problem (or more commonly, a series
of unconstrained problems) and solved using one (or some variant) of the algorithms
we have already seen.
Transformation methods may further be classified as
(1) Exterior Penalty Function Methods
(2) Interior Penalty (or Barrier) Function Methods
(3) Augmented Lagrangian (or Multiplier) Methods
2. PRIMAL METHODS: These are methods that work directly on the original
problem.
We shall look at each of these in turn; we first start with transformation methods.
150
Exterior Penalty Function Methods
These methods generate a sequence of infeasible points whose limit is an optimal
solution to the original problem. The original constrained problem is transformed into
an unconstrained one (or a series of unconstrained problems).The constraints are
incorporated into the objective by means of a "penalty parameter" which penalizes
any constraint violation. Consider the problem
Min f(x)
st gj(x) ≤ 0, j=1,2,…,m, hj(x) = 0, j=1,2,…,p, x∈Rn
Define a penalty function p as follows:
�(�) = ��Max{0,��(�)}��
�
���
+ ��ℎ�(�)��
�
���
where α is some positive integer. It is easy to see that
If gj(x) ≤ 0 then Max{0, gj(x)} = 0
If gj(x) > 0 then Max{0, gj(x)} = gj(x) (violation)
Now define the auxiliary function as
P(x)= f(x) + µ�(x),where µ >> 0.
Thus if a constraint is violated, �(x)>0 and a "penalty" of µ�(x) is incurred.
151
The (UNCONSTRAINED) problem now is:
Minimize P(x)= f(x) + µ�(x)
st x∈Rn
EXAMPLE:
Minimize f(x)=x, st g(x)=2-x ≤ 0, x ∈R (Note that minimum is at x* =2)
Let �(x)= Max[{0,g(x)}]2 ,
i. e. ,�(�) = �(2 − �)� if� < 20 if� ≥ 2
If �(x)=0, then the optimal solution to Minimize �(�) + ��(�) is at x*= -∞ and this is
infeasible; so �(�) + ��(�) = � + �(2 − �)�
Minimize P(�) = �(�) + ��(�) has optimum solution�∗ = 2 −�
��.
����: As� → ∞, �2 −1
2�� → 2 (= �∗)
Penalty Function Auxiliary Function
-3 -2 -1 0 1 2 -1 0 1 2 3
µ1p
µ2p
p
f+µ2p
f+µ1p
µ1=0.5
µ2=1.5
µ1=0.5
µ2=1.5
152
In general:
1. We convert the constrained problem to an unconstrained one by using the
penalty function.
2. The solution to the unconstrained problem can be made arbitrarily close to that
of the original one by choosing sufficiently large µ.
In practice, if µ is very large, too much emphasis is placed on feasibility. Often,
algorithms for unconstrained optimization would stop prematurely because the step
sizes required for improvement become very small.
Usually, we solve a sequence of problems with successively increasing µ values; the
optimal point from one iteration becomes the starting point for the next problem.
PROCEDURE: Choose the following initially:
1. a tolerance ε,
2. an “increase” factor β,
3. a starting point x1, and
4. an initial µ1.
At Iteration i
1. Solve the problem Minimize f(x)+µip(x); st x∈X (usually Rn). Use xi as the
starting point and let the optimal solution be xi+1
2. If µip(xi+1) ≤ ε then STOP; else let µi+1 =(µi)β and start iteration (i+1)
153
EXAMPLE: Minimize f(x) = x12 + 2x2
2
st g(x) = 1–x1 –x2 ≤ 0; x∈R2
Define the penalty function
�(�) = �(1 − �� − ��)� if�(�) > 00 if�(�) ≤ 0
The unconstrained problem is
Minimize x12 + 2x2
2+ µ�(x)
If �(x)=0, then the optimal solution is x* =(0,0) INFEASIBLE!
∴ �(x) = (1–x1 –x2)2, ⇒ P(x) = x12 + 2x2
2+ µ [(1–x1 –x2)2], and the necessary
conditions for the optimal solution (∇P(x)=0) yield the following:
�P
���= 2�� + 2�(1 − �� − ��)(−1) = 0,
�P
���= 4�� + 2�(1 − �� − ��)(−1) = 0
Thus��∗ =
2�
(2 + 3�)and��
∗ =�
(2 + 3�)
Starting with µ =0.1, β=10 and x1=(0,0) and using a tolerance of 0.001 (say), we have
the following:
Iter. (i) µi xi+1 g(xi+1) µip(xi+1)
12345
0.11.0101001000
(0.087, 0.043)(0.4, 0.2)
(0.625, 0.3125)(0.6622, 0.3311)
(0.666, 0.333)
0.870.40
0.06250.00670.001
0.07570.16
0.0390.004490.001
Thus the optimal solution is x*= (2/3, 1/3).
154
INTERIOR PENALTY (BARRIER) FUNCTION METHODS
These methods also transform the original problem into an unconstrained one;
however the barrier functions prevent the current solution from ever leaving the
feasible region. These require that the interior of the feasible sets be nonempty, which
is impossible if equality constraints are present. Therefore, they are used with
problems having only inequality constraints.
A barrier function B is one that is continuous and nonnegative over the interior
of {x | g(x)≤0}, i.e., over the set {x | g(x)<0}, and approaches ∞ as the boundary is
approached from the interior.
Let �(�) ≥ 0 if y<0 and lim�→��
�(�) = ∞
Then�(�) = ��[��(�)
�
���
]
Usually,
�(�) = −∑�
��(�)���� or �(�) = −∑ ���(−��(�))�
���
In both cases, note that lim��(�)→��
�(�) = ∞
The auxiliary function is now
f(x) + µB(x)
where µ is a SMALL positive number.
155
Q. Why should µ be SMALL?
A. Ideally we would like B(x)=0 if gj(x)<0 and B(x)=∞ if gj(x)=0, so that we never
leave the region{x | g(x) ≤ 0}. However, B(x) is now discontinuous. This causes
serious computational problems during the unconstrained optimization.
Consider the same problem as the one we looked at with (exterior) penalty functions:
Minimize f(x) = x , st –x+2≤0
Define the barrier function B(x)= -1/(-x+2)
So the auxiliary function is f(x) + µB(x) = x - µ/(-x+2), and has its optimum solution
at x*=2 + √�, and as µ→0, x*→2.
Barrier Function Auxiliary Function
Similar to exterior penalty functions we don't just choose one small value for rather
we start with some µ1 and generate a sequence of points.
0
2
4
6
8
2 3 4 5 6 7 8
0
2
4
6
8
10
12
2 3 4 5 6 7 8
µ1B
µ2B
B
f+µ2B
f+µ1B
µ1=0.5
µ2=1.5
µ1=0.5
µ2=1.5
156
PROCEDURE: Initially choose a tolerance ε, a “decrease” factor β, an interior
starting point x1, and an initial µ1.
At Iteration i
1. Solve the problem Minimize f(x)+µiB(x),st x∈X (usually Rn). Use xi as the
starting point and let the optimal solution be xi+1
2. If µiB(xi+1) < ε then STOP; else let µi+1 =(µi)β and start iteration (i+1)
Consider the previous example once again…
EXAMPLE: Minimize f(x) = x12 + 2x2
2
st g(x) = 1–x1 –x2 ≤ 0; x∈R2
Define the barrier function
�(�) = − ����−�(�)� = −���(�� + �� − 1)
The unconstrained problem is
Minimize x12 + 2x2
2+ µ�(x) = x12 + 2x2
2 - µ ���(�� + �� − 1)
The necessary conditions for the optimal solution (∇f(x)=0) yield the following:
��
���= 2�� − �/(�� + �� − 1) = 0,
��
���= 4�� − �/(�� + �� − 1) = 0
157
Solving, we get�� =1 ± �1 + 3�
3and�� =
1 ± �1 + 3�
6
Since the negative signs lead to infeasibility, we ��∗ =
�������
�and��
∗ =�������
�
Starting with µ =1, β=0.1 and x1=(0,0) and using a tolerance of 0.005 (say), we have
the following:
Iter. (i) µi xi+1 g(xi+1) µiB(xi+1)
12345
10.10.010.0010.0001
(1.0, 0.5)(0.714, 0.357)(0.672, 0.336)
(0.6672, 0.3336)(0.6666, 0.3333)
-0.5-0.071-0.008-0.0008-0.0001
0.6930.2650.0480.00700.0009
Thus the optimal solution is x*= (2/3, 1/3).
158
Penalty Function Methods and Lagrange Multipliers
Consider the penalty function approach to the problem below
Minimize f(x)
st gj(x) ≤ 0; j = 1,2,…,m
hj(x) = 0; j = m+1,m+2,…,l xk∈Rn.
Suppose we use the usual interior penalty function described earlier, with p=2.
The auxiliary function that we minimize is then given by
P(x) = f(x) + µp(x) = f(x) + µ{∑j[Max(0, gj(x))]2 + ∑j[hj(x)]2}
= f(x) + {∑jµ[Max(0, gj(x))]2 + ∑jµ[hj(x)]2}
The necessary condition for this to have a minimum is that
∇P(x) = ∇f(x) + µ∇p(x) = 0, i.e.,
∇f(x) + ∑j2µ[Max(0, gj(x)]∇gj(x) + ∑j2µ[hj(x)]∇hj(x) = 0 (1)
Suppose that the solution to (1) for a fixed µ (say µk>0) is given by xk.
Let us also designate
2µk[Max{0, gj(xk)}] = λj(µk); j=1,2,…,m (2)
2µk[hj(xk)] = λj(µk); j=m+1,…,l (3)
so that for µ=µk we may rewrite (1) as
159
∇f(x) + ∑jλj(µk) ∇gj(x) + ∑jλj(µk) ∇hj(x) = 0 (4)
Now consider the Lagrangian for the original problem:
L(x,λ) = f(x) + ∑jλjgj(x) + ∑jλjhj(x)
The usual K-K-T necessary conditions yield
∇f(x) + ∑j=1..m {λj∇gj(x)} + ∑j=m+1..l {λj∇hj(x)} = 0 (5)
plus the original constraints, complementary slackness for the inequality constraints
and λj≥0 for j=1,…,m.
Comparing (4) and (5) we can see that when we minimize the auxiliary function using
µ=µk, the λj(µk) values given by (2) and (3) estimate the Lagrange multipliers in (5).
In fact it may be shown that as the penalty function method proceeds and µk→∞ and
the xk→x*, the optimum solution, the values of λj(µk) →λj*, the optimum Lagrange
multiplier value for constraint j.
Consider our example on page 153 once again. For this problem the Lagrangian is
given by L(x,λ) =x12 + 2x2
2 + λ(1-x1-x2).
The K-K-T conditions yield
∂L/∂x1 = 2x1-λ = 0; ∂L/∂x2 = 4x2-λ = 0; λ(1-x1-x2)=0
Solving these results in x1*=2/3; x2
*=1/3; λ*=4/3; (λ=0 yields an infeasible solution).
160
Recall that for fixed µk the optimum value of xk was [2µk/(2+3µk) µk/(2+3µk)]T. As
we saw, when µk→∞, these converge to the optimum solution of x* = (2/3,1/3).
Now, suppose we use (2) to define λ(µ) = 2µ[Max(0, gj(xk)]
= 2µ[1 - {2µ/(2+3µ)} - {µ/(2+3µ)}] (since gj(xk)>0 if µ>0)
= 2µ[1 - {3µ/(2+3µ)}] = 4µ/(2+3µ)
Then it is readily seen that Limµ→∞λ(µ) = Limµ→∞4µ/(2+3µ) = 4/3 = λ*
Similar statements can also be made for the barrier (interior penalty) function approach,
e.g., if we use the log-barrier function we have
P(x) = f(x) + µ p(x) = f(x) + µ∑j-log(-gj(x)),
so that
∇P(x) = ∇f(x) - ∑j{µ/gj(x)}∇gj(x) = 0 (6)
If (like before) for a fixed µk we denote the solution by xk and define -µk/gj(xk) =
λj(µk), then from (6) and (5) we see that λj(µk) approximates λj. Furthermore, as
µk→0 and the xk→x*, it can be shown that λj(µk) →λj*
For our example (page 156) -µk/gj(xk) =µk/�1 −�������
�−
�������
�� =
-2µk/(1-�1 + 3��) → 4/3, as µk→0.
161
Penalty and Barrier function methods have been referred to as “SEQUENTIAL
UNCONSTRAINED MINIMIZATION TECHNIQUES” (SUMT) and studied in
detail by Fiacco and McCormick. While they are attractive in the simplicity of the
principle on which they are based, they also possess several undesirable properties.
When the parameters µ are very large in value, the penalty functions tend to be
ill-behaved near the boundary of the constraint set where the optimum points usually
lie. Another problem is the choice of appropriate µi and β values. The rate at which µi
change (i.e. the β values) can seriously affect the computational effort to find a
solution. Also, as µ increases, the Hessian of the unconstrained function often
becomes ill-conditioned.
Some of these problems are addressed by the so-called "multiplier" methods.
Here there is no need for µ to go to infinity, and the unconstrained function is better
conditioned with no singularities. Furthermore, they also have faster rates of
convergence than SUMT.
162
Ill-Conditioning of the Hessian Matrix
Consider the Hessian matrix of the Auxiliary function P(x)=x12+2x2
2+µ(1-x1-x2)2:
H =
+
+
µµ
µµ
242
222. Suppose we want to find its eigenvalues by solving
|H-λI| = (2+2µ-λ)*(4+2µ-λ) - 4µ2
= λ2 – (6+4µ)λ + (8+12µ) = 0
This quadratic equation yields
λ = 14)23( 2 +±+ µµ
Taking the ratio of the largest and the smallest eigenvalue yields
14)23(
14)23(2
2
+−+
+++
µµ
µµ. It should be clear that as µ→∞, the limit of the preceding ratio
also goes to ∞. This indicates that as the iterations proceed and we start to increase the
value of µ, the Hessian of the unconstrained function that we are minimizing becomes
increasingly ill-conditioned. This is a common situation and is especially problematic
if we are using a method for the unconstrained optimization that requires the use of the
Hessian.
163
MULTIPLIER METHODS
Consider the problem: Min f(x), st gj(x)≤0, j=1,2,...,m.
In multiplier methods the auxiliary function is given by
P(x) = f(x) + ½ ∑ ���Max{0,��(�) + ���}�
�����
where µj>0 and ��� ≥ 0 are parameters associated with the jth constraint (i ≡ iteration
no.). This is also often referred to as the AUGMENTED LAGRANGIAN function.
Note that if ���=0 and µj =µ, this reduces to the usual exterior penalty function.
The basic idea behind multiplier methods is as follows:
Let xi be the minimum of the auxiliary function P�(x) at some iteration i. From the
optimality conditions we have
�P����� = ������+ ���������0,������+ ��
������(��)�
�
���
Now, ����0,������+ ��
�� = ��� + ��� �−��
� ,�������. So, �P����� = � implies
������ + ����������(�
�)�
�
���
+ ��������−��� ,����
������(��)�
�
���
= �
Let us assume for a moment that ��� are chosen at each iteration i so that they satisfy
the following:
(1)
It is easily verified that (1) is equivalent to requiring that
��� ≥ 0,����
�� ≤ 0 and�������
�� = 0 (2)
��� �−��� ,����
��� = 0
164
Therefore as long as (2) is satisfied, we have for the point xi that minimize P�(x)
�P����� = ������+ ∑ ���������(�
�)� = ����� (3)
If we let ����� = ��
� and use the fact that µj>0, then (2) reduces to
��� ≥ 0,����
�� ≤ 0 and��� ∙ ����
�� = 0, ∀� (A)
and (3) reduces to
������+ ∑ ������(�
�) = ����� . (B)
It is readily seen that (A) and (B) are merely the KARUSH-KUHN-TUCKER
necessary conditions for xi to be a solution to the original problem below!
Min f(x), st gj(x)≤0, j=1,2,...,m.
We may therefore conclude that
�� = �∗ and����� = ����
∗ = ��∗
Here x* is the solution to the problem, and λj* is the optimum Lagrange multiplier
associated with constraint j.
From the previous discussion it should be clear that at each iteration i, ��� should be
chosen in such a way that ����−��� ,��� → 0, which results in xi → x*. Note that xi is
obtained here by minimizing the auxiliary function with respect to x.
165
ALGORITHM: A general algorithmic approach for the multiplier method may now
be stated as follows:
STEP 0: Set i=0, choose vectors xi, µ and ��.
STEP 1: Set i=i+1.
STEP 2: Starting at xi-1, Minimize P�(x) to find xi.
STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not
satisfied.
STEP 4: Modify �� based on satisfying (1). Also modify µ if necessary and go to
Step 2.
(Usually �� is set to 0…)
One example of a formula for changing �� is the following:
���� = �� + �����������, −��
���
166
PRIMAL METHODS
By a primal method we mean one that works directly on the original constrained
problem:
Min f(x)
st gj(x)≤0 j=1,2,...,m.
Almost all methods are based on the following general strategy:
Let xi be a design at iteration i. A new design xi+1 is found by the expression xi+1 = xi
+ αi��, where αi is a step size parameter and �� is a search direction (vector). The
direction vector �� is typically determined by an expression of the form:
�� = �������� + ��������
��
�∈��
where ∇f and ∇gj are gradient vectors of the objective and constraint functions and Jε
is an “active set" given by Jε ={j|gj(xi)=0}, or more commonly ={j|gj(xi)+ε ≥ 0, ε >0}.
Unlike with transformation methods, it is evident that here the gradient vectors of
individual constraint functions need to be evaluated. This is a characteristic of all
primal methods.
167
METHOD OF FEASIBLE DIRECTIONS
The first class of primal methods we look at are the feasible directions methods,
where each xi is within the feasible region. An advantage is that if the process is
terminated before reaching the true optimum solution (as is often necessary with
large-scale problems) the terminating point is feasible and hence acceptable. The
general strategy at iteration i is:
(1) Let xi be a feasible point.
(2) Choose a direction �� so that
a) xi+α�� is feasible at least for some "sufficiently" small α>0, and
b) f(xi+�) < f(xi).
(3) Do an unconstrained optimization
Minα f(xi+α��) to obtain α*=αi and hence xi+1 = xi+αi��
DEFINITION: For the problem Minimize f(x), st x∈X
d (≠0) is called a feasible direction at x'∈X if ∃ δ>0 ∋ (x'+αd)∈X for α∈[0,δ)
Xx'
feasible directions
d
X
x'
not feasible directions
168
Further, if we also have f(x'+αd) < f(x') for all α∈[0,α'), α' <δ, then d is called an
improving feasible direction.
Under differentiability, recall that this implies
∇fT(x')⋅d < 0
Let Fx' = {d | ∇fT(x')⋅d < 0}, and
Dx' = {d | ∃ δ>0 ∋ (x'+αd)∈X for all α∈[0,δ)},
Thus Fx' is the set of improving directions at x', and
Dx' is the set of feasible directions at x'.
If x* is a local optimum then Fx* ∩ Dx* = φ as long as a constraint qualification is
met.
In general, for the problem
Minimize f(x), st gj(x)≤0, j=1,2,...,m.
let Jx' ={j | gj(x')=0} (Index set of active constraints)
We can show that the set of feasible directions at x' is
Dx' = {d | ∇gjT(x')⋅d < 0 for all j∈Jx'}
NOTE: If gj(x) is a linear function, then the strict inequality (<) used to define the set
Dx' above can be relaxed to an inequality (≤).
169
Geometric Interpretation
Minimize f(x), st gj(x) ≤0, j=1,2,…,m
(i) (ii)
In (ii) d any positive step in this direction is infeasible (however we will often
consider this a “feasible” direction…)
FARKAS’ LEMMA: Given A∈Rm×n, b∈Rm, x∈Rn, y∈Rm, the following statements
are equivalent to each other:
1. yTA ≤ 0 ⇒ yTb ≤ 0
2. ∃ z such that Az=b, z≥0
dT∇gj(x*)<0 d is a feasibledirection
∇gj(x*)
gj(x)=0x* dT∇gj(x*)=0
d is tangent to theconstraint boundary
∇gj(x*)
x*
gj(x)=0d
A1
A2
A3
b
H1
H2
H3
Hb
yTA≤ 0 ⇔ y∈ ∩jHj-
where Hj- is the closed
half-space on the side of Hj
that does not contain Aj
So…
yTA≤ 0 ⇒ yTb ≤ 0 simply
implies that (∩jHj-) ⊆ Hb-
∩jHj-
Suppose Hj is the hyperplane
through the origin that is
orthogonal to Aj
d
170
Application to NLP
Let b ≡ -∇f(x*) Aj ≡ ∇gj(x*)
y ≡ d (direction vector), zj ≡ λj for j∈ Jx* ≡ {j | gj(x*)=0}
Farkas’ Lemma then implies that
(1) dT∇gj(x*) ≤ 0 ∀ j∈ Jx* ⇒ -dT∇f(x*) ≤ 0, i.e., dT∇f(x*) ≥ 0
(2) ∃ λj≥0 such that ∑j∈Jx* λj∇gj(x*) = -∇f(x*)
are equivalent! Note that (1) indicates that
• directions satisfying dT∇gj(x*) ≤ 0 ∀ j∈ Jx* are feasible directions
• directions satisfying dT∇f(x*) ≥ 0 are ascent (non-improving) directions
On the other hand (2) indicates the K-K-T conditions. They are equivalent!
So at the optimum, there is no feasible direction that is improving…
Cone generated by gradients
of tight constraints
K-K-T implies that the steepest descent direction is in
the cone generated by gradients of tight constraints
x*
g3(x)=0
g2(x)=0
∇g3(x*)
-∇f (x*)
∇g1(x*)
g1(x)=0
Feasible region
171
Similarly for the general NLP problem
Minimize f(x)
st gj(x) ≤ 0, j=1,2,...,m
hk(x) = 0, k=1,2,...,p
the set of feasible directions is
Dx'= {d | ∇gjT(x')⋅d < 0 ∀ j∈Jx', and ∇hk
T(x') ⋅d = 0 ∀ k}
====================================================
In a feasible directions algorithm we have the following:
STEP 1 (direction finding)
To generate an improving feasible direction from x', we need to find d∈Fx'∩Dx' ≠ φ,
i.e., ∋
∇fT(x')⋅d < 0 and ∇gjT(x')⋅d < 0 for j∈Jx', ∇hk
T(x') ⋅d = 0 ∀ k
We therefore solve the subproblem
Minimize ∇fT(x')⋅d
st ∇gjT(x')⋅d < 0 for j∈Jx'
(≤0 for j ∋ gj linear)
∇hkT (x')⋅d = 0 for all k,
along with some “normalizing” constraints such as
-1 ≤ di ≤ 1 for i=1,2,...,n or ║d║2= dTd ≤ 1.
If the objective of this subproblem is negative, we have an improving feasible
direction di.
172
In practice, the actual subproblem we solve is slightly different since strict inequality
(<) constraints are difficult to handle.
Let z = Max [∇fT(x')⋅d, ∇gjT(x')⋅d for j∈Jx']. Then we solve:
Minimize z
st ∇fT(x')⋅d ≤ z
∇gjT(x')⋅d ≤ z for j∈J
∇hkT (x')⋅d = 0 for k=1,2,...,p,
-1 ≤ di ≤ 1 for i=1,2,...,n
If (z*,d*) is the optimum solution for this then
a) z* < 0 ⇒ d =d* is an improving feasible direction
b) z = 0 ⇒ x* is a Fritz John point (or if a CQ is met, a K-K-T point)
(Note that z* can never be greater than 0)
STEP 2: (line search for step size)
Assuming that z* from the first step is negative, we now solve
Min f(xi+αdi)
st gj(xi+αdi) ≤ 0, α≥0
This can be rewritten as
Min f(xi+αdi)
st 0≤α≤αmax
where αmax= supremum {α | gj(xi+αdi) ≤ 0, j=1,2,...,m}. Let the optimum solution to
this be at α*=αi.
Then we set xi+1= (xi+αdi) and return to Step 1.
173
NOTE: (1) In practice the set Jx' is usually defined as Jx'={ j |gj(xi)+ε ≥0}, where the
tolerance ε is referred to as the "constraint thickness," which may be
reduced as the algorithm proceeds.
(2) This procedure is usually attributed to ZOUTENDIJK
EXAMPLE
Minimize 2x12+ x2
2- 2x1x2 - 4x1 - 6x2
st x1 + x2 ≤ 8
-x1 + 2x2 ≤ 10
-x1 ≤ 0
-x2 ≤ 0 (A QP…)
For this, we have
∇��(�) = �11� , ∇��(�) = �
−12� , ∇��(�) = �
−10� , ∇��(�) = �
0−1
�,
and ∇�(�) = �4�� − 2�� − 42�� − 2�� − 6
�.
Let us begin with �� = �00�, with f(x0)=0
ITERATION 1 J={3,4}. STEP 1 yields
Min z Min z
st ∇fT(x1)⋅d ≤ z d1(4x1-2x2-4) + d2(2x2-2x1 -6) ≤ z
∇g3T(xi)⋅d ≤ z ⇒ -d1 ≤ z
∇g4T(xi)⋅d ≤ z -d2 ≤ z
-1≤d1, d2≤1 d1,d2 ∈ [-1,1]
174
i.e., Min z
st -4d1- 6d2 ≤ z, -d1 ≤ z, -d2 ≤ z d1,d2 ∈ [-1,1]
yielding z* = -1 (<0), with d* = d0= �11�
STEP 2 (line search):
�� + ��� = �00� + � �
11� = �
���
Thus αmax = {α | 2α≤8, α≤10, -α≤0, -α≤0} = 4
We therefore solve: Minimize f(x0+αd0), st 0≤α≤4
⇒ α∗= α0=4 ⇒ �� = �� + ���� = �
44�, with f(x1)= -24.
ITERATION 2 J={1}. STEP 1 yields
Min z
st d1(4x1-2x2-4) + d2(2x2-2x1 -6) ≤ z
d1+ d2 ≤ z
-1≤d1, d2≤1
i.e., Min z
st 4d1- 6d2 ≤ z, d1 + d2 ≤ z, d1,d2 ∈ [-1,1]
yielding z* = -4 (<0), with d* = d1= �−10�.
STEP 2 (line search):
�� + ��� = �44� + � �
−10� = �
4 − �4
�
175
Thus αmax = {α | 8-α≤8, α-4+8≤10, α-4≤0, -4≤0} = 4
We therefore solve: Minimize 2(4-α)2 + 42 – 2(4-α)4 – 4(4-α) – 6(4), st 0≤α≤4.
⇒ α∗=α1=1 ⇒ �� = �� + ���� = �
34�, with f(x2)= -26.
ITERATION 3 J={φ}. STEP 1 yields
Min z
st d1(4x1-2x2-4) + d2(2x2-2x1 -6) ≤ z
-1≤d1, d2≤1
i.e., Min z
st -4d2 ≤ z, d1,d2 ∈ [-1,1]
yielding z* = -4 (<0), with d* = d2= �01�.
STEP 2 (line search):
�� + ��� = �34� + � �
01� = �
34 + �
�
Thus αmax = {α | 7+α≤8, 5+α≤10, -3≤0, -4-α≤0} = 1
We therefore solve: Minimize f(x2+αd2), st 0≤α≤1
i.e., Minimize 2(3)2 + (4+α)2 – 2(3)(4+α) – 4(3) – 6(4+α), st 0≤α≤1
⇒ α∗=α2=1 ⇒ �� = �� + ���� = �
35�, with f(x3)= -29.
176
ITERATION 4 J={1}. STEP 1 yields
Min z
st d1(4x1-2x2-4) + d2(2x2-2x1 -6) ≤ z
d1+ d2 ≤ z
-1≤d1, d2≤1
i.e., Min z
st -2d1- 10d2 ≤ z, d1 + d2 ≤ z, d1,d2 ∈ [-1,1]
yielding z* = 0, with d* = �00�. STOP
Therefore the optimum solution is given by �∗ = �35�, with f(x*)= -29.
1 2 3 4 5 6 87
1
2
3
4
5
6
8
7
x0
x1
x2
x3
-x1+2 x2=10
x1+ x2=8
path followed by algorithm
feasible region
177
GRADIENT PROJECTION METHOD (ROSEN)
Recall that the steepest descent direction is -∇f(x). However, for constrained
problems, moving along -∇f(x) may destroy feasibility. Rosen's method works by
projecting -∇f(x) on to the hyperplane tangent to the set of active constraints. By
doing so it tries to improve the objective while simultaneously maintaining
feasibility. It uses d=-P∇f(x) as the search direction, where P is a projection matrix.
An (n×n) matrix is a projection matrix if PT = P and PTP = P (i.e., P is idempotent)
Properties of P
1) P is positive semidefinite
2) P is a projection matrix if, and only if, I-P is also a projection matrix.
3) Let Q=I-P and p= Px1, q=Qx2 where x1, x2∈Rn. Then
(a) pTq= qTp =0 (p and q are orthogonal)
(b) Any x∈Rn can be uniquely expressed as x=p+q, where p=Px, q=(I-P)x
x2
x1
x
p
q
given x, we can
find p and q
178
Consider the simpler case where all constraints are linear. Assume that there are 2
linear constraints intersecting along a line as shown below.
Obviously, moving along -∇f would take us outside the feasible region. Say we
project -∇f on to the feasible region using the matrix P.
THEOREM: As long as P is a projection matrix and P∇f ≠0, d=-P∇f will be an
improving direction.
PROOF: ∇f Td = ∇f T⋅(-P∇f) = - ∇f TPTP∇f = -║P∇f║2 < 0. (QED)
For -P∇f to also be feasible we must have ∇g1T (-P∇f) and ∇g2
T (-P∇f) ≤ 0 (by
definition…).
-∇f
(I-P)(-∇f)
∇g2
∇g1
g1(x)=0g2(x)=0
P(-∇f)
179
Say we pick P such that ∇g1T(-P∇f) = ∇g2
T(-P∇f) = 0, so that -P∇f is a feasible
direction). Thus if we denote M as the (2×n) matrix where Row 1 is ∇g1T and
Row 2 is ∇g2T, then we have M(-P∇f)= 0 (*)
To find the form of the matrix P, we make use of Property 4(b) to write the vector -∇f
as -∇f = P(-∇f) + [I-P](-∇f) = -P∇f - [I-P]∇f
Now, -∇g1 and -∇g2 are both orthogonal to -P∇f, and so is [I-P](-∇f). Hence we must
have
[I-P](-∇f) = - [I-P]∇f = λ1∇g1 + λ2∇g2
⇒ -[I-P]∇f = MT�, where �=������
⇒ -M[I-P]∇f = MMT�
⇒ -(MMT)-1M[I-P]∇f = (MMT)-1MMT� =�
⇒ � = -{ (MMT)-1M∇f - (MMT)-1 MP∇f }
⇒ � = -{ (MMT)-1M∇f + (MMT)-1M(-P∇f) }
⇒ � = -(MMT)-1M∇f (from (*) above M(-P∇f)=0…)
Hence we have
-∇f = -P∇f - [I-P]∇f = -P∇f + MT� = -P∇f - MT(MMT)-1M∇f
i.e., P∇f = ∇f - MT(MMT)-1M∇f = [I - MT(MMT)-1M]∇f
i.e., P = [I - MT(MMT)-1M]
180
Question: What if P∇f(x) = 0?
Answer: Then 0 = P∇f = [I –MT(MMT)-1M] ∇f = ∇f + MTw, where
w = -(MMT)-1M∇f. If w≥0, then x satisfies the Karush-Kuhn-Tucker conditions and
we may stop; if not, a new projection matrix �� could be identified such that d= -��∇f
is an improving feasible direction. In order to identify this matrix �� we merely pick
any component of w that is negative, and drop the corresponding row from the matrix
M. Then we use the usual formula to obtain ��.
====================
Now consider the general problem
Minimize f(x)
st gj(x) ≤ 0, j=1,2,...,m
hk(x) = 0, k=1,2,...,p
The gradient projection algorithm can be generalized to nonlinear constraints by
projecting the gradient on to the tangent hyperplane at x (rather than on to the surface
of the feasible region itself).
181
STEP 1: SEARCH DIRECTION
Let xi be a feasible point and let J be the "active set", i.e., J = {j | gj(xi)=0}, or more
practically, J = {j | gj(xi)+ε ≥ 0}.
Suppose each j∈J is approximated using the Taylor series expansion by:
gj(x) = gj(xi) + (x-xi)∇gj(xi).
Notice that this approximation is a linear function of x with slope ∇gj(xi).
• Let M≡matrix whose rows are ∇gjT(xi) for j∈J (active constraints) and ∇hk
T(xi)
for k=1,2,...,p (equality constraints)
• Let P=[I - MT(MMT)-1M] (≡projection matrix). If M does not exist, let P=I.
• Let di = -P∇f(xi).
a) If di =0 and M does not exist STOP
b) If di =0 and M exists find
� = ���� = -(MMT)-1M∇f(xi)
where u corresponds to g (inequality constraints) and v to h (equality constraints).
If u>0 STOP, else delete the row of M corresponding to some uj <0 and
recompute P.
c) If di ≠0 go to Step 2.
182
STEP 2. STEP SIZE (Line Search)
Solve Minimize��������
���� + ����
where αmax= Supremum {α | gj(xi+αdi) ≤ 0, hj(xi+αdi) =0}.
Call the optimal solution α*and find xi+1 = xi+α*di. If all constraints are not linear
make a "correction move" to return x into the feasible region. The need for this is
shown below. Generally, it could be done by solving g(x)=0 starting at xi+1 using the
Newton Raphson approach, or by moving orthogonally from xi+1 etc.
g1, g2 are linear constraints g1 is a nonlinear constraint
NO CORRECTION REQD. REQUIRES CORRECTION
g1=0
g2=0∇g1
−∇f
−P∇f
O.F. contours−∇f
−P∇f
g1=0
O.F. contours
correction move
xi+1
x1 x1
x2 x2
183
LINEARIZATION METHODS
The general idea of linearization methods is to replace the solution of the NLP by the
solution of a series of linear programs which approximate the original NLP.
The simplest method is RECURSIVE LP:
Transform Min f(x), st gj(x), j=1,2,...,m,
into the LP
Min {f(xi) + (x-xi)T∇f(xi)}
st gj(xi) + (x-xi)T∇gj(xi)≤ 0, for all j.
Thus we start with some initial xi where the objective and constraints (usually only
the tight ones) are linearized, and then solve the LP to obtain a new xi+1 and
continue...
Note that f(xi) is a constant and x-xi=d implies the problem is equivalent to
Min dT∇f(xi)
st dT∇gj(xi)≤ -gj(xi), for all j.
and this is a direction finding LP problem, and xi+1 =xi+d implies that the step
size=1.0
EXAMPLE: Min f(x)= 4x1 – x22 -12,
st g1(x) = x12 + x2
2 - 25 ≤ 0,
g2(x) = -x12 - x2
2 +10x1 + 10x2 - 34 ≥ 0
If this is linearized at the point xi =(2,4), then since
184
∇����� = �4
−2��� = �
4−8
� ; ∇������ = �
2��2��
� = �48� ; ∇����
�� = �−2�� + 10−2�� + 10
� = �62�
it yields the following linear program (VERIFY…)
Min ��(x) = 4x1 – 8x2 + 4,
st ��1(x) = 4x1 + 8x2 - 45 ≤ 0,
��2(x) = 6x1 + 2x2 - 14 ≥ 0
The method however has limited application since it does not converge if the local
minimum occurs at a point that is not a vertex of the feasible region; in such cases it
oscillates between adjacent vertices. To avoid this one may limit the step size (for
instance, the Frank-Wolfe method minimizes f between xi and xi+d to get xi+1).
f=-20
f=-10
f=0
��=-30
��=0
��1=0
��2=0
�1=0�2=0
��*
185
RECURSIVE QUADRATIC PROGRAMMING (RQP)
RQP is similar in logic to RLP, except that it solves a sequence of quadratic
programming problems. RQP is a better approach because the second order terms
help capture curvature information.
Min dT∇L(xi) + ½ dTBd
st dT∇gj (xi) ≤ -gj(xi) ∀ j
where B is the Hessian (or an approximation of the Hessian) of the Lagrangian
function of the objective function f at the point xi. Notice that this a QP in d with
linear constraints. Once the above QP is solved to get di, we obtain xi by finding an
optimum step size along xi+di and continue. The matrix B is never explicitly
computed --- it is usually updated by schemes such as the BFGS approach using
information from ∇f(xi) and ∇f(xi+1) only. (Recall the quasi-Newton methods for
unconstrained optimization)
186
KELLEY'S CUTTING PLANE METHOD
One of the better linearization methods is the cutting plane algorithm of J. E. Kelley,
developed for convex programming problems.
Suppose we have a set of nonlinear constraints gj(x)≤0 for j=1,2,...,m that determine
the convex feasible region G. Suppose further that we can find a polyhedral set H
that entirely contains the region determined by these constraints, i.e., G⊆H.
In general, a cutting plane algorithm would operate as follows:
1. Solve the problem with linear constraints.
2. If the optimal solution to this problem is feasible in the original (nonlinear)
constraints then STOP; it is also optimal for the original problem.
3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that the
current optimum point (for the linear constraints) becomes infeasible for the new
problem with the added linear constraint (so we "cut out" part of the infeasibility).
Now return to Step 1.
�1=0
�2=0
�3=0G
ℎ1=0
H
ℎ2=0
ℎ3=0
ℎ4=0
G:
�j(x): nonlinear
ℎj(x): linear; henceH is polyhedral
�������� ������
187
JUSTIFICATION
Step 1: It is in general easier to solve problems with linear constraints than ones with
nonlinear constraints.
Step 2: G is determined by gj(x)≤0, j=1,...,m, where each gj is convex, while
H is determined by hk≤0 ∀ k, where each hk is linear. G is wholly contained inside
the polyhedral set H.
Thus every point that satisfies gj(x)≤0 automatically satisfies hk(x)≤0, but not
vice-versa. Thus G places "extra" constraints on the design variables and therefore
the optimal solution with G can never be any better than the optimal solution with H.
Therefore, if for some region H, the optimal solution x* is also feasible in G, it
MUST be optimal for G also. (NOTE: In general, the optimal solution for H at some
iteration will not be feasible in G).
Step 3: Generating a “CUT”
In Step 2, let �̅���� = Maximum�∈�
�� ����, where J is the set of violated constraints,
J={j | gj(x)>0}, i.e., �̅���� is the value of the most violated constraint at xi. By the
Taylor series approximation around xi,
�̅(�) = �̅���� + ��− �����������+ � �[�− ��]��(��)[�− ��]
⇒ �̅(�) ≥ �̅���� + ��− �������(��), for every x∈Rn (since �̅is convex).
188
If ������� = �then�̅(�) ≥ �̅����
But since �̅ is violated at xi, �̅���� > 0. Therefore �̅(�) > 0 ∀ x∈Rn.
⇒ �̅ is violated for all x,
⇒ the original problem is infeasible.
So, assuming then that ������� ≠ �, consider the following linear constraint:
��− ����������� ≤ −�̅����
i. e. , ℎ(�) = �̅����+ ��− ����������� ≤ 0 (⊗)
(Note that xi, ������� and �̅���� are all known…)
The current point xi is obviously infeasible in this constraint --- since h(xi) ≤ 0 implies
that �̅����+ ��� − ����������� = �̅���� ≤ 0, which contradicts the fact that �̅ is
violated at xi , (i.e, �̅(xi) >0). Also, if x is feasible then �̅(�) ≤ 0 and �̅����+
��− �������(��) ≤ �̅(�) by the convexity of �̅. So x remains feasible with this extra
linear constraint and no part of the original feasible region is cut off. Hence (⊗)
defines a suitable cutting plane. If we add this extra linear constraint, then the optimal
solution would change because the current optimal solution would no longer be
feasible for the new LP with the added constraint.
NOTE: If the objective for the original NLP was linear, then at each iteration we
merely solve a linear program!
189
An Example (in R2) of how a cutting plane algorithm might proceed.
(--------- cutting plane).
Nonnegativity + 5 linear
constraints
G
ℎ3H
ℎ2
ℎ1
ℎ4
��=x*
ℎ5 ℎ6
G
ℎ3H
ℎ2
ℎ1
ℎ4
��ℎ5
True optimum
x* for G…
Nonnegativity + 6 linear
constraints
Nonnegativity + 4 linear
constraints
G
ℎ3H
ℎ2
ℎ1
ℎ4
��
Nonnegativity + 3 linear
constraints
G
ℎ3H
ℎ2
ℎ1
��
h4
h5
h6