a perfect example for the bfgs...

A Perfect Example for the BFGS Method

Yu-Hong Dai

State Key Laboratory of Scientific and Engineering ComputingInst of Computational Mathematics and Scientific/Engineering Computing

Academy of Mathematics and Systems ScienceChinese Academy of Sciences, Beijing 100190, China

Email: dyh@lsec.cc.ac.cnhttp://lsec.cc.ac.cn/∼dyh

July 12, 2010

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 1 / 70

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Outline

The BFGS Quasi-Newton Method

Quasi-Newton Methods

Unconstrained Optimization

minx∈<n

Quasi-Newton Methods

xk+1 = xk + αk dk

dk = −Hk gk

How does Hk+1 get close to [∇2f (xk+1)]−1? Defining

sk = xk+1 − xk , yk = gk+1 − gk ,

the answer by Davidon (1959) is to ask Hk+1 to be updated from Hkand match the second derivative information along the previous step:

Hk+1yk = sk

Broyden’s Convex Family of Methods

The DFP Method

Hk+1 = Hk −HkykyT

yTk Hkyk

sTk yk

The BFGS Method

Hk+1 = Hk −skyT

k Hk + HkyksTk

sTk yk

yTk Hkyk

sTk yk

Broyden’s Convex Family of Methods (φ ∈ [0, 1])

Hk+1 = Hk −HkykyT

yTk Hyk

sTk yk

+ φvkvTk ,

where vk =√

yTk Hyk

sTk yk

− Hkyk

yTk Hyk

)Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 4 / 70

Theoretical and Practical Line Searches

Line Search Function

ψk (α) = f (xk + αdk ), where α ∈ [0, +∞)

Global Exact Line Search: ψk (α∗k ) = global minψk (α)

Curry Line Search: α∗k is the first local minimizer of ψk (α)

Wolfe Inexact Line Search: find αk such that{ψk (αk ) ≤ ψk (0) + σ1ψ

′k (0)αk

ψ′k (αk ) ≥ σ2ψ′k (0)

Armijo Inexact Line Search: choose αk to be the maximal valueof {λm : m ≥ 0}, where λ ∈ (0, 1), such that

ψk (αk ) ≤ ψk (0) + σψ′k (0)αk

Convergence of BFGS for Convex Functions

Powell (1976):

The BFGS methodwith Wolfe inexact line searches

is globally convergent foruniformly convex functions

Byrd, Nocedal and Yuan (1987):

The Broyden’s convex family of methods (except DFP)with Wolfe inexact line searches

is globally convergent for(not necessarily uniformly) convex functions

Two Open Convergence Questions

Nocedal (1992), Fletcher (1994), et al:

Open Question (I)Does the DFP method with Wolfe line searches converge for uniformlyconvex functions?

Open Question (II)Does the BFGS method with Wolfe line searches converge forgeneral nonconvex functions?

Nonconvergence of BFGS for Nonconvex Functions

Powell (1984): if the stepsize αk can be chosen as any local minimizerof the line search function ψk (α)

Dai (2002): if the stepsize αk is obtained by the Wolfe inexact linesearch

Mascarenhas (2004): if the stepsize αk is chosen to be the globalminimizer of the line search function ψk (α)

Motivation of This Work

Powell (2000) was able to show that the BFGS method convergesglobally for two-dimensional nonconvex functions if the line searchtakes the first local minimizer of ψk (α).

The Aim of This Work is to construct a perfect example for thenonconvergence of the BFGS method with the following properties:

the stepsize is always one; namely, αk ≡ 1;each line search function ψk (α) is convex and hence αk is theunique minimizer of ψk (α);the unit stepsize can be accepted by all the exact and inexact linesearches mentioned before.

Basic Ideas of Constructing Examples

Prefix the concrete forms of the steps {sk ; k ≥ 1} and gradients{gk ; k ≥ 1} with some parameters to be determined later.Consequently, once x1 is given, the whole sequence {xk} is fixed.To enable the BFGS method to generate the prefixed steps,investigate the consistency conditions about the steps {sk ; k ≥ 1}and gradients {gk ; k ≥ 1}.Choose the parameters by some way to satisfy the consistencyconditions and the other necessary conditions on the objectivefunction.Construct a function f whose gradients are the preassignedvalues, namely, ∇f (xk ) = gk for all k ≥ 1 and ensure the linesearch has the desired properties.

Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients

Outline

Denoting the rotation matrix

(cosα − sinαsinα cosα

), R2 =

(cosβ − sinβsinβ cosβ

)we prescribe the forms of the steps {sk} and the gradients {gk}:

s1 = (√

2, 0, γ, τ)T , sk+1 = Msk (k ≥ 1);

g1 = (l , h, 0, −√

2)T , gk+1 = Pgk (k ≥ 1),

(R1 00 tR2

), P =

(tR1 00 R2

In the above, the parameter t ∈ (0,1) answers for the decay of the lasttwo components of sk and the first two components of gk .

The angles α and β are chosen so that

α =2πm1

, β =2πm2

for some integers m1 and m2. Specifically, we choose

α =14π, β =

Therefore, after every eight iterations, the first two components of thesteps {sk} turn to the same while the last two components of {sk}shrink with the factor of t8. This means that {xk} asymptotically turnaround the vertices of a regular octagon that lies in the plane.So is {gk} if we exchange the first and the last two components.

Looking for Consistent Steps and Gradients Unit Stepsizes

Outline

Assumption on the Line SearchWe always assume that the line search satisfies

gTk+1sk = 0. (1)

This with the quasi-Newton equation

Hk+1yk = sk , (2)

and the definition of the search direction

Hk+1gk+1 = −α−1k+1sk+1 (3)

implies that

sTk+1yk = −αk+1gT

k+1Hk+1yk = −αk+1gTk+1sk = 0. (4)

The above relation (4) is called as the conjugacy condition in thecontext of nonlinear conjugate gradient methods.

Expression for Hkyk and Hk+1.

Multiplying the BFGS updating formula with gk+1 and using (1),

Hk+1gk+1 = Hkgk+1 −yT

k Hkgk+1

sTk yk

which with (3) gives

Hkgk+1 = −α−1k+1sk+1 +

yTk Hkgk+1

sTk yk

sk . (5)

Multiplying (5) by gTk and noticing gT

k Hkgk+1 = 0, we get that

0 = −α−1k+1gT

k sk+1 +yT

k Hkgk+1

sTk yk

gTk sk = −α−1

k+1gTk sk+1 − yT

k Hkgk+1.

Thus yTk Hkgk+1 = −α−1

k+1gTk sk+1 = −α−1

k+1gTk+1sk+1, and hence

Hkgk+1 = −α−1k+1sk+1 − α−1

k+1sk+1

sTk yk

sk . (6)

Expression for Hkyk and Hk+1 (ctd.) Thus by (6) andHkgk = −α−1

k sk , we obtain

Hkyk = −α−1k+1sk+1 +

[α−1

k − α−1k+1

gTk+1sk+1

sTk yk

]sk . (7)

Substituting this into the BFGS updating formula

Hk+1 = Hk −skyT

k Hk + HkyksTk

sTk yk

yTk Hkyk

sTk yk

yields

Hk+1 = Hk+α−1k+1

sksTk+1 + sk+1sT

sTk yk

[1− α−1

k + α−1k+1

gTk+1sk+1

sTk yk

Lemma 1. Assume that gTk+1sk = 0 for all k ≥ 1. Then for all k ≥ 1

and i ≥ 0, we have that

Hkgk+i + α−1k+isk+i ∈ Span{sk ,sk+1, . . . ,sk+i−1}. (9)

Proof. For convenience, we write (8)

Hk+1 = Hk + V (sk ,sk+1), (10)

where V (sk ,sk+1) means the rank-two matrix in the right hand of (8).Therefore for all i ≥ 1, we have that

Hk+i = Hk +i∑

V (sk+j−1,sk+j). (11)

The statement follows by multiplying the above relation with gk+i andusing Hk+igk+i = −α−1

k+isk+i and gTk+isk+i−1 = 0.

Lemma 2. To construct a desired example with Det(S1) 6= 0, we musthave that

αk = 1, for all k ≥ 4.

proof. Define

Gk =[yk−1 gk gk+1 gk+2

[sk−1 sk sk+1 sk+2

Then by Hkyk−1 = sk−1 and Lemma 1, we can get that

Det(Hk )Det(Gk ) = Det[Hkyk−1 Hkgk Hkgk+1 Hkgk+2

]= Det

[sk−1 − α−1

k sk − α−1k+1sk+1 − α−2

k+2sk+2]

= −α−1k α−1

k+1α−1k+2Det(Sk ).

(12)Replacing k with k + 1 in the above yields

Det(Hk+1)Det(Gk+1) = −α−1k+1α

−1k+2α

−1k+3Det(Sk+1). (13)

On the other hand, due to the special forms of {gk} and {sk}, we havethat Gk+1 = PGk and Sk+1 = MSk . Hence

Det(Gk+1) = Det(P) Det(Gk ) = t2 Det(Gk ),

Det(Sk+1) = Det(M) Det(Sk ) = t2 Det(Sk ),(14)

Due to the basic determinant relation of BFGS and the assumption,

Det(Hk+1) =sT

k H−1k sk

sTk yk

Det(Hk ) = αkDet(Hk ). (15)

If Det(S1) 6= 0, (12) with k = 1 implies that Det(G1) 6= 0. Then by (14),

Det(Gk ) 6= 0, Det(Sk ) 6= 0, for all k ≥ 1. (16)

Dividing (13) by (12) and using the above relations, we then obtain

αk+3 = 1.

So the statement holds due to the arbitrariness of k ≥ 1.Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 20 / 70

The deletion of the first finite iterations does not influence the wholeexample. Thus we will ask our counter-example to satisfy

Det(S1) 6= 0 (17)

andαk ≡ 1. (18)

Looking for Consistent Steps and Gradients Consistency Conditions

Outline

Question. What else conditions on {gk} and {sk} which ensure thesequence of {sk ; k ≥ 1} can be generated by the BFGS update?

Under the assumptions (1) (i.e. gTk+1sk = 0), (17) and (18), the

updating formula of Hk in (8) can be simplified as

Hk+1 = Hk +sksT

k+1 + sk+1sTk

sTk yk

k+1sk+1

gTk sk

sTk yk

. (19)

An early idea is to consider the linear system formed by the aboverelations and

H8(j+1) = diag(t−4E2, t4E2)H8j ,

where E2 is the 2-dimensional identity matrix. It seems difficult toanalyze with the above formula directly.

Another Observation. In the case of the dimension n = 4 and thelinear independence assumption of the steps, the matrix Hk+1 can beuniquely defined by the equations given by Hk+1yk , Hk+1gk+1,Hk+1gk+2 and Hk+1gk+3. As a matter of fact, we have that

Hk+1yk = sk (quasi-Newtion equations) (20)Hk+1gk+1 = −sk+1 (using (3) and αk ≡ 1) (21)

Hk+1gk+2 = −sk+2 +gT

k+2sk+2

gTk+1sk+1

sk+1 (by (6) and αk ≡ 1) (22)

Hk+1gk+3 = −sk+3 +

k+3sk+3

gTk+2sk+2

k+3sk+1

gTk+1sk+1

k+2sk+2

gTk+1sk+1

k+3sk+1

gTk+1sk+1

)sk+1 (23)

The last equality is obtained by multiplying (19) by gk+2, using (22)and finally replacing k with k + 1.

Relations (20)-(23) provide a system of 16 equations, while thesymmetric matrix Hk+1 only has 10 independent entries.

Question. How to ensure that this linear system has a symmetricsolution H?

Lemma 3. Assume that {u1,u2, . . . ,un} and {v1,v2, . . . ,vn} are twosets of n-dimensional linearly independent vectors. Then there exists asymmetric matrix H ∈ Rn×n satisfying

Hui = vi , i = 1,2, . . . ,n (24)

if and only ifuT

i vj = uTj vi , ∀ i , j = 1,2, . . . ,n. (25)

Proof. The “only if" part. If H = HT satisfies (24), we have for alli , j = 1,2, . . . ,n, uT

i vj = uTi Huj = uT

i HT uj = (Hui)T uj = vT

i uj .The “if" part. Assume that (25) holds. Defining the matrices

U = (u1 u2 . . . un) , V = (v1 v2 . . . vn) ,

direct calculations show that

UT HU = UT V =

1 v1 uT1 v2 · · · uT

1 vnuT

2 v1 uT2 v2 · · · uT

2 vn· · · · · · · · · · · ·

uTn v1 uT

n v2 · · · uTn vn

:= A. (26)

By (25), A is symmetric. So H = U−T AU−1 satisfies H = HT and (24).Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 27 / 70

Using Lemma 3, the following six conditions are sufficient for the linearsystem (20)-(23) to allow a symmetric matrix Hk+1.

gTk+1(Hk+1yk ) = (Hk+1gk+1)

gTk+2(Hk+1gk+1) = (Hk+1gk+2)

T gk+1

gTk+3(Hk+1gk+1) = (Hk+1gk+3)

T gk+1

gTk+3(Hk+1gk+2) = (Hk+1gk+3)

T gk+2

Considering the whole sequence of {Hk+1; k ≥ 1} and combining theline search condition gT

k+1sk = 0, we require the following fourconditions hold for all k ≥ 1.

gTk+1sk = 0 (27)

sTk+1yk = 0 (28)

gTk+2sk = −sT

k+2yk (29)

gTk+3sk = −sT

k+3yk +

k+3sk+3

gTk+2sk+2

k+3sk+1

gTk+1sk+1

k+2yk (30)

Looking for the expression for matrix Hk+1

Lemma 4. Assume that (25) holds and the matrix H satisfies (24). Iffurther, the matrix A in (26) is positive definite, then there must existnonsingular triangular matrices T1 and T2 such that

A = V T U = T−T1 T2.

Thus if denoting V = VT1, we have that

V T U = T2.

Since HU = V , we get that

H = VU−1 =(

VT−11

) (T−1

2 V T)

T−11 T−1

)V T .

Assuming that the diagonal matrix T−11 T−1

2 has diagonal entriest1, t2, . . . , tn, V = (v1, v2, . . . , vn), we obtain

H =n∑

ti vi vTi .

To express Hk+1, we make use of the steps {sk+i ; i = 0,1,2,3} tointroduce the following two vectors that are orthogonal to yk and gk+1:

zk = −sTk+2yk

sTk yk

sk −sT

k+2gk+1

sTk+1gk+1

sk+1 + sk+2,

wk = −sTk+3yk

sTk yk

sk −sT

k+3gk+1

sTk+1gk+1

sk+1 + sk+3.(31)

The vectors zk and wk are well defined because sTk yk = −sT

k gk > 0 ispositive due to the descent property of dk .

Direct calculations show that

zTk yk = zT

k gk+1 = 0, wTk yk = wT

k gk+1 = 0. (32)

Further, we define the vector that is orthogonal to gk+2:

vk = −wT

k gk+2

zTk gk+2

zk + wk . (33)

By the choice of vk , it is easy to see that

vTk yk = vT

k gk+1 = vTk gk+2 = 0. (34)

If zTk gk+2 < 0 and vT

k gk+3 < 0, the following matrix Hk+1 is positiveand satisfies all the relations (20), (21), (22) and (23):

Hk+1 = ak sksTk + bk sk+1sT

k+1 + ck zkzTk + dk vkvT

k , (35)

sTk yk

, bk = − 1sT

k+1gk+1, ck = − 1

zTk gk+2

, dk = − 1vT

k gk+3.

At the same time, the positive definiteness of Hk+1 requires

ak < 0, bk < 0, ck < 0, dk < 0.

Looking for Consistent Steps and Gradients Choosing Parameters

Outline

Now we turn to how to choose suitable parameters such that theconsistency conditions (27), (28), (29) and (30) hold. For this, wedenote

v = (∆1 ∆2 ∆3 ∆4)T ,

∆1 =√

2 l , ∆2 =√

2 h, ∆3 = −√

2 τ, ∆4 = −√

2 γ. (37)

The condition sT1 g2 = sT

1 Pg1 = 0, namely, (27) with k = 1, asks

[t cosα − t sinα cosβ − sinβ] v = 0. (38)

The condition sT2 y1 = sT

1 MT (P − I)g1 = sT1 (tI −MT )g1 = 0, namely,

(28) with k = 1, requires the vector v to satisfy

[t − cosα − sinα t(1− cosβ) − t sinβ] v = 0. (39)

The requirement (29) with k = 1, that is,

0 = sT1 g3 + sT

3 y1 = sT1 P2g1 + sT

1 (M2)T (P − I)g1

= sT1 [P2 + tMT − (M2)T ]g1,

yields the equation[t cosα+ (t2 − 1) cos 2α t sinα− (1 + t2) sin 2α

t2(cosβ − cos 2β) + cos 2β t2(sinβ − sin 2β)− sin 2β]

v = 0.(40)

By (38), (39), (40) and the choice of α and β, we see that {∆i} mustsatisfy

2 t −√

22 t −

2 −√

t −√

22 −

2 (1 +√

22 )t −

2 t√

2 t − (1 + t2) −√

22 t2 (

2 + 1)t2 + 1

∆1∆2∆3∆4

(41)By the Cramer’s rule, to meet (41), we may choose {∆i} as follows:

∆1 = ±[(2 +

√2)t3 + (1 +

√2)t + 1

∆2 = ±[(2 +

√2)t3 + (1 +

√2)t − 1

∆3 = ±[−√

2 t3 + 2t2 + (1−√

2)t + 1],

∆4 = ±[√

2 t3 − 2t2 + (1 +√

2)t − 1].

Since gT1 s1 = ∆1 + ∆3 = ±2(t + 1)(t2 + 1) and t ∈ (0,1), we choose

all the signs in (42) as − so that gT1 s1 < 0.

Noting that sT3 y1 = −gT

3 s1 and gT4 s4 = t gT

3 s3, we know that (30) withk = 1 is equivalent to

gT4 s1 + gT

2 s4 − gT1 s4 + gT

gT4 s2

gT2 s2

]= 0. (43)

Further, using gT2 s2 = t gT

1 s1, gT4 s2 = t gT

3 s1 and gT2 s4 = t gT

1 s3, (43)can be simplified as

gT1 s1

4 s1 − gT1 s4 + t(gT

3 s1 + gT1 s3)

]+ (gT

3 s1)2 = 0. (44)

Consequently, we obtain the following equation for the parameter t :

(t + 1)2(t2 + 1)2[p2(t) + 2tp(t)− 2q(t)

]= 0, (45)

p(t) = −(2 +√

2)t2 + (2 +√

2)t − 1,q(t) = (2 + 3

√2)t3 − (4 + 3

√2)t2 + (3 + 2

√2)t − 2

Further calculations provide(6 + 4

√2)−1 [

p2(t) + 2tp(t)− 2q(t)]

=[t2 + (1−

√2)t + (1−

2 )] [

t2 + (1− 3√

2)t + (−3 + 72

√2)].

By the above relation, it is not difficult to show that the equation (45)has a unique root in the interval (−1, 1), that is

t =3√

2− 1−√

31− 20√

(≈ 0.7973) , (47)

which satisfies

1− 3√

(−3 +

Therefore if we choose the above t and the vectors s1 and g1 such that(42) holds with minus sign, then the prefixed steps and gradientssatisfy the consistency conditions (27), (28), (29) and (30) for all k ≥ 1.

Furthermore, by (42) with minus signs and the definitions of ∆i ’s, wecan obtain

l = (−4− 13√

2)t + (−1 + 11√

h = (−4− 13√

2)t + (−1 + 12√

γ = (17− 8√

2)t + (−17 + 9√

τ = (−17 + 9√

2)t + (17− 9√

Therefore the vectors s1 and g1 are

(17− 8√

2)t + (−17 + 9√

(−17 + 9√

2)t + (17− 9√

(−4− 13

√2)t + (−1 + 11

(−4− 13√

2)t + (−1 + 12√

−√

Remark. Numerically, other choices for α and β are possible. Forexample,

α =27π, β =

which could lead to a counter-example of 7 cyclic points. In this case,however, it is difficult to get its analytical solution and we can onlyobtain a numerical solution of t ≈ 0.8642 in (0,1).

Constructing A Suitable Objective Function Recovering the iterations and function values

Outline

For i = 1, . . . ,8, the limits x∗i := limj→∞

x8j+i and g∗i := limj→∞

x8j+i exist.

Suppose that

x∗i =

aibi00

, g∗i =

00cidi

; i = 1, . . . ,8

witha1 = −

2, b1 = −1−

2, c1 = 0, d1 = −

We have that (ai+1bi+1

(ci+1di+1

We can see that {x∗i : i = 1, . . . ,8} are exactly the vertices of a regularoctagon that lies in the plane spanned by the first and secondcoordinates and has the origin as its center.

Due to the special choices of {sk} and {gk}, we have

x8j+i =

t8jpit8jqi

, g8j+i =

t8j lit8jhicidi

; i = 1, . . . ,8

with (pi+1qi+1

)= t R2

(li+1hi+1

)= t R1

To get the values of p1 and q1, we denote v to be the vector formed bythe last two components of v. We have that

x∗1 − x1 =∞∑

si =∞∑

(tR2)i s1 = (E − tR2)

−1(γτ

where E is the identity matrix. Consequently, we have that(p1q1

)= x1 =

((−15720 + 4019

√2)t + (32931− 17534

(4376− 1977√

2)t + (9563− 9433√

Assuming that the limit of f (xk ) is f ∗, we have that

f (x8j+i)− f ∗ =(g∗i)T (x8j+i − x∗i

)= t8j

)T (piqi

)= t8j

[R i−1

)]T [(tR2)

i−1(

)]= t8j+i−1

)T (p1q1

Consequently,

f (x1)− f ∗ = c1p1 + d1q1 = (3954−4376√

2)t+(18866−9563√

2)4633 ,

f (x8j+i)− f ∗ = t8j+i−1(

(λ′i(ρ)µ′i(ρ)

))= t8j+i−1 (f (x1)− f ∗) .

Now we see the value of gT8j+is8j+i . Denoting

s8j+i =

ηiξi

t8jγit8jτi

we have η1 =√

2, ξ1 = 0, γ1 = γ, τ1 = τ and(ηiξi

(ηi−1ξi−1

(γiτi

)= t R2

(γi−1τi−1

gT8j+is8j+i =

t8j lit8jhicidi

ηiξi

t8jγit8jτi

lihicidi

ηiξiγiτi

= t8j+i−1(l1η1 + h1ξ1 + c1γ1 + d1τ1) = t8j+i−1gT

where gT1 s1 =

√2(l1 − τ1) = (−44 + 13

√2)t + (40− 18

√2).

Therefore

f (x8j+i+1)− f (x8j+i)

α8j+igT8j+is8j+i

=(f (x8j+i+1)− f ∗)− (f (x8j+i)− f ∗)

gT8j+is8j+i

=(t8j+i − t8j+i−1)(f (x1)− f ∗)

t8j+i−1gT1 s1

=(t − 1)(f (x1)− f ∗)

gT1 s1

≈ 2.6483e − 02.

The above relation, together with gT8j+i+1s8j+i = 0, implies that the

stepsize α8j+i = 1 can be accepted by either the Wolfe line search orthe Armijo line search with suitable line search parameters.

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

Outline

To be such that∇f (x8j+i) = g8j+i ,

we assume that the objective function f is of the form

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

where λ and µ are 2-dimensional functions to be determined. Denoting

)and Ai =

(∂λ(Vi )

∂µ(Vi )∂x1

∂λ(Vi )∂x2

∂µ(Vi )∂x2

the functions λ and µ have to satisfy(λ(Vi)µ(Vi)

). (50)

Recalling the relations(pi+1qi+1

)= t R2

(li+1hi+1

)= t R1

we shall ask Ai and Ai+1 to meet the condition

Ai+1 = R1AiRT2 (51)

so that the relation (50) satisfies for all i provided that it holds withi = 1. This is because if (50) and (51) is true, we have

(pi+1qi+1

)= (R1AiRT

2 )(tR2)

)= tR1Ai

)= tR1

(li+1hi+1

Therefore we can focus our attention on the choice of the matrix

(ω1 ω2ω3 ω4

Now we shall consider the restrictions of the functions λ and µ on theedges of the octagon and define

λi(α) = λ(Vi + αsi), µi(α) = µ(Vi + αsi),

si = Vi+1 − Vi =

(ηiξi

)is the vector formed by the first two components of s8j+i . The relation(49) implies that(

λi+1(0)µi+1(0)

(λi(0)µi(0)

)for all i ≥ 1.

At the same time, we have that(λ′i+1(0)

µ′i+1(0)

i+1si+1 =(

R2ATi RT

)(R1si) = R2AT

i si = R2

(λ′i(0)µ′i(0)

)holds for all i ≥ 1.

Based on the above conditions, we are going to construct the functionsλi ’s and µi ’s such that(

λi+1(α)µi+1(α)

(λi(α)µi(α)

)for all i ≥ 1 and α ∈ <1. (52)

Constructing A Suitable Objective Function Constructing The Functions on The Lines

Outline

Consider the following extension of the line search functions

ψi(α) = f (xi + αsi), α ∈ <1.

We shall concentrate on the choices of λ1(·), µ1(·) and ψ1(·).Assuming that f ∗ = 0, it is obvious that

ψ1(0) = f (x1) = c1p1 + d1q1, (53)

ψ1(1) = ψ2(0) = t f (x1), (54)

ψ′1(0) = gT1 s1 =

√2 (l1 − τ1), (55)

ψ′1(1) = gT2 s1 = 0, (56)

Suppose that we have constructed a function ψ1 satisfying all the fourconditions (53)-(56) and some special requirements.

There are many ways to choose the line search functions. Thefollowing is one of them with good theoretical properties.

ψ1(α) = ζ1(α− 1)38 + ζ2(α− 1)2 + ζ3,

ζ1 = (173256−38127√

2)t+(−138064+48586√

2)166788 ≈ 1.5468E − 1,

ζ2 = (377472−359709√

2)t+(−712544+577958√

2)166788 ≈ 1.0405E − 3,

ζ3 = (−11344+6675√

2)t+(42494−26967√

2)4633 ≈ 6.1270E − 1.

We shall now ask what conditions have to be satisfied by λ1 and µ1.Since ψ1 is the restriction of f on the straight line

x1 + αs1 =

a1b1p1q1

η1ξ1γ1τ1

we have that

ψ1(α) = [p1 + αγ1]λ1(α) + [q1 + ατ1]µ1(α). (57)

Now, by (49),(λ1(0)µ1(0)

(λ(V1)µ(V1)

(λ1(1)µ1(1)

(λ(V2)µ(V2)

), (58)

which is obviously consistent with (53) and (54).

One has to be careful to choose the derivatives λ′1(0), µ′1(0), λ′1(1) andµ′1(1) to meet

). (59)

There are some relations between these values and A1

AT1 s1 =

(λ′1(0)µ′1(0)

), (60)

AT1 s8 =

(λ′8(1)µ′8(1)

(λ′1(1)µ′1(1)

). (61)

By the expression of ψ1, we know that

ψ′1(α) = γ1λ1(α) + τ1(α) + [p1 + αγ1]λ′1(α) + [q1 + ατ1]µ

′1(α).

Setting α = 0 and 1 respectively, we obtain the following moreconditions (

)T (λ′1(0)µ′1(0)

)= ψ′1(0)− γ1λ1(0)− τ1µ1(0), (62)

)T (λ′1(1)µ′1(1)

)= ψ′1(1)− γ1λ1(1)− τ1µ1(1). (63)

The relations (59)–(63) form a system of 8 equations with 8 variables

ω1, ω2, ω3, ω4, λ′1(0), µ′1(0), λ′1(1), µ′1(1).

Therefore after the function ψ1(α) has been chosen, we can derivethose derivatives uniquely in the above way. This, together with therelation (52), also imposes the relation (51) between Ai+1 and Ai .

Attention must be also given to the cross points of the straight lineconnecting V1 and V2 and the others. It is easy to deduce that(

λ1(1 +√

µ1(1 +√

(λ3(−

µ3(−√

(λ1(−

µ1(−√

). (64)

Setting α to be 1 +√

22 and −

2 , respectively, yields(p1 + (1 +

2 )γ1

q1 + (1 +√

22 )τ1

)T (λ1(1 +

µ1(1 +√

)= ψ1(1 +

2), (65)

(p1 −

q1 −√

22 τ1

)T (λ1(−

µ1(−√

)= ψ1(−

2). (66)

The above system of 4 linear equations decides the values of

λ1(1 +

2), µ1(1 +

2), λ1(−

2), µ1(−

In a similar way, we must have that(λ1(2 +

µ1(2 +√

(λ4(−1−

µ4(−1−√

(λ1(−1−

µ1(−1−√

). (67)

Setting α to be 2 +√

2 and −1−√

2, respectively, yields(p1 + (2 +

√2)γ1

q1 + (2 +√

)T (λ1(2 +

µ1(2 +√

)= ψ1(2 +

√2), (68)

(p1 − (1 +

√2)γ1

q1 − (1 +√

)T (λ1(−1−

µ1(−1−√

)= ψ1(−1−

√2). (69)

The above system of 4 linear equations decides the values of

λ1(2 +√

2), µ1(2 +√

2), λ1(−1−√

2), µ1(−1−√

Now we can construct a twice-continuous differentiable function λ1 byfor example Lagrangian interpolation such that the following value

λ1(−1−√

2), λ1(−√

), λ1(0), λ′1(0), λ1(1), λ′1(1), λ1(1+

2), λ1(2+

is in accordance with the required one.

The function µ1 is then obtained from (57)

µ1(α) =ψ1(α)− (p1 + αγ1)λ1(α)

q1 + ατ1.

Some more attention could be made so that µ1 is twice-continuouslydifferential function.

Constructing A Suitable Objective Function Extending the Functions to the Whole Plane

Outline

Define the eight straight lines

L1 = {(x1, x2) : x2 + (1 +√

22 ) = 0},

L2 = {(x1, x2) : x1 − x2 − (1 +√

2) = 0},L3 = {(x1, x2) : x1 − (1 +

2 ) = 0},L4 = {(x1, x2) : x1 + x2 − (1 +

√2) = 0},

L5 = {(x1, x2) : x2 − (1 +√

22 ) = 0},

L6 = {(x1, x2) : x1 − x2 + (1 +√

2) = 0},L7 = {(x1, x2) : x1 + (1 +

2 ) = 0},L8 = {(x1, x2) : x1 + x2 + (1 +

√2) = 0},

which are the extensions of the eight edges of the octagon.

To complete the construction of the objective function f , we have todeduce the twice-continuously differentiable extensions of thefunctions λ and µ, which are already defined on the straight lines{Li : i = 1, . . . ,8}.

An Illustrative ExampleGiven a 1-dimensional polynomial function

f (x), x ∈ R1,

with zero roots at x = 0 and x = 1. How to extend this function to R2

so that the extension function F (x , y) satisfies

(1) F (x ,0) = f (x), for x ∈ R1,

(2) F (x , y) = 0, if x = 0,(3) F (x , y) = 0, if x + y = 1,(4) F (x , y) = 0, if y = 1?

Answer. Since f (x) is polynomial, we can write

f (x) = x(x − 1)h(x).

The following extension function is one desired function:

F (x , y) = x(x + y − 1)(1− y) h(x).

There are 24 crossing points altogether of the eight lines, including thevertices {Vi : i = 1, . . . ,8} of the octagon. Assume that the othercrossing points are {Ci : i = 1, . . . ,8}. Then it is possible to constructa two-dimensional eight-order polynomial function φ such that thefunction values of φ at Ci ’s are accordance to those of λ, and thefunction and gradient values of φ at Vi ’s are accordance to thecorresponding values of λ.

Denote the restrictions of φ to each line Li to be φi . For each i , we aregoing to construct a function ei(x1, x2) such that

ei(x1, x2) =

{λi − φi , (x1, x2) ∈ Li0, (x1, x2) ∈ ∪j 6=iLj .

Thus the sum function e(x1, x2) =∑8

i=1 ei(x1, x2) is accordance withλi − φi on the line Li for each i . Therefore

λ(x1, x2) = φ(x1, x2) + e(x1, x2)

is the desired extension. The extension µ(x1, x2) can be similarlyobtained.

Constructing A Suitable Objective Function Finals form of the objective function

Outline

If we only ask the Wolfe conditions to be satisfied,

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).

More exactly,

λ(x1, x2) = (15− 212

√2)x5

1 + (6− 92

√2)(x4

1 x2 + 2x21 x3

+ (√

2− 1)(6x21 x2 + x3

2 ) + (−15 + 10√

+ (154− 15

√2)x1 −

√2x2 + ω1λ(x1, x2) + ω3λ(−x2, x1),

λ(x1, x2) =3− 2

1 − 1)(2x21 − (3 + 2

√2))x2,

If the stepsize is chosen to be the first local minimizer,

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).

More exactly,

λ(x1, x2) := λ(x1, x2) +(

x21 + x2

2 − (2 +√

2))2·∑

[wj λj(x1, x2) + uj λj(x2, x1)

λj(x1, x2) = x31 − 3x1x2

2 , x51 − 2x3

1 x22 − 3x1x4

4 , . . .

Conclusion and Discussion

A perfect example

1,14π,

34π, 8,

2− 1−√

31− 20√

A 3-dimensional example?Any helps in analyzing the convergence of DFP?Any helps in improving the practical BFGS algorithm?

Conclusion and Discussion

THANK YOU VERY MUCH !

a perfect example for the bfgs...

Documents

research -...

seismic impedance inversion using l -norm...

modeling of frequency-domain elastic-wave equation with an...

earth and planetary...

basin research (2001) 13, 141–167 carbonate sedimentation...

modeling the effects of secular variation of geomagnetic...

journal of asian earth...

an interactive program on digitizing historical...

origin of sulfur rich oils and h s in tertiary lacustrine...

paleozoic multiple subduction-accretion processes of...

earth and planetary...

provided for non-commercial research and educational...

earth and planetary science letters -...

refertilization of ancient lithospheric mantle beneath the...

magmatic underplating beneath the emeishan large igneous...

comparison of particle size characteristics of the...

contents lists available at sciencedirect journal of...

an efficient step-length formula for correlative least...

pleistocene chemical weathering history of asian arid and...

diffraction imaging by uniform asymptotic theory and...