a perfect example for the bfgs...
Post on 12-Oct-2020
1 Views
Preview:
TRANSCRIPT
A Perfect Example for the BFGS Method
Yu-Hong Dai
State Key Laboratory of Scientific and Engineering ComputingInst of Computational Mathematics and Scientific/Engineering Computing
Academy of Mathematics and Systems ScienceChinese Academy of Sciences, Beijing 100190, China
Email: dyh@lsec.cc.ac.cnhttp://lsec.cc.ac.cn/∼dyh
July 12, 2010
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 1 / 70
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70
The BFGS Quasi-Newton Method
Quasi-Newton Methods
Unconstrained Optimization
minx∈<n
f (x)
Quasi-Newton Methods
xk+1 = xk + αk dk
dk = −Hk gk
How does Hk+1 get close to [∇2f (xk+1)]−1? Defining
sk = xk+1 − xk , yk = gk+1 − gk ,
the answer by Davidon (1959) is to ask Hk+1 to be updated from Hkand match the second derivative information along the previous step:
Hk+1yk = sk
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 3 / 70
The BFGS Quasi-Newton Method
Broyden’s Convex Family of Methods
The DFP Method
Hk+1 = Hk −HkykyT
k Hk
yTk Hkyk
+sksT
k
sTk yk
The BFGS Method
Hk+1 = Hk −skyT
k Hk + HkyksTk
sTk yk
+
(1 +
yTk Hkyk
sTk yk
)sksT
k
sTk yk
Broyden’s Convex Family of Methods (φ ∈ [0, 1])
Hk+1 = Hk −HkykyT
k Hk
yTk Hyk
+sksT
k
sTk yk
+ φvkvTk ,
where vk =√
yTk Hyk
(sk
sTk yk
− Hkyk
yTk Hyk
)Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 4 / 70
The BFGS Quasi-Newton Method
Theoretical and Practical Line Searches
Line Search Function
ψk (α) = f (xk + αdk ), where α ∈ [0, +∞)
Global Exact Line Search: ψk (α∗k ) = global minψk (α)
Curry Line Search: α∗k is the first local minimizer of ψk (α)
Wolfe Inexact Line Search: find αk such that{ψk (αk ) ≤ ψk (0) + σ1ψ
′k (0)αk
ψ′k (αk ) ≥ σ2ψ′k (0)
Armijo Inexact Line Search: choose αk to be the maximal valueof {λm : m ≥ 0}, where λ ∈ (0, 1), such that
ψk (αk ) ≤ ψk (0) + σψ′k (0)αk
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 5 / 70
The BFGS Quasi-Newton Method
Convergence of BFGS for Convex Functions
Powell (1976):
The BFGS methodwith Wolfe inexact line searches
is globally convergent foruniformly convex functions
Byrd, Nocedal and Yuan (1987):
The Broyden’s convex family of methods (except DFP)with Wolfe inexact line searches
is globally convergent for(not necessarily uniformly) convex functions
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 6 / 70
The BFGS Quasi-Newton Method
Two Open Convergence Questions
Nocedal (1992), Fletcher (1994), et al:
Open Question (I)Does the DFP method with Wolfe line searches converge for uniformlyconvex functions?
Open Question (II)Does the BFGS method with Wolfe line searches converge forgeneral nonconvex functions?
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 7 / 70
The BFGS Quasi-Newton Method
Nonconvergence of BFGS for Nonconvex Functions
Powell (1984): if the stepsize αk can be chosen as any local minimizerof the line search function ψk (α)
Dai (2002): if the stepsize αk is obtained by the Wolfe inexact linesearch
Mascarenhas (2004): if the stepsize αk is chosen to be the globalminimizer of the line search function ψk (α)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 8 / 70
The BFGS Quasi-Newton Method
Motivation of This Work
Powell (2000) was able to show that the BFGS method convergesglobally for two-dimensional nonconvex functions if the line searchtakes the first local minimizer of ψk (α).
The Aim of This Work is to construct a perfect example for thenonconvergence of the BFGS method with the following properties:
the stepsize is always one; namely, αk ≡ 1;each line search function ψk (α) is convex and hence αk is theunique minimizer of ψk (α);the unit stepsize can be accepted by all the exact and inexact linesearches mentioned before.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 9 / 70
The BFGS Quasi-Newton Method
Basic Ideas of Constructing Examples
Prefix the concrete forms of the steps {sk ; k ≥ 1} and gradients{gk ; k ≥ 1} with some parameters to be determined later.Consequently, once x1 is given, the whole sequence {xk} is fixed.To enable the BFGS method to generate the prefixed steps,investigate the consistency conditions about the steps {sk ; k ≥ 1}and gradients {gk ; k ≥ 1}.Choose the parameters by some way to satisfy the consistencyconditions and the other necessary conditions on the objectivefunction.Construct a function f whose gradients are the preassignedvalues, namely, ∇f (xk ) = gk for all k ≥ 1 and ensure the linesearch has the desired properties.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 10 / 70
Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 11 / 70
Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients
Denoting the rotation matrix
R1 =
(cosα − sinαsinα cosα
), R2 =
(cosβ − sinβsinβ cosβ
)we prescribe the forms of the steps {sk} and the gradients {gk}:
s1 = (√
2, 0, γ, τ)T , sk+1 = Msk (k ≥ 1);
g1 = (l , h, 0, −√
2)T , gk+1 = Pgk (k ≥ 1),
where
M =
(R1 00 tR2
), P =
(tR1 00 R2
).
In the above, the parameter t ∈ (0,1) answers for the decay of the lasttwo components of sk and the first two components of gk .
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 12 / 70
Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients
The angles α and β are chosen so that
α =2πm1
, β =2πm2
,
for some integers m1 and m2. Specifically, we choose
α =14π, β =
34π.
Therefore, after every eight iterations, the first two components of thesteps {sk} turn to the same while the last two components of {sk}shrink with the factor of t8. This means that {xk} asymptotically turnaround the vertices of a regular octagon that lies in the plane.So is {gk} if we exchange the first and the last two components.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 13 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 14 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Assumption on the Line SearchWe always assume that the line search satisfies
gTk+1sk = 0. (1)
This with the quasi-Newton equation
Hk+1yk = sk , (2)
and the definition of the search direction
Hk+1gk+1 = −α−1k+1sk+1 (3)
implies that
sTk+1yk = −αk+1gT
k+1Hk+1yk = −αk+1gTk+1sk = 0. (4)
The above relation (4) is called as the conjugacy condition in thecontext of nonlinear conjugate gradient methods.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 15 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Expression for Hkyk and Hk+1.
Multiplying the BFGS updating formula with gk+1 and using (1),
Hk+1gk+1 = Hkgk+1 −yT
k Hkgk+1
sTk yk
sk ,
which with (3) gives
Hkgk+1 = −α−1k+1sk+1 +
yTk Hkgk+1
sTk yk
sk . (5)
Multiplying (5) by gTk and noticing gT
k Hkgk+1 = 0, we get that
0 = −α−1k+1gT
k sk+1 +yT
k Hkgk+1
sTk yk
gTk sk = −α−1
k+1gTk sk+1 − yT
k Hkgk+1.
Thus yTk Hkgk+1 = −α−1
k+1gTk sk+1 = −α−1
k+1gTk+1sk+1, and hence
Hkgk+1 = −α−1k+1sk+1 − α−1
k+1gT
k+1sk+1
sTk yk
sk . (6)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 16 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Expression for Hkyk and Hk+1 (ctd.) Thus by (6) andHkgk = −α−1
k sk , we obtain
Hkyk = −α−1k+1sk+1 +
[α−1
k − α−1k+1
gTk+1sk+1
sTk yk
]sk . (7)
Substituting this into the BFGS updating formula
Hk+1 = Hk −skyT
k Hk + HkyksTk
sTk yk
+
(1 +
yTk Hkyk
sTk yk
)sksT
k
sTk yk
yields
Hk+1 = Hk+α−1k+1
sksTk+1 + sk+1sT
k
sTk yk
+
[1− α−1
k + α−1k+1
gTk+1sk+1
sTk yk
]sksT
k
sTk yk
.
(8)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 17 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Lemma 1. Assume that gTk+1sk = 0 for all k ≥ 1. Then for all k ≥ 1
and i ≥ 0, we have that
Hkgk+i + α−1k+isk+i ∈ Span{sk ,sk+1, . . . ,sk+i−1}. (9)
Proof. For convenience, we write (8)
Hk+1 = Hk + V (sk ,sk+1), (10)
where V (sk ,sk+1) means the rank-two matrix in the right hand of (8).Therefore for all i ≥ 1, we have that
Hk+i = Hk +i∑
j=1
V (sk+j−1,sk+j). (11)
The statement follows by multiplying the above relation with gk+i andusing Hk+igk+i = −α−1
k+isk+i and gTk+isk+i−1 = 0.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 18 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
Lemma 2. To construct a desired example with Det(S1) 6= 0, we musthave that
αk = 1, for all k ≥ 4.
proof. Define
Gk =[yk−1 gk gk+1 gk+2
]Sk =
[sk−1 sk sk+1 sk+2
].
Then by Hkyk−1 = sk−1 and Lemma 1, we can get that
Det(Hk )Det(Gk ) = Det[Hkyk−1 Hkgk Hkgk+1 Hkgk+2
]= Det
[sk−1 − α−1
k sk − α−1k+1sk+1 − α−2
k+2sk+2]
= −α−1k α−1
k+1α−1k+2Det(Sk ).
(12)Replacing k with k + 1 in the above yields
Det(Hk+1)Det(Gk+1) = −α−1k+1α
−1k+2α
−1k+3Det(Sk+1). (13)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 19 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
On the other hand, due to the special forms of {gk} and {sk}, we havethat Gk+1 = PGk and Sk+1 = MSk . Hence
Det(Gk+1) = Det(P) Det(Gk ) = t2 Det(Gk ),
Det(Sk+1) = Det(M) Det(Sk ) = t2 Det(Sk ),(14)
Due to the basic determinant relation of BFGS and the assumption,
Det(Hk+1) =sT
k H−1k sk
sTk yk
Det(Hk ) = αkDet(Hk ). (15)
If Det(S1) 6= 0, (12) with k = 1 implies that Det(G1) 6= 0. Then by (14),
Det(Gk ) 6= 0, Det(Sk ) 6= 0, for all k ≥ 1. (16)
Dividing (13) by (12) and using the above relations, we then obtain
αk+3 = 1.
So the statement holds due to the arbitrariness of k ≥ 1.Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 20 / 70
Looking for Consistent Steps and Gradients Unit Stepsizes
The deletion of the first finite iterations does not influence the wholeexample. Thus we will ask our counter-example to satisfy
Det(S1) 6= 0 (17)
andαk ≡ 1. (18)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 21 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 22 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Question. What else conditions on {gk} and {sk} which ensure thesequence of {sk ; k ≥ 1} can be generated by the BFGS update?
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 23 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Under the assumptions (1) (i.e. gTk+1sk = 0), (17) and (18), the
updating formula of Hk in (8) can be simplified as
Hk+1 = Hk +sksT
k+1 + sk+1sTk
sTk yk
−gT
k+1sk+1
gTk sk
sksTk
sTk yk
. (19)
An early idea is to consider the linear system formed by the aboverelations and
H8(j+1) = diag(t−4E2, t4E2)H8j ,
where E2 is the 2-dimensional identity matrix. It seems difficult toanalyze with the above formula directly.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 24 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Another Observation. In the case of the dimension n = 4 and thelinear independence assumption of the steps, the matrix Hk+1 can beuniquely defined by the equations given by Hk+1yk , Hk+1gk+1,Hk+1gk+2 and Hk+1gk+3. As a matter of fact, we have that
Hk+1yk = sk (quasi-Newtion equations) (20)Hk+1gk+1 = −sk+1 (using (3) and αk ≡ 1) (21)
Hk+1gk+2 = −sk+2 +gT
k+2sk+2
gTk+1sk+1
sk+1 (by (6) and αk ≡ 1) (22)
Hk+1gk+3 = −sk+3 +
(gT
k+3sk+3
gTk+2sk+2
+gT
k+3sk+1
gTk+1sk+1
)sk+2
−
(gT
k+2sk+2
gTk+1sk+1
)(gT
k+3sk+1
gTk+1sk+1
)sk+1 (23)
The last equality is obtained by multiplying (19) by gk+2, using (22)and finally replacing k with k + 1.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 25 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Relations (20)-(23) provide a system of 16 equations, while thesymmetric matrix Hk+1 only has 10 independent entries.
Question. How to ensure that this linear system has a symmetricsolution H?
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 26 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Lemma 3. Assume that {u1,u2, . . . ,un} and {v1,v2, . . . ,vn} are twosets of n-dimensional linearly independent vectors. Then there exists asymmetric matrix H ∈ Rn×n satisfying
Hui = vi , i = 1,2, . . . ,n (24)
if and only ifuT
i vj = uTj vi , ∀ i , j = 1,2, . . . ,n. (25)
Proof. The “only if" part. If H = HT satisfies (24), we have for alli , j = 1,2, . . . ,n, uT
i vj = uTi Huj = uT
i HT uj = (Hui)T uj = vT
i uj .The “if" part. Assume that (25) holds. Defining the matrices
U = (u1 u2 . . . un) , V = (v1 v2 . . . vn) ,
direct calculations show that
UT HU = UT V =
uT
1 v1 uT1 v2 · · · uT
1 vnuT
2 v1 uT2 v2 · · · uT
2 vn· · · · · · · · · · · ·
uTn v1 uT
n v2 · · · uTn vn
:= A. (26)
By (25), A is symmetric. So H = U−T AU−1 satisfies H = HT and (24).Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 27 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Using Lemma 3, the following six conditions are sufficient for the linearsystem (20)-(23) to allow a symmetric matrix Hk+1.
gTk+1(Hk+1yk ) = (Hk+1gk+1)
T yk
gTk+2(Hk+1yk ) = (Hk+1gk+2)
T yk
gTk+3(Hk+1yk ) = (Hk+1gk+3)
T yk
gTk+2(Hk+1gk+1) = (Hk+1gk+2)
T gk+1
gTk+3(Hk+1gk+1) = (Hk+1gk+3)
T gk+1
gTk+3(Hk+1gk+2) = (Hk+1gk+3)
T gk+2
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 28 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Considering the whole sequence of {Hk+1; k ≥ 1} and combining theline search condition gT
k+1sk = 0, we require the following fourconditions hold for all k ≥ 1.
gTk+1sk = 0 (27)
sTk+1yk = 0 (28)
gTk+2sk = −sT
k+2yk (29)
gTk+3sk = −sT
k+3yk +
(gT
k+3sk+3
gTk+2sk+2
+gT
k+3sk+1
gTk+1sk+1
)sT
k+2yk (30)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 29 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Looking for the expression for matrix Hk+1
Lemma 4. Assume that (25) holds and the matrix H satisfies (24). Iffurther, the matrix A in (26) is positive definite, then there must existnonsingular triangular matrices T1 and T2 such that
A = V T U = T−T1 T2.
Thus if denoting V = VT1, we have that
V T U = T2.
Since HU = V , we get that
H = VU−1 =(
VT−11
) (T−1
2 V T)
= V(
T−11 T−1
2
)V T .
Assuming that the diagonal matrix T−11 T−1
2 has diagonal entriest1, t2, . . . , tn, V = (v1, v2, . . . , vn), we obtain
H =n∑
i=1
ti vi vTi .
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 30 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
To express Hk+1, we make use of the steps {sk+i ; i = 0,1,2,3} tointroduce the following two vectors that are orthogonal to yk and gk+1:
zk = −sTk+2yk
sTk yk
sk −sT
k+2gk+1
sTk+1gk+1
sk+1 + sk+2,
wk = −sTk+3yk
sTk yk
sk −sT
k+3gk+1
sTk+1gk+1
sk+1 + sk+3.(31)
The vectors zk and wk are well defined because sTk yk = −sT
k gk > 0 ispositive due to the descent property of dk .
Direct calculations show that
zTk yk = zT
k gk+1 = 0, wTk yk = wT
k gk+1 = 0. (32)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 31 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
Further, we define the vector that is orthogonal to gk+2:
vk = −wT
k gk+2
zTk gk+2
zk + wk . (33)
By the choice of vk , it is easy to see that
vTk yk = vT
k gk+1 = vTk gk+2 = 0. (34)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 32 / 70
Looking for Consistent Steps and Gradients Consistency Conditions
If zTk gk+2 < 0 and vT
k gk+3 < 0, the following matrix Hk+1 is positiveand satisfies all the relations (20), (21), (22) and (23):
Hk+1 = ak sksTk + bk sk+1sT
k+1 + ck zkzTk + dk vkvT
k , (35)
where
ak =1
sTk yk
, bk = − 1sT
k+1gk+1, ck = − 1
zTk gk+2
, dk = − 1vT
k gk+3.
(36)
At the same time, the positive definiteness of Hk+1 requires
ak < 0, bk < 0, ck < 0, dk < 0.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 33 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 34 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Now we turn to how to choose suitable parameters such that theconsistency conditions (27), (28), (29) and (30) hold. For this, wedenote
v = (∆1 ∆2 ∆3 ∆4)T ,
where
∆1 =√
2 l , ∆2 =√
2 h, ∆3 = −√
2 τ, ∆4 = −√
2 γ. (37)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 35 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
The condition sT1 g2 = sT
1 Pg1 = 0, namely, (27) with k = 1, asks
[t cosα − t sinα cosβ − sinβ] v = 0. (38)
The condition sT2 y1 = sT
1 MT (P − I)g1 = sT1 (tI −MT )g1 = 0, namely,
(28) with k = 1, requires the vector v to satisfy
[t − cosα − sinα t(1− cosβ) − t sinβ] v = 0. (39)
The requirement (29) with k = 1, that is,
0 = sT1 g3 + sT
3 y1 = sT1 P2g1 + sT
1 (M2)T (P − I)g1
= sT1 [P2 + tMT − (M2)T ]g1,
yields the equation[t cosα+ (t2 − 1) cos 2α t sinα− (1 + t2) sin 2α
t2(cosβ − cos 2β) + cos 2β t2(sinβ − sin 2β)− sin 2β]
v = 0.(40)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 36 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
By (38), (39), (40) and the choice of α and β, we see that {∆i} mustsatisfy
√2
2 t −√
22 t −
√2
2 −√
22
t −√
22 −
√2
2 (1 +√
22 )t −
√2
2 t√
22 t
√2
2 t − (1 + t2) −√
22 t2 (
√2
2 + 1)t2 + 1
∆1∆2∆3∆4
= 0.
(41)By the Cramer’s rule, to meet (41), we may choose {∆i} as follows:
∆1 = ±[(2 +
√2)t3 + (1 +
√2)t + 1
],
∆2 = ±[(2 +
√2)t3 + (1 +
√2)t − 1
],
∆3 = ±[−√
2 t3 + 2t2 + (1−√
2)t + 1],
∆4 = ±[√
2 t3 − 2t2 + (1 +√
2)t − 1].
(42)
Since gT1 s1 = ∆1 + ∆3 = ±2(t + 1)(t2 + 1) and t ∈ (0,1), we choose
all the signs in (42) as − so that gT1 s1 < 0.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 37 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Noting that sT3 y1 = −gT
3 s1 and gT4 s4 = t gT
3 s3, we know that (30) withk = 1 is equivalent to
gT4 s1 + gT
2 s4 − gT1 s4 + gT
3 s1
[t +
gT4 s2
gT2 s2
]= 0. (43)
Further, using gT2 s2 = t gT
1 s1, gT4 s2 = t gT
3 s1 and gT2 s4 = t gT
1 s3, (43)can be simplified as
gT1 s1
[gT
4 s1 − gT1 s4 + t(gT
3 s1 + gT1 s3)
]+ (gT
3 s1)2 = 0. (44)
Consequently, we obtain the following equation for the parameter t :
(t + 1)2(t2 + 1)2[p2(t) + 2tp(t)− 2q(t)
]= 0, (45)
where
p(t) = −(2 +√
2)t2 + (2 +√
2)t − 1,q(t) = (2 + 3
√2)t3 − (4 + 3
√2)t2 + (3 + 2
√2)t − 2
√2.
(46)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 38 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Further calculations provide(6 + 4
√2)−1 [
p2(t) + 2tp(t)− 2q(t)]
=[t2 + (1−
√2)t + (1−
√2
2 )] [
t2 + (1− 3√
2)t + (−3 + 72
√2)].
By the above relation, it is not difficult to show that the equation (45)has a unique root in the interval (−1, 1), that is
t =3√
2− 1−√
31− 20√
22
(≈ 0.7973) , (47)
which satisfies
t2 +(
1− 3√
2)
t +
(−3 +
72
√2)
= 0.
Therefore if we choose the above t and the vectors s1 and g1 such that(42) holds with minus sign, then the prefixed steps and gradientssatisfy the consistency conditions (27), (28), (29) and (30) for all k ≥ 1.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 39 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Furthermore, by (42) with minus signs and the definitions of ∆i ’s, wecan obtain
l = (−4− 13√
2)t + (−1 + 11√
2),
h = (−4− 13√
2)t + (−1 + 12√
2),
γ = (17− 8√
2)t + (−17 + 9√
2),
τ = (−17 + 9√
2)t + (17− 9√
2).
(48)
Therefore the vectors s1 and g1 are
s1 =
√
20
(17− 8√
2)t + (−17 + 9√
2)
(−17 + 9√
2)t + (17− 9√
2)
,
g1 =
(−4− 13
√2)t + (−1 + 11
√2)
(−4− 13√
2)t + (−1 + 12√
2)0
−√
2
.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 40 / 70
Looking for Consistent Steps and Gradients Choosing Parameters
Remark. Numerically, other choices for α and β are possible. Forexample,
α =27π, β =
47π,
which could lead to a counter-example of 7 cyclic points. In this case,however, it is difficult to get its analytical solution and we can onlyobtain a numerical solution of t ≈ 0.8642 in (0,1).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 41 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 42 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
For i = 1, . . . ,8, the limits x∗i := limj→∞
x8j+i and g∗i := limj→∞
x8j+i exist.
Suppose that
x∗i =
aibi00
, g∗i =
00cidi
; i = 1, . . . ,8
witha1 = −
√2
2, b1 = −1−
√2
2, c1 = 0, d1 = −
√2.
We have that (ai+1bi+1
)= R1
(aibi
),
(ci+1di+1
)= R2
(cidi
).
We can see that {x∗i : i = 1, . . . ,8} are exactly the vertices of a regularoctagon that lies in the plane spanned by the first and secondcoordinates and has the origin as its center.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 43 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
Due to the special choices of {sk} and {gk}, we have
x8j+i =
aibi
t8jpit8jqi
, g8j+i =
t8j lit8jhicidi
; i = 1, . . . ,8
with (pi+1qi+1
)= t R2
(piqi
);
(l1h1
)=
(lh
),
(li+1hi+1
)= t R1
(lihi
).
To get the values of p1 and q1, we denote v to be the vector formed bythe last two components of v. We have that
x∗1 − x1 =∞∑
i=0
si =∞∑
i=0
(tR2)i s1 = (E − tR2)
−1(γτ
),
where E is the identity matrix. Consequently, we have that(p1q1
)= x1 =
14633
((−15720 + 4019
√2)t + (32931− 17534
√2)
(4376− 1977√
2)t + (9563− 9433√
2)
).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 44 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
Assuming that the limit of f (xk ) is f ∗, we have that
f (x8j+i)− f ∗ =(g∗i)T (x8j+i − x∗i
)= t8j
(cidi
)T (piqi
)= t8j
[R i−1
2
(c1d1
)]T [(tR2)
i−1(
p1q1
)]= t8j+i−1
(c1d1
)T (p1q1
).
Consequently,
f (x1)− f ∗ = c1p1 + d1q1 = (3954−4376√
2)t+(18866−9563√
2)4633 ,
f (x8j+i)− f ∗ = t8j+i−1(
R2
(λ′i(ρ)µ′i(ρ)
))= t8j+i−1 (f (x1)− f ∗) .
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 45 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
Now we see the value of gT8j+is8j+i . Denoting
s8j+i =
ηiξi
t8jγit8jτi
,
we have η1 =√
2, ξ1 = 0, γ1 = γ, τ1 = τ and(ηiξi
)= R1
(ηi−1ξi−1
),
(γiτi
)= t R2
(γi−1τi−1
).
Thus
gT8j+is8j+i =
t8j lit8jhicidi
T
ηiξi
t8jγit8jτi
= t8j
lihicidi
T
ηiξiγiτi
= t8j+i−1(l1η1 + h1ξ1 + c1γ1 + d1τ1) = t8j+i−1gT
1 s1,
where gT1 s1 =
√2(l1 − τ1) = (−44 + 13
√2)t + (40− 18
√2).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 46 / 70
Constructing A Suitable Objective Function Recovering the iterations and function values
Therefore
f (x8j+i+1)− f (x8j+i)
α8j+igT8j+is8j+i
=(f (x8j+i+1)− f ∗)− (f (x8j+i)− f ∗)
gT8j+is8j+i
=(t8j+i − t8j+i−1)(f (x1)− f ∗)
t8j+i−1gT1 s1
=(t − 1)(f (x1)− f ∗)
gT1 s1
≈ 2.6483e − 02.
The above relation, together with gT8j+i+1s8j+i = 0, implies that the
stepsize α8j+i = 1 can be accepted by either the Wolfe line search orthe Armijo line search with suitable line search parameters.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 47 / 70
Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 48 / 70
Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function
To be such that∇f (x8j+i) = g8j+i ,
we assume that the objective function f is of the form
f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,
where λ and µ are 2-dimensional functions to be determined. Denoting
Vi =
(aibi
)and Ai =
(∂λ(Vi )
∂x1
∂µ(Vi )∂x1
∂λ(Vi )∂x2
∂µ(Vi )∂x2
),
the functions λ and µ have to satisfy(λ(Vi)µ(Vi)
)=
(cidi
)(49)
and
Ai
(piqi
)=
(lihi
). (50)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 49 / 70
Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function
Recalling the relations(pi+1qi+1
)= t R2
(piqi
);
(li+1hi+1
)= t R1
(lihi
),
we shall ask Ai and Ai+1 to meet the condition
Ai+1 = R1AiRT2 (51)
so that the relation (50) satisfies for all i provided that it holds withi = 1. This is because if (50) and (51) is true, we have
Ai+1
(pi+1qi+1
)= (R1AiRT
2 )(tR2)
(piqi
)= tR1Ai
(piqi
)= tR1
(lihi
)=
(li+1hi+1
).
Therefore we can focus our attention on the choice of the matrix
A1 :=
(ω1 ω2ω3 ω4
).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 50 / 70
Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function
Now we shall consider the restrictions of the functions λ and µ on theedges of the octagon and define
λi(α) = λ(Vi + αsi), µi(α) = µ(Vi + αsi),
where
si = Vi+1 − Vi =
(ηiξi
)is the vector formed by the first two components of s8j+i . The relation(49) implies that(
λi+1(0)µi+1(0)
)= R2
(λi(0)µi(0)
)for all i ≥ 1.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 51 / 70
Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function
At the same time, we have that(λ′i+1(0)
µ′i+1(0)
)= AT
i+1si+1 =(
R2ATi RT
1
)(R1si) = R2AT
i si = R2
(λ′i(0)µ′i(0)
)holds for all i ≥ 1.
Based on the above conditions, we are going to construct the functionsλi ’s and µi ’s such that(
λi+1(α)µi+1(α)
)= R2
(λi(α)µi(α)
)for all i ≥ 1 and α ∈ <1. (52)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 52 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 53 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
Consider the following extension of the line search functions
ψi(α) = f (xi + αsi), α ∈ <1.
We shall concentrate on the choices of λ1(·), µ1(·) and ψ1(·).Assuming that f ∗ = 0, it is obvious that
ψ1(0) = f (x1) = c1p1 + d1q1, (53)
ψ1(1) = ψ2(0) = t f (x1), (54)
ψ′1(0) = gT1 s1 =
√2 (l1 − τ1), (55)
ψ′1(1) = gT2 s1 = 0, (56)
Suppose that we have constructed a function ψ1 satisfying all the fourconditions (53)-(56) and some special requirements.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 54 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
There are many ways to choose the line search functions. Thefollowing is one of them with good theoretical properties.
ψ1(α) = ζ1(α− 1)38 + ζ2(α− 1)2 + ζ3,
where
ζ1 = (173256−38127√
2)t+(−138064+48586√
2)166788 ≈ 1.5468E − 1,
ζ2 = (377472−359709√
2)t+(−712544+577958√
2)166788 ≈ 1.0405E − 3,
ζ3 = (−11344+6675√
2)t+(42494−26967√
2)4633 ≈ 6.1270E − 1.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 55 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
We shall now ask what conditions have to be satisfied by λ1 and µ1.Since ψ1 is the restriction of f on the straight line
x1 + αs1 =
a1b1p1q1
+ α
η1ξ1γ1τ1
,
we have that
ψ1(α) = [p1 + αγ1]λ1(α) + [q1 + ατ1]µ1(α). (57)
Now, by (49),(λ1(0)µ1(0)
)=
(λ(V1)µ(V1)
),
(λ1(1)µ1(1)
)=
(λ(V2)µ(V2)
), (58)
which is obviously consistent with (53) and (54).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 56 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
One has to be careful to choose the derivatives λ′1(0), µ′1(0), λ′1(1) andµ′1(1) to meet
A1
(p1q1
)=
(l1h1
). (59)
There are some relations between these values and A1
AT1 s1 =
(λ′1(0)µ′1(0)
), (60)
and
AT1 s8 =
(λ′8(1)µ′8(1)
)= R7
2
(λ′1(1)µ′1(1)
)= RT
2
(λ′1(1)µ′1(1)
). (61)
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 57 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
By the expression of ψ1, we know that
ψ′1(α) = γ1λ1(α) + τ1(α) + [p1 + αγ1]λ′1(α) + [q1 + ατ1]µ
′1(α).
Setting α = 0 and 1 respectively, we obtain the following moreconditions (
p1q1
)T (λ′1(0)µ′1(0)
)= ψ′1(0)− γ1λ1(0)− τ1µ1(0), (62)
(p2q2
)T (λ′1(1)µ′1(1)
)= ψ′1(1)− γ1λ1(1)− τ1µ1(1). (63)
The relations (59)–(63) form a system of 8 equations with 8 variables
ω1, ω2, ω3, ω4, λ′1(0), µ′1(0), λ′1(1), µ′1(1).
Therefore after the function ψ1(α) has been chosen, we can derivethose derivatives uniquely in the above way. This, together with therelation (52), also imposes the relation (51) between Ai+1 and Ai .
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 58 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
Attention must be also given to the cross points of the straight lineconnecting V1 and V2 and the others. It is easy to deduce that(
λ1(1 +√
22 )
µ1(1 +√
22 )
)=
(λ3(−
√2
2 )
µ3(−√
22 )
)= R2
2
(λ1(−
√2
2 )
µ1(−√
22 )
). (64)
Setting α to be 1 +√
22 and −
√2
2 , respectively, yields(p1 + (1 +
√2
2 )γ1
q1 + (1 +√
22 )τ1
)T (λ1(1 +
√2
2 )
µ1(1 +√
22 )
)= ψ1(1 +
√2
2), (65)
(p1 −
√2
2 γ1
q1 −√
22 τ1
)T (λ1(−
√2
2 )
µ1(−√
22 )
)= ψ1(−
√2
2). (66)
The above system of 4 linear equations decides the values of
λ1(1 +
√2
2), µ1(1 +
√2
2), λ1(−
√2
2), µ1(−
√2
2).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 59 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
In a similar way, we must have that(λ1(2 +
√2)
µ1(2 +√
2)
)=
(λ4(−1−
√2)
µ4(−1−√
2)
)= R3
2
(λ1(−1−
√2)
µ1(−1−√
2)
). (67)
Setting α to be 2 +√
2 and −1−√
2, respectively, yields(p1 + (2 +
√2)γ1
q1 + (2 +√
2)τ1
)T (λ1(2 +
√2)
µ1(2 +√
2)
)= ψ1(2 +
√2), (68)
(p1 − (1 +
√2)γ1
q1 − (1 +√
2)τ1
)T (λ1(−1−
√2)
µ1(−1−√
2)
)= ψ1(−1−
√2). (69)
The above system of 4 linear equations decides the values of
λ1(2 +√
2), µ1(2 +√
2), λ1(−1−√
2), µ1(−1−√
2).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 60 / 70
Constructing A Suitable Objective Function Constructing The Functions on The Lines
Now we can construct a twice-continuous differentiable function λ1 byfor example Lagrangian interpolation such that the following value
λ1(−1−√
2), λ1(−√
22
), λ1(0), λ′1(0), λ1(1), λ′1(1), λ1(1+
√2
2), λ1(2+
√2)
is in accordance with the required one.
The function µ1 is then obtained from (57)
µ1(α) =ψ1(α)− (p1 + αγ1)λ1(α)
q1 + ατ1.
Some more attention could be made so that µ1 is twice-continuouslydifferential function.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 61 / 70
Constructing A Suitable Objective Function Extending the Functions to the Whole Plane
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 62 / 70
Constructing A Suitable Objective Function Extending the Functions to the Whole Plane
Define the eight straight lines
L1 = {(x1, x2) : x2 + (1 +√
22 ) = 0},
L2 = {(x1, x2) : x1 − x2 − (1 +√
2) = 0},L3 = {(x1, x2) : x1 − (1 +
√2
2 ) = 0},L4 = {(x1, x2) : x1 + x2 − (1 +
√2) = 0},
L5 = {(x1, x2) : x2 − (1 +√
22 ) = 0},
L6 = {(x1, x2) : x1 − x2 + (1 +√
2) = 0},L7 = {(x1, x2) : x1 + (1 +
√2
2 ) = 0},L8 = {(x1, x2) : x1 + x2 + (1 +
√2) = 0},
which are the extensions of the eight edges of the octagon.
To complete the construction of the objective function f , we have todeduce the twice-continuously differentiable extensions of thefunctions λ and µ, which are already defined on the straight lines{Li : i = 1, . . . ,8}.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 63 / 70
Constructing A Suitable Objective Function Extending the Functions to the Whole Plane
An Illustrative ExampleGiven a 1-dimensional polynomial function
f (x), x ∈ R1,
with zero roots at x = 0 and x = 1. How to extend this function to R2
so that the extension function F (x , y) satisfies
(1) F (x ,0) = f (x), for x ∈ R1,
(2) F (x , y) = 0, if x = 0,(3) F (x , y) = 0, if x + y = 1,(4) F (x , y) = 0, if y = 1?
Answer. Since f (x) is polynomial, we can write
f (x) = x(x − 1)h(x).
The following extension function is one desired function:
F (x , y) = x(x + y − 1)(1− y) h(x).
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 64 / 70
Constructing A Suitable Objective Function Extending the Functions to the Whole Plane
There are 24 crossing points altogether of the eight lines, including thevertices {Vi : i = 1, . . . ,8} of the octagon. Assume that the othercrossing points are {Ci : i = 1, . . . ,8}. Then it is possible to constructa two-dimensional eight-order polynomial function φ such that thefunction values of φ at Ci ’s are accordance to those of λ, and thefunction and gradient values of φ at Vi ’s are accordance to thecorresponding values of λ.
Denote the restrictions of φ to each line Li to be φi . For each i , we aregoing to construct a function ei(x1, x2) such that
ei(x1, x2) =
{λi − φi , (x1, x2) ∈ Li0, (x1, x2) ∈ ∪j 6=iLj .
Thus the sum function e(x1, x2) =∑8
i=1 ei(x1, x2) is accordance withλi − φi on the line Li for each i . Therefore
λ(x1, x2) = φ(x1, x2) + e(x1, x2)
is the desired extension. The extension µ(x1, x2) can be similarlyobtained.
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 65 / 70
Constructing A Suitable Objective Function Finals form of the objective function
Outline
1 The BFGS Quasi-Newton Method
2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters
3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function
4 Conclusion and Discussion
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 66 / 70
Constructing A Suitable Objective Function Finals form of the objective function
If we only ask the Wolfe conditions to be satisfied,
f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,
where
µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).
More exactly,
λ(x1, x2) = (15− 212
√2)x5
1 + (6− 92
√2)(x4
1 x2 + 2x21 x3
2 )
+ (√
2− 1)(6x21 x2 + x3
2 ) + (−15 + 10√
2)x31
+ (154− 15
8
√2)x1 −
98
√2x2 + ω1λ(x1, x2) + ω3λ(−x2, x1),
λ(x1, x2) =3− 2
√2
4(2x2
1 − 1)(2x21 − (3 + 2
√2))x2,
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 67 / 70
Constructing A Suitable Objective Function Finals form of the objective function
If the stepsize is chosen to be the first local minimizer,
f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,
where
µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).
More exactly,
λ(x1, x2) := λ(x1, x2) +(
x21 + x2
2 − (2 +√
2))2·∑
j
[wj λj(x1, x2) + uj λj(x2, x1)
],
λj(x1, x2) = x31 − 3x1x2
2 , x51 − 2x3
1 x22 − 3x1x4
4 , . . .
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 68 / 70
Conclusion and Discussion
A perfect example
1,14π,
34π, 8,
3√
2− 1−√
31− 20√
22
, 38
A 3-dimensional example?Any helps in analyzing the convergence of DFP?Any helps in improving the practical BFGS algorithm?
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 69 / 70
Conclusion and Discussion
THANK YOU VERY MUCH !
Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 70 / 70
top related