Curved Trajectories towards Local Minimum of a Function
Al JimenezMathematics Department
California Polytechnic State University San Luis Obispo, CA 93407
…Taylor Series and Rotations
Spring, 2008
Introduction and Notation
• The Problem
Minimize ( ), :n
n
xf x f
• Derivatives: (4)( ), ( ), ( ), ( ), etcf x f x f x f x
• A local min x* is a critical point: ( *) 0f x
• Necessary condition: ≥ 0( *)f x
Typical Iterative Methods
• Sequence is generated from x0 1 2 1, ,..., ,k kx x x x
• Such that 1( ) ( ) ( )k k k k kf x f x p v f x
• With vk a vector with propertya descent direction
( ) 0k kf x v
• And pk > 0 typically approximates solution of
called the line search or the scalar search
Minimize ( )k kp
f x pv
• Proven to converge for smooth functions
Current Methods• Selecting vk has huge effect on convergence rate:
– Steepest Descent: 1st order– Newton’s direction: 2nd order,
but may not be a descent direction when far from a min
– Conjugate Directions uses vk-1, vk-2, ...
– Quasi-Newton/Variable metric also uses vk-1, vk-2, ...
– High order Tensor models fit prior iteration values– Number of derivatives available affects method
( )k kv f x 1
( ) ( )k k kv f x f x
• The scalar search– Accuracy of scalar minimization – Quadratic models: “Trust Region”
Infinite Series of Solution
• Matrix vector products, but shown with exponents for connections with scalar Taylor series.
* 2 31 1( ) ( ) ( ) ( ) ...
2 6k k k k k k kx h z h z z h z z h z z
1 1
1 2
1 3 (4)
( ) ( )
( ) ( ) ( 1) ( ) ( ) ( )
( ) ( ) ( 1)( 2) 3 ( ) ( ) ( ) ( ) ( ) ( ) ( )
pk k k
pk k k k k k
pk k k k k k k k k k
h z p f x z
h z f x p p z f x h z h z
h z f x p p p z f x h z h z f x h z h z h z
Infinite Series of Solution…• Define:
2
3 2 2
(4)4 2 3 2 2 2
( ) ( )
1( ) ( )
21
( ) ( ) ( )6
k k
k k
k k k
f x d f x
f x d f x d d
f x d f x d d f x d d d
• Then: * 22 2 3
2 32 3 4
1( 1)
2
1( 1)( 2) ( 1) ...
6
kx x pd p p d p d
p p p d p p d p d
• For p = 1: *2 3 4 ....kx x d d d
Curved Trajectories Algorithm• At kth iteration, estimate , then calculate:
2
3 22
24 2 33
( ) ( )
1( ) ( ) ( 1) ( )
1( ) ( ) ( 1) ( )
k k
k k k
k k k
f x d f x
f x d f x d f x
f x d f x d d f x
• Select order, modify di , and select pk
1 2
21 2 2 3
2 31 2 2 3 2 3 4
3 12
2 211 1
2 6 66 6
k k
k k
k k
x x pd
x x d p d d p
x x d p d d p d d d p
2nd order:3rd order:
4th order:
Challenges• High order terms accurately approximated
from the Gradient and the Hessian
• Scalar searches along polynomial curved trajectories
• Performance for large problems– Exploit Sparse Hessian
• Store nonzeros only, no operations on zeros
• Far from solution:– Hessian not positive definite (solved)
• Hessian modified and use CG step as last resort
Hessian < 0 Changes
CPU-time Profile (127 problems < 500 variables)
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9 10 11 12Normalized CPU-time/problem
Cu
mu
lati
ve
Dis
trib
uti
on
CTA CTAn CTAnn CG Descent
Lancelot Tenmin L-BFGS L-BFGS-B
Cuter Performance Profiles
CPU-time Profile (51 problems >= 500 variables)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 3 5 7 9 11 13
Normalized CPU-time/problem
Cu
mu
lati
ve
Dis
trib
uti
on
CTA CTAn CG Descent Lancelot L-BFGS-B
Cuter Performance Profiles
Current Research Pursuits
Handle multiple functions: Pareto optimal points
Handle Constraint Functions Explore the family of infinite series for
combination of composition functions.
Rosenbrock Banana Function
• Algorithm selects
2 2 2( , ) 100( ) (1 )f x y y x x *
0 [ ] [ 1.2 1] [1 1]T T Tx x y x
, , , , ,x
-1.21.0
f 24.2000 Gradient
-215.600-88.00
,Hessian
1330.00 480.0480.0 200
,d2
-0.02472-0.3807
,d3
-0.024440.05805
,d4
-0.024200.05687
,4th order xk1
1.2 1. p ( ) 0.04532 p ( ) 0.02416 0.003879 p1.0 1. p ( ) 0.6979 p ( )0.4968 0.06462 p
1 [0.1156 0.1479] , 2.59Tx f
x0
x1
x2
x3f = 24.2
f = 24.2
f = 4f = 0.5
3D View
2 2 2( , ) 100( ) (1 )f x y y x x
2 2 2( , ) 100( ) (1 )f x y y x x
Trajectories from starting point
Rotations
Rotations 3D
Rotations• At point we have kx 1 ( ) ( )k kx x R h p
h(p) is trajectory and R(θ) is rotation matrix.
• h(0) = 0 and R(0) = I, and for 2 coordinates, counterclockwise
cos sin( )
sin cosR
• At the kth step far from solution we want:
,
Minimize ( ) ( )kp
f x R h p
( ) ( ) ( )k k k kf x R h p f x But settle for pk, θk:
Rotations (continued)
• Gives
1
21 2
( )sin cos( ) 0
( )cos sin
h pf f ff R h p
h px x
1 2* 1 *2 1
1 21 2
( ) ( )tan ,
2 2( ) ( )
f fh p h p
x xf f
h p h px x
• Trajectory angle with the gradient for R(0) = I1 ( )
cos , 0( ) 2G G
f h p
f h p
• Observations:1 2
0 2 1
1 21 2
2
( ) ( )
( ) ( )
f f fh p h p
x x
f f fh p h p
x x
2 2 2( , ) 100( ) (1 )f x y y x x
Rotation Challenges/Results• Select effective θk without too much work
– Using existing strategy to calculate pk, then calculate a θk from θ* and θG . Then calculate a new pk again using rotated trajectory.
*0.4min( , )k G – Good results with– θk > 40º indicates elongated ellipse contours,
and rotation seems unproductive in this case.– Effective when CTA series is convergent and
iteration is not close to the minimum point.
• Functions of more than 2 variables later
f (p, θ)
f (p, θ), θ = 0, -0.1, -0.2, -0.3
θ = 0
θ = -0.1
θ = -0.2θ = -0.3
More than Two Coordinates• Ignore coordinates with insignificant Newton
correction magnitudes.• Success achieved by adding the 3rd coordinate
to the first two as follows:– Calculate the rotation by paring the 3rd coordinate
with each of the top 2 coordinates.– This results in a rotation matrix:
2 2 1 1
3 3 1 1
3 3 2 2
1 0 0 cos 0 sin cos sin 0
0 cos sin 0 1 0 sin cos 0
0 sin cos sin 0 cos 0 0 1
R
– Where the angles θ1 , θ2 , θ3 are each calculated between two coordinates as explained before.
• The 4th coordinate is added by pairing rotations with the first 3 coordinates, and so on.