b1 optimization - robots.ox.ac.ukvictor/teaching/pdfs/b1-optimization-lecture1... · •lecture...

42
B1 Optimization 4 Lectures Trinity 2019 / Michaelmas 2020 1 Examples Sheet Prof V A Prisacariu Lecture 1: Local and global optima, unconstrained univariate and multivariate optimization, stationary points, steepest descent. Lecture 2: Newton and Newton like methods – Quasi-Newton, Gauss-Newton; the Nelder- Mead (amoeba) simplex algorithm. Lecture 3: Linear programming constrained optimization; the simplex algorithm, interior point methods; integer programming. Lecture 4: Convexity, robust cost functions, methods for non-convex functions – grid search, multiple coverings, branch and bound, simulated annealing, evolutionary optimization. Course based on previous B1 Optimization, by Prof A Zisserman.

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

B1 Optimization4 Lectures Trinity 2019 / Michaelmas 20201 Examples Sheet Prof V A Prisacariu

• Lecture 1: Local and global optima, unconstrained univariate and multivariate optimization, stationary points, steepest descent.

• Lecture 2: Newton and Newton like methods – Quasi-Newton, Gauss-Newton; the Nelder-Mead (amoeba) simplex algorithm.

• Lecture 3: Linear programming constrained optimization; the simplex algorithm, interior point methods; integer programming.

• Lecture 4: Convexity, robust cost functions, methods for non-convex functions – grid search, multiple coverings, branch and bound, simulated annealing, evolutionary optimization.

• Course based on previous B1 Optimization, by Prof A Zisserman.

Page 2: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Textbooks

Practical Optimization

Philip E. Gill, Walter Murray, and Margaret H. Wright

Covers unconstrained and constrained optimization. Very clear and comprehensive.

Page 3: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Background reading and web resources

Numerical Recipes in C (or C++) : The Art of Scientific Computing

William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling

CUP 1992/2002- Good chapter on optimization- Available on line as pdf

Page 4: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Background reading and web resources

Convex Optimization

Stephen Boyd and Lieven Vandenberghe

CUP 2004- Available on line at http://www.stanford.edu/~boyd/cvxbook/

Page 5: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Further reading, web resources, and the slides available on Canvas and at:

http://www.robots.ox.ac.uk/~victor -> Teaching

Page 6: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Lecture 1Topics covered in this lecture:

• Problem formulation;

• Local and global optima;

• Unconstrained univariate optimization;

• Unconstrained multivariate optimization for quadratic functions:

• Stationary points;

• Steepest descent.

Page 7: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Introduction

Optimization is used to find the best or optimal solution to a problem.

Steps involved in formulating an optimization problem:

• Conversion of the problem into a mathematical model that abstracts all theessential elements;

• Choosing a suitable optimization method for the problem;

• Obtaining the optimum solution.

Page 8: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Example: airfoil/wing aerodynamic shape optimization

Optimization problem: • constraints: lift, area, chord;• minimize drag;• vary shape.

Aircraft Design via Numerical Optimization Are We There Yet? Martins et al, 2017

Page 9: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Introduction: Problem specification

Suppose we have a cost function (or objective function):

𝑓 𝐱 :ℝ! → ℝ

Our aim is find the value of the parameters 𝐱 that minimize the function:

𝐱∗ = argmin𝐱𝑓(𝐱)

subject to the following constraints:- equality: 𝑐! 𝐱 = 0, 𝑖 = 1,… ,𝑚".- inequality: 𝑐!(𝐱) ≥ 0, 𝑖 = 𝑚" + 1,… ,𝑚.

We will start by focussing on unconstrained problems.

Page 10: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Recall: One dimensional functions

A differentiable function has a stationary point when the derivative is zero: 𝑑𝑓𝑑𝑥

= 0

The second derivative gives the type of stationary point.

𝑓!! 𝑥 ≤ 0maximum

𝑓!! 𝑥 = 0inflection

𝑓!! 𝑥 ≥ 0minimum

Page 11: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Unconstrained optimization

- down-hill search (gradient descent) algorithms can find local minima;

- which of the minima is found depends on the starting point;

- such minima often occur in real applications.

local minimum

global minimum

min𝐱𝑓(𝐱)

𝑓(𝐱)

𝐱

Page 12: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Example: template matching in 2D images

Input:Two point sets 𝑀 = 𝐌! and 𝐷 = {𝐃#}.

Task:Determine the transformation 𝑇 that minimizes the error between 𝐷 and the transformed 𝑀.

Motivation:Robot picking up “C”

Data, 𝐷

Transformation, 𝑇

Model, 𝑀

Page 13: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Cost function (matches known)

2D points 𝑥, 𝑦 $, Model 𝐌!, Data 𝐃!

𝑓 𝜃, 𝑡%, 𝑡& =;!

𝐑 𝜃 𝐌! + 𝑡 − 𝐃! '

Transformation parameters:- Rotation angle 𝜃

- Translation 𝐭 = 𝑡%, 𝑡&,$

𝐃"

𝐌"

Page 14: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Cost function (matches unknown)

2D points 𝑥, 𝑦 $, Model 𝐌!, Data 𝐃!

𝑓 𝜃, 𝑡%, 𝑡& =;!

min#

𝐑 𝜃 𝐌# + 𝑡 − 𝐃!'

Transformation parameters:- Rotation angle 𝜃

- Translation 𝐭 = 𝑡%, 𝑡&,$

𝐃"

𝐌#

for each data point

find closest model point

Page 15: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Cost function (matches unknown)

𝑓 𝜃, 𝑡%, 𝑡& =;!

min#

𝐑 𝜃 𝐌# + 𝑡 − 𝐃!'

for each data point

find closest model point

Model point: 𝐌# = 𝑥#, 𝑦#$

Transformation parameters:- Rotation angle 𝜃- Translation 𝐭 = 𝑡%, 𝑡&,

$ correct correspondences

𝐌#

𝐃"

closest point correspondences

𝐌#

𝐃"As distances become smaller we get the correct correspondences and model pulled onto the data

Page 16: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Performance

Page 17: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Unconstrained univariate optimization

We will look at three basic methods to determine the minimum:1. Gradient descent;2. Polynomial interpolation;3. Newton’s method. These introduce the ideas that will be applied in the multivariate case.

For the moment, assume we can start close to the global minimum

min%𝑓(𝑥)

𝑓(𝑥)

𝑥

function of one variable

Page 18: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

A typical 1D function

As an example, consider the function:

𝑓 𝑥 = 0.1 + 0.1𝑥 + 𝑥!/(0.1 + 𝑥!)

(assume we do not know the actual function expression from now on)

Page 19: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

1. Gradient descent

Given a starting location, 𝑥(, examine )*)% and move in the downhill direction to generate a new estimate 𝑥+ = 𝑥( + 𝛿𝑥.

How to determine the step size 𝛿𝑥 ?

𝛿𝑥 = −𝛼𝑑𝑓𝑑𝑥

Page 20: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Setting alpha ..

Slide credit: Bert De Brabandere

If the learning rate is too large, gradient descent may not converge (left)If it is too small, convergence will be slow (middle).A good learning rate leads to fast convergence (right)

Page 21: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

2. Polynomial interpolation (trust region method)

Approximate 𝑓(𝑥) with a simpler function which reasonably approximates the function in a neighbourhood around the current estimate 𝑥. This neighbourhood is the trust region.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

- Bracket the minimum.- Fit a quadratic or cubic polynomial

which interpolates 𝑓(𝑥) at some points in the interval.

- Jump to the (easily obtained) minimum for the polynomial.

- Throw away the worst point and repeat the process.

Page 22: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Quadratic interpolation using 3 points, 2 iterations

Other methods to interpolate a quadratic ?

- e.g. 2 points and one gradient

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

Page 23: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

3. Newton’s method

Fit a quadratic approx. to 𝑓(𝑥) using both gradient and curvature information at 𝑥.

- Expand 𝑓(𝑥) locally using a Taylor series:

𝑓 𝑥 + 𝛿𝑥 = 𝑓 𝑥 + 𝛿𝑥𝑓! 𝑥 +𝛿𝑥"

2𝑓!! 𝑥 + h.o.t.

- Find the 𝛿𝑥 (the variable here) such that 𝑥 + 𝛿𝑥 is a stationary point of 𝑓:𝑑𝑑𝛿𝑥 𝑓 𝑥 + 𝛿𝑥𝑓! 𝑥 +

𝛿𝑥"

2 𝑓!! 𝑥 = 𝑓! 𝑥 + 𝛿𝑥𝑓!! 𝑥 = 0

- and rearranging:

𝛿𝑥 = −𝑓′(𝑥)𝑓′′(𝑥)

- Update for 𝑥:

𝑥#$% = 𝑥# −𝑓! 𝑥#𝑓!! 𝑥#

Page 24: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

Newton iterations Quadratic approximations

- avoids the need to bracket the root;

- quadratic convergence (decimal accuracy doubles at every iteration).

Page 25: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

- global convergence of Newton’s method is poor;

- often fails if the starting point is too far from the minimum;

- in practice, must be used with a globalization strategy which reduces the step length until function decrease is assured.

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.5

0

0.5

1

Newton iterations Quadratic approximations

Page 26: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Stationary Points for Multidimensional functions

𝑓 𝑥 :ℝ' → ℝ has a stationary point with the gradient

∇𝑓 =𝜕𝑓𝜕𝑥(

,𝜕𝑓𝜕𝑥)

, … ,𝜕𝑓𝜕𝑥'

$= 0

∇𝑓 =𝜕𝑓𝜕𝑥(

,𝜕𝑓𝜕𝑥)

$= 0

Page 27: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Extension to N dimensions

- How big can N be?: problem sizes can vary from a handful of parameters to millions.

- In the following we will first examine the properties of stationary points in N dimensions and then move onto optimization algorithms to find the stationary point (minimum).

- We will consider examples for N = 2, so that cost function surfaces can be visualized.

Page 28: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Taylor expansion in 2D

A function can be approximated locally by its Taylor series expansion about a point 𝑥(.

𝑓 𝑥( + 𝑥 ≈ 𝑓 𝑥( +𝜕𝑓𝜕𝑥,𝜕𝑓𝜕𝑦

𝑥𝑦 +

12𝑥, 𝑦

𝜕'𝑓𝜕𝑥'

𝜕'𝑓𝜕𝑥𝜕𝑦

𝜕'𝑓𝜕𝑥𝜕𝑦

𝜕'𝑓𝜕𝑦'

𝑥𝑦 + h.o.t.

- This is a generalization of the 1D Taylor series:

𝑓 𝑥 + 𝛿𝑥 = 𝑓 𝑥 + 𝛿𝑥𝑓, 𝑥 +𝛿𝑥'

2𝑓,, 𝑥 + h.o.t.

- The expansion to second order is a quadratic function in 𝐱.𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +

12𝐱$H𝐱

Page 29: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Taylor expansion in ND

A function may be approximated locally by its Taylor expansion about a point 𝑥(𝑓 𝑥( + 𝑥 ≈ 𝑓 𝑥( + ∇𝑓$𝐱 +

12𝐱$H𝐱 + h.o.t.

where the gradient ∇𝑓(𝐱) of 𝑓(𝐱) is the vector

∇𝑓 𝐱 =𝜕𝑓𝜕𝑥+

, … ,𝜕𝑓𝜕𝑥-

$

and the Hessian H 𝐱 of 𝑓(𝐱) is the symmetric matrix𝜕'𝑓𝜕𝑥+'

⋯𝜕'𝑓

𝜕𝑥+𝜕𝑥-⋮ ⋱ ⋮

𝜕'𝑓𝜕𝑥+𝜕𝑥-

⋯𝜕'𝑓𝜕𝑥-'

Page 30: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Properties of Quadratic functions

Taylor expansion:

𝑓 𝐱( + 𝐱 = 𝑓 𝐱( + 𝐠$𝐱 +12𝐱$H𝐱

Expand about a stationary point 𝐱( = 𝐱∗ in direction 𝐩:

𝑓 𝐱∗ + 𝛼𝐩 = 𝑓 𝐱∗ + 𝐠$𝛼𝐩 +12𝛼'𝐩$H𝐩 = 𝑓 𝐱∗ +

12𝛼'𝐩$H𝐩

since the stationary point 𝐠 = |∇𝑓 𝐱∗ = 0.

At a stationary point the behavior is determined by H.

Page 31: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Properties of Quadratic functions

H is a symmetric matrix, so it has orthogonal eigenvectors

H𝐮0 = 𝜆0𝐮0 choose 𝐮! = 1

𝑓 𝐱∗ + 𝛼𝐮! = 𝑓 𝐱∗ +12𝛼'𝐮!$H𝐮! = 𝑓 𝐱∗ +

12𝛼'𝜆!

As |𝛼| increases, 𝑓 𝐱∗ + 𝛼𝐮! increases, decreases or is unchanging according to whether 𝜆! is positive, negative or zero.

Page 32: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Examples of Quadratic functions

Case 1: both eigenvalues positive

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = −50

−50 ,𝐇 = 6 44 6

-5 0 5 10 15-5

0

5

10

15

minimum

Page 33: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Examples of Quadratic functions

Case 2: eigenvalues have different signs

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = −30

+20 ,𝐇 = 6 00 −4

-5 0 5 10 15-5

0

5

10

15

saddle surface: extremum but not a minimum

Page 34: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Examples of Quadratic functions

Case 3: one eigenvalue zero.

𝑓 𝐱 = 𝑎 + 𝐠$𝐱 +12𝐱$H𝐱

with 𝑎 = 0, 𝐠 = 0

0 ,𝐇 = 6 00 0

-5 0 5 10 15-5

0

5

10

15

parabolic cylinder

Page 35: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Hessian positive definite.Convex function.Minimum point.

Hessian negative definite.Concave function.Maximum point.

Hessian mixed.Surface has negative point.Saddle point.

More in Lecture 4

Page 36: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Optimization in N dimensions – line search

- Reduce optimization in N dimensions to a series of (1D) line minimizations.

- Use methods developed in 1D (e.g. polynomial interpolation).

-5 0 5 10 15-5

0

5

10

15

Page 37: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

An Optimization Algorithm

Start at 𝐱( then repeat:

1. Compute a search direction 𝐩1.

2. Compute a step length 𝛼1, such that 𝑓 𝐱1 + 𝛼1𝐩1 < 𝑓 𝐱1 .

3. Update 𝐱12+ = 𝐱1 + 𝛼1𝐩1

4. Check for convergence (termination criteria), e.g. ∇𝑓 ≈ 0.

Reduce optimization in N dimensions to a series of (1D) line minimizations.

Page 38: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Steepest descent

Basic principle is to minimize the N-dimensional function by a series of 1D line-minimizations:

𝐱32+ = 𝐱3 + 𝛼3𝐩3

The steepest decent method choses 𝐩3 to be parallel to the negative gradient:𝐩3 = −∇𝑓 𝒙4

The step-size 𝛼3 is chosen to minimize 𝑓(𝐱3 + 𝛼3𝐩3). For quadratic forms there is a closed form solution:

𝛼3 = −𝐩3$𝐩3𝐩3$H𝐩3

[exercise – minimize 𝑓(𝐱' + 𝛼'𝐩') wrt 𝛼]

Page 39: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Steepest descent example

𝑎 = 0, 𝐠 = −50−50 ,𝐇 = 6 4

4 6

Steepest descent at 𝐱* = [0, 14]

After each line minimization, the new gradient is always orthogonal to the previous direction step- True for any line minimization.- Can be proven by examining the

derivation for 𝛼3.

Consequently, the iterations may zig-zag down the valley in a very inefficient matter.

Page 40: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Conjugate gradients

The method of conjugate gradients chooses successive descent directions 𝐩3such that it is guaranteed to reach the minimum in a finite number of steps.

- Each 𝐩3 is chosen to be conjugate to all previous search directions wrt the Hessian H:

𝐩3$𝐻𝐩3 = 0

- The resulting search directions are mutually linearly independent.

- Remarkably, 𝐩3 can be chose using only knowledge of 𝐩35+, ∇𝑓 𝐱35+ and ∇𝑓(𝐱3) (see Numerical Recipes), e.g.

𝐩3 = ∇f4 +∇𝑓3$∇𝑓3

∇𝑓35+$ ∇𝑓35+𝐩35+

Page 41: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

Conjugate gradients

- An N-dimensional quadratic form can be minimized in at most N conjugate descent steps.

- In figure: 3 different starting steps, minimum is reached in exactly 2 steps.

Page 42: B1 Optimization - robots.ox.ac.ukvictor/teaching/pdfs/B1-Optimization-Lecture1... · •Lecture 1:Local and global optima, unconstrained univariate and multivariate optimization,

What is next?

- Move from functions that are exactly quadratic to general functions that are represented locally by a quadratic

- Newton’s method (that uses 2nd derivatives) and Newton-like methods for general functions