modern automatic control - purdue universityzak/ece680/ece_680_steepest_and_newton.pdfgradient and...

ECE 680

Modern Automatic ControlGradient and Newton’s Methods—A Review

Stan Zak

October 21, 2019

ECE 680Modern Automatic Control – p. 1/14

Review of the Gradient Properties

The direction of maximum increase of a real-valued

differentiable function at a point is orthogonal to the level set

of the function at that point






The gradient acts in a such a direction that for a small

displacement, f increases more in this direction than in any

other direction








other direction

The direction of the gradient is clearly a good direction to

move in the case of maximization








other direction

The direction of the gradient is clearly a good direction to

move in the case of maximization

If we minimize, then we should move in the direction of the

negative gradient


Steepest Ascent

0∇f(x )

x0

f = c0

f = c + δ0


Intro to the Gradient Descent Method

Taylor’s at x(0) with ∇f(

x(0))

6= 0,

f(

x(0) − α∇f(

x(0)))

= f(

x(0))

−α∇f(

x(0))⊤

∇f(

x(0))

+o (α)

= f(

x(0))

−α‖∇f(

x(0))

‖2

+o (α)

where α > 0


Decreasing the Value of f

For small α,

f(

x(0) − α∇f(

x(0)))

= f(

x(0))

− α‖∇f(

x(0))

‖2



For small α,

f(

x(0) − α∇f(

x(0)))

= f(

x(0))

− α‖∇f(

x(0))

‖2

Therefore, for small α and ∇f(

x(0))

6= 0,

f(

x(0) − α∇f(

x(0)))

< f(

x(0))



For small α,

f(

x(0) − α∇f(

x(0)))

= f(

x(0))

− α‖∇f(

x(0))

‖2

Therefore, for small α and ∇f(

x(0))

6= 0,

f(

x(0) − α∇f(

x(0)))

< f(

x(0))

Thus the point

xnew = x(0) − α∇f(

x(0))

is an improvement over the point xold = x(0) if we are

searching for a minimizer of f


The Method of Steepest Descent

Starting from x(0) conduct a 1D (one-dimensional) search in

the direction

−∇f(

x(0))

= −g(

x(0))




the direction

−∇f(

x(0))

= −g(

x(0))

Conduct search until a min is found. At this point, x(1),

re-assess the gradient

g(1) = ∇f(

x(1))




the direction

−∇f(

x(0))

= −g(

x(0))

Conduct search until a min is found. At this point, x(1),

re-assess the gradient

g(1) = ∇f(

x(1))

If g(1) = ∇f(

x(1))

6= 0, conduct a line search (1D search)

along −g(1) for a minimizer, etc.


1D Line Search


A Steepest Descent Path

x(0)

x(1)

x(2)

x(3) x*

f=c0

f=c1

f=c2

f=c3

c0>c1>c2>c3


Steepest Descent Path in a Valley

x(0)

x(1)

f=c0

f=c1

c > c0 1

x*


The Iterative Steepest Descent (SD) Algorithm

x(k+1) = x(k) − αkg(k)

where αk is a positive scalar minimizing

f(

x(k) − αg(k))


The Iterative Steepest Descent (SD) Algorithm

x(k+1) = x(k) − αkg(k)

where αk is a positive scalar minimizing

f(

x(k) − αg(k))

An alternative way to express αk,

αk = arg minα≥0

f(

x(k) − αg(k))


Newton’s Method for a Function of n Variables

Assumption f ∈ C2 implies F = F⊤




Obtain Quadratic Approximation of f Using Second-Order

Taylor’s Expansion,

q(x) = f(

x(k))

+ g(k)⊤(

x − x(k))

+1

2

(

x − x(k))⊤

F(

x(k)) (

x − x(k))




Obtain Quadratic Approximation of f Using Second-Order

Taylor’s Expansion,

q(x) = f(

x(k))

+ g(k)⊤(

x − x(k))

+1

2

(

x − x(k))⊤

F(

x(k)) (

x − x(k))

Apply the FONC to q(x),

∇q(x) = 0


Idea Behind Newton’s Method

f,q

x1

x2

fq

Current Pointx(k)

x(k+1) x*Predicted Minimizer

Newton’s algorithm minimizes the quadratic approximation rather

than the function itself


FONC Applied to q(x)

∇q(x) = g(k) + F(

x(k)) (

x − x(k))

= 0



∇q(x) = g(k) + F(

x(k)) (

x − x(k))

= 0

Rearrange,

F(

x(k))

x − x(k)

+

g(k)

=

0



∇q(x) = g(k) + F(

x(k)) (

x − x(k))

= 0

Rearrange,

F(

x(k))

x − x(k)

+

g(k)

=

0

Assume that F(

x(k))−1

exists


Newton’s Iterative Algorithm

Solve for x and label it x(k+1),

F(

x(k))

x − x(k)

+

g(k)

=

0




F(

x(k))

x − x(k)

+

g(k)

=

0

x(k+1)= x(k) − F

(

x(k))−1

g(k)




F(

x(k))

x − x(k)

+

g(k)

=

0

x(k+1)= x(k) − F

(

x(k))−1

g(k)

Newton’s method solves quadratic in one step!


Newton’s Algorithm Minimizes the Quadratic in One Step

The quadratic,

f(x) =1

2x⊤Qx − x⊤b + c, x ∈ R

n, Q = Q⊤ > 0



The quadratic,

f(x) =1

2x⊤Qx − x⊤b + c, x ∈ R

n, Q = Q⊤ > 0

FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0



The quadratic,

f(x) =1

2x⊤Qx − x⊤b + c, x ∈ R

n, Q = Q⊤ > 0

FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0

We prove that x(1) = x∗



The quadratic,

f(x) =1

2x⊤Qx − x⊤b + c, x ∈ R

n, Q = Q⊤ > 0

FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0

We prove that x(1) = x∗

Let x(0) be an initial guess (condition)

x(1) = x(0) − F(

x(0))−1

g(0)

= x(0) − Q−1(

Qx(0) − b)

= Q−1b

= x∗


modern automatic control - purdue universityzak/ece680/ece_680_steepest_and_newton.pdfgradient and...

Documents