modern automatic control - purdue universityzak/ece680/ece_680_steepest_and_newton.pdfgradient and...
TRANSCRIPT
ECE 680
Modern Automatic ControlGradient and Newton’s Methods—A Review
Stan Zak
October 21, 2019
ECE 680Modern Automatic Control – p. 1/14
Review of the Gradient Properties
The direction of maximum increase of a real-valued
differentiable function at a point is orthogonal to the level set
of the function at that point
ECE 680Modern Automatic Control – p. 2/14
Review of the Gradient Properties
The direction of maximum increase of a real-valued
differentiable function at a point is orthogonal to the level set
of the function at that point
The gradient acts in a such a direction that for a small
displacement, f increases more in this direction than in any
other direction
ECE 680Modern Automatic Control – p. 2/14
Review of the Gradient Properties
The direction of maximum increase of a real-valued
differentiable function at a point is orthogonal to the level set
of the function at that point
The gradient acts in a such a direction that for a small
displacement, f increases more in this direction than in any
other direction
The direction of the gradient is clearly a good direction to
move in the case of maximization
ECE 680Modern Automatic Control – p. 2/14
Review of the Gradient Properties
The direction of maximum increase of a real-valued
differentiable function at a point is orthogonal to the level set
of the function at that point
The gradient acts in a such a direction that for a small
displacement, f increases more in this direction than in any
other direction
The direction of the gradient is clearly a good direction to
move in the case of maximization
If we minimize, then we should move in the direction of the
negative gradient
ECE 680Modern Automatic Control – p. 2/14
Steepest Ascent
0∇f(x )
x0
f = c0
f = c + δ0
ECE 680Modern Automatic Control – p. 3/14
Intro to the Gradient Descent Method
Taylor’s at x(0) with ∇f(
x(0))
6= 0,
f(
x(0) − α∇f(
x(0)))
= f(
x(0))
−α∇f(
x(0))⊤
∇f(
x(0))
+o (α)
= f(
x(0))
−α‖∇f(
x(0))
‖2
+o (α)
where α > 0
ECE 680Modern Automatic Control – p. 4/14
Decreasing the Value of f
For small α,
f(
x(0) − α∇f(
x(0)))
= f(
x(0))
− α‖∇f(
x(0))
‖2
ECE 680Modern Automatic Control – p. 5/14
Decreasing the Value of f
For small α,
f(
x(0) − α∇f(
x(0)))
= f(
x(0))
− α‖∇f(
x(0))
‖2
Therefore, for small α and ∇f(
x(0))
6= 0,
f(
x(0) − α∇f(
x(0)))
< f(
x(0))
ECE 680Modern Automatic Control – p. 5/14
Decreasing the Value of f
For small α,
f(
x(0) − α∇f(
x(0)))
= f(
x(0))
− α‖∇f(
x(0))
‖2
Therefore, for small α and ∇f(
x(0))
6= 0,
f(
x(0) − α∇f(
x(0)))
< f(
x(0))
Thus the point
xnew = x(0) − α∇f(
x(0))
is an improvement over the point xold = x(0) if we are
searching for a minimizer of f
ECE 680Modern Automatic Control – p. 5/14
The Method of Steepest Descent
Starting from x(0) conduct a 1D (one-dimensional) search in
the direction
−∇f(
x(0))
= −g(
x(0))
ECE 680Modern Automatic Control – p. 6/14
The Method of Steepest Descent
Starting from x(0) conduct a 1D (one-dimensional) search in
the direction
−∇f(
x(0))
= −g(
x(0))
Conduct search until a min is found. At this point, x(1),
re-assess the gradient
g(1) = ∇f(
x(1))
ECE 680Modern Automatic Control – p. 6/14
The Method of Steepest Descent
Starting from x(0) conduct a 1D (one-dimensional) search in
the direction
−∇f(
x(0))
= −g(
x(0))
Conduct search until a min is found. At this point, x(1),
re-assess the gradient
g(1) = ∇f(
x(1))
If g(1) = ∇f(
x(1))
6= 0, conduct a line search (1D search)
along −g(1) for a minimizer, etc.
ECE 680Modern Automatic Control – p. 6/14
1D Line Search
ECE 680Modern Automatic Control – p. 7/14
A Steepest Descent Path
x(0)
x(1)
x(2)
x(3) x*
f=c0
f=c1
f=c2
f=c3
c0>c1>c2>c3
ECE 680Modern Automatic Control – p. 8/14
Steepest Descent Path in a Valley
x(0)
x(1)
f=c0
f=c1
c > c0 1
x*
ECE 680Modern Automatic Control – p. 9/14
The Iterative Steepest Descent (SD) Algorithm
x(k+1) = x(k) − αkg(k)
where αk is a positive scalar minimizing
f(
x(k) − αg(k))
ECE 680Modern Automatic Control – p. 10/14
The Iterative Steepest Descent (SD) Algorithm
x(k+1) = x(k) − αkg(k)
where αk is a positive scalar minimizing
f(
x(k) − αg(k))
An alternative way to express αk,
αk = arg minα≥0
f(
x(k) − αg(k))
ECE 680Modern Automatic Control – p. 10/14
Newton’s Method for a Function of n Variables
Assumption f ∈ C2 implies F = F⊤
ECE 680Modern Automatic Control – p. 11/14
Newton’s Method for a Function of n Variables
Assumption f ∈ C2 implies F = F⊤
Obtain Quadratic Approximation of f Using Second-Order
Taylor’s Expansion,
q(x) = f(
x(k))
+ g(k)⊤(
x − x(k))
+1
2
(
x − x(k))⊤
F(
x(k)) (
x − x(k))
ECE 680Modern Automatic Control – p. 11/14
Newton’s Method for a Function of n Variables
Assumption f ∈ C2 implies F = F⊤
Obtain Quadratic Approximation of f Using Second-Order
Taylor’s Expansion,
q(x) = f(
x(k))
+ g(k)⊤(
x − x(k))
+1
2
(
x − x(k))⊤
F(
x(k)) (
x − x(k))
Apply the FONC to q(x),
∇q(x) = 0
ECE 680Modern Automatic Control – p. 11/14
Idea Behind Newton’s Method
f,q
x1
x2
fq
Current Pointx(k)
x(k+1) x*Predicted Minimizer
Newton’s algorithm minimizes the quadratic approximation rather
than the function itself
ECE 680Modern Automatic Control – p. 12/14
FONC Applied to q(x)
∇q(x) = g(k) + F(
x(k)) (
x − x(k))
= 0
ECE 680Modern Automatic Control – p. 13/14
FONC Applied to q(x)
∇q(x) = g(k) + F(
x(k)) (
x − x(k))
= 0
Rearrange,
F(
x(k))
x − x(k)
+
g(k)
=
0
ECE 680Modern Automatic Control – p. 13/14
FONC Applied to q(x)
∇q(x) = g(k) + F(
x(k)) (
x − x(k))
= 0
Rearrange,
F(
x(k))
x − x(k)
+
g(k)
=
0
Assume that F(
x(k))−1
exists
ECE 680Modern Automatic Control – p. 13/14
Newton’s Iterative Algorithm
Solve for x and label it x(k+1),
F(
x(k))
x − x(k)
+
g(k)
=
0
ECE 680Modern Automatic Control – p. 14/14
Newton’s Iterative Algorithm
Solve for x and label it x(k+1),
F(
x(k))
x − x(k)
+
g(k)
=
0
x(k+1)= x(k) − F
(
x(k))−1
g(k)
ECE 680Modern Automatic Control – p. 14/14
Newton’s Iterative Algorithm
Solve for x and label it x(k+1),
F(
x(k))
x − x(k)
+
g(k)
=
0
x(k+1)= x(k) − F
(
x(k))−1
g(k)
Newton’s method solves quadratic in one step!
ECE 680Modern Automatic Control – p. 14/14
Newton’s Algorithm Minimizes the Quadratic in One Step
The quadratic,
f(x) =1
2x⊤Qx − x⊤b + c, x ∈ R
n, Q = Q⊤ > 0
ECE 680Modern Automatic Control – p. 15/14
Newton’s Algorithm Minimizes the Quadratic in One Step
The quadratic,
f(x) =1
2x⊤Qx − x⊤b + c, x ∈ R
n, Q = Q⊤ > 0
FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0
ECE 680Modern Automatic Control – p. 15/14
Newton’s Algorithm Minimizes the Quadratic in One Step
The quadratic,
f(x) =1
2x⊤Qx − x⊤b + c, x ∈ R
n, Q = Q⊤ > 0
FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0
We prove that x(1) = x∗
ECE 680Modern Automatic Control – p. 15/14
Newton’s Algorithm Minimizes the Quadratic in One Step
The quadratic,
f(x) =1
2x⊤Qx − x⊤b + c, x ∈ R
n, Q = Q⊤ > 0
FONC, ∇f(x) = Qx − b = 0, x∗ = Q−1b, F (x) > 0
We prove that x(1) = x∗
Let x(0) be an initial guess (condition)
x(1) = x(0) − F(
x(0))−1
g(0)
= x(0) − Q−1(
Qx(0) − b)
= Q−1b
= x∗
ECE 680Modern Automatic Control – p. 15/14