basic concepts in linear algebra and optimizationstanford.edu/~yinbin/teaching/optimization.pdfbasic...

Basic concepts in Linear Algebra and Optimization

Yinbin Ma

GEOPHYS 211

Outline

Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication

Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient

Unconstrainted Nonlinear OptimizationI Optimality conditionI Methods based on a local quadratic modelI Line search methods

Basic concepts - vector space

A vector space is any set V for which two operations are defined:1) Vector addition: any vector x1 and x2 in set V can be added to anothervector x = x1 + x2 and x is also in set V .2) Scalar Multiplication: Any vector x in V can be multiplied ("scaled")by a real number c 2 R to produce a second vector cx which is also in V.

In this class, we only discuss the case where V ⇢ Rn, meaning each vectorx is the space is a n-dimensional column vector.

Basic concepts - norm

The “model space” and “data space” we mentioned in class are normedvector spaces. A norm is a function k·k : Rn ! R that map a vector to areal number. A norm must satisfy the following:1) kxk � 0 and kxk= 0 i� x = 02) kx + yk kxk+kyk3) kaxk= |a|kxkwhere x and y are vectors in vector space V and a 2 R.

Basic concepts - normWe will see the following norm in this course:1) L2 norm: for a vector x , the L2 norm is defined as:

kxk2 ⌘

sn

Âi=1

x2i

2) L1norm: for a vector x ,the L2 norm is defined as:

kxk1 ⌘n

Âi=1

|xi

|

3) L• norm: for a vector x ,the L• norm is defined as:

kxk• ⌘ maxi=1,··· ,n

|xi

|

The norm for a matrix is induced as:

||A||a = supx 6=0

||Ax ||a||x ||a

Basic concepts - linear mapping, range and null spaceWe say a a map x ! Ax is linear if for any x ,y 2 Rn, and any a 2 R,

A(x + y) = Ax +AyA(ax) = aAx

It can be proved that each linear mapping from Rn to Rm can beexpressed by the multiplication of a m⇥n matrix.The range of linear operator A 2 Rm⇥n, is the space spanned by thecolumns of A,

range(A) = {y |such that y = Ax ,x 2 Rn}

The null space of linear operator A 2 Rm⇥n is the space,

null(A) = {x |such that Ax = 0}

It is “obvious” that range(A) is perpendicular to null(AT ). (exercise)

Basic concepts - four ways matrix multiplication

For the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 1:

bij

=m

Âk=1

aik

ckj

Here bij

, aik

, and ckj

are entries of B, A, C.



B = [b1|b2| · · · |bn

]

Here bi

is the i � th column of matrix B.Then,

B = [Ac1|Ac2| · · · |Acn

]

bi

= Aci

Each column of B is in the range (we will talk about it later) of A. Thus,the range of B is the subset of the range of A.

Basic concepts - four ways matrix multiplicationFor the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 3:

B =

2

664

bT

1bT

2· · ·bT

l

3

775

Here bi

is the i � th row of matrix B.Then,

B =

2

664

aT

1 CaT

2 C· · ·

aT

l

C

3

775

bT

i

= aT

i

CThis form is not commenly used.



B = Âi ,j=1,··· ,m

ai

cT

j

Where, ai

is the i � th column of matrix A, and cT

j

is the j � th row ofmatrix C .Each term a

i

cT

j

is a rank-one matrix.

Outline



Unconstrainted Nonlinear OptimizationI Optimality conditionI Search directionI Line search

Linear Optimization- normal equationWe solve a linear system having n unknowns and with m > n equations.We want to find a vector m 2 Rn that satisfies,

Fm = d

where d 2 Rm and F 2 Rm⇥n.Reformulate the problem:

define residual r = d�Fmfind m that minimizekrk2 = kFm�dk2

It can be proved that, we can minimize the residual norm when F⇤r = 0.This is equivalent to a n⇥n system,

F⇤Fm = F⇤d

which is the normal equation. We can solve norm equation using directionmethods such at LU, QR, SVD, Cholesky decomposition, etc.

Linear Optimization-steepest descent method

For the unconstraint linear optimization problem:

min J(m) = kFm�dk22

To find the minimum of objective function J(m) iteratively using steepestdescent method, at the current point mk, we update the model by movingalong the nagative direction of gradient,

mk+1 = m

k

�a—J(mk

)

—J(mk

) = F⇤(Fmk

�d)

The gradient can be evaluated exactly, and we have analytical formula forthe optimal a.

Linear Optimization-conjugate gradient methodFor the unconstraint linear optimization problem:

min J(m) = kFm�dk22

Starting from m0, we have a series of search direction �mi

, i = 1,2, · · · ,k,and updated model iteratively,m

i

= mi�1 �a

i�1�mi�1, i = 1, · · · ,k.

For the next search direction �mk

in the spacespan{�m0, · · · ,�m

k�1,—J(mk

)},

�mk

=k�1Âi=0

ci

�mi

+ ck

—J(mk

)

The “magic” is that for linear problem c0 = c1 = · · ·= ck�2 = 0. We ended

up with Conjugate gradient method,�m

k

= ck�1�m

k�1 + ck

—J(mk

)

ak

= min J(mk

+ak

�mk

)

mk+1 = m

k

+ak

�mk

We are searching within the space span{�m0, · · · ,�mk�1,—J(m

k

)} in CGmethod, though looks like we are doing a plane search.

Outline



Unconstrainted Nonlinear OptimizationI Optimality conditionI Search directionI Line search

Unconstrainted Nonlinear Optimization-OptimalityconditionFor the unconstraint nonlinear optimization problem:

minimize mJ(m)

where J(m) is a real-valued function.How should we determine if m⇤ is a local minimizer?Theorem(First order necessary conditions for a local minimum)

—J(m⇤) = 0

Theorem(Second order necessary conditions for a local minimum)

s⇤—2J(m⇤)s � 0, 8s 2 Rn

Unconstrainted Nonlinear Optimization-Search directionFor the unconstraint nonlinear optimization problem:

minimize mJ(m)

Given a model point mk

, we want to find a search direction �mk, and areal number, such that J(m

k

+ak

�mk

)< J(mk

).How do we choose the search direction �mk?1) Gradient based method,

J(mk

+ak

�mk

)�J(mk

)⇡ ak

—J(mk

)T�mk

+O(k�mk

k22)

Thus,�m

k

=�—J(mk

)

is a search direction. We can also use similar technique in CG method,

�mk

=�c1—J(mk

)+ c2�mk�1

where c1,c2 2 R.

Unconstrainted Nonlinear Optimization-Search directionFor the unconstraint nonlinear optimization problem:

minimize mJ(m)



k

+ak

�mk

)< J(mk

).How do we choose the search direction �mk?1) Methods based on a local quadratic model,

J(mk

+ak

�mk

)�J(mk

)⇡ ak

—J(mk

)T�mk

+a2k

12�mT

k

—2J(mk

)�mk

We solve the approximated problem,

minimize y(pk

)⌘ —J(mk

)T pk

+12p

k

—2J(mk

)pk

pk

= ak

�mk

The approximated problem is a linear system and can be solved exactly.Then, update the model,

mk+1 = m

k

+pk

Unconstrainted Nonlinear Optimization-Line searchFor the unconstraint nonlinear optimization problem:

minimize mJ(m)



k

+ak

�mk

)< J(mk

).How do we choose a

k

for a given search direction �mk? Can we choosearbitrary a

k

such that J(mk

+ak

�mk

)< J(mk

)?The answer is no. For example, J(m) = m2, m 2 R1. We can find asequence, such that

m0 = 2,�mk

=�mk

ak

=2+3⇥2�(k+1)

1+2�k

Then,m

k

= (�1)k(1+2�k)

J(mk

) =1

(1+2�k)2 ! 1

Unconstrainted Nonlinear Optimization-Line search

For the unconstraint nonlinear optimization problem:

minimize mJ(m)



k

+ak

�mk

)< J(mk

).How do we choose a

k

for a given search direction �mk? A popular set ofconditions that guarentee convergence named Wolfe condition:

J(mk

+ak

�mk

) J(mk

)+ c1ak

—J(mk

)T�mk

—J(mk

+ak

�mk

)T�mk

� c2ak

—J(mk

)T�mk

where 0 < c1 < c2 < 1.

Reference

Numerical Linear Algebra, by Lloyd N. Trefethen, David Bau III.Numerical Optimization, by Jorge Nocedal, Stephen Wright.Lecture notes from Prof. Walter Murray,http://web.stanford.edu/class/cme304/

basic concepts in linear algebra and optimizationstanford.edu/~yinbin/teaching/optimization.pdfbasic...

Documents