mathematical methods for business and economics · m. maggi (mibe) mathematical methods for...

Mathematical Methods for Business and Economics

Mario Maggi

Dipartimento di Economia Politica e Metodi Quantitativi

Universita di Pavia

a.a. 2010/2011

M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 1 / 79

Vectors

Real vector

x ∈ Rn

components xi , i = 1, . . . , n

x = [x1, x2, . . . , xn]

Superscripts denote different vectors, e.g. x1, x2

x1 =[x11 , x

12 , . . . , x

1n

]

Real numbers (scalars): α ∈ R


Vectors

Row/column vectors

x = [x1, x2, . . . , xn] , y =

y1

y2...

yn

Transposition: x ′

x = [x1, x2, . . . , xn] , x ′ =

x1

x2...

xn

,(x ′)′= [x1, x2, . . . , xn]


Special vectors

Special vectors:

The null vector [0] =[

0 · · · 0]

The sum vector 1 =[

1 · · · 1]

The basis vectors, each one have null components except the one in

i -th position which equals 1:

e1 =[

1 0 0 · · · 0]

e2 =[

0 1 0 · · · 0]

en =[

0 0 · · · 0 1]


Vector comparison

x = y if xi = yi ,∀i (x 6= y else);

x > y (greater than), if xi > yi ,∀i ;

x ≧ y (greater or equal than), if xi ≧ yi ,∀i ;

x ≥ y (quasi-greater than) if x ≧ y and x 6= y

In a similar way the opposite relations (<, ≦, ≤) and negations (≯, �, �,

≮, �, �) are introduced

Remark We use the same convention for scalars too, then “≧” stands for

greater or equal than (idem for “≦”)

Comparison between the vector x and [0]:

x > [0], x is positive

x ≧ [0], x is non-negative

x ≥ [0], x is semi-positive

(the same for the negative cases).M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 5 / 79

Vector operations

Sum: Given two vectors x , y ∈ Rn, both row or column

z = x+y , =

z1

z2...

zn

=

x1 + y1

x2 + y2...

xn + yn

, or zi = xi+yi , i = 1, . . . , n

Product by a scalar: Given a vector x ∈ Rn and a scalar α ∈ R

z = αx = [αx1, αx2, . . . αxn] or zi = αxi , i = 1, . . . , n

Scalar product: Given two vectors x ∈ Rn (row) and y ∈ Rn (column)

xy =n∑

i=1

xiyi

xy is a scalar: xy ∈ RM. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 6 / 79

Vector operations

{x , y ∈ X} ⇒ {x + y ∈ X} X is closed with respect

to the sum

{x , y ∈ X} ⇒ {y + x ∈ X} the sum is commutative

{x , y , z ∈ X} ⇒ {(x + y) + z = x + (y + z)} the sum is associative

{x ∈ X} ⇒ {∃ [0] : [0] ∈ X , (x + [0]) = x} ∃ the null vector (neu-

tral)

{x ∈ X} ⇒ {∃ (−x) : (−x) ∈ X , (x + (−x)) = [0]} ∃ the opposite vector


Vector operations

{x ∈ X , λ ∈ R} ⇒ {λx ∈ X} X is closed w.r.t. the

multiplication by a

scalar

{x ∈ X , λ, µ ∈ R} ⇒ {(λ+ µ) x ∈ X} distributive property

{x , y ∈ X , λ ∈ R} ⇒ {λ (x + y) = λx + λy} distributive property

{x ∈ X , λ, µ ∈ R} ⇒ {µ (λx) = (λµ) x} associative property

{x ∈ X} ⇒ {1x = x} ∃ the neutral element


Norm and distance

A norm is a function ‖·‖ : Rn → R which associates a real number to

each vector of Rn.

Norm properties: ∀x , y ∈ Rn, ∀α ∈ R

1 ‖x‖ ≧ 0, ∀x 6= [0] and ‖[0]‖ = 0

2 ‖αx‖ = |α| ‖x‖

3 ‖x + y‖ ≦ ‖x‖+ ‖y‖ (triangular unequality).

Therte exists different kind of norm.

The p-norms are widely used

‖x‖p =

(n∑

i=1

|xi |p

) 1p

, 1 ≦ p < +∞,


Norm and distance

in particular

‖x‖1 =n∑

i=1

|xi |

‖x‖2 =

√n∑

i=1

x2i , the Euclidean norm corresponding to the length of

the segment [0] , x in the Cartesian space Rn;

‖x‖∞ = maxi∈{1,...,n} {|xi |}

Remark Given x ∈ Rn (column),

x ′x =n∑

i=1

(xi)2 = ‖x‖2 .


Norm and distance

Given two vectors x , y ∈ Rn, the Euclidean norm of their difference is

‖x − y‖ =

√√√√

n∑

i=1

(xi − yi)2

The function d : Rn ×Rn → R+ which associates to each pair (x , y) of

Rn vectors the value ‖x − y‖ is said Euclidean distance

The Euclidean distance between x and y corresponds to the length of the

segment x , y in the Cartesian space Rn


Linear space

Consider a set X on whose elements the operations of sum and product by

a scalar are defined as above. The set X is said a Linear space if

given any pairs (x , y) of elements of X , then x + y ∈ X

for any x ∈ X and α ∈ R, then αx ∈ X

If the set Y is a linear space and Y ⊆ X , then Y is said a linear subset of

X

If in the linear space X the Euclidean norm is defined, then X is a

Euclidean space

The n-dimension Cartesian space Rn is a Euclidean space


Definition

The two vectors x , y ∈ Rn are orthogonal if their scalar product is null

From the geometric point of view, this means that the two segments

[0] , x and [0] , y form a right angle in the n-dimension Cartesian space

Definition

The vectors{x i ∈ Rn, i = 1, . . . , n

}are linearly independent if it is not

possible to find n not all null scalars αi , i = 1, . . . , n, such that

n∑

i=1

αixi = [0] .


Matrices

A matrix A of order (m × n) is a set of mn scalars endowed with a

complete double order: Given the set of indexes

(i , j) ∈ {1, . . . ,m} × {1, . . . , n}, i is the row index i , j is the column index

A =

a11 a12 · · · a1n

a21 a22. . .

......

. . .. . . a(m−1)n

am1 · · · am(n−1) amn

.

The elements of a matrix A: aij , with i = 1, . . . ,m and j = 1, . . . , n

A = [aij ]

Ai i−th row of A

Aj j−th column of A


Matrices

Columnwise:

A =[A1|A2| · · · |An

], Aj ∈ Rm, i = 1, . . . ,m

Rowwise:

A =

A1

A2

...

Am

, Ai ∈ Rn, j = 1, . . . , n.


Matrices

Transposition

A =[A1|A2| · · · |An

]∈ Rm×n, A′ =

(A1)′

(A2)′

...

(An)′

∈ Rn×m,

or

A =

A1

A2

...

Am

∈ Rm×n, A′ =[(A1)

′ | (A2)′ | · · · | (Am)

′] ∈ Rn×m.

A square matrix A is symmetric if

A = A′, aij = aji , ∀i , j


Matrix operations

Product by a scalar

Given A ∈ and λ ∈ R, then

C = λA ∈ Rm×n, and cij = λaij , i = 1, . . .m, j = 1, . . . , n

Matrix product

Given A ∈ Rm×n and B ∈ Rp×q, the product AB is defined if n = p, its

elements are

(AB)ij = AiBj .

The element in place (i , j) is obtained by the scalar product betwen

Ai and B j

In general, this product does not commutate AB 6= BA.


Matrix operations

Element by element (Hadamard) product

Given A,B ∈ Rm×n

C = A ∗ B , C ∈ Rm×n, cij = aijbij , i = 1, . . .m, j = 1, . . . , n

Kronecker (tensorial) product

Given A ∈ Rm×n, B ∈ Rp×q the Kronecker product C = A⊗ B yields a

matrix C ∈ Rmp×nq defined by blocks as follows

C =

a11B a12B · · · a1nB

a21B a22B · · · a2nB...

.... . .

...

am1B am2B · · · amnB

.

In general, this product does not commutate: A⊗ B 6= B ⊗ A.M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 18 / 79

Special matrices

The elements aij with i = j form the (principal) diagonal of the matrix A

Diagonal matrices: the elements out of the diagonal are null

Identity matrix:

I =

1 0 · · · 0

0 1. . .

......

. . .. . . 0

0 · · · 0 1

Remark: AI = A, IA = A, ∀A for which the matrix product is defined


Special matrices

A square matrix is said upper (lower) triangular if the elements below

(above) the diagonal are null

A =

a11 a12 · · · a1n

0 a22. . .

......

. . .. . . a(n−1)n

0 · · · 0 ann

, upper triangular

A =

a11 0 · · · 0

a21 a22. . .

......

. . .. . . 0

an1 · · · an(n−1) ann

, lower triangular


Block partitioned matrices

For example:

[

A B

C D

]

,

1 3 5

−9 0 4

0 1 −2

4 −2

2 −1

0 1

[

−1 1 −1] [

0 −1]

[ [

1 2

0 0

] [

0 1

1 1

] ]

,

[

0 1

1 1

]

[0] [0]

[0]

[

0 1

1 1

]

[0]

[0] [0]

[

0 1

1 1

]


Determinant

The determinant is a function det : Rn×n → R which associates a real

number to each square matrix

Determinant calculation

Minors Given a submatrix A of A

Order k minor: A contains the elements of some k rows and some k

columns of A; with k = 1, 2, . . . , n

Order k principal minor: A contains the elements of some k rows

the corresponding k columns of A; with k = 1, 2, . . . , n

Leading (or North-West) minor of order k: A contains the

elements of the first k rows and columns of A

Complement: given aij se A is obtained deleting the row Ai and the

column Aj from A; det(A)is of order (n − 1)

Cofactor of aij : the product (−1)i+j det(A)


Laplace rule

Given the square matrix A ∈ Rn×n, its determinant is given by

fix a row index i

det (A) =

n∑

j=1

aijcij ;

fix a column index j

det (A) =

n∑

i=1

aijcij .

where cij is the cofactor of the element aij

The determinant of a matrix of order n is the sum of n determinants of

order n− 1 → recursion


Determinant

Given two order n square martices A and B

det (AB) = det (A) det (B) .

If A is triangular det(A) = a11a22 · · · ann

Transposition: det(A) = det(A′)

Block matrices:

det

([

A B

[0] D

])

= det

([

A [0]

B D

])

= det(A) det(D)

with A and D square matrices


Rank

The rank of the matrix A ∈ Rm×n is equal to the number of its rows or

columns which are linearly independent

rk(A) ≦ min {m, n}.

The rakn of a matrix A ∈ Rm×n equals the maximum order of its non-null

minors

Theorem

Consider the matrices A ∈ Rm×n and B ∈ Rn×q, then

rk (AB) ≦ min {rk (A) , rk (B)} .

If B = A′, then

rk(AA′) = rk

(A′A

)= rk (A) .


Inverse

Given a matrix A ∈ Rn×n, it is invertible (non-singular) if a matrix

A−1 ∈ Rn×n exists such that

AA−1 = I , A−1A = I .

A matrix A is invertible if and only if det (A) 6= 0.

When it exists, the inverse is unique

Remark The inverse of a diagonal, upper triangular, lower triangular

matrix is diagonal, upper triangular, lower triangular, respectively


Linear transformations

Definition

A function f : Rn → Rm endowed with the properties

f (x + y) = f (x) + f (y) , ∀x , y ∈ Rn,

f (αx) = αf (x) , ∀α ∈ R,∀x ∈ Rn,

is called linear transformation

A linear transformation f : Rn → Rm can be identified by an m × n

matrix, the coefficient matrix

Given a column vector x ∈ Rn, the productAx is an Rm column vector

x 7→ Ax verifies

{

A (x + y) = Ax + Ay , ∀x , y ∈ Rn,

A (αx) = α (Ax) , ∀α ∈ R,∀x ∈ Rn.

it is a linear transformation Rn ⊇ x 7→ Ax ⊆ Rm.M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 27 / 79

Linear transformation

Consider the linear transformation y = Ax , A ∈ Rm×n

rk(A) = n each x ∈ Rn produces a different y ∈ Rm

rk(A) = m every y ∈ Rm can be obtained by transforming (at least

one) vector x ∈ Rm

rk(A) < n different x ∈ Rn can produce the same y ∈ Rm

rk(A) < m there are some y ∈ Rm which can not be obtained by trans-

forming an x ∈ Rn

Special case: rk(A) = n = m one-to-one transformation

A ∈ Rn×n, y = Ax , x = A−1y


Linear transformation

Consider a set of n Rm (column) vectors{A1,A2, . . . ,An

}. The set of all

linear combinations of them is a linear space of dymension k = rk(A),

where A =[A1 | A2 | · · · | An

]∈ Rm×n:

{y ∈ Rm | y = Ax , x ∈ Rn}

is a linear space, it is called the span of the set{A1,A2, . . . ,An

}or the

linear space generated by it.


Linear systems

Consider the system of m linear equations and n variables x1, x2, . . . , xn,

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2...

......

am1x1 + am2x2 + · · ·+ amnxn = bm.

Collecting the aij coefficients into the matrix A ∈ Rm×n and the right hand

side terms into the vector b ∈ Rm, the system can be written in the form

Ax = b

Theorem (Rouche–Capelli)

The linear system Ax = b admit solution if and only if rk(A) = rk(A | b).


Linear systems

The system Ax = b can be written in the form

[A1 | A2 | · · · | An

]x = b

A1x1 + A2x2 + · · · + Anxn = b

that is:

Is it possible to find n real numbers (x1, x2, . . . , xn) such that the

linear combination of the columns of A is equal to b?

In oter words, does the vector b belong to the span of the column of

A?

The Rouche–Capelli theorem checks exactly this.


Linear systems

Consider the system Ax = b, A ∈ Rm×n, with rk(A) = rk(A | b) (i.e. a

solution exists).

rk(A) < m m − rk(A) equations are redundant

rk(A) < n n − rk(A) variables can be moved to the right hand side

rk(A) = m < n there are ∞n−m solutions for every b

rk(A) = m = n Cramerian system: one solution for every b


Eigenvalues and eigenvectors

Consider the square matrix A ∈ Rn×n.

There exists a (complex) number λ and a (complex) vector x ∈ Cn,

x 6= [0] such that Ax = λx?

That is, solve the problem

{

Ax = λx

x 6= [0]

In each pair (λ,x) which solves this problem:

λ is an eigenvalue of A

x is an eigenvector of A associated to the eigenvalue λ

The linear transformation acts as a simple scalar, transforming an

eigenvector into a vector proportional to it



Ax = λx , x 6= [0] Ax = λIx , x 6= [0]

Ax − λIx = [0], x 6= [0] (A− λI ) x = [0] , x 6= [0]

The last relation, with respect to x is a linear homogeneous system,

therefore it admits non-null solution if and only if its coefficient matrix

(A− λI ) is singular, or its determinant is null:

det (A− λI ) = 0

The left hand side is a polynomial of degree n in λ: characteristic

polynomial of A

The roots (real, complex, single, multiple) of the characteristic polynomial

are the eigenvalues of A

The set of the eigenvalues of A is the spectrum of AM. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 34 / 79


The multiplicity of an eigenvalues as a root of the characteristic

polynomial is said algebraic multiplicity of the eigenvalue

If x is an eigenvector of A associated to the eigenvalue λ, then (αx),

with ∀α 6= 0, is an eigenvector of A, associato a λ as well:

Ax = λx

x 6= [0]

α 6= 0

⇒

{

A (αx) = λ (αx)

(αx) 6= [0] .

A is singular if and only if one of its eigenvalues is null

If A is diagonal o triangular, its diagonal elements are its eigenvalues

det(A) is equal to the product of all the eigenvalues (with their

algebraic multiplicity)

det (A) =

n∏

i=1

λi .



Theorem

Eigenvectors associated to different eigenvalues are linearly independent

Theorem

The eigenvalues (and the eigenvectors) of a real symmetric matrix are real

Theorem

A symmetric matrix always has n linearly independent eigenvectors.

It is possible tho choose them to be orthogonal with norm 1

Theorem

If A is not singular and has eigenvalue-eigenvector pairs(λi , x

i),

i = 1, . . . , n, then the pairs(

1λi, x i)

, i = 1, . . . , n, are the

eigenvalue-eigenvector pairs for A−1


Diagonalization

Theorem If and only if A has n linearly independent eigenvectors x1, x2,

. . . , xn, then the matrix

X =[

x1 x2 · · · xn]

is such that the product X−1AX is a diagonal matrix, with diagonal

elements equal to the eigenvalues:

D = X−1AX =

λ1 0 · · · 0

0 λ2 · · · 0

· · · · · ·. . . · · ·

0 0 · · · λn

Theorem If A is symmetric, then the column of X can be chosen such

that X−1 = X ′ (orthogonal matrix)M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 37 / 79

Spectral decomposition

Theorem

A symmetric matrix A can be decomposed (by product) as follows

A = λ1

[

x1(x1)′]

+ λ2

[

x2(x2)′]

+ · · ·+ λn

[xn (xn)′

],

where λ1, λ2, . . . , λn are the eigenvalues of A and x1, x2, . . . , xn are

eigenvectors respectively associated to them

An equivalent form is

A = XDX ′.

Each matrix[

x i(x i)′]

has rank 1

Moreover, if A = A′ is non-singular, then

A−1 = (λ1)−1[

x1(x1)′]

+ (λ2)−1[

x2(x2)′]

+ · · ·+ (λn)−1 [xn (xn)′

].


Quadratic forms

Let he function q : Rn → R be defined as follows

q (x) = x ′Ax + cx + c0,

where A ∈ Rn×n, c ∈ Rn e c0 ∈ R

The function q is a (complete) quadratic form

The function q can be rewritten as follows

q (x) =

n∑

i=1

n∑

j=1

aijxixj +

n∑

j=1

cjxj + c0.

Whene c0 and c are null, then the function

q (x) = x ′Ax =

n∑

i=1

n∑

j=1

aijxixj

is said homogeneous quadratic form or simply quadratic formM. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 39 / 79

Quadratic forms

Remark A function f : Rn → Rm is homogeneous of degree k if

f (αx) = αk f (x) , ∀α ≧ 0.

A linear transformation is homogeneous of degree 1

A quadratic form is homogeneous of degree 2.

Let q (x) = x ′Ax ,A ∈ Rn×n be a quadratic form.

The quadratic form x ′Bx with B = 12 (A+ A′) is equivalent to q, in fact

x ′Ax = x ′Bx , ∀x ∈ Rn

Moreover, there is a one-to-one relation between quadratic forms and

symmetric matrices


Quadratic forms – Classification

A quadratic form is (see examples.sce)

positive definite if

x 6= [0] ⇒ x ′Ax > 0;

negative definite if

x 6= [0] ⇒ x ′Ax < 0;

semi-positive definite if

x 6= [0] ⇒ x ′Ax ≧ 0, ∃x 6= [0] : x ′Ax = 0;

semi-negative definite if

x 6= [0] ⇒ x ′Ax ≦ 0, ∃x 6= [0] : x ′Ax = 0;

indefinite if it can assume both positive and negative values

∃x1, x2 ∈ Rn :(x1)′Ax1 > 0,

(x2)′Ax2 < 0.


Quadratic forms – Classification (test 1)

Let A 6= [0] be a symmetric real matrix with eigenvalues λ1, λ2, . . . , λn.

Then

A is positive definite ⇔ λj > 0,∀j ,

A is negative definite ⇔ λj < 0,∀j ,

A is positive semi-definite ⇔ λj ≧ 0,∀j , and ∃h ∈ {1, . . . , n} : λh = 0,

A is negative semi-definite ⇔ λj ≦ 0,∀j , and ∃h ∈ {1, . . . , n} : λh = 0,

A is indefinite ⇔ ∃h, k ∈ {1, . . . , n} : λh > 0, λk < 0.


Quadratic forms – Classification (test 2)

Let A 6= [0] be a symmetric real matrix, then A is

positive definite ⇔ all its n leading minors are positive

negative definite ⇔ its n leading minors have signs {−,+,−,+, · · · },

the determinant should be negative

positive semi-definite ⇔ all its (2n − 1) principal minors are ≧ 0, and

det (A) = 0;

negative semi-definite ⇔ its principal minors of order k are ≧ 0 if k is

even, and ≦ 0 if k is odd, and det (A) = 0;

indefinite in all other cases


Functions of several variables

Consider the function f : Rn ⊇ X → R. If the limit for t → 0 of the

partial differential ratio

f(x + te i

)− f (x)

t,

is finite, it is the partial derivative of f (x) with respect to xi :

∂f (x)

∂xi= lim

t→0

f(x + te i

)− f (x)

t

The (usually row) vector function ∇f : Rn → Rn which collects the

partial derivatives of f is the gradient of f :

∇f =[

∂f (x)∂x1

∂f (x)∂x2

· · · ∂f (x)∂xn

]



The partial derivartive of ∂f (x)∂xi

with respect to xj

∂[∂f (x)∂xi

]

∂xj= lim

t→0

∂f (x+te j)∂xi

− ∂f (x)∂xi

t,

when finite, is the 2◦ order partial derivative of f with respect to xi

and∂2f (x)

∂xi∂xj

The square n× n matrix which collects the 2◦ order partial derivatives of f

Hf (x) =

[∂f (x)

∂xi∂xj

]

,

is the Hessian matrix



A scalar function f (x), defined on X ⊆ Rn, belongs to the C0 class

(f (x) ∈ C0, if f (x) is continuous x ∈ X

It belongs to Ck class, with k ≥ 1 and inger (f (x) ∈ Ck), if f (x) is

continuous ∀x ∈ X , together with all its partial derivatives of order

1, 2, . . . , k .

Theorem (Schwarz)

If the scalar function f (x), with x ∈ X ⊆ Rn belongs to C2, then its

Hessian matris is symmetric:

Hf (x) = (Hf (x))′ .



Consider a vector function of several variables f : Rn → Rm

x ∈ Rn, f (x) =

f1(x)

f2(x)...

fm(x)

∈ Rm

Partial derivatives are defined on every component

∂fj (x)

∂xi, i = 1, . . . , n, j = 1, . . . ,m

The matrix which collects all the first partial derivative is the Jacobian

matrix

Jf (x) =

∇f1 (x)...

∇fm (x)

=

[∂fi (x)

∂xj

]



Definition

A function f : R ⊇ X → R is differentiable in x◦ ∈ int(X ) if a number m

exists (depending on f and x◦ only) such that

f (x◦ + h)− f (x◦) = mh + o (h)

where o (h) is infinitesimal of higher order with respect to h (that is

limh→0

o(h)h

= 0)

Definition

A function f : Rn ⊇ X → R is differentiable in x◦ ∈ int(X ) if there exists

a vector a ∈ Rn (depending on f and x◦ only) such that

f (x◦ + h)− f (x◦) = ah+ o (‖h‖)



Definition

A function f : Rn ⊇ X → Rm is differentiable in x◦ ∈ int(X ) if there

exists a matrix M ∈ Rm×n such that

f (x◦ + h)− f (x◦) = Mh+ o (‖h‖)

Definition

A function is said differentiable on the open set X if it is differentiable

on every point of X

Theorem

If f : Rn ⊇ X → R is differentiable in x◦, then in that point

f is continuous, it admits its n partial derivatives and

f (x◦ + h)− f (x◦) = ah+ o (‖h‖) , with a = ∇f (x◦)M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 49 / 79


Theorem (First order Taylor expansion with Peano reminder)

Let X ⊆ Rn be an open set, f : X → R, f ∈ C2 in X and

λx◦, (1− λ)x◦ + h ∈ X , ∀λ ∈ [0, 1]. Then

f (x◦ + h) = f (x◦) +∇f (x◦) h + o (‖h‖) , for h → [0]

Theorem (Second order Taylor expansion with Peano reminder)

Let X ⊆ Rn be an open set, f : X → R, f ∈ C2 in X and

λx◦, (1− λ)x◦ + h ∈ X , ∀λ ∈ [0, 1]. Then

f (x◦ + h) = f (x◦) +∇f (x◦) h+1

2h′Hf (x◦) h + o

(

‖h‖2)

, for h → [0]


Unconstrained optimization

Definition

A point x◦ ∈ X is a maximum (minimum) point for f : Rn ⊇ X → R, if

f (x◦) ≧ (≦) f (x), ∀x ∈ X

It is said strong if the unequality is strict ∀x ∈ X and ∀x 6= x◦

Definition

A point x◦ ∈ X is a local maximum (minimum) point for

f : Rn ⊇ X → R, if a neighborohood I (x◦) exists such that

f (x◦) ≧ (≦) f (x), ∀x ∈ X ∩ I (x◦)

It is said strong if the unequality is strict ∀x ∈ X ∩ I (x◦)∀ and x 6= x◦.

minx

f (x) , with f : Rn → R differentiable

For maximization problems, find the minimum of g (x) = −f (x)M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 51 / 79

Unconstrained optimization – Optimality conditions

Theorem (Fermat)

Let f : Rn ⊇ X → R be differentiable in X . If the vector x◦ ∈ int(X ) is a

local maximum or minimum point for f , then

∇f (x◦) = [0]

The point x◦ is said stationary or critical for f .

The conditions (an n-equations, n-unknowns system)

∇f (x◦) = [0]

are said first order (necessary) conditions (FOC)


(small digression) Convexity

Definition

A set X ⊆ Rn is convex if ∀x1, x2 ∈ X and ∀λ ∈ [0, 1], then[λx1 + (1− λ) x2

]∈ X

Definition

A function f : X → R, defined on a convex set X , is convex (concave) if

∀x1, x2 ∈ X and ∀λ ∈ [0, 1],

f(λx1 + (1− λ) x2

)≦ λf

(x1)+ (1− λ) f

(x2)

(≧ concave)

Definition

It is strictly convex (concave) if ∀x1, x2 ∈ X e ∀λ ∈ (0, 1)

f(λx1 + (1− λ) x2

)< λf

(x1)+ (1− λ) f

(x2)

(> concave)


Unconstrained optimization – Optimality conditions

Theorem

Consider the convex (concave) function f : X → R, with X convex in Rn.

Then

1 Each local minimum (maximum) point is global as well

2 The set of the minimum (maximum) points is convex

3 If f is differentable on the open convex set X , then each stationary

point of f is a global minimum (maximum) point

Remark The FOC conditions are necessary and sufficient for convex

(concave) functions

RemarkIf f is strictly convex (concave), then if x◦ ∈ X is a local

minimum (maximum) point, then it is the unique global strict minimum

(maximum) point


Constrained optimization – Equality constraints

Consider the problem

minx

f (x)

sub h(x) =

h1(x)...

hm(x)

= [0]

objective function f : Rn → R (n variables), f ∈ C2

constraint function h : Rn → Rm (m constraints)

The Lagrangian function is

L (x , λ) = f (x)− λh(x) = f (x)−

m∑

i=1

λihi (x)

(x ∈ Rn column, h(x) ∈ Rm column, λ ∈ Rm row)M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 55 / 79

Constrained optimization – First order conditions

Theorem (First order necessary conditions)

Let x◦ be an interior local minimum or maximum point of f constrained to

h(x) = [0] and the Jacobian Jh (x◦) have rank = m. Then a row vector

λ◦ ∈ Rm exists such that (x◦, λ◦) is stationary for L(x , λ)

∇L (x◦, λ◦) =

∂L∂x1

...

∂L∂xn

∂L∂λ1

...

∂L∂λm

x=x◦,λ=λ◦

= [0]


Constrained optimization – Optimality conditions

Theorem


minx

f (x)

sub h(x) =

h1(x)...

hm(x)

= [0]

If the objective function f is convex (concave for maximum problems) and

the constraint function h is linear (that is each hi is linear ∀i = 1, . . . ,m),

then the stationary points of the Lagrangian function solve the problem.


Constrained optimization – Examples

Solve

minx

f (x) = (x1)2 + (x2)

2

sub h(x) = x1 + x2 = 10


L(x1, x2, λ) = (x1)2 + (x2)

2 − λ (x1 + x2 − 10)

The gradient of the Lagrangian is

∇L(x1, x2, λ) = [2x1 − λ, 2x2 − λ, −x1 − x2 + 10]

The FOC are

2x1 − λ = 0

2x2 − λ = 0

−x1 − x2 + 10 = 0

... M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 58 / 79


...

The stationary point for the Lagrangian is (x∗1 , x∗2 , λ

∗) = (5, 5, 10)

level sets (blue

the constraint

(black)



Solve

minx

f (x) = (x1 + x2)

sub h(x) = (x1)2 + 2 (x2)

2 − 6 = 0


L (x , λ) = x1 + x2 − λ(

(x1)2 + 2 (x2)

2 − 6)

,

the first order conditions are

Lx1 = 1− λ2x1 = 0

Lx2 = 1− λ4x2 = 0

Lλ = −(

(x1)2 + 2 (x2)

2 − 6)

= 0

...M. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 60 / 79


...

from the first two equations, x1 and x2 cannot be null and

λ = 12x1

= 14x2

, therefore x1 = 2x2,

plug this into the third equation

−(

(2x2)2 + 2 (x2)

2 − 6)

= 0

and obtain the two solutions

(x∗, λ∗) =([2, 1] , 14

), (x∗∗, λ∗∗) =

([−2,−1, ] ,−1

4

)

The value of the function is

f (x∗, λ∗) = f([2, 1] , 14

)= 3, f (x∗∗, λ∗∗) = f

([−2,−1, ] ,−1

4

)= −3



minx

(5x1 + 2x2 − x3)

sub :

{

x1x2 − 3 = 0

x1x3 − 1 = 0


L (x , λ) = 5x1 + 2x2 − x3 − λ1 (x1x2 − 3)− λ2 (x1x3 − 1)

and the system ∇L (x , λ) = [0]:

Lx1 = 5− λ1x2 − λ2x3 = 0

Lx2 = 2− λ1x1 = 0

Lx3 = −1− λ2x1 = 0

Lλ1= 3− x1x2 = 0

Lλ2= 1− x1x3 = 0,



...

Its solutions (the Lagrangian’s stationary points) are

(x∗, λ∗) =([1, 3, 1]′ , [2,−1]

), (x∗∗, λ∗∗) =

([−1,−3,−1]′ , [−2, 1]

)

The values of the function in the stationary points are

f (x∗, λ∗) = f([1, 3, 1]′ , [2,−1]

)= 10,

f (x∗∗, λ∗∗) = f([−1,−3,−1]′ , [−2, 1]

)= −10

Remark Despite the function’s values, it’s possible to show that (x∗, λ∗) is

a constrained local minimum, whereas (x∗∗, λ∗∗) is a constrained local

maximum



Find the maximum and minimum of

f (x) = (x1)2 − x1x2 + (x2)

2

subject to

h (x) = (x1)2 + (x2)

2 − 1 = 0


L (x , λ) = (x1)2 − x1x2 + (x2)

2 − λ(

(x1)2 + (x2)

2 − 1)

therefore, the FOC are

Lx1 = 2x1 − x2 − 2λx1 = 0

Lx2 = −x1 + 2x2 − 2λx2 = 0

Lλ = 1− (x1)2 − (x2)

2 = 0.



...

From the first two equations, we get{

2 (1− λ) x1 − x2 = 0

x1 − 2 (1− λ) x2 = 0,

The solutions where x = [0] do not verify the constraint

Remark that the system is linear and homogeneous in x , therefore, we have

non-null solution when the determinant of the coefficient matrix is null

det

([

2 (1− λ) −1

1 −2 (1− λ)

])

= 4λ2 − 8λ+ 3 = 0 ⇒ λ1 =12 , λ2 =

32

with λ1 =12 ⇒ x1 = x2 and substituting into the constraint, we obtain

x1 = x2 = ±√

12

and with λ2 =32 ⇒ x1 = −x2, we obtain x1 = ±

√12 , x2 = −x1



...

the 4 stationary points:

a =

[√12 ,

√12

]′, b = −a, c =

[√12 ,−

√12

]′, d = −c .

The objective function can be written as the quadratic form associated to

the positive definite symmetric matrix (eigenvalues 12 ,

32)

[

1 −12

−12 1

]

The values of the function in the four stationary points are

f (a) = f (b) = 12 , f (c) = f (d) = 3

2 ,

therfeore a and b are minimum points and c e d are maximum pointsM. Maggi (MIBE) Mathematical Methods for Business and Economics a.a. 2010/2011 66 / 79


The level sets are ellipses and the constraint defines a circle

the constraint (black), the level sets, critical points


Constrained optimization – Sensitivity analysis example

Consider the Cobb Douglas production function Q(L,K ) = 20L12K

12 , with

a unitary cost of labour L and capital K of 10 and 4 respectively; the

available budget is 200

Maximize the production under the budget constraint

maxL,K

20L12K

12

sub 10L + 4K = 200


L(L,K , λ) = 20L12K

12 − λ(10L + 4K − 200)

and the FOC

10L−12K

12 = 10λ

10L12K− 1

2 = 4λ

10L+ 4K = 200


Constrained optimization – Sensitivity analysis example

...

dividing the first equation by the second one, we get

10L−12K

12

10L12K− 1

2

=10λ

4λ⇒

K

L=

5

2, ⇒ K =

5

2L

substituting this relation in the constraint, we obtain L∗ = 10, K ∗ = 25

and using the first to solve for λ, we get λ∗ = 5√10

Condider the problem where the budget is b, the optimal value of the

Lagrange multiplier is

λ∗ =dQ∗

db

that is

dQ∗ = λ∗db, ∆Q∗ ≃ λ∗∆b (locally)


Quadratic programming


minx

1

2x ′Qx + p′x quadratic objective function

s. t. C (eq)x = b(eq) m linear equality constraints

Cx ≦ b s linear unequality constraints

l ≦ x ≦ u lower and upper bounds

where:

x ∈ Rn, Q ∈ Rn×n, p ∈ Rn, C (eq) ∈ Rn×m, C ∈ Rn×s

b(eq) ∈ Rm, b ∈ Rs , l ∈ Rn, u ∈ Rn


Quadratic programming – Example (vector notation)

minx

1

2x ′Qx

sub u′x = 1

where

x , u ∈ Rn (column), Q ∈ Rn×n, Q = Q ′, positive definite


L(x , λ) =1

2x ′Qx − λ(u′x − 1)



Remark The gradient of the linear function f (x) = u′x is ∇f (x) = u′

Remark The gradient of the homogeneous quadratic form f (x) = x ′Qx

with Q symmetric, is ∇f (x) = 2x ′Q

The gradient of the Lagrangian is

∇L(x , λ) =[x ′Q − λu′, −u′x + 1

]

FOC[x ′Q − λu′, −u′x + 1

]= [0, . . . , 0]

[(x ′Q − λu′

)′, −u′x + 1

]

= [[0], 0][

Qx − λu

−u′x

]

=

[

[0]

−1

]



...

Rewrite FOC in this way

[

Qx − λu

−u′x

]

=

[

[0]

−1

]

=

[

Q −u

−u′ 0

] [

x

λ

]

=

[

[0]

−1

]

If the matrix Q is invertible, then the matrix

[

Q −u

−u′ 0

]

is invertible as

well, therefore the stationary point is

[

x∗

λ∗

]

=

[

Q −u

−u′ 0

]−1 [

[0]

−1

]


Modelling examples

Collect the n realizations xi , i = 1, . . . , n of the random variable X in the

vector x ∈ Rn

Sample mean E [X ] = 1n

∑ni=1 xi =

1n1′x , where 1 is the column unitary

vector

Sample variance

Var(x) =1

n

n∑

i=1

(xi − E [X ])2 =1

n(x − E [X ]1)′ (x − E [X ]1) =

=1

n

(x ′x − x ′E [X ]1− E [X ]1′x + E [X ]21′1

)=

=1

n

(x ′x − 2E [X ]nE [X ] + nE [X ]2

)=

1

nx ′x − E [X ]2 =

1

nx ′x −

(1

n1′x

)2


Modelling examples

Let x , y ∈ Rn collect the n realizations of the random variables X and Y

Sample covariance betwen X and Y

Cov(X ,Y ) = E [(X − E [X ]) (Y − E [Y ])] =1

n

n∑

i=1

(xi − E [X ]) (yi − E [Y ]) =

= (x − E [X ]1)′ (y − E [Y ]1)

=1

n

(x ′y − E [Y ]x ′1− E [X ]1′y + E [X ]E [Y ]1′1

)

=1

n

(x ′y − E [Y ]nE [X ]− E [X ]nE [Y ] + nE [X ]E [Y ]

)=

1

nx ′y − E [X ]E [Y ]

=1

nx ′y −

1

n2(1′x)(1′y)

Remark that Cov(X ,Y ) = Cov(Y ,X )


Modelling examples

Linear regression

yi = β0 + β1x1i + β2x

2i + · · · + βnx

ni + εi , i = 1, . . . ,T

observations yi , xji , i = 1, . . . ,T , j = 1, . . . , n

parameters βj , j = 1, . . . , n

errors εi , i = 1, . . . ,T , mean zero: 1T

∑Ti=1 εi = 0

Matrix notation

Y =

y1...

yT

, X =

x11 . . . xn1...

. . ....

x1T . . . xnt

, β =

β1...

βn

, ε =

ε1...

εT

Y = β01+ Xβ + ε


Modelling examples

A more common matrix notation

Y =

y1...

yT

, X =

1 x11 . . . xn1...

.... . .

...

1 x1T . . . xnt

, β =

β0

β1...

βn

, ε =

ε1...

εT

Y = Xβ + ε

Minimize the sum of the squares of the errors

T∑

i=1

ε2i = ε′ε = ‖ε‖2 = ε′Iε

Sample errors

ε︸︷︷︸

erors

= Y︸︷︷︸

reality

− Xβ︸︷︷︸

model


Modelling examples

The objective function to be minimized is

ε′ε = (Y − Xβ)′(Y − Xβ) = Y ′Y − Y ′Xβ − β′X ′Y + β′X ′Xβ =

= Y ′Y − 2Y ′Xβ + β′X ′Xβ

complete quadratic form in β

The matrix X ′X ∈ R(n+1)×(n+1) is

1 symmetric (X ′X )′ = X ′X

2 positive definite or semi-definite:

q(z) = z ′(X ′X

)z =

(z ′X ′) (Xz) = ‖Xz‖2 ≧ 0,∀z ∈ Rn+1

3 rk (X ′X ) = rk(X )


Modelling examples

The gradient of the objective function is

2β′ (X ′X)− 2Y ′X

The FOC are(X ′X

)β = X ′Y

If X ′X is invertible (⇔ rk(X ) = n + 1), then

β∗ =(X ′X

)−1X ′Y

This β∗ is commonly known as ordinary least square (OLS) parameter

estimate in linear regression models


mathematical methods for business and economics · m. maggi (mibe) mathematical methods for...

Documents