csci 1951-g optimization methods in finance part 01

CSCI 1951-G – Optimization Methods in FinancePart 01:

Linear Programming

January 26, 2018

1 / 38

Liability/asset cash-flow matching problemRecall the formulation of the problem:

max w

c1 + p1 − e1 = 150

c2 + p2 + 1.003e1 − 1.01c1 − e2 = 100

c3 + p3 + 1.003e2 − 1.01c2 − e3 = −200c4 − 1.02p1 − 1.01c3 + 1.003e3 − e4 = 200

c5 − 1.02p2 − 1.01c4 + 1.003e4 − e5 = −50− 1.02p3 − 1.01c5 + 1.003e5 − w = −3000 ≤ ci ≤ 100, i ≤ i ≤ 5

pi > 0, 1 ≤ i ≤ 3

ei ≥ 0, 1 ≤ i ≤ 5

This is a Linear Programming (LP) problem:• the variables are continuous;• the constraints are linear functions of the variables;• the objective function is linear in the variable.

2 / 38

Linear Programming

Linear Programming is arguably the best known and mostfrequently solved class of optimization problems.

Definition (Linear Program)

A generic LP problem has the following form:min cTx

aTx = b for (a, b) ∈ EaTx ≥ b for (a, b) ∈ I

where• x ∈ Rn is the vector of decision variables;

• c ∈ Rn, the costs vector, defines the objective function;

• E and I are sets of pairs (a, b) where a ∈ Rn, and b ∈ R.Any assignment of values to x is called a solution. A feasible solutionsatisfies the constraints. The optimal solution is feasible andminimize the objective function.

3 / 38

Graphical solution to a linear optimization problem

Let’s solve:

max 500x1 + 450x2

x1 +5

6x2 ≤ 10

x1 + 2x2 ≤ 15

x1 ≤ 8

x1, x2 ≥ 0

4 / 38

Standard form and slack variables

More o�en, we will express LP in the standard form

Standard Form

min cTx

Ax = b

x ≥ 0where A is a n× n matrix, and b ∈ Rn.

The standard form is not restrictive:• we can rewrite inequality constraints as equalities by introducing

slack variables;

• maximization problems can be wri�en as minimization problemsby multiplying the objective function by −1.

• variables that are unrestricted in sign can be expressed as thedi�erence of two new non-negative variables.

5 / 38

Standard form and slack variables (cont.)How do we express the following problem in standard form?

Non-standard form

max x1 + x2

2x1 + x2 ≤ 12

x1 + 2x2 ≤ 9

x1 ≥ 0, x2 ≥ 0

Standard form

min − x1 − x22x1 + x2 + z1 = 12

x1 + 2x2 + z2 = 9

x1 ≥ 0, x2 ≥ 0, z1 ≥ 0, z2 ≥ 0

The zi are slack variables: they appear in the constraints, but not inthe objective function.

6 / 38

Lower bounds to the optimal objective value

We would like to compute an optimal solution to a LP. . .

But let’s start with an easier question:

�estion

How can we find a lower bound to the objective function value for afeasible solution?

Intuition for the answer

The constraints provide some bounds on the value of the objectivefunction.

7 / 38

Lower bounds (cont.)Consider

min − x1 − x22x1 + x2 ≤ 12

x1 + 2x2 ≤ 9

x1 ≥ 0, x2 ≥ 0

�estion

Can we use the first constraint to give a lower bound to −x1 − x2 forany feasible solution?

Answer (Yes!)

Any feasible solution must be such that−x1 − x2 ≥ −2x1 − x2 ≥ −12 .

We leverage the fact that the coe�icients in the constraint(multiplied by -1) are lower bounds to the coe�icients in theobjective function.

8 / 38

Lower bounds (cont.)

The second constraint gives us a be�er bound:

−x1 − x2 ≥ −x1 − 2x2 ≥ −9

�estion

Can we do even be�er?

Yes, with a linear combination of the constraints:

−x1 − x2 ≥ −1

3(2x1 + x2)−

1

3(x1 + 2x2) ≥ −

1

3(12 + 9) ≥ −7

No feasible solution can have an objective value smaller than -7.

9 / 38

Lower bounds (cont.)

General strategy

Find a linear combination of the constraints such that• the resulting coe�icient for each variable is no larger than the cor-

responding coe�icient in the objective function; and

• the resulting lower bound to the optimal objective value is maxi-mized.

In other words, we want to find the coe�icients of the linearcombinations of constraints that satisfy the new constraints on thecoe�icients of the variables, and maximize the lower bound.

This is a new optimization problem: the dual problem.The original problem is known as the primal problem.

10 / 38

Dual problem

�estion

What is the dual problem in our example?

Answer

max 12y1 + 9y2

2y1 + y2 ≤ −1y1 + 2y2 ≤ −1y1, y2 ≤ 0

In general:

Primal problem

min cTx

Ax ≤ bx ≥ 0

Dual problem

max bTy

ATy ≤ cy ≤ 0

Dual problem (min.form)

min − bTyATy ≤ cy ≤ 0

11 / 38

Weak duality

Theorem (Weak duality)

Let x be any feasible solution to the primal LP and y be any feasiblesolution to the dual LP. Then

cTx ≥ bTy .

Proof.

We havecTx− bTy = cTx− yTb = cTx− yTAx = (c−ATy)Tx > 0

where the last step follows from x ≥ 0 and (c−ATy) ≥ 0(why?).

Definition (Duality gap)

The quantity cTx− bTy is known as the duality gap.

12 / 38

Strong duality

Corollary

If:• x is feasible for the primal LP’;

• y is feasible for the dual LP; and

• cTx = bTy;then x must be optimal for the primal LP and y must be optimal forthe dual LP.

The above gives a su�icient condition for optimality of a primal-dualpair of feasible solutions. This condition is also necessary.

Theorem (Strong duality)

If the primal (resp. dual) problem has an optimal solution x (resp. y),then the dual (resp. primal) has an optimal solution y (resp. x) suchthat cTx = bTy.

13 / 38

Complementary slackness

How can we obtain an optimal solution to the dual problem from anoptimal solution to the primal (and viceversa)?

Theorem (Complementary slackness)

Let x be an optimal solution for the primal LP and y be an optimalsolution to the dual LP. Then

yT(Ax− b) = 0 and

(cT − yTA)x = 0

Proof.

Exercise.

14 / 38

Algorithms for solving LP

Due to the great relevance of LP in many fields, a number ofalgorithms have been developed to solve LP problems.Simplex Algorithm: Due to G. Dantzig in 1947, it was abreakthrough and it is considered “one of the top 10 algorithms ofthe twentieth century.” It is an exponential time algorithm, butextremely e�icient in practice also for large problems.

Ellipsoid Method: by Yudin and Nemirovski in 1976, it was thefirst polynomial time algorithm for LP. In practice, the performanceis so bad that the algorithm is only of theoretical interest, and evenso, only for historical purposes.

Interior-Point Method: by Karmarkar in 1984, it is the firstpolynomial time algorithm for LP that can solve some real-worldLPs faster than the simplex.

We now present and analyze the Simplex Algorithm, and willdiscuss the Interior-Point Method later, in the context of �adraticProgramming.

15 / 38

Roadmap

1 Look at the geometry of the LP feasible region

2 Prove the existence of an optimal solution that satisfy a specificgeometrical condition

3 Study what solutions that satisfy the condition look like

4 Discuss how to move between solutions that satisfy the condition

5 Use these ingredients to develop an algorithm

6 Analyze correctness and running time complexity

16 / 38

Convex polyhedra

Definition (Convex polyhedron)

A convex polyhedron P is the solution set of a system of m linearinequalities:

P = {x ∈ Rn : Ax ≥ b}A is m× n, b is m× 1.

Fact

The feasible region of an LP is a convex polyhedron.

Definition (Polyhedron in standard form)

P = {x ∈ Rn : Ax = b, x ≥ 0}

A is m× n, b is m× 1.

17 / 38

Extreme points and their optimality

Definition (Extreme point)

x ∈ P is an extreme point of P i� there exist no distincty, z ∈ P, λ ∈ (0, 1) s.t. x = λy + (1− λ)z.

Theorem (Optimality of extreme points)

Let P ⊆ Rn be a polyhedron and consider the problem minx∈P cTx

for a given c ∈ Rn. If P has at least one extreme point and there existsan optimal solution, then there exists an optimal solution that is anextreme point.

Proof.

Coming up.

18 / 38

Proof of optimality of extreme points

v: optimal objective valueQ: set of optimal solutions,

Q = {x ∈ Rn : Ax = b, x ≥ 0, cTx = v} ⊆ P

Fact

Q is a convex polyhedron.

Fact

Since Q ⊆ P and P has an extreme point, Q must have an extremepoint.

Let x∗ be an extreme point of Q.We now show that x∗ is an extreme point in P .

19 / 38

Proof of optimality of extreme points (cont.)

Let’s show that x∗ (extreme point of Q) is an extreme point of P .• Assume x∗ is not an extreme point of P , i.e.,∃ y, z ∈ P, y 6= x∗, z 6= x∗, λ ∈ (0, 1) s.t. λy + (1− λ)z = x∗.

• How can we write cTx∗? cTx∗ = λcTy + (1− λ)cTz.

• Let’s compare cTx∗ and cTy and cTx∗ and cTz. It must becTy ≥ cTx∗ or cTz ≥ cTx∗. Suppose cTy ≥ cTx∗.

• It must also be cTy = cTx∗. Why? Because x∗ is an optimalsolution: cTy = cTx∗ = v.

• But then what about cTz ? cTz = v.

• What does cTy = cTz = v imply? They must belong to Q.

• Does this contradict the hypothesis? We found y and z in Q s.t.x∗ = λy + (1− λ)z, so x∗ is not an extreme point of Q.

• We reach a contradiction, thus x∗ is an extreme point in P .QED

20 / 38

Extreme points and consequences

Theorem

Every convex polyhedron in standard form has an extreme point.

Corollary

Every LP with a non-empty feasible region P• either has optimal value −∞;

• or there exists an optimal solution at one of the extreme points of P .

21 / 38

Algorithmic idea and questions

Algorithmic idea

Find an initial extreme point x.While x is not optimal:

x← a di�erent extreme point.Output x

There may be many extreme points, so not the best algorithm?

We must answer the following questions:

• What is the algebraic form of extreme points?

• How do we move among extreme points?

• How do we avoid revisiting the same extreme point / ensure thatwe find all of them if necessary?

• How do we know that an extreme point is optimal?

• How do we find the first extreme point?

22 / 38

Basic solutions

Assume from now on that the m rows of A are linearly independent.Consider a set of m columns of A = [a1 · · · an] that are linearlyindependent. Let r1, . . . , rm be their indices. Let B = [ar1 · · · arm ].Dimensions of B are m×m.Permute the columns of A to obtain [B N ]. Consider the samepermutation for the rows of x to obtain [xTB xTN ]T.

Ax = b ⇔ [B N ]

[xBxN

]= b

⇔ BxB +NxN = b

⇔ xB = B−1b−B−1NxN

Why is B invertible? It’s a square matrix with full rank.What’s an easy solution to the last equation? xN = 0 andxB = B−1b.A solution obtained this way is known as a basic solution.

23 / 38

Basic solutions

To obtain a basic solution:

1 Choose m linearly independent columns of A to obtain B. Thecolumns in B are known as the basic columns.

2 Split the variables x into basic variables xB and non-basic variablesxN .

3 Set the non-basic variables xN to zero.

4 Set xB = B−1b.

Is a basic solution x always feasible? No. It must also be x ≥ 0!(i.e., xB ≥ 0).

If x is basic and feasible, is a Basic Feasible Solution (BFS).

24 / 38

Basic feasible solutions and extreme points

Theorem

Each BFS corresponds to one and only one extreme point of P .

Proof: (only that a BFS x must be an extreme point of P).

• Assume that x is not an extreme point of P . What does it mean?∃ distinct y, z ∈ P and λ ∈ (0, 1) s.t. x = λy + (1− λ)z.

• What about xB and yB , zB , and xN and yN , zN ?0 = xN = λyN + (1− λ)zN

It must be. . .yN = zN = 0, because y, z ∈ P , so y, z ≥ 0.

• What about xB and yB and zB?xB = B−1b but also must be ByB = b and BzB = bi

so it must be xB = yB = zB .

• What’s the contradiction? We did not find distinct y and z to ex-press x.

25 / 38

Corollary

If P is not empty,• either the optimal value is −∞• or there is a BFS . . . that is optimal.

BFSs are “algebraic” representations of the “geometric” extremepoints.

Algorithmic idea

Find an initial BFS x.While x is not optimal:

x← a di�erent BFS.Output x

26 / 38

Adjacent BFS and directions

Definition (Adjacent BFSs)

Two BFSs are adjacent i� their basic matrices di�er in only one basiccolumn.

We want to move between adjacent BFSs because they look“similar”. How do we move among adjacent bfs?

Definition (Feasible direction)

Let x ∈ P . A vector d ∈ Rn is a feasible direction at x if there existsθ > 0 such that x+ θd ∈ P .

Definition (Improving direction)

Consider the LP minx∈P cTx. A vector d ∈ Rn is a improving

direction if cTd < 0.

Starting from a BFS, we want to find a feasible improving directiontowards an adjacent BFS.

27 / 38

Basic (feasible) directions

Fact

Any direction that is feasible at a BFS x must have a stricly positivevalue in at least one of the components corresponding to non-basicvariables of x (why?).

We move in a basic direction d = [dB dN ] that has a strictlypositive value in exactly one of the components corresponding tonon-basic variables, e.g., xj .We fix dN so that

dj = 1

di = 0 for every non-basic index i 6= j

AnddB = −B−1aj .

The choice for dB comes from the requirement Ad = 0 for anyfeasible direction d (why?).

28 / 38

Reduced costs

How do we know if the chosen basic feasible direction is improving?

Definition (Reduced cost)

Let x be a basic solution with basis matrix B, and let cB be thevector of the costs of the basic variables. For each 1 ≤ i ≤ n, thereduced cost ci of variable xi is

ci = ci − cTBB−1ai .

Fact

The basic direction associated with xj is improving i� cj < 0.

Why? (Hint: cB = 0 (why?))

29 / 38

Optimality condition

Definition (Degenerate bfs)

Let x be a bfs. x is said to be degenerate i� |{j : xj > 0}| < m.

Theorem

Let x be a BFS and let c be the corresponding vector of reduced costs.• if c ≥ 0, then x is optimal.

• if x is optimal and nondegenerate, then c ≥ 0.

30 / 38

Step size

• Let d be a basic, feasible, improving direction from the currentBFS x, and let B be the basis matrix for x.

• We want to find move along d to find a BFS x′ adjacent to x thatcan be expressed as x′ = x+ θd for some value θ ∈ R.

How to compute the step size θ?We know that at x′, one of the basic variables in x will be 0.We want to find the maximum θ such that

xB + θdB ≥ 0 .

Let u = −dB = B−1aj . The value of a basic variable xri changes,along d, at a “rate” of xri/uri . Hence we want

θ = minri : uri>0

xriui

.

The basic variable xri that a�ains the minimum is the one thatleaves the BFS (it is a non-basic variable at x′).

31 / 38

Bland’s rule

• What non-basic variable should enter the BFS?

• If more than one basic variable could leave (i.e., more than onebasic variable a�ains the minimum that gives θ), which oneshould leave?

• Why does all this ma�er?Without an accurate choice of the entering/exiting variables, thealgorithm may cycle forever at the same degenerate BFS.

Bland’s anticycling rule (eliminates the risk of cycling)

• Among all non-basic variables that can enter the basis, choose theone with the minimum index.

• Among all basic variables that can exit the basis, choose the onewith the minimum index.

32 / 38

The Simplex algorithm

1 Start from a bfs x, with basic variables xr1 , . . . , xrm and basismatrix B.

2 Compute the reduced costs cj = cj − cTBB−1aj for 1 ≤ j ≤ n. Ifc ≥ 0, then x is optimal.

3 Use Bland’s rule to select the entering variable xj and obtainu = −dB = B−1aj by solving the system Bu = aj . If u ≤ 0,then the LP is unbounded.

4 Compute the step size θ = minri : uri>0xriuri

.

5 Determine the new solution and the leaving variable xri , usingBland’s rule.

6 Replace xri with xj in the list of basic variables.

7 Go to step 1.

33 / 38

Runtime

Fact

There are at most(nm

)basic (feasible) solutions.

Why?

There are artifical LPs where the simplex visits all BFSRuntime is not polynomial.

With distribution assumptions on the input, the simplex performsO(n4) iterations, each taking time O(nm).

For real-world instances, the average is ≤ 3(n−m) steps.

34 / 38

Finding the first BFS

The simplex algorithm needs a BFS to start. How do we find it?

How to find a feasible solution s.t. xi = 0?

Set the objective function to be just xi, and find the optimalsolution to this problem.

If the optimal value is 0 then we found a feasible solution withxi = 0, otherwise there is no such feasible solution.

If we want multiple variables set to 0, set the objective function tobe to the sum of the variables that should be 0.

35 / 38

Two-phases simplex algorithm

Two-phases simplex algorithm

1 introduce artificial variables (not the same as slack variables);

2 easily find initial BFS involving the artificial variables;

3 Set objective function to be the sum of the artificial variables;

4 Run the simplex algorithm on the modified LP;

5 If the optimal value is 0, we obtain a BFS for the original LP. Oth-erwise, the original LP has no feasible solution;

6 If we found an inital BFS for the original problem, run the simplexalgorithm on it.

This procedure is known as the Two-phases simplex algorithm, withsteps 1-5 being Phase 1 and step 6 being Phase 2.

36 / 38

Artificial variables and initial BFS

Introducing the aritificial variables

• Start with a LP in standard form.

• Add to the LHS of each constraint an artificial variable with signequal to the sign of the RHS.

Initial BFS (for the artificial LP)

A BFS with the artificial variables being the basic variables, withvalues equal to the RHS of their constraints (non-artificial variablesare (non-basic) and set to 0).

We can now solve the artificial LP and find a BFS for the original LP,if it exists.

37 / 38

Dual Simplex

Through strong duality, we can compute the dual optimal solutiondual optimal solution a�er having solved the primal optimalsolution with the simplex algorithm.

Observation

Solve the primal by solving the dual with the simplex, and thenobtain a primal optimal solution from the dual optimal solution.

The advantage is that if c ≥ 0, then y = 0 is dual feasible, so we canavoid Phase 1, and go directly to finding the optimal solution.

38 / 38

csci 1951-g optimization methods in finance part 01

Documents