university of colorado denver, fall 2011 alexander engauaengau/math5593/lecturenotes.pdf · math...

Math 5593 Linear Programming Lecture Notes

University of Colorado Denver, Fall 2011

Alexander Engau

Abstract

On October 14, 1975, the Royal Swedish Academy of Sciences decided to award the Nobel Prize inEconomics in equal shares to Leonid Kantorovich (USSR) and Tjalling C. Koopmans (USA) “fortheir contributions to the theory of optimum allocation of resources” which Kantorovich describesin his later autobiography as “the formulation of the basic economic problems, their mathemat-ical form, a sketch of the solution method, and the first discussion of its economic sense.” Heconcludes that “in essence, it contained the main ideas of the theories and algorithms of linearprogramming.” This course will explore this award-winning and truly international discipline andprovide students with all necessary theoretical and problem-solving skills to successfully modeland optimize their own favorite LP application including but not limited to managerial operationsresearch (transportation, scheduling, planning), statistics (regression), finance (portfolio selectionand option pricing), engineering and structural design, and even game theory (poker)! Followingthe historical developments in the field, the course will start out to discuss the simplex method andsome of its variants, then delve into duality theory and convex analysis, take a break and do somenetworking, and eventually witness the “interior-point revolution” of optimization. What soundsfun on paper will be fun in the classroom and include rigorous proofs and the use of freely availableand state-of-the-art mathematical programming and optimization software.

UCD Mathematical Sciences Catalogue Course Description

A linear program is an optimization problem that seeks to minimize or maximize a linear func-tion subject to a system of linear inequalities and equations. This course begins with examplesof linear programs and variations in their representations. Basic theoretical foundations coveredinclude polyhedra, convexity, linear inequalities and duality. Two classes of solution algorithms aregiven: simplex methods and interior point methods. The primary emphasis of this course is onmathematical foundations, and applications are used to illustrate the main results.

Important Note Regarding These Notes

These lecture notes were first put together in 2009 when I taught the course for the firt time, andinevitably they will still include many mistakes. Therefore, please be very critical and always askif in doubt. Especially if you find a mathematical mistake or believe explanations are missing,unclear, or wrong, I would be very thankful if you let me know personally or by email at [email protected]. If you find one of the many typographical or grammar mistakes, smileabout my mistake and don’t worry – I plan to do quite a bit of rewriting and other revisionsthroughout this run and later editions of the course, and hope that most current typos will vanish“quasi-automatically” over time while I am introducing new ones. Thank you, and enjoy!

Tentative Schedule and Assignments

Week Topics Book Notes Homework

Part 0. Introduction (please read Chapter 1 in the AMPL Book in detail)

1 Introduction 1 0-1 Problem Set 1

Part 1a. The Simplex Method

2 The Simplex Method 2 2.1-2.23 Degeneracy 3 2.3-2.4 Problem Set 2

Part 1b. Basic Duality Theory

4 Duality Theory I 5 3.1-3.25 Duality Theory II 5 3.3-3.4 Problem Set 3

Part 1c. Simplex Revisited

6 Simplex Method in Matrix Notation 6 4.17 Sensitivity and Parametric Analysis 7 4.2 Problem Set 48 Problems in General Form 9 4.3 Midterm Exam

Part 2a. LP Applications (please also scan book chapters 11-13 and 23-24)

9 Convex Analysis 10 5 Problem Set 5

Part 2b. Network-Type Problems

10 Network Flow Problems 14 6.1-6.211 Applications 15 6.3-6.4 Problem Set 6

Part 2c. An Engineering Problem

12 Structural Optimization 16 7 not on exam

Part 3. Interior-Point Methods

13 The Affine-Scaling Method 21 8-914 Primal-Dual Methods 18 (22) 10 Problem Set 7

15 Student Presentations on LP Applications and Extensions

16 Finals Week Final Exam

Book and Required Software (no purchases necessary): Robert J. Vanderbei, Linear Pro-gramming – Foundations and Extensions, International Series in Operations Research & Manage-ment Science, Volume 114, Third Edition, Springer, 2008, ISBN: 978-0-387-74387-5.

• Auraria Library Online Edition (free, all book chapters downloadable in pdf format)Connect to http://0-dx.doi.org.skyline.ucdenver.edu/10.1007/978-0-387-74388-2

• Accompanying Book Web Page: http://www.princeton.edu/∼rvdb/LPbook/

• AMPL Software: http://www.ampl.com/ (please download the free student edition)

• Learning AMPL: http://ampl.com/BOOK/ch1-2.pdf (chapter 1 of the AMPL book)

• LATEX Editor and Compiler: Good (and free) choices for Windows are MiKTeX(http://miktex.org/) and TeXnicCenter (http://www.texniccenter.org/)

2

http://0-dx.doi.org.skyline.ucdenver.edu/10.1007/978-0-387-74388-2

http://www.princeton.edu/~rvdb/LPbook/

http://www.ampl.com/

http://ampl.com/BOOK/ch1-2.pdf

http://miktex.org/

http://www.texniccenter.org/

Table of Contents

Introduction 6

0 Mathematical Modeling and Optimization 7

1 LP Applications and Models 9

1.1 Production and Allocation Problems (Maximizing Profits) . . . . . . . . . . . . . . . 9

1.2 Diet and Blending Problems (Minimizing Costs) . . . . . . . . . . . . . . . . . . . . 11

1.3 The General Linear Programming Problem . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Problem Set 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

I The Simplex Method and Duality 15

2 The Simplex Method 16

2.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Pivoting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Basic and Nonbasic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Largest Coefficients and Minimum Ratios . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Cycling and Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.4 Bland’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Phase-I/Phase-II Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 The Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5 Problem Set 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Duality Theory 31

3.1 An (Old) Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Primal and Dual Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 The Dual and Primal-Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.1 Dual Pivoting Rules and the Dual Simplex Method . . . . . . . . . . . . . . . 37

3.3.2 The Primal-Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4 The General Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Problem Set 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Simplex Variants 42

4.1 The Simplex Method in Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Sensitivity and the Parametric Simplex Method . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Ranging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2.2 The Homotopy Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2.3 The (Parametric) Self-Dual Simplex Algorithm . . . . . . . . . . . . . . . . . 50

4.3 The Primal Simplex Method with Ranges . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Problem Set 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3

4 TABLE OF CONTENTS

II LP Applications 55

5 Convex Analysis 565.1 Problem Set 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Network Flow Problems 616.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.1.1 The Transportation Problem (HW 2.1) . . . . . . . . . . . . . . . . . . . . . 616.1.2 The Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.1.3 The Transshipment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.2 Some Network (not so much graph) Theory . . . . . . . . . . . . . . . . . . . . . . . 646.3 The (Primal) Network Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . 666.4 Shortest Paths and Maximum Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.4.1 The Shortest-Path Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.4.2 The Maximum-Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.5 Problem Set 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Structural Optimization 75

8 Student Projects 788.1 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788.2 Statistical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.3 Financial Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

III Interior-Point Methods 88

9 The Interior-Point Revolution 89

10 The Affine-Scaling Method 9610.1 A Generic Linear Programming Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9610.2 The Affine-Scaling Step Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9710.3 Termination and Phase-II Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9810.4 Initialization and Phase-I Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9910.5 Affine-Scaling for LPs with Inequality Constraints . . . . . . . . . . . . . . . . . . . 101

11 Primal-Dual Methods 10211.1 The Primal-Dual Path-Following Method . . . . . . . . . . . . . . . . . . . . . . . . 102

11.1.1 The Central Path and Path-Following Algorithm . . . . . . . . . . . . . . . . 10311.1.2 Newton Steps and KKT Systems . . . . . . . . . . . . . . . . . . . . . . . . . 10411.1.3 Path-Following Versus Affine-Scaling . . . . . . . . . . . . . . . . . . . . . . . 105

11.2 Homogeneous Self-Dual Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . 10611.3 Problem Set 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Selected References 109

A Solutions 111

List of Figures

1 The five-stage process of mathematical problem solving . . . . . . . . . . . . . . . . 7

1.1 Feasible region and objective level curves of the production model . . . . . . . . . . 10

2.1 The Rolling Mill Problem Linear Program . . . . . . . . . . . . . . . . . . . . . . . . 192.2 The Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 The Simplex Algorithm in Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . 454.2 The (Parametric) Self-Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . 51

6.1 A transportation network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.2 An assignment network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3 A transshipment network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.4 A transshipment network with an highlighted initial feassible spanning tree . . . . . 676.5 The new feasible spanning tree with updated node and arc labels . . . . . . . . . . . 676.6 The optimal spanning tree with updated node and arc labels after redirecting 5 units

of flow along the (undirected) cycle C = {(3, 4), (6, 4), (6, 2), (1, 2), (1, 3)} (minimumcost is 53) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.7 The shortest-path network used for the examples in Section 6.4.1 . . . . . . . . . . . 696.8 The two shortest-path spanning trees of the examples in Section 6.4.1 with optimal

node labels (the two alternatives are indicated by the dashed arcs) . . . . . . . . . . 706.9 Six iterations of Dijkstra’s algorithm for finding a shortest path . . . . . . . . . . . . 716.10 A network with positive arc capacities uij > 0 . . . . . . . . . . . . . . . . . . . . . . 72

A.1 A polyhedral cone (left), an ice-cream cone (middle), and a (projected) psd cone(right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5

Part

Introduction

6

Chapter 0

Mathematical Modeling andOptimization

Although the fundamentals of optimization can be traced far back to the works of Newton (1643-1727), Lagrange (1736-1813), and Gauss (1777-1855), first independent developments in optimiza-tion started in the late nineteen thirties when first Kantorovich (1960) in 1939 and then Dantzig(1951) in 1947 presented what is today known as the simplex method for linear programming (LP),giving birth to the new field of optimization usually referred to as mathematical programming andsince then grown to a mature area within the broad discipline of operations research. Originallymotivated from real-life problems in production, manufacturing, and military operations, it seemsvaluable to precede our discussion of the theory and methods of linear programming with a fewwords on the general process of mathematical modeling and problem-solving.

Starting from a practical real-life optimization problem, Figure 1 illustrates a typical five-stageprocess of “solving” this problem using mathematical programming and optimization.

Figure 1: The five-stage process of mathematical problem solving

1. Mathematical Modeling: the formulation of a real-life problem as abstract system ofvariables, objectives, and constraints that represent the general form of this problem.

2. Data Collection: the use of statistical methods to collect data and estimate unknownparameters that can be used to define the specific problem instance to be solved.

3. Programming: the development, design, and implementation of general algorithms for aclass of general problems or models to solve specific problem instances of this model.

7

8 CHAPTER 0. MATHEMATICAL MODELING AND OPTIMIZATION

4. Optimization: the process of actually “solving” the problem, i.e., computing an optimalsolution by running an algorithm that finds optimal values for all the decision variables.

5. Analysis: the interpretation and subsequent realization of the found solution in the contextof the real-life problem, together with a (critical) analysis of its practical relevance.

Important Note: Your optimal solution may not solve your actual problem! There areplenty of sources for mistakes, including but not limited to modeling, statistical, and computation-al/numerical errors. If you are not convinced that your optimal solution is meaningful or actuallysolves your problem, always consider to possibly refine your model, to verify and recollect yourdata, or to implement or change to different numerical methods.

Disclaimer For This Course: Unless otherwise stated, all our models, data, and methods areassumed to be flawless. Hence being able to ignore many of the above issues for the rest of thistext, especially Step 5, we will mostly focus on linear programming and optimization (LP/LO)techniques that enable us to perform Steps 3 and 4, respectively. Before we do so, however, weshall look at at least two example applications and their models (you will see some more as part ofyour homework assignments), and already introduce some first terminology along the way.

Chapter 1

LP Applications and Models

The following two sections are taken from Chapters 1 and 2 in the AMPL book by Fourer et al.(2002), the first of which can be downloaded for free from the AMPL page http://www.ampl.com.

1.1 Production and Allocation Problems (Maximizing Profits)

Consider the following “’real-life” problem (the same problem is described in more detail in Chapter1 of Fourer et al. (2002); see also Exercise 1.1 on page 8 in Vanderbei (2008)): A steel companyallocates time on a rolling mill to produce bands and coils and to maximize their profit. To modeland solve this problem, we start to apply the five-stage process from Chapter 0.

1. Mathematical Model: To formulate the model, we first introduce decision variables for theamounts of bands and coils to be produced.

xB number of tons of bands to be produced

xC number of tons of coils to be produced

Next, we need to decide what parameters we want to include into our model.

pB, pC profits in dollars from selling one ton of bands and coils, respectively

rB , rC production rates for each product in tons per hour

uB , uC upper production limits for each product in tons

t available time in rolling mill, in hours

Now we are able to formulate our objective as to maximize profit

maximize profit: pBxB + pCxC

subject to the time (or budget) constraint on the total available time on the rolling mill

1

rBxB +

1

rCxC ≤ t.

Finally, we have production constraints on the produced quantities

0 ≤ xB ≤ uB and 0 ≤ xC ≤ uC .

Hence, the complete mathematical model can be written as to

maximize pBxB + pCxC

subject to1

rBxB +

1

rCxC ≤ t

0 ≤ xB ≤ uB

0 ≤ xC ≤ uC

Note that this is only a model but not yet an actual problem that could be solved for anumerical solution!

9

http://www.ampl.com

10 CHAPTER 1. LP APPLICATIONS AND MODELS

2. Data Collection: To bring the above model to life, we need to collect some relevant dataand, for simplicity, simply use the information already provided in the books by Fourer et al.(2002) and Vanderbei (2008).

Profit ($/tons) Production rates (tons/hr) Production limits (max tons)

Bands 25 200 6,000Coils 30 140 4,000

Available time on mill: 40 hours

With this data, the specific problem instance becomes

maximize 25xB + 30xC

subject to1

200xB +

1

140xC ≤ 40

0 ≤ xB ≤ 6000

0 ≤ xC ≤ 4000

which could now be solved using mathematical programming and/or optimization.

3. Programming: Since we do not have any algorithm yet, we need to be a little creative tosolve this problem. Let’s look at two possible approaches.

The Graphical Method The idea of the graphical method is to study the geometry of theproblem by plotting all constraints and selected objective level curves into a two-dimensionalcoordinate system whose horizontal and vertical axes correspond to the possible amounts ofbands and coils to be produced. For instance, the time constraint

1

200xB +

1

140xC ≤ 40

corresponds to the negative halfspace associated with the hyperplane (which is simply a linein two dimensions)

1

200xB +

1

140xC = 40

or equivalently, in the more common functional notation with xB as independent and xC asdependent variable, to the linear function

xC = 5600 − 0.7xB .

Together with the four halfspaces given by the so-called box constraints 0 ≤ xB ≤ 6000 and0 ≤ xC ≤ 4000, the left plot in Figure 1.1 shows the feasible region of this problem.

Figure 1.1: Feasible region and objective level curves of the production model

1.2. DIET AND BLENDING PROBLEMS (MINIMIZING COSTS) 11

Plotting the objective function is slightly more difficult, because we do not have a specific“band” or “coil” intercept given. However, we can plot several level curves if we assume anarbitrarily fixed value for the total profit and then plot the corresponding hyperplane for thatfixed profit level. The middle plot in Figure 1.1 shows the three level curves for a profit of$120,000, $192,000, and $210,000, and the combined plot on the right shows that $192,000 isindeed the optimal profit. By inspection, we easily see that the optimal production amountsare xB = $6, 000 tons of bands, and xC = $5, 600− 0.7xB = $1, 400 tons of coils. For a moredetailed discussion of this solution method, please consult Fourer et al. (2002).

A “Simple” Method An alternative and quite clever method to solve this problem is tonot only look at production rates and limits, but also at the two profit rates

pBrB = 25[$/ton]200[tons/hour] = 5, 000[$/hour] and pCrC = 30140 = 4, 200[$/hour]

where we have also included the relevant units. Clearly, since it is now apparent that theproduction of bands is more efficient and, thus, more profitable than the production of coils,we would produce as many tons of bands as possible, namely its upper limit of 6,000 tonsusing 6000/200 = 30 hours on the mill. Furthermore, since we cannot produce any morebands after that, we would still use the remaining capacity of 10 hours for the production of10 · 140 = 1400 tone of coils, giving us the same solution as before with an optimal profit of$192,000.

4. Optimization: While we were able to solve the above problem, we may already expect that ifwe want to solve problems with more than two or three decision variables, it is impractical orsimply not possible to use the graphical method anymore. Furthermore, a similarly “simple”method may require a little bit (or a lot) more work if done by hand, for which we will learnthe very similar Simplex Method in the next chapter. Until then, however, we can already useone of many existing software packages for linear programming. Using AMPL for most of thiscourse, please follow the introduction already given in class, read Chapter 1 from the AMPLbook by Fourer et al. (2002), and try if you can solve the given new examples in extension ofthe preliminary production model already discussed above.

5. Analysis: Assuming that our model and data is flawless and does not need any furtherrevision, we skip the analysis and have solved our first “real-life” problem in three differentways!

1.2 Diet and Blending Problems (Minimizing Costs)

Of course, we know that our above model was far from realistic. Although profit is commonlydefined as the excess of revenue over costs, there may be many other production costs that needto considered separately, and to be kept as small as possible (i.e., minimized). Hence, let us nowconsider a second problem to minimize costs (yes, while ignoring profits, but you will see that thereis no real profit in the next problem). Originally another classical operations research applicationin production and manufacturing that deals with the most cost-efficient blending or mixing of agiven set of raw materials into a homogeneous product, however, for a change let us formulate theproblem in a way that seems a little bit closer to our own world. Namely, let us consider choosingour next week’s meal plan as to meet minimum nutritional requirements and minimize the cost ofour shopping list. In addition, in order to still enjoy eating, we also want to ensure a sufficientvariety in the types of food that we consume.

To define the mathematical model, we first need to decide what types of food are available tous, and which nutrients we want to include into our model. So let’s say that we have n items offood such as bread, fruits, vegetables, meats, etc.

FOOD = {1, 2, . . . , n}


and m possible nutrients such as carbohydrates, proteins, fats, vitamins, calcium, sodium, magne-sium, etc.

NUTR = {1, 2, . . . ,m}

Our decision variables xj will be the quantities of each food item j ∈ FOOD to buy and toinclude into our diet, and our parameters will include the cost cj of each item, their minimum andmaximum quantity fmin,j and fmax,j to ensure overall variety, and the amounts aij of nutrient i infood item j. Finally, we need to specify minimum and maximum nutritional requirements nmin,i

and nmax,i for each nutrient i ∈ NUTR .

Decision variables: xj food quantities to buy

Data parameters: cj cost of food items

fmin,j , fmax,j min/max food quantities for variety

nmin,i, nmax,i min/max nutritional requirements

aij amount of nutrient i in food j

With this setup, we can define our objective as to minimize cost

minimize cost:∑

j∈ FOOD

cjxj

while meeting nutritional requirements and ensuring the variety of our diet

nmin,i ≤∑

j∈ FOOD aijxj ≤ nmax,i for all i ∈ NUTR

fmin,j ≤ xj ≤ fmax,j for all j ∈ FOOD

Next, to find our personal optimal diet, we need to specify the actual food items we would like toconsume and then collect their relevant nutrition facts before solving the resulting problem instanceusing AMPL. This will be part of your first homework assignment.

1.3 The General Linear Programming Problem

This section will be covered essentially identical to Section 1.2 in Vanderbei (2008) (reading assign-ment!) and introduce the general LP model, some terminology (variables, objectives, constraints,slacks), and the two examples of unbounded and infeasible problems that you have plotted inExercise 1.3 for your first homework assignment.

1.4 Problem Set 1

Exercise 1.1. (The Diet Problem) [3 points] Starting from the diet model developed in class,choose your own sets of food and nutrients, specify all required model parameters, and solve theresulting problem instance using AMPL to find your personal optimal diet. You may use the modelfile diet.mod from AMPL and only need to modify the file diet.dat according to your chosensets and parameters. Please paste your data file into your typed solution (e.g., using the verbatim

environment in LATEX), and also submit it separately using the file upload tool on Blackboard.

Exercise 1.2. (Airline Modeling) [6 points] Solve Exercise 1.2 on page 9 in your text book andfind the optimal solution using AMPL (you do not need to write different model and data files). Inyour model formulation, clearly define all variables, parameters, the objective and all constraints,and again provide your AMPL input file and/or series of commands together with your solution.

Exercise 1.3. (Geometry of Infeasible and Unbounded LPs) [6 points] At the end ofSection 1.2, the author discusses the following two LP problems that are infeasible (there are nofeasible solutions) or unbounded (there are feasible solutions with arbitrarily large objective values):

1.4. PROBLEM SET 1 13

maximize 5x1 + 4x2

subject to x1 + x2 ≤ 2−2x1 − 2x2 ≤ −9x1 , x2 ≥ 0

maximize x1 − 4x2

subject to −2x1 + x2 ≤ −1−x1 − 2x2 ≤ −2x1 , x2 ≥ 0

Graph the geometry of these two LPs (including level curves for the feasible problem), and illustrategraphically why the first problem is infeasible and why the second problem is unbounded. If youalready know how to plot or include graphics with LATEX (e.g., if you decide to use Maple, Matlab,Mathematica, etc.), great, otherwise you may also turn in a handwritten solution for this problem.

Exercise 1.4. (Basic Linear Programming Theory) [5 points] In this exercise, you willexplore and prove some fundamental properties of linear programs, their representations, and rela-tionships to other optimization problems. For convenience, we will use the matrix-vector notation

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

∈ Rm×n, b =

b1b2...bm

∈ Rm, and c =

c1c2...cn

∈ Rn.

(a). Representation of LP in Standard and Canonical Form (1 point) Different from the termi-nology used in our book, many other authors refer to a linear program to be in standard formif it is formulated as minimization problem with both equalities and nonnegativity constraints

minimize cTx subject to Ax = b and x ≥ 0

and to be in canonical form if all constraints are formulated as greater-or-equal inequalities

minimize cTx subject to Ax ≥ b.

Show that the maximization LP with less-or-equal inequalities and nonnegativity constraints

maximize cTx subject to Ax ≤ b and x ≥ 0

that is discussed in our text book can be converted into both standard and canonical form.[Hint: You are allowed to redefine the problem data A, b, and c and the decision variable x.]

(b). Characterization of LP as Convex and Conic Optimization Problem (4 points): Over the last20 years, the development of interior-point methods has led to significant generalizations oflinear programming to the important areas of convex and conic optimization. Although theirdetailed study belongs to a series of more advanced topics in optimization, it will be useful toknow at least some of the conceptual relationships between these areas. As we explore someof these relationships in the following few exercises, we will also introduce a couple of usefuldefinitions many of which you should already know from basic linear algebra or real analysis.

(i) Convex Functions and Sets (1 point): First, we review the notion of convexity: Afunction f : S ⊆ Rn → R is said to be convex if for any two points x and y in the set S

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) for all 0 ≤ λ ≤ 1

(the line segment connecting any two points on the graph lies above or on the graph), andf is said to be concave if the same inequality holds with ≤ replaced by≥ (the line segmentconnecting any two points on the graph lies below or on the graph – convince yourselfthat a function f is concave if and only if −f is convex). Similarly, a set S ⊆ Rn is saidto be convex if for any two points x and y in S and 0 ≤ λ ≤ 1, the point λx+ (1− λ)yalso belongs to S (the line segment connecting any two points in the set also belongs tothe set), and it is said to be affine if the same condition holds for any λ ∈ R (the linepassing through any two points in the set also belongs to the set). Show that a function fis convex if and only if its epigraph epi(f) = {(x, z) ∈ Rn+1 : f(x) ≤ z)} is a convex set.


(ii) Convex Optimization (1 point): An optimization problem is said to be a convexprogram if it can be formulated to minimize a convex function over a convex set. Showthat linear programming is a special case of convex optimization by showing that everyLP is a convex program, and give an example for a convex program that is not an LP.

(iii) A Challenge (not for credit but for high recognition): Can you show that ev-ery convex program can equivalently be formulated as an optimization problem thatminimizes a linear (!) function over a convex set? Hint: Use your result from part (a).

(iv) Polyhedral Sets and Cones (1 point): Next, we review some basic but importantconcepts from linear algebra: A hyperplane in Rn is a set {x ∈ Rn : aTx = b}, thevector a ∈ Rn is referred to as its normal vector, and the sets {x ∈ Rn : aTx ≥ b}and {x ∈ Rn : aTx ≤ b} are its associated positive and negative halfspaces, respectively(convince yourself that the normal vector is orthogonal to the hyperplane and points ‘into’the positive halfspace). A set S ⊆ Rn is said to be polyhedral if it is the intersection ofa finite number of halfspaces (convince yourself that every polyhedral set is convex). Aset K ⊆ Rn is said to be a cone if for any point x ∈ K and any λ ≥ 0, the point λx alsobelongs to K. Give a geometrical description and example of (i) a (general) cone, (ii)a polyhedral cone (a cone that is polyhedral), and (iii) show that a cone K is convex ifand only if K +K ⊆ K, where K +K = {k1 + k2 : k1, k2 ∈ K} is the Minkowski sumof K with itself (since you have already convinced yourself that every polyhedral set isconvex, it should also be immediately clear that polyhedral cones are always convex).

(v) Conic Optimization (1 point): An optimization problem is said to be a conic programif it can be formulated to minimize a linear function over the intersection of an affine setwith a convex cone (convince yourself that the intersection of convex sets is convex andthus, using (c), that conic programming is a special case of convex programming). Showthat linear programming is a special case of conic optimization, and explicitly state theunderlying affine set and convex cone for an LP in standard or (careful!) canonical form.

(vi) Another Challenge (for the fun only): Can you think of an example for a conicprogram that is not an LP? Hint: Can you think of a convex cone that is not polyhedral?

Part I

The Simplex Method and Duality

15

Chapter 2

The Simplex Method

In this chapter, we will learn our first and, in fact, the first algorithm to solve linear programmingproblems: the famous Simplex Method by Dantzig (1951) and Kantorovich (1960). Although in themean time challenged by alternative solution methods including some of the interior-point methodsthat we will learn about a little bit later in the course, even 60 years after its first discovery theSimplex Method remains one of the most important algorithms in mathematical programming andoptimization, in general.

2.1 An Example

Let us start by considering the following linear program

maximize 5x1 + 6x2

subject to 7x1 + 10x2 ≤ 56x1 ≤ 6

x2 ≤ 4x1 , x2 ≥ 0

and introduce the nonnegative slack variables w1, w2, w3 to write this problem using only equalityconstraints

maximize 5x1 + 6x2

subject to 7x1 + 10x2 + w1 = 56x1 + w2 = 6

x2 + w3 = 4x1 , x2 , w1 , w2 , w3 ≥ 0

Like most other optimization algorithms, the Simplex Method starts from an initial feasiblepoint and then repeats a series of improvement steps until it finds an optimal solution. This processis quite similar to us looking for the closest supermarket: Starting from our current location, wefirst pick a direction and start walking, and then need to decide when to stop. Once we stop, wehave either found a supermarket, in which case we are done, or we need to pick a new directionand continue our search, possibly walking along a new path or the same as before or even goingback from where we came. In summary:

1. Start from an initial point.

2. Find a search direction along which which we can (or hope to) improve.

3. Decide how far we want to move into this direction.

4. Arrive at your next point: if it is optimal, we are done, otherwise go back to 2.

16

2.1. AN EXAMPLE 17

Before we apply the above scheme to our problem, let us rewrite the linear program as a so-calleddictionary

ζ = 5x1 + 6x2

w1 = 56 − 7x1 − 10x2

w2 = 6 − x1

w3 = 4 − x2

where the new variable ζ denotes the objective value and similar to the three slacks is characterizedonly in terms of the original decision variables x1 and x2. Hence, we can use the values of thevariables on the right, which are also called independent variables, to compute or “look up” (hence“dictionary”) the values of each dependent variable on the left. In particular, to get an initialfeasible point, all we need to do is to assign values to the independent variables on the right, whichthen also implies the values for the dependent values. The basic (!) idea of the simplex method isto always set all independent variables to 0, so that x1 = x2 = 0 and w1 = 56, w2 = 6, and w3 = 4becomes our first initial point to the above problem with a current objective value of ζ = 0.

Having found an initial feasible point, next we need to find a “direction” into which we canimprove. From the ζ-row in the above dictionary, it is clear that in order to improve (i,e,, increase)the current objective value, we need to increase either x1 or x2 (or both) because both have apositive sign and hence can give an increase in the current objective. In the simplex method,however, we always increase only one variable, so let’s choose the one with the larger coefficienthoping that it will give us a larger improvement, here x2 because 6 > 5.

The dictionary tells us that by increasing x2, we will also change and in our case decrease thecurrent values for w1 and w3 (because all coefficients of x1 an x2 are negative). Clearly, to stayfeasible we need need to make sure that both slacks will remain nonnegative which then determinesby how much we can increase x2

w1 = 56− 10x2 ≥ 0 ⇔ x2 ≤ 5.6

w3 = 4− x2 ≥ 0 ⇔ x2 ≤ 4

The above analysis makes use of the fact that the other independent variable, here x1, will be keptat zero. Then we see that we can only increase x2 up to a value of 4, because otherwise w3 wouldbecome negative. In fact, remember that the initial problem had an upper bound of 4 for x2, andthat w3 is simply the slack variable of that constraint. It all makes sense! In particular, after weset x2 = 4 and w3 = 0, we also need to find the new values for all the other variables and can easilyuse the dictionary to find that w1 = 16, w2 = 6 and x1 = 0 as before, and ζ = 24. Note that asbefore we have three variables that are positive, and two variables that are zero and thus could beincreased to further improve our new objective. To check if this is possible, we simply update thecurrent dictionary to write all nonzero variables as the dependent variables on the left, and expressthem in terms of the zero variables as the independent variables on the right of the equality sign.We can do this by simply interchanging the roles of w3 and x2 using that

w3 = 4− x2 ⇔ x2 = 4− w3

and then substituting this expression for x2 into each equation of the old dictionary

ζ = 5x1 + 6x2 = 5x1 + 6(4− w3) = 24 + 5x1 − 6w3

w1 = 56− 7x1 − 10x2 = 56− 7x1 − 10(4 − w3) = 16− 7x1 + 10w3

Because w2 = 6− x1 does not depend on x2, we simply keep it as it is and find our new dictionary

ζ = 24 + 5x1 − 6w3

w1 = 16 − 7x1 + 10w3

w2 = 6 − x1

x2 = 4 − w3

Note how the numerical values correspond to the new values that we had already found above.Also note that w3 now has a negative coefficient in the ζ-row and, thus, that another increase in

18 CHAPTER 2. THE SIMPLEX METHOD

w3 does not improve the new objective. The coefficient of x1 is still positive, however, so that anincrease in x1 will also increase our current objective value of 24. Similar to before, we then needto ensure that all dependent variables remain nonnegative

w1 = 16− 7x1 ≥ 0 ⇔ x1 ≤16

7w2 = 6− x1 ≥ 0 ⇔ x1 ≤ 6

Choosing the smaller of the two, we see that x1 = 167 and w1 = 0 will give us the new solution for

which we can directly update the dictionary using that

w1 = 16− 7x1 + 10w3 ⇔ x1 =16

7−

1

7w1 +

10

7w3

Substituting this expression in the previous dictionary and replacing w1 by x1, the new dictionarybecomes

ζ = 2487 − 5

7w1 + 87w3

x1 = 167 − 1

7w1 + 107 w3

w2 = 267 + 1

7w1 − 107 w3

x2 = 4 − w3

Now, while the coefficient of w1 has become negative so that we cannot improve the objectiveby further increasing w1, the coefficient of w3 that was previously negative is now again positive!Although this may seem surprising, because we previously said that a further increase in w3 wouldnot improve our objective in the previous step, it now does and we will soon understand why this is.First, however, let us finish this example and decide by how much we can increase w3. Of course,by now you are already an expert in what to do!

w2 =26

7−

10

7w3 ≥ 0 ⇔ w3 ≤

13

5x2 = 4− w3 ≥ 0 ⇔ w3 ≤ 4

Choosing the smaller value and thus interchanging the roles of w2 and w3 using

w2 =26

7+

1

7w1 −

10

7w3 ⇔ w3 =

13

5+

1

10w1 −

7

10w2

we find the new dictionary

ζ = 38.4 − 0.6w1 − 0.8w2

x1 = 6 − w2

w3 = 2.6 + 0.1w1 − 0.7w2

x2 = 1.4 − 0.1w1 + 0.7w2

Because all objective coefficients are now negative, no further increase is capable to further improvethe objective, which means we have found an optimal solution! From the dictionary, we readily seethat the optimal values are x1 = 6, x2 = 1.4, w1 = w2 = 0, and w3 = 2.6, and the optimal objectivevalue is 38.4.

Let’s take another look at the optimal solution x = (x1, x2) = (6, 1.4), which should lookfamiliar. In fact, the rolling mill example in Section 1.1 had a very similar solution, 6,000 tonesof bands and 1,400 tons in coils to gain a maximal profit of $192,000. It will probably take youjust another second to realize that we have just solved the exact same problem as before afterrescaling the objective coefficients by a factor of 0.2 (clearly, 38.4/0.2 = 192), the decision variablesand production limits by a factor of 0.001, and multiplying the original constraint by 1.4. So let’sanalyze what we just did using the same plot that we already considered in Figure 1.1.

Note that we have two original variables x1 and x2 with nonnegativity constraints (the coordi-nate axes) and three slack variables w1, w2, and w3 associated with the upper limits (the verticaland horizontal boundary of the feasible region) and the time constraint (the diagonal), respectively.

2.1. AN EXAMPLE 19

Figure 2.1: The Rolling Mill Problem Linear Program

Hence, if for any given point any of the variables is zero, then the point either lies on one of thecoordinate axes x1 = 0 or x2 = 0, or on one of the constraints 7x1 − 10x2 = 56, x1 = 6, or x2 = 4.In particular, for any feasible point at most two variables can be zero at the same time, becauseat most two constraints intersect in a common point. Equivalently, we can conclude that at leastthree variables must be positive, and exactly three variable are positive at every corner point orso-called vertex of the feasible region.

Starting from the origin, we had found that we can increase the objective if we increase thevalue of x2, i.e., if we walk “upwards” along the vertical axis. Clearly, we are not allowed to leavethe feasible region and must stop as soon as we hit the first constraint, which is the upper boundon x2. Note that the associated slack variable is w3 which now becomes zero, exactly what wehad found before. Also note that the distance to the time constraint, measured by w1 has becomesmaller, while the distance to the upper bound on x1 has not changed. Yep, we knew that already!

In our second step, we decided to increase the value of x1, i.e., walk to the right until hittingthe time constraint, or equivalently, reducing w1 to zero. But we were still not optimal, becausethen we could again increase w3, or equivalently, increase the distance (walk further away) fromthe horizontal upper bound. Remember how we previously wondered why we first reduced w3 tozero, and then again increased the value - but now it makes sense! Because at the end we need toarrive at the optimal solution, which of course is the same we had already found in Section 1.1.

Finally, you may notice that there could have been a faster way to solve this problem, namelyif we would have decided to not increase x2 but start with x1, bringing us right below the optimalsolution. We see what would have happened in that case, let’s start again with our first dictionary

ζ = 5x1 + 6x2

w1 = 56 − 7x1 − 10x2

w2 = 6 − x1

w3 = 4 − x2

The plot tells us that by starting from the origin and walking along the horizontal axis, the firstconstraint we run into is the vertical upper bound with the slack variable w2. Not at all surprisinganymore, we accordingly find that

w1 = 56 − 7x1 ≥ 0 ⇔ x1 ≤ 8

w2 = 6− x1 ≥ 0 ⇔ x1 ≤ 6


so that we replace w2 = 6− x1 with x1 = 6− w2 resulting in the new dictionary

ζ = 30 − 5w2 + 6x2

w1 = 14 + 7w2 − 10x2

x1 = 6 − w2

w3 = 4 − x2

corresponding to the lower right point in the above plot. Since x2 has the only positive coefficient,we now increase x2 which has to bring us to the optimal solution. Clearly, then the final dictionarywill also be identical to the optimal dictionary we had found before. Check it!

2.2 Pivoting Rules

In the above section, we have seen that the number of steps taken by the simplex method dependson which of the independent variables we choose to increase from zero, or move from the set ofindependent variables to the set of dependent variables. In fact, in this section we will see that aproper choice is crucial for the simplex method to find an optimal solution at all! Before we start,however, let us introduce some common terminology and useful notation.

2.2.1 Basic and Nonbasic Variables

Let us consider a general linear program with n decision variables and m constraints

maximize

n∑

j=1

cjxj

subject ton∑

j=1

aijxj ≤ bi i = 1, . . . ,m

xj ≥ 0 j = 1, . . . , n

whose initial dictionary can be written as

ζ =∑n

j=1 cjxj

wi = bi −∑n

j=1 aijxj , i = 1, . . . ,m.(2.1)

The simplex method repeatedly increases the objective value ζ by interchanging the role of onedependent and one independent variable until all coefficients in the objective function are negative,upon which no further improvement is possible. Since the method does not really distinguishbetween original variables and slack variables, it will be convenient to give up this distinction aswell and simply denote

(x1, x2, . . . , xn, w1, w2, . . . , wm) = (x1, x2, . . . , xn, xn+1, xn+2, . . . , xn+m).

Given any dictionary, the set of indices of the dependent variables is then said to form a basis, andthe associated variables are also called the basic variables. Accordingly, the independent variablesare called the nonbasic variables, and setting all nonbasic variables to zero gives a point that isalso called a basic feasible solution. Denoting the index set of the basic and nonbasic variables byB and N , respectively, any dictionary can then be written in the form

ζ = ζ +∑

j∈N cjxj

xi = bi −∑

j∈N aijxj , i ∈ B

where the bar-notation indicates that the current objective value ζ, right-hand side vectors bi, andobjective and constraint coefficients cj and aij repeatedly change as we move from one point toanother. Now interchanging a dependent (basic) variable with an independent (nonbasic) one, we

2.2. PIVOTING RULES 21

call this interchange a pivot and also say that the basic variable becomes nonbasic or leaves the basis,and similarly, that the nonbasic variables becomes basic or enters the basis. In particular, notethat the number of basic variables is always equal to the number of constraints! You will discoversome other interesting relationships between basic feasible solutions and the general solutions oflinear systems of equalities as part of one of your future homework assignments.

2.2.2 Largest Coefficients and Minimum Ratios

Pivoting rules provide guidance on how to choose the entering and leaving variables as to guaranteeimprovement of the current basic feasible solution, and to eventually find an optimal solution. Oneof the first and still most popular pivoting rules is the largest-coefficient rule, which we intuitivelyalready used ourselves when we solved the example in Section 2.1.

Largest-Coefficient Rule (LCR): Choose the entering variable k ∈ N such that ck is positive(clear!), and maximal among all positive objective coefficients

k = arg maxj∈N{cj > 0}.

To decide which variable to remove from the basis, we now follow the same idea as before andinvestigate by how much we can increase the new basic variable xk while ensuring that all othervariables remain nonnegative

xi = bi − aikxk ≥ 0 ⇔

{

xk ≤bi

aikif aik > 0

xk ≥bi

aikif aik < 0

for all i ∈ B

Clearly, if aik = 0, then the value of xi does not change at all, and it only becomes bigger if aik isnegative. Hence, we can restrict our attention to those basic variables for which aik is positive, andparticularly to that one for which the ratio bi

aikis minimal. We have just derived the minimum-ratio

test.

Minimum-Ratio Test (MRT): Choose the leaving variable l ∈ B such that aik > 0 and bi

aikis

minimal.

l = arg mini∈B

{biaik

: aik > 0

}

The following list collects several remarks on the above pivoting rules.

1. Note that that neither rule does specify how to break ties. Hence, if there are two nonbasicvariables with the same objective coefficient, or two basic variables with the same minimumratio, let’s say for now that we pick one of the two arbitrarily.

2. It is easy to see that the minimum ratio directly implies the new basic variable’s value

xk = mini∈B

{biaik

: aik > 0

}

Furthermore, then it is also clear that the objective value will increase by an amount of bick

aik.

3. If all cost coefficients cj are negative or zero, however, then it follows that we cannot furtherimprove the current point and that we have found an optimal solution. Is this clear, or doesit need a proof? Think about it!

4. On the other hand, if all coefficients aik are nonpositive, then we can increase xk by anarbitrary amount without reducing any of the xi to zero. Hence, we can also increase theobjective function by an arbitrary amount, showing that the problem in this case is unbounded.This is discussed in slightly more detail in Section 2.4 in Vanderbei (2008). Read it!


5. We usually assume that we have a feasible point, so we implicitly assume that all bi arenonnegative (we will see later what to do otherwise). However, it is not clear what happensif some of the bi are zero, i.e., if we have a basic variable that is zero. It might not be a bigdeal, but sometimes it might be, and ugly things can happen, in general. Be ready for it inthe next section!

6. Finally, for the case that all bi > 0, Vanderbei (2008) also uses an equivalent maximum-ratiotest

l = arg maxi∈B

{aik

bi

}

which (correctly) does not restrict the aik to be positive. Do you see why?

2.2.3 Cycling and Degeneracy

To see what can happen if parts of the right-hand side vanish, let us now look at the followingfamous example by Beale (1955).

maximize 34x1 − 20x2 + 1

2x3 − 6x4

subject to 14x1 − 8x2 − x3 + 9x4 ≤ 012x1 − 12x2 − 1

2x3 + 3x4 ≤ 0x3 ≤ 1

x1 , x2 , x3 , x4 ≥ 0

To avoid some of the tedious notation of the dictionary notation, we now drop the variable namesand present the same problem in a so-called (reduced) simplex tableau

x1 x2 x3 x4 1(RHS)

ζ (=) 34 −20 1

2 −6 (+) 0

w1 (+) 14 −8 −1 9 (=) 0

w2 (+) 12 −12 −1

2 3 (=) 0w3 (+) 0 0 1 0 (=) 1

Note that are some subtle differences to the dictionary notation. First, both the coefficients of theobjective function and the constraints are recorded as they are, which means that the summationand equality signs for the objective function ζ and the linear equality constraints occur at differentpositions (and are usually dropped but given here for clarity). However, note how this correspondsto the dictionary notation (2.1) which multiplies the coefficients aij that appear in the simplextableau by an extra −1 (in front of the summation sign). Another difference to the dictionarynotation, the right-hand side (RHS) values are now recorded on the far right, and the column isoften labeled “1” which makes sense if we think about the tableau entries as multipliers, or factorsto be multiplied by x1, x2, x3, x4, and 1, respectively.

Now starting from the above first tableau and applying the largest-coefficient rule, we see thatx1 will enter the basis, and the minimum-ratio test gives us a choice of either w1 or w2 as leavingvariable. For now and all following tie-breakers, let us always choose the “first” variable to leavethe basis, i.e., that variable with the smaller index. Proceeding as we did for the dictionary, wepivot between x1 and w1 by rewriting and substituting

w1 +1

4x1 − 8x2 − 1x3 + 9x4 = 0 ⇔ x1 = −4w1 + 32x2 + 4x3 − 36x4

leading to the new tableau (check the details)

w1 x2 x3 x4 1

ζ −3 4 72 −33 0

x1 4 −32 −4 36 0

w2 −2 4 32 −15 0

w3 0 0 1 0 1

w1 w2 x3 x4 1

ζ −1 −1 2 −18 0

x1 −12 8 8 84 0x2 −1

214

38 −15

4 0w3 0 0 1 0 1


where we have now dropped all unnecessary summation and equality signs. The below series showwhat happens next together with the associated pivoting elements.

w1 w2 x1 x4 1

ζ 2 −3 −14 3 0

x3 −32 1 1

8 −212 0

x2116 −1

8 − 364

316 0

w332 −1 −1

8212 1

w1 w2 x1 x2 1

ζ 1 −1 12 −16 0

x3 2 −6 −52 56 0

x413 −2

3 −14

163 0

w3 −2 6 52 −56 1

x3 w2 x1 x2 1

ζ −12 2 7

4 −44 0

w112 −3 −5

4 28 0

x4 −16

13

16 −4 0

w3 1 0 0 0 1

x3 x4 x1 x2 1

ζ 12 −6 3

4 −20 0

w1 −1 9 14 −8 0

w2 −12 3 1

2 −12 0w3 1 0 0 0 1

Quite a surprise! After seven iterations, we find that the new tableau is identical to the first tableau,and after another critical look it turns out that all basic feasible solutions correspond to the samepoint x1 = x2 = x3 = x4 = w1 = w2 = 0 and w3 = 1. This “phenomenon” is also called cyclingand due to degeneracy of the associated basic feasible solution.

Definition (Degeneracy). A basic feasible solution x = (xB, xN ) is said to be degenerate if xi = 0for some i ∈ B and N 6= ∅ (a technical but important detail). Similarly, an LP is said to bedegenerate if it has at least one degenerate basic feasible solution.

The difficulty with degenerate solutions is that we can represent the same solution using differentbases because we can replace the zero variable that is currently in the basis with any other currentlynonbasic variable, giving us “a new look” for the the same solution. In fact, each tableau in theformer example represents the same solution with only one nonzero variable w3 = 1, but they alllook quite different. Moreover, if we repeated the simplex method using the above pivoting rules,then we would continue to cycle between these different representation of the same point forever.Luckily, there are several rules that prevent this undesired behavior, one of which we discuss in thefollowing section.

Aside: Pivoting in the Simplex Tableau You will have experienced already that it can bequite a bit of work to update the dictionaries or tableau after every pivot. Luckily, we can simplifyour work later if we put in some extra effort now! Let us start from an initial dictionary

ζ = ζ +∑

j∈N cjxj

xi = bi −∑

j∈N aijxj , i ∈ B

and assume that we have decided to pivot between variables xk with k ∈ N and xl with l ∈ B. Theassociated simplex tableau before and after this pivot can be written as

xk xj (j 6= k) 1

ζ ck cj ζ

xl alk alj blxi (i 6= l) aik aij bi

xl xj 1

ζ c∗k c∗j ζ∗

xk a∗lk a∗lj b∗lxi a∗ik a∗ij b∗i


To derive the relationships between the old and new coefficients, let us analyze what happens if weinterchange the roles between xl and xk in the initial dictionary. As always, we first rewrite

xl = bl −∑

j∈N

aljxj = bl −∑

j∈N\{k}

aljxj − alkxk

⇔ xk =1

alk

bl −∑

j∈N\{k}

aljxj − xl

and then substitute this expression into the other constraints and objective function

xi = bi −∑

j∈N

aijxj = bi −∑

j∈N\{k}

aijxj −aik

alk

bl −∑

j∈N\{k}

aljxj − xl

=

(

bi −aik

alkbl

)

−∑

j∈N\{k}

(

aij −aik

alkalj

)

xj −

(

−aik

alk

)

xl

ζ = ζ +∑

j∈N

cjxj = ζ +∑

j∈N\{k}

cjxj +ckalk

bl −∑

j∈N\{k}

aljxj − xl

=

(

ζ +ckalk

bl

)

+∑

j∈N\{k}

(

cj −ckalk

alj

)

xj +

(

−ckalk

)

xl

Hence, we see that we can update the new simplex tableau with the new basis B∗ = B ∪ {k} \ {l}and N ∗ = N ∪ {l} \ {k} using the simple pivoting rules

a∗kl =1

alk

a∗kj =alj

alkfor j ∈ N ∗ \ {l}

a∗il = −aik

alkfor i ∈ B∗ \ {k}

a∗ij = aij −alj aik

alkotherwise

b∗k =blalk

b∗i = bi −blaik

alkfor i ∈ B∗ \ {k}

c∗l = −ckalk

c∗j = cj −alj ckalk

for j ∈ N ∗ \ {l}

ζ∗ = ζ +blckalk

2.2.4 Bland’s Rule

In contrast to the largest-coefficient rule, Bland’s Rule could also be called the “smallest index”rule because it specifies to always choose the variable with the smallest index among all “suitable”variables to enter or leave the basis (suitable here means that ck > 0 for the entering variable withk ∈ N , and that bl/alk is the minimum ratio for the leaving variable with l ∈ B). In particular,Bland’s Rule does not allow ties and always decides in favor of the variable with the smaller index.It is time for our first theorem!

Theorem. Under Bland’s rule, the Simplex Method never cycles.


Proof (by contradiction). We will prove the above theorem in three steps.

1. Assume that there is a cycle and let xt be the leaving variable with largest index, and xs beits replacing variable in the same dictionary

ζ = ζ +∑

j∈N cjxj

xi = bi −∑

j∈N aijxj, i ∈ B

We collect several (obvious?) facts that we will need for some of our later arguments:

t > s, t ∈ B, s ∈ N , cs > 0, ats > 0, bt = 0 (becausebtcsats

= 0) (*)

2. Because we cycle, there must be another dictionary in which xt re-enters a new basis B∗ (andthere is no other candidate with j < t to enter this basis), so we have that c∗t > 0 (�) and

ζ = ζ +∑

j∈N ∗ c∗jxj

xi = b∗i −∑

j∈N ∗ a∗ijxj, i ∈ B∗

where the objective value ζ has not changed because it can never change in a cycle. Now letc∗j = 0 for all j ∈ B∗ (◦) and write ζ = ζ +

∑n+mj=1 c∗jxj, then (with some deep thoughts!)

c∗j

{

= 0 if x ∈ B∗

< 0 if x ∈ N ∗ because s < t but xt enters basis (rule!)⇒ c∗s ≤ 0 (4)

3. Now, if we were to increase xs = y > 0 in the dictionary in Step 1 (for a moment ignoringthe issue of staying feasible) and let xj = 0 for all j ∈ N \ {s}, then

ζ = ζ + csy and xi = bi − aisy for all i ∈ B.

Alternatively, from Step 2, we also have that

ζ = ζ + c∗sy +∑

i ∈B

c∗i (bi − aisy)

and equating the two former expressions for ζ (and some rewriting) yields that

(

cs − c∗s +

∑

i∈B

c∗i ais

)

y =∑

u∈B

c∗i bi.

Now, since y > 0 was chosen arbitrary, the above equation can only be true for any y if allother terms on the left and right side are equal to zero, so

cs︸︷︷︸

>0 by (∗)

−c∗s︸︷︷︸

≥0 by (4)

+∑

i∈B

c∗i ais = 0

which implies that∑

i∈B c∗i ais < 0 and, thus, c∗r ars < 0 for at least one index r ∈ B (∇). We

are almost done! Because now, it follows that c∗r 6= 0 and r 6= t since c∗r ars < 0 but c∗t ats > 0by (�) and (*), and further, that r /∈ B∗ by (◦) so that xr must also enter and leave the basisduring the cycle and, thus, that r < t as well (because t is the largest such index). Hence,we must find that c∗r < 0 because xt has entered the basis B∗ (otherwise, if c∗r > 0, we wouldhave violated Bland’s rule) which is only possible if ars > 0 by (∇). But if ars > 0 when xs

entered the basis, then xr was also a suitable variable to leave the basis B when xt left thebasis although r < t – which also would be a contradiction to Bland’s rule! Hence, our initialassumption must have been wrong, and there can be no cycle which we wanted to show.

�


2.3 Phase-I/Phase-II Initialization

So far, we have always assumed that all right-hand sides were initially nonnegative which madeit very easy to find an initial feasible solution by setting all original variables to zero and allslack variables equal to these right-hand-sides. In the previous section, we have also seen thatunder certain pivoting rules, the Simplex Method may cycle if some of the right-hand sides vanishbecause of the phenomenon that we called degeneracy. We then discussed Bland’s rule to showthat we can avoid cycling by always choosing that variable to enter or leave the basis that has thesmallest index among all suitable candidates.

Nevertheless, we still have not resolved the difficulty that some of the right-hand sides may benegative, in which case we do not have an initial feasible solution to start the simplex method asoutlined above. To find an initial feasible point, we now introduce the Phase-I/Phase-II methodthat first finds an initial feasible solution, and then solves the original problem exactly as we didbefore. Let us look at the following example (Exercise 2.3 in Vanderbei (2008)) and its initialdictionary

max 2x1 − 6x2

s.t. −x1 − x2 − x3 ≤ −22x1 − x2 + x3 ≤ 1x1 , x2 , x3 ≥ 0

ζ = 2x1 − 6x2

x4 = −2 + x1 + x2 + x3

x5 = 1 − 2x1 + x2 − x3

Clearly, if we now set all nonbasic variables to zero, then x4 = −2 < 0 becomes negative which isinfeasible! However, this does not necessarily mean that the problem is infeasible (it may, but italso may not). To settle this issue and find a feasible solution, if it exists, we typically solve thefollowing auxiliary Phase-I problem

maximize − x0

subject to − x0 +

n∑

j=1

aijxj ≤ bi i = 1, . . . ,m

xj ≥ 0 j = 1, . . . , n

where x0 is an auxiliary variable that is bounded below by 0 and to be minimized (note thatmaximization of the negative variable −x0 is the same as minimization of the positive variable x0).The following two observations are straightforward.

1. The auxiliary problem is always feasible (e.g., choose x0 ≥ −min{bi, 0} and xj = 0 for allj 6= 0) and never unbounded (because the smallest possible objective value is 0).

2. The original LP is feasible if and only if the optimal objective value of the auxiliary problemis 0 (because only then we can drop the redundant variable x0).

To see how to solve the auxiliary problem, let us write this problem as simplex tableau with anadditional row to also record the original objective (we will need it later!).

x0 x1 x2 x3 1

ζ (0) (2) (−6) (0) (0)

−1 0 0 0 0

x4 -1 −1 −1 −1 −2x5 −1 2 −1 1 1

The pivoting rule for this infeasible tableau is very simple: let the auxiliary variable x0 be theentering variable, and choose the most negative basic slack variable as leaving variable, so in thiscase, pivot between x0 and x4. The new (and now feasible) tableau is given below on the left, butit is still not optimal because there are still positive objective coefficients in the auxiliary second

2.4. THE SIMPLEX ALGORITHM 27

ζ-row. Then choosing to pivot between x2 and x0, the next (and optimal) tableau is given belowon the right.

x4 x1 x2 x3 1

ζ (0) (2) (−6) (0) (0)

−1 1 1 1 2

x0 −1 1 1 1 2x5 −1 3 0 2 3

x4 x1 x0 x3 1

ζ −6 8 (6) 6 −12

(0) (0) (−1) (0) (0)

x2 −1 1 (1) 1 2x5 −1 3 (0) 2 3

To see that the above tableau on the right is optimal for the auxiliary problem, note that allobjective coefficients in the second ζ-row are now nonpositive. Furthermore, note that both x0 andthe new objective value are 0, so that we can drop this variable together with the auxiliary objectivefunction and start from the remaining new dictionary as initial feasible point (x1, x2, x3, x4, x5) =(0, 2, 0, 0, 3) with objective value ζ = −12 for the regular Phase-II problem, as shown below onthe left. Finally, after another pivot between x3 and x5 (or another series of steps based on theparticular pivoting rule chosen), we find the optimal solution (x1, x2, x3, x4, x5) = (0, 0.5, 1.5, 0, 0)with objective value ζ = −3 as shown in the tableau on the right.

x4 x1 x3 1

ζ −6 8 6 −12

x2 −1 1 1 2

x5 −1 3 2 3

x4 x1 x5 1

ζ −3 −1 −3 −3

x2 −12 −1

2 −12

12

x3 −12

32

12

32

This Phase-I/Phase-II approach can be used whenever we have an initial problem for which some(or all) of the right-hand-side coefficients are negative. Clearly, if we don’t have any, then we canomit a Phase-I and immediately start with the regular Phase-II problem like we had done before.

2.4 The Simplex Algorithm

The flowchart in Figure 2.2 summarizes the Simplex Algorithm that we have derived in this section.

As a direct consequence of Bland’s Rule, that shows that this algorithm never cycles under thesmallest-index pivoting rule and consequently terminates in a finite number of steps (because thereis only a finite number of different bases that we can choose), our previous proof also proves thefollowing important result.

Theorem (Fundamental Theorem of Linear Programming). Any linear program of the form

max cTx s.t. Ax ≤ b, x ≥ 0

either has an optimal solution, is infeasible, or is unbounded. If a feasible or optimal solution exists,then a basic feasible or optimal solution exists, respectively.

2.5 Problem Set 2

Exercise 2.1. (Mathematical Modeling and Optimization using AMPL) [5 points] Goingback to our initial steel production problem, let us now assume that we have decided to producesteel coils at three different mill locations, in the following amounts (in tons):

GARY Gary, Indiana 1400CLEV Cleveland, Ohio 2600PITT Pittsburgh, Pennsylvania 2900

The total of 6,900 tons must be shipped to meet the following orders from seven automobile factories:


max cTx

s.t. Ax ≤ b

x ≥ 0

Phase I:

max − x0

s.t. − x0 +Ax ≤ b

x, x0 ≥ 0

b ≥ 0?

x0 = 0?

Phase II:

ζ = ζ +∑

j∈N

cjxj

xi = bi −∑

j∈N

aijxj, i ∈ B

l = min{

bk

aik: aik > 0

}

and pivot xk ↔ xl

cj ≤ 0 for all j? ajk ≤ 0 for all k?

INFEASIBLE OPTIMAL UNBOUNDED

no

no

yes

yes

yes

no

ck > 0

no

yes

Figure 2.2: The Simplex Algorithm


FRA Framingham, Massachusetts 900DET Detroit, Michigan 1200LAN Lansing, Michigan 600WIN Windsor, Ontario 400STL St. Louis, Missouri 1700FRE Fremont, California 1100LAF Lafayette, Indiana 1000

The shipping costs for one ton of coils from each mill to each of the plants are given as follows:

FRA DET LAN WIN STL FRE LAF

GARY 39 14 11 14 16 82 8CLEV 27 9 12 9 26 95 17PITT 24 14 17 13 28 99 20

What is the least expansive plan for shipping the coils from mills to plants? Write a general AMPLmodel for this type of problem, and then use your model to solve this specific problem instance.

Exercise 2.2. (Solving Linear Programs using the Simplex Method) [6 points] Solve thethree (very similar) Problems 2.5, 2.6, and 2.7 on pages 24/25 in your text book. Solve one of theproblems using dictionaries, another one using the simplex tableau, and the third using the simpleonline pivot tool from the book’s web page (you can choose for yourself what method to use foreach problem, and may also use the online tool to check your two other answers):

http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/simple.html (use labels: “Primal w/ x0”)

For each problem, show all (infeasible and feasible) intermediate iterates with their current objectivevalue in the appropriate form (dictionary, tableau, or numerical values if you are using the tool).

Exercise 2.3. (Pivoting Challenge) [5 points] Do the Pivoting Challenge in Problems 2.12and 2.13 on page 26 in your text book (solve ten problems each) using the following settings:

rows: 5, cols: 4, seed: 2011, no. of probs: 10, instructor’s email: [email protected]

Note that these challenges are different from the online pivot exam on the book’s web page:

Problem 2.12: http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/primal.htmlProblem 2.13: http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/primal x0.html

You may take each challenge multiple times to get familiar with the tool or to improve your previousscores, if you wish. Only the last score submitted before the due date of this assignment will count!

Exercise 2.4. (Some More Theory and Proofs) [4 points] For Assignment 1, we have re-viewed several concepts from basic linear algebra and analysis, including the concept of convexfunctions and sets, hyperplanes and normal vectors, and halfspaces and polyhedral (linear) cones.This week, we will continue our theoretical study of linear programs and derive a geometrical condi-tion for optimality of LPs in canonical form (note that we already know an analytical condition forLPs in standard from, namely that a feasible solution is optimal if the current objective coefficientscj for all nonbasic variables xj with j ∈ N are nonpositive). Recall that the canonical-form LPminimizes a linear function cTx only subject to greater-or-equal inequalities Ax ≥ b but withoutexplicit nonnegativity constraints. For simplicity, we will write the ith row of the matrix A as aT

i

(so that ai is a column vector!) and the ith component of b as bi (so that bi is a scalar!).

1. Ascent and Descent Directions (1 point): First, let us observe that each inequalityconstraint aT

i x ≥ bi restricts feasibility to a positive halfspace {x : aTi x ≥ bi} with normal

vector ai. Hence, the feasible region S = {x : Ax ≥ b} =⋃

i{x : aTi x ≥ bi} is the intersection

of a finite number of halfspaces and thus a polyhedral cone, i.e., a convex set of rays emanatingfrom the origin, or using different terminology, a set of directions. Now given a normal vector

http://campuscgi.princeton.edu/~rvdb/JAVA/pivot/simple.html

http://campuscgi.princeton.edu/~rvdb/JAVA/pivot/primal.html

http://campuscgi.princeton.edu/~rvdb/JAVA/pivot/primal_x0.html


ai and a point x, we call a nonzero vector d an ascent, descent, or orthogonal direction withrespect to ai if aT

i (x + λd) is an increasing, decreasing, or constant (neither increasing nordecreasing) function in λ for all λ > 0, respectively. Give an analytical condition on aT

i d anda geometrical condition on the angle between the vectors ai and d such that d is an ascent,descent, or orthogonal direction with respect to ai, respectively, and use your condition toshow that the normal vector ai is an ascent direction with respect to itself.

2. Active Constraints and Sets (1 point): Now let x′ ∈ S = {x : Ax ≥ b} be a feasiblepoint, then the ith constraint is said to be active or inactive at x′ if aT

i x′ = bi or aT

i x′ > bi,

respectively. The active set at x′ is defined as the set of indices of active constraints at x′ anddenoted by A(x′). The active constraint matrix at x′, denoted by Ax′ , is the matrix of thoserows aT

i of A that are active at x′, i ∈ A(x′). Show that if x′ and x are two feasible points,then Ax′(x− x′) ≥ 0. Is the converse true? Give a proof, or present a counterexample.

3. Feasible Directions (1 point): Similar to feasible points, we define the notion of feasibledirections. If x′ is feasible to aT

i x ≥ bi, then a nonzero direction d is called a feasible directionat x′ with respect to the constraint aT

i x ≥ bi if there exists a positive scalar σ > 0 such thataT

i (x′ + λd) ≥ bi for all 0 ≤ λ ≤ σ. Analogously, d is called a feasible direction at x withrespect to the system Ax ≥ b if there exists σ > 0 such that A(x+ λd) ≥ b for all 0 ≤ λ ≤ σ.Show that d is feasible at x′ if and only if Ax′d ≥ 0, i.e., aT

i d ≥ 0 for all i ∈ A(x′). Use thisresult to conclude that if all constraints in the system Ax ≥ b are inactive at a feasible pointx′, then every direction is a feasible direction at x′ with respect to that system.

4. Optimality Condition for LPs in Canonical Form (1 point): Finally, we have arrivedat the point where can prove the desired optimality condition. First, it is clear (otherwiseprove it) that a solution x∗ is optimal for an LP in canonical form if and only if x∗ is feasible,Ax∗ ≥ b, and cTx∗ ≤ cTx for any other feasible x ∈ S (because we minimize). Alternatively,we can characterize an optimal solution geometrically by the absence of any feasible descentdirection with respect to the normal vector c: Show that a feasible point x∗ is optimal if andonly if cTd ≥ 0 for all feasible directions d at x∗, or equivalently, for all d such that Ax∗d ≥ 0.

Chapter 3

Duality Theory

Now that we are able to model, formulate, and solve linear programs, it seems that we have all weever need and may wonder what else there to learn about linear programing in this course. In thischapter, we will answer this question with one of the most beautiful aspects of LP theory: duality!

3.1 An (Old) Example

Let us recall our very first steel production problem from Section 1.1: A steel company wishes toallocate time on a rolling mill to produce bands and coils and to maximize their profit, given thefollowing underlying data.

Profit ($/tons) Production rates (tons/hr) Production limits (max tons)

Bands 25 200 6,000Coils 30 140 4,000


Using three different approaches (the geometric method, the simplex method, and a computer) wehad seen before that the optimal production plan was given by 6,000 tons of bands and 1,400 bandsof coils, giving a total profit of $192,000. In extension of our previous discussion, however, let usnow consider the scenario in which another steel producer or some other competitor is interested inpurchasing all or parts of the above company, and has complete knowledge of our current profits,production rates, and limits. Hence, let us introduce the three new decision variables

yA cost/price per hour available on rolling mill (in dollars per hour)

yB cost/price of production capacity of bands (in dollars per ton)

yC cost/price of production capacity of coils (in dollars per ton)

so that the total costs or price of our production (with 40 hours rolling time and production limitsof 6,000 and 4,000 tons of bands and coils, respectively) are given by 40yA + 6, 000yB + 4, 000yC

which the competitor will want to minimize as to pay as little as possible. On the other hand,any realistic offer has to meet at least the current per-ton profits from producing bands and coilsyielding the new linear program

minimize 40yA + 6, 000yB + 4, 000yC

subject to1

200yA + yB ≥ 25

1

140yA + yC ≥ 30

yA, yB , yC ≥ 0

31

32 CHAPTER 3. DUALITY THEORY

To solve this problem, let us first introduce two new slack variables zB and zC

minimize 40yA + 6, 000yB + 4, 000yC

subject to1

200yA + yB − zB = 25

1

140yA + yC − zC = 30

yA, yB , yC , zB , zC ≥ 0

and note that we can easily find an initial feasible solution by setting yB = 25, yC = 30 andyP = ZB = zC = 0. Hence, using dictionary notation we can write

yB = 25−1

200yA + zB

yC = 30−1

140yA + zC

ζ = 40yA + 6, 000yB + 4, 000yC

= 40yA + 6, 000(25 −1

200yA + zB) + 4, 000(30 −

1

140yA + zC)

= 270000 −130

7yA + 6000zB + 4000zC

Because we minimize, we now need to look for negative objective coefficient and hence decide toincrease yA while keeping zB = zC = 0

yB = 25−1

200yA ≥ 0 ⇔ yA ≤ 5000

yC = 30−1

140yA ≥ 0 ⇔ yA ≤ 4200

Thus pivoting between yA and yC , we first solve the associated pivoting equation for yC

yC = 30−1

140yA + zC ⇔ yA = 4200 − 140yC + 140zC

and then update the remaining dictionary yielding

yB = 25−1

200yA + zB = 25−

1

200(4200 − 140yC + 140zC ) + zB = 4 + 0.7yC + zB − 0.7yC

ζ = 270000 −130

7yA + 6000zB + 4000zC

= 270000 −130

7(4200 − 140yC + 140zC ) + 6000zB + 4000zC

= 192000 + 2600yC + 6000zB + 1400zC

Since all coefficients are positive, the objective cannot be further decreased and we have found anoptimal solution yA = 4200, yB = 4, and yC = zB = zC = 0. Clearly, using the simplex tableau forthe rewritten problem

maximize − 40yA − 6, 000yB − 4, 000yC

subject to −1

200yA − yB ≤ 25

−1

140yA − yC ≤ 30

yA, yB, yC ,≥ 0

yA yB yC 1

ζ −40 −6000 −4000 0

zB − 1200 −1 0 −25

zC − 1140 0 −1 −30

3.1. AN (OLD) EXAMPLE 33

we can also start with the slack variables in the basis and follow the standard two-phase approachto first find the feasible solution

y0 yA yB yC 1

ζ 0 −40 −6000 −4000 0

−1 0 0 0 0

zB −1 − 1200 −1 0 −25

zC −1 − 1140 0 −1 −30

zC yA yB yC 1

ζ 0 −40 −6000 −4000 0

−1 1140 0 1 30

zB −1 1467 −1 1 5

y0 −1 1140 0 1 30

zC yA yB zB 1

ζ −4000 −2207 −10000 4000 −20000

−2 1200 1 −1 25

yC −1 1467 −1 1 5

y0 0 1200 1 −1 25

zC yA y0 zB 1

ζ −4000 1307 10000 −6000 −270000

−2 1200 1 −1 25

yC −1 1140 1 0 30

yB 0 1200 1 −1 25

yielding the same solution we had used initially, and then solving the Phase-II problem to find thesame optimal solution

zC yA zB 1

ζ −4000 1307 −6000 −270000

yC −1 1140 0 30

yB 0 1200 −1 25

zC yC zB 1

ζ −1400 −2600 −6000 −192000

yA −140 140 0 4200yB 0.7 −0.7 −1 4

If we take a closer look at the optimal solution, we find a couple of interesting relationships tothe original problem of maximizing profits. First, we note that the new optimal objective value of$192,000, that represents the minimal cost or purchase price, coincides with the maximum profitwe had already found as part of our initial problem. Furthermore, the old optimal values of 1,400tons of coils and 6,000 tons of bands show up as negative coefficients in the objective function ofthe new optimal tableau. You will also recognize that the third coefficient of 2600 denotes theoptimal slack, or unused capacity of coils in the optimal production plan. This is all quite curious,but there is even more. Let us now take a closer look at the new optimal solution.

1. First, assume we decided to either buy (or sell) an extra hour to be made available onour rolling mile. Because we currently produce the more cost-efficient bands at their upperlimits, we would need to adjust the production of coils and either increase (or decrease) theirproduced amount resulting in a profit gain (or loss) of

1 [hr] · 140 [tons per hr] · 30 [dollars per ton ] = 4200 dollars

Note that this is exactly the optimal cost (or price) y∗A per hour available on the rolling mill.Interesting.

2. Next, let us assume that we keep the current rolling time at 40 hours but purchase (or sell)an extra ton production capacity of bands. Clearly, the associated profit gain (or less) fromproducing bands would be 25$, which is quite different from its optimal cost or price of y∗B = 4dollars. But haven’t we forgotten something? If we were to produce an extra ton of bands, wewould also need more time on the rolling mill and, hence, would need to decrease our produc-tion of coils resulting in an simultaneous profit loss. Let’s look at the numbers. The productionof one extra ton of bands requires 1 [ton] · 1

200 [hr/tons] = 0.005 hours on the rolling mill re-sulting in a profit loss from coils of 0.005 [hours] ·140 [tons/hr] ·30 [dollars/ton] = 21 dollars.Et voila, the net profit is exactly the optimal capacity cost or price of bands.


3. Finally, if we were being offered additional capacity of coils, would be interested at all? Prob-ably not, because we still have unused capacity available which is reflected by the optimalsolution y∗C = 0. Similarly, if we were asked for parts of our unused (and thus for us “worth-less”) capacity, then any positive amount would provide us with a benefit and (ignoringmarket share, dominance, and competition considerations) our optimal price would also be(although maybe somewhat surprisingly) y∗C = 0 dollars.

As suggested by the above discussion, the optimal values y∗A, y∗B, and y∗C are also called (economic)shadow prices of production and capacities, respectively.

3.2 Primal and Dual Problems

In the former section, we have seen that there exist pairs of problems (one maximization and oneminimization problem) that use essentially the same data but different decision variables and whoseoptimal solutions are closely related to each other. As we will learn next, this is no coincidence.

Definition. Given a linear program

maximize cTx subject to Ax ≤ b, x ≥ 0 (P)

called the primal problem, the linear program

minimize bT y subject to AT y ≥ c, y ≥ 0 (D)

is called its dual problem, and the optimal values y∗ are also called shadow prices.

Remark. Is is easy to see that the dual of the dual is the primal. For example, write the dualproblem (D) equivalently as

− maximize (−b)T y subject to (−A)T y ≤ (−c), y ≥ 0 ((D′))

and take the dual of (D′) yielding

− minimize (−c)Tx subject to (−A)Tx ≥ (−b), x ≥ 0 ((D′′))

which is again equivalent to problem (P)

maximize cTx subject to ATx ≤ b, x ≥ 0 ((P))

Next, we collect some fundamental results on the primal-dual pair (P) and (D), including weakduality, strong duality, and complementary slackness.

Theorem (Weak Duality). If x and y are feasible for (P) and (D), respectively, then cTx ≤ bT y.

Proof. Le x and y be feasible, so x ≥ 0 (*) and Ax ≤ b (**), and y ≥ 0 (**) and AT y ≥ c (*).Then it follows that

cTx(∗)≤ (AT y)Tx = yTAx

(∗∗)≤ yT b = bT y

�

Remark. One of the major importances of the weak duality theorem is that it gives both lowerand upper bounds on the optimal objective values.

• Every feasible solution x for (P) gives an upper bound cTx on the optimal objective value of(P), and a lower bound on the optimal objective value of (D).

• Similarly, every feasible solution y for (D) gives a lower bound bT y on the optimal objectivevalue of (D), and an upper bound on the optimal objective value of (P).

3.2. PRIMAL AND DUAL PROBLEMS 35

Corollary. If either (P) or (D) is unbounded, then the respective other problem is infeasible.

Remark. It is possible that both (P) and (D) are infeasible (one example is given in the book).

Theorem (Strong Duality). If either (P) or (D) has an optimal solution x∗ or y∗, respectively,then so does the other and cTx∗ = bT y∗.

Proof. Let x∗ be an optimal solution for (P) with optimal objective value ζ∗ = cTx∗. BecausecTx∗ ≤ bT y for all feasible y by weak duality, we only need to find one feasible y∗ such thatcTx∗ = bT q∗. The example in Section 3.1 may already give you a hint on where to look.In the optimal dictionary for (P) with basis B∗, we have

ζ = ζ∗ +∑

j∈N ∗c∗j xj

Let us again distinguish between original variables xj, j = 1, . . . , n with optimal objective coeffi-cients c∗j (which we set to 0 if j ∈ B∗) and slacks wi, i = 1, . . . , n with optimal objective coefficientsd∗i (which we also set to 0 if i ∈ B∗), so that

ζ = ζ∗ + (c∗)Tx+ (d∗)Tw

where c∗ ≤ 0 and d∗ ≤ 0 by definition and optimality of ζ∗.Claim: y∗ = −d∗ is feasible for (D), and cTx∗ = bT y∗.First, it is clear that y∗ ≥ 0. Then rewrite the above objective function by substituting ζ = cTx,d∗ = −y∗, and w = b−Ax to see that

cTx = cTx∗ + (c∗)Tx+ (−y∗)T (b−Ax)⇔ (c− c∗ −AT y∗)x = cTx ∗ −(y∗)T b

must hold for any x, and hence that

c− c∗︸︷︷︸

≤0

−ATy∗ = 0 ⇒ AT y∗ ≥ c

showing that y∗ is feasible for (D), and

cTx ∗ −(y∗)T b = 0 ⇒ cTx∗ = bT y∗.

�

Remark. The strong duality theory tells us that we do not need to actually solve the dual problem,but can construct the optimal dual solution from either the optimal dictionary or the optimalsimplex tableau.

• The optimal dual variables are the negated optimal cost coefficients of the slack variables,y∗ = −d∗.

• Similarly, the optimal dual slack variables are the negated optimal cost coefficients of theoriginal variables

z∗ = AT y∗ − c = −c∗.

Another very useful result, that we will use often and later revisit when we start to talk abouta second class of solution algorithms referred to as interior-point methods, is a so-called comple-mentarity result.

Theorem (Complementary Slackness). Two feasible solutions (x,w) and (y, z) are optimal for (P)and (D), respectively, if and only if

xjzj = 0 for all j = 1, . . . , n

wiyi = 0 for all i = 1, . . . ,m


Proof. First, from proving weak duality recall that

cTx(∗)≤ (AT y)Tx = yTAx

(∗∗)≤ yT b = bT y

with equality if and only if both (*) and (**) hold with equality. Denote the rows and columns ofthe m× n-matrix A using the convention that

A = [−− aj −−]j=1,...,n and AT = [−− ai −−]i=1,...,m

then it is easy to see (*) and (**) hold with equality if and only if

xj = 0 or aTj y = cj, and yi = 0 or bi = aT

i x

or equivalently, ifxj(a

Tj y − cj) = xjzj = 0 and yi(bi − a

Ti x) = yiwi = 0

for all j = 1, . . . , n and i = 1, . . . ,m. �

In the next section, we now revisit the simplex method and learn how to use the above resultsto facilitate the solution of arbitrary linear programs, especially those for which an initial feasiblesolution is not readily available. In that spirit, we will develop an alternative dual simplex algorithmand two variants of a two-phase primal-dual simplex method.

3.3 The Dual and Primal-Dual Simplex Method

Consider the example LP (same as in Section 5.6 in the book)

maximize − x1 − x2

subject to − 2x1 − x2 ≤ 4

− 2x1 + 4x2 ≤ −8

− x1 + 3x2 ≤ −7

x1, x2 ≥ 0

and write down the corresponding dual program

minimize 4y1 − 8y2 − 7y3

subject to − 2y1 − 2y2 − y3 ≥ −1

− y1 + 4y2 + 3y3 ≥ −1

y1, y2, y3 ≥ 0

Note that the solution x1 = x2 = x3 = 0 is not feasible for (P) so that we would need to use aPhase-I/Phase-II approach to solve the primal problem, whereas the basic solution y1 = y2 = 0is readily feasible for the dual problem. Hence, instead of solving the primal problem it seemsreasonable to solve the dual problem using a Phase-II only. To see how to do this using the simplextableau, let us write down a slightly extended (rather documented) primal-dual simplex tableauwith slack variables w1, w2, and w3 for the primal and z1 and z2 for the dual problem.

x1 x2 1

ζ −1 −1 0 1

w1 −2 −1 4 −y1

w2 -2 4 −8 −y2

w3 −1 3 −7 −y3

−z1 −z2 ξ

3.3. THE DUAL AND PRIMAL-DUAL SIMPLEX METHOD 37

Note that we have only added the negated dual variables together with the dual objective ξ onthe bottom, and negated dual slacks with a new constant term on the right side of the otherwiseunchanged tableau. Hence, we can simply imagine flipping the tableau and using it analogously asbefore the solution of the dual problem. Because of the negated variables, and because we minimizeinstead of maximize, however, we also need to slightly adjust the previous pivoting rules that werederived for the primal problem to work accordingly for the dual problem.

3.3.1 Dual Pivoting Rules and the Dual Simplex Method

Because we imagine pivoting on elements of the dual problem, whose basic variable corresponds tothe nonbasic variable of the primal problem, and similarly, whose nonbasic variables correspond tothe basic variable of the primal problem, different from the previous rules we now choose the enteringvariable first (corresponding to a leaving variable for the dual) and then apply a minimum-ratiotest to decide which variable will enter the primal basis (or leave the basis for the dual problem).

Largest-Coefficient Rule (for dual problem): Choose the leaving variable xl (l ∈ B) withthe most negative (!) right-hand-side/constant term/dual objective coefficient

l = arg maxi∈B{|bi| : bi < 0}

If the current solution is feasible (i.e., the primal objective coefficients, or equivalently, the negateddual basic variables are all negative) and all dual objective coefficients (i.e., all right-hand sidesentries) are nonnegative, then the current solution is both primal and dual feasible and thus optimal.

Minimum-Ratio Test (for dual problem): Choose the entering (!) variable xk (k ∈ N ) withthe smallest positive ratio

cj

ajl

k = arg minj∈N{|cjajl| : ajl < 0}

If the current solution is dual feasible (cj ≤ 0 for all j ∈ N ), bl < 0 for some l ∈ B, and all ajl ≥ 0,then the dual problem is unbounded, then the primal problem is infeasible.

Maximum-Ratio Test (for dual problem): If the current tableau is dual feasible, then theminimum-ratio test can be replaced by an equivalent maximum-ratio test that drops the negativityrestriction on ajl < 0

k = arg maxj∈N{|ajl

cj|}

Do you see how these two ratio tests will choose the same pivot element?

To see how to apply these rules in practice, let us now go back to the LP that we were trying tosolve. First, different from the primal pivoting rules, now we need to start by choosing the leavingvariable and, based on the above largest-coefficient rule for the dual problem, choose w2 as theleaving variable (or equivalently, y2 as the entering variable for the dual problem). Applying theminimum-ratio test, we then see that w2 is replaced by x1, or equivalently, that y2 replaces z1 inthe new tableau

w2 x2 1

ζ −12 −3 −4 1

w1 −1 −5 12 −y1

x1 −12 −2 4 −z1

w3 -12 1 −3 −y3

−y2 −z2 ξ

w3 x2 1

ζ −1 −4 −7 1

w1 −2 −7 18 −y1

x1 −1 −3 7 −z1w2 −2 −2 6 −y3

−y3 −z2 ξ


After one additional pivot as shown above, we obtain the primal-dual feasible and thus optimalsolution

x∗1 = 7, x∗2 = 0, w∗1 = 18, w∗

2 = 6, w∗3 = 0

z∗1 = 0, z∗2 = 4, y∗1 = 0, y∗2 = 0, y∗3 = 1

with an optimal objective value of ζ∗ = ξ∗ = 7. Note how every nonzero basic variable in eitherthe primal or dual problem correspond to a zero nonbasic variable in the respective other problemand vice versa, also being a straightforward consequence of the complementary slackness theoremin Section 3.2.

3.3.2 The Primal-Dual Simplex Method

Let us now look at another problem (see also Example 2.3 in Vanderbei (2008)):

maximize 2x1 − 6x2

subject to − x1 − x2 − x3 ≤ −2

2x1 − x2 + x3 ≤ 1

x1, x2, x3 ≥ 0

Note there both b � 0 and c � 0 and, hence, that both the initial primal and dual dictionary ortableau are initially infeasible. Hence, we can neither apply the (primal) simplex method fromChapter 2 that starts from a (primal) feasible point and works towards (primal) optimality, orequivalently, dual feasibility, nor the dual simplex method from before that analogously starts froma dual feasible point and works towards dual optimality, or equivalently, primal feasibility. Hence,similar to our discussion in Section 2.3 we need to solve this problem using a two-phase approach:

1. Introduce an auxiliary objective function and work towards either primal or dual feasibilityusing the dual or primal simplex method, respectively.

2. While preserving either primal or dual feasibility, work towards optimality, or equivalently,feasibility of the respective other problem.

Hence, any two-phase method can be categorized based on if we first establish primal or dual feasi-bility using the dual or primal simplex method, respectively, giving rise to the two different althoughvery similar dual-phase-I/primal-phase-II and primal-phase-I/dual-phase-II simplex methods. Veryeconomically, both algorithms can be applied using the same advanced simplex tableau

x1 x2 x3 1

ζ 2 −6 0 0 1

−1 −1 −1

w1 −1 -1 −1 1 −2 −y1

w2 2 −1 1 1 1 −y2

−z1 −z2 −z3 ξ

In the above tableau, we have introduced two auxiliary objective functions with unit coefficientsfor both the primal and dual problem, which can be used as dummy feasible solutions for eithera primal or dual phase-I method. Clearly, if we do the arithmetic by hand, then we may want todrop the respective other objective row or column to reduce our work, otherwise - like in the onlinepivot tool which takes care of the arithmetic of us - we can simply ignore any of these two auxiliaryobjectives that are not needed. For illustration, let us now solve the above problem using both ofthese two-phase methods.

3.4. THE GENERAL DUAL 39

Dual-Phase-I/Primal-Phase-II Method In this method, we first apply the dual simplexmethod to establish a primal feasible solution for which all right-hand-side coefficients are non-negative, similar to the Phase-I problem in Section 2.3. Since we initially ignore optimality duringthe phase-I, however, we replace the original objective by an auxiliary (primal optimal/dual feasi-ble) vector, typically the negative unit vector. Then pivoting between w1 as only candidate leavingvariable and x2 as suitable entering variable using the dual pivoting rules in the above tableau anddropping the auxiliary RHS vector, we find the new tableau on the left which is now feasible forthe original primal problem.

x1 w1 x3 1

ζ 8 −6 6 −12 1

0 −1 0

x2 1 −1 1 2 −z2w2 3 −1 2 3 −y2

−z1 −y1 −z3 ξ

x1 w1 w2 1

ζ −1 −3 −3 −3 1

x2 −12 −1

2 −12

12 −z2

x332 −1

212

32 −z3

−z1 −y1 −y2 ξ

Since the tableau on the left is feasible, we can then again remove the auxiliary objective andcontinue with regular primal simplex pivots to find an optimal solution or detect unboundedness. Inparticular, ignoring both the largest-coefficient and Bland’s smallest-index rule (but after cheatingto find the smallest number of necessary pivots and save some typing), we only need one morewell-guessed pivot between w2 and x3 to find the optimal tableau as given above on the right. Notethat this solution is both primal and dual feasible and hence optimal by the strong duality.

Primal-Phase-I/Dual-Phase-II Method Different from the above, now we start with thetableau below on the left which replaces the primal RHS vector, or equivalently, dual objective byan auxiliary primal feasible RHS column, typically the positive unit vector. Now pivoting betweenx1 as only candidate entering variable and w2 as only candidate leaving variable using the (primal)pivoting rules, we find the new tableau below on the right that is then dual feasible, or equivalently,seems primal optimal (if it would be primal feasible).

x1 x2 x3 1

ζ 2 −6 0 0 1

w1 −1 −1 −1 1 −2 −y1

w2 2 −1 1 1 1 −y2

−z1 −z2 −z3 ξ

w2 x2 x3 1

ζ −1 −5 −1 1 1

w112 −3

2 -12

32 −3

2 −y1

x112 −1

212

12

12 −z1

−y2 −z2 −z3 ξ

Two more iterations of the regular dual simplex method, and we will be done.

w2 x2 w1 1

ζ −2 −2 −2 −2 1

x3 −1 3 −2 3 −z3x1 1 -2 1 −1 −z1−y2 −z2 −y1 ξ

w2 x1 w1 1

ζ −3 −1 −3 −3 1

x312

32 −1

232 −z3

x2 −12 −1

2 −12

12 −z2

−y2 −z1 −y1 ξ

3.4 The General Dual

As we have already seen at several places, not every LP is initially formulated as maximizationproblem with less-or-equal inequalities and nonnegativity constraints, and although every LP canbe written in such form, in principle, it will often be convenient to have a set of rules to formulatethe dual problem of an arbitrary primal LP. The following set of rules can be derived relativelyeasily (also see Section 5.8 in Vanderbei (2008)).


Primal Problem (Maximization) Dual Problem (Minimization)

LEQ inequality aTi x ≤ bi nonnegative variable y ≥ 0

(nonnegative slack) (wi = bi − aTi x ≥ 0)

equality constraint aTi x = bi free variable y ≷ 0

(zero slack) (wi = 0)

nonnegative variable x ≥ 0 GEQ inequality aTj y ≥ cj

(nonnegative slack) (zj = aTj y − cj ≥ 0)

free variable x ≷ 0 equality constraint cTj y = cj(zero slack) (zj = 0)

As an example, let us consider the primal (maximization) LP (see Exercise 5.1 in Vanderbei (2008))

maximize x1 − 2x2

subject to x1 + 2x2 − x3 + x4 ≥ 04x1 + 3x2 + 4x3 − 2x4 ≤ 3−x1 − x2 + 2x3 + x4 = 1

x2 , x3 ≥ 0

Each of the three primal constraints (or hidden slacks) will be associated with one new variabley1, y2, and y3, and each of the four original variables x1, x2, x3, and x4 will now be associatedwith a new constraint (or hidden slack) according to the above table. Hence, we may first multiplythe first greater-or-equal inequality by -1 to obtain a less-or-equal inequality (otherwise we couldalso change the sign on y1, see the two equivalent problems below) and then use the above table towrite down the dual minimization problem with two nonnegative variables y1 and y2 (because thefirst two constraints are inequalities), one unrestricted variable y3 (because the third constraint isan equality), two equality constraints (because both x1 and x4 are unrestricted), and two greater-or-equal inequalities (because both x2 and x3 are nonnegative)

minimize 3y2 + y3

subject to −y1 + 4y2 − y3 = 0−2y1 + 3y2 − y3 ≥ −2y1 + 4y2 + 2y3 ≥ 0−y1 − 2y2 + y3 = 0y1 ≥ 0 , y2 ≥ 0

minimize 3y2 + y3

subject to y1 + 4y2 − y3 = 02y1 + 3y2 − y3 ≥ −2−y1 + 4y2 + 2y3 ≥ 0y1 − 2y2 + y3 = 0y1 ≤ 0 , y2 ≥ 0

It will be very useful to develop good “dualing” skills, and save you a lot of time compared toalways rewriting the problem into one specific form. By the way, do you know when and whichfamous mathematician died at the age of 20 in a duel (a real one, because there weren’t any LPduals at that time yet)?

3.5 Problem Set 3

Exercise 3.1. (Mathematical Modeling and Optimization using AMPL) [6 points] Inour previous steel production and transportation models, we have learned how to find an optimalproduction plan for multiple commodities (bands, coils, and plates) at a single mill subject to givenbudget and capacity constraints, and how to ship given quantities of a single commodity (onlycoils) in the least expensive way from several mills to meet given demands of customers at thevarious factories. This week, you will combine the two problems to develop and solve an integratedmulticommodity model of production and transportation (for an isolated multicommodity trans-portation model and some of the relevant data, compare the multi.mod and multi.dat files in theAMPL models folder). Here is some more data, you know what to do (if you are not sure, make thebest modeling decisions you can think of – after all, we know that no model is ever fully correct).


Mill Location time prod. rate (tons/hr) prod. cost ($/ton)(hours) bands coils plates bands coils plates

GARY Gary, Indiana 20 200 140 160 180 170 180CLEV Cleveland, Ohio 15 190 130 160 190 170 185PITT Pittsburgh, Pennsylvania 20 230 160 170 190 180 185

Plant Location demands (tons)bands coils plates

FRA Framingham, Massachusetts 300 500 100DET Detroit, Michigan 300 750 100LAN Lansing, Michigan 100 400 0WIN Windsor, Ontario 75 250 50STL St. Louis, Missouri 650 950 200FRE Fremont, California 225 850 100LAF Lafayette, Indiana 250 500 250

The following table gives the bands/coils/plates shipping costs (the same as in the multi.dat file).

FRA DET LAN WIN STL FRE LAF

GARY 30/39/41 10/14/15 8/11/12 10/14/16 11/16/17 71/82/86 6/8/8CLEV 22/27/29 7/9/9 10/12/13 7/9/9 21/26/28 82/95/99 13/17/18PITT 19/24/26 11/14/14 12/17/17 10/13/13 25/28/31 83/99/104 15/20/20

Exercise 3.2. (Solving LPs using the Primal-Dual and Dual-Primal Two-Phase SimplexMethods) [4 points] Do Exercises 5.8 and 5.9 on page 81 in your text book. For each problem,you may use either the primal-dual or the dual-primal simplex method, whichever you find easier.

Exercise 3.3. (Pivoting Challenge) [6 points] Do Exercises 5.10 and 5.11 on the same page(page 81) using the same settings as on Assignment 2 (4 rows and columns, seed 0909, five problems).

Exercise 5.10: http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/dp2phase.htmlExercise 5.11: http://campuscgi.princeton.edu/∼rvdb/JAVA/pivot/pd2phase.html

Exercise 3.4. (Diet Problem) [4 points] Do Exercise 5.16 on pages 85/86 in your text book.

http://campuscgi.princeton.edu/~rvdb/JAVA/pivot/dp2phase.html

http://campuscgi.princeton.edu/~rvdb/JAVA/pivot/pd2phase.html

Chapter 4

Simplex Variants

We have already seen that there is not a unique simplex algorithm, but a class of simplex methodsall of which follow the similar scheme of starting from an arbitrary (feasible or infeasible) point,doing pivots and taking steps that work simultaneously or sequentially towards primal and dualfeasibility, and optimality. Using the duality concepts we learned in the previous chapter, we cannow continue this undertaking on a slightly higher level, and at the same time point to some of thepractical implementation issues if we were to write our own Simplex solver.

4.1 The Simplex Method in Matrix Notation

After suffering through some extensive dictionary and tableau updates, you are probably wonderingif there is no other way to facilitate the organization of the Simplex method. You may also haveobserved that many new dictionary coefficients and tableau entries seem to be computed andupdated multiple times but not really used for the actual pivoting decisions. In fact, in this sectionwe will see that all we really need to keep track of is the current basis B, i.e., the set of dependentvariable xi with i ∈ B that are currently serving as basic variables. At the same time, we will developanother cool relationship between primal and dual problems (if you are not sure how “cool” thatreally could be, at least you will find it useful in later chapters).

We begin with our usual LP problem, but to simplify notation later (actually very soon) weshall make things slightly less pretty now and add several “snakes” (tildes):

(P) maximize cT x

subject to Ax ≤ b

x ≥ 0.

And immediately (who would have guessed that soon is that soon), we define a new “undecorated”problem by setting

A =[

A I]∈ Rm×(n+m), c =

[c0

]

∈ Rn+m, x =

[xw

]

∈ Rn+m

as our constraint matrix, objective vector, and decision variable and obtain the equivalent (standardfrom) LP formulation with equality constraints:

(P) maximize cTx

subject to Ax = b

x ≥ 0

Now let B be a basis and write A =[B N

]where B ∈ Rm×m is the square submatrix of A

that consists of m linear independent columns aj with j ∈ B (the linear independence assumptionguarantees that B is invertible, which we will justify and use in just a moment), and N ∈ Rm×n is

42

4.1. THE SIMPLEX METHOD IN MATRIX NOTATION 43

the m×n submatrix of A with columns aj and j ∈ N . Similarly, we also write c = [cB, cN ]T ∈ Rm+n

and x = [xB, xN ]T ∈ Rm+n so that

cTx =n+m∑

j=1

cjxj =∑

j∈B

cjxj +∑

j∈N

cjxj = cTBxB + cTNxN

(Ax)i =

n+m∑

j=1

aijxj =∑

j∈B

aijxj +∑

j∈N

aijxj = (BxB)i + (NxN )i

for all i = 1, . . . ,m. Next, if we were to (and we will) write the associated dictionary, we need toexpress the basic variable xB in terms of the independent nonbasic vector xN . This means that wemust solve the linear system of equations Ax = BxB+NxN = b for xB , which is possible if and onlyif the matrix B is invertible (note that we implicitly ensured this condition by choosing suitablepivots in each step of our Simplex method). In this case, we can compute xB = B−1b−B−1NxNso that the resulting dictionary is given by

ζ = cTx = cTBxB + cTNxN = cTB(B−1b−B−1NxN ) + cTNxN

= cTBB−1b+ (cTN − c

TBB

−1N)xN

xB = B−1b−B−1NxN

As a side note, these expressions finally reveal the “mystery” of the bar-notation we have used sofar; specifically, we now understand that

ζ = cTB−1b, [cj ] = cN − (B−1N)T cB, [bi] = B−1b, [aij] = B−1N.

We continue as always (but faster): set the nonbasic vector xN to zero, and get the basic variablesand current objective value for any (!) given basis B as

(x∗B, x∗N ) = (B−1b, 0) and ζ∗ = cTBx

∗B = cTBB

−1b.

From Chapter 3, we already know that the negated basic primal values x∗B correspond to theobjective coefficients for the nonbasic dual variables z∗N , and that the negated objective coefficientsfor the nonbasic primal variables x∗N correspond to the basic dual values z∗B (we could have used yinstead of z but follow the notation in your book - remember that the complementary dual slackvariable of x is z, whereas the complementary primal variable of y is the slack w). In any case, wecan write down the corresponding dual dictionary

−ξ = −cTBB−1b− (B−1b)T zB

zN = (B−1N)T cB − cN + (B−1N)T zB

or equivalently, with z∗B = 0 and z∗N = (B−1N)T cB − cN , the symmetric primal-dual pair

ζ = ζ∗ − (z∗N )TxN

xB = x∗B −B−1NxN ,

−ξ = −ζ∗ − (x∗B)T zB

zN = z∗N + (B−1N)T zB.

It is not difficult to see that these two dictionaries are exact analogons of each other, based on thefollowing relationship between the matrices −B−1N and (B−1N)T .

Theorem (Negative Transpose Property). Let (P) and (P) with A =[

A I]∈ Rm×(n+m), b ∈ Rm,

c =[c 0

]T∈ Rn, and x =

[x w

]T∈ Rn be as before, and define A =

[

−I AT]∈ Rn×(n+m),

b =[0 b

]T∈ Rn+m, and y =

[z y

]T∈ Rn+m so that

(D) minimize bT y

subject to AT y = c

y ≥ 0

44 CHAPTER 4. SIMPLEX VARIANTS

If A =[N B

]and A =

[

B N], then

B−1N = −(B−1N)T .

Proof. Note that AAT =[A I

][−I

AT

]

= −A+ A = 0 and hence

AAT =[N B

][BT

NT

]

= NBT +BNT = 0

Now multiplying this equation with B−1 from the left and with B−1 from the right, we get

B−1N = B−1NBT (BT )−1 = −B−1BNT (BT )−1 = −(B−1N)T �

As another consequence of the above theorem, we also observe that the matrix coefficients inboth the primal and the dual dictionary are always the same for every basic (not only the optimal)primal and dual solution, when defined using the same basis B. Without repeating the discussionfrom earlier sections, Figure 4.1 shows a full version of the corresponding primal and/or dual simplexalgorithms in matrix notation.

To see that the notation is much worse than the actual work (this is encouraging but notnecessarily true if you do the work by hand!), let us illustrate this algorithm by the example

max 5x1 + 4x2 + 3x3

s.t. 2x1 + 3x2 + x3 + x4 = 5

4x1 + x2 + 2x3 + x5 = 11

3x1 + 4x2 + 2x3 + x6 = 8

x1, x2, x3, x4, x5, x6 ≥ 0.

As before, we choose the slack variables as our initial basic variables, so that xB = (x4, x5, x6)T =

(5, 11, 8)T and B = {4, 5, 6}, and xN = (x1, x2, x3)T = (0, 0, 0)T and N = {1, 2, 3}. Next, we

decompose A and c accordingly into

B =

1 0 00 1 00 0 1

, N =

2 3 14 1 23 4 2

, cB =

000

, and cN =

543

and then compute

x∗B = B−1b = b =

5118

≥ 0 and z∗N = (B−1N)T cB − cN = −cN =

−5−4−3

� 0.

This shows that our initial solution is primal feasible, but not dual feasible and thus not yet optimal(meaning the notational mess starts here). We continue to pick the entering variable by choosingk = arg min{z∗1 = −5, z∗2 = −4, z∗3 = −3} = 1 (note that this is the index of the minimum element!),so that x1 will enter the basis B. Next, we compute

∆xB = B−1Ne1 =

2 3 14 1 23 4 2

100

=

243

and then apply the minimum-ratio test to select the corresponding leaving variable index:

l = arg min

{x∗4

∆x4=

5

2,x∗5

∆x5=

11

4,x∗6

∆x6=

8

3

}

= 4.

4.1. THE SIMPLEX METHOD IN MATRIX NOTATION 45

(B,N ) x∗B = B−1b, z∗N = (B−1N)T cB − cN

x∗B ≥ 0z∗N ≥ 0

x∗B ≥ 0z∗N � 0

x∗B � 0z∗N ≥ 0

x∗B � 0z∗N � 0

OPTIMAL PRIMAL FEASIBLE DUAL FEASIBLE F!&%?#

k ∈ arg minj∈N {z∗j < 0} l ∈ arg mini∈B{x

∗i < 0}

Increase xN = [0 · · · t · · · 0]T = tekin xB = x∗B − (B−1N)xN

Increase zB = [0 · · · s · · · 0]T = selin zN = z∗N + (B−1N)T zB

∆xB = B−1Nek

l ∈ arg mini∈B{x∗

i

∆xi: ∆xi > 0}

∆zN = −(B−1N)T el


k ∈ arg minj∈N {z∗j

∆zj: ∆zj > 0}

∆xB = B−1Nek

x∗k ← t =x∗

l

∆xl, x∗B ← x∗B − t∆xB, z

∗l ← s =

z∗k

∆zk, z∗N ← z∗N − s∆zN

B ← B \ {l} ∪ {k}, N ← N \ {k} ∪ {l}

Figure 4.1: The Simplex Algorithm in Matrix Notation

This means that x4 will leave the basis and be replaced x1. To update all other variables, it nowsuffices to compute

∆zN = −(B−1N)T e4 = −

2 4 33 1 41 2 2

100

=

−2−3−1

and set

x∗1 ← t =x∗4

∆x4=

5

2= 2.5 x∗B ←

5118

− 2.5

243

=

01

0.5

z∗4 ← s =z∗1

∆z1=−5

−2= 2.5 z∗N ←

−5−4−3

− 2.5

−2−3−1

=

03.5−0.5

� 0.

Note that x∗B = (x∗1, x∗5, x

∗6) = (2.5, 1, 0.5)T ≥ 0 has remained positive, so that we are still primal

feasible with a new basis B ← {1, 5, 6} (at least some indication that we did this correctly).


However, also note that z∗N = (z∗2 , z∗3 , z

∗4) = (3.5,−0.5, 2.5)T � 0 still has a negative component for

N ← {2, 3, 4}, so that we are not yet dual feasible and thus still not optimal (so we need to do thisagain???). Yes, we continue and without choice let k = 3 and x3 be the next entering variable (hereremember that k is the index of the nonbasic variable and not its position in the set N ; in fact, theindices in N can be ordered arbitrarily as long as we keep track of the correct associations). Asbefore, we first update

∆xB = B−1Ne3 =

2 0 04 1 03 0 1

−1

3 1 11 2 04 2 0

010

=

12 0 0−2 1 0−3

2 0 1

122

=

12012

so that t = min{2.50.5 ,

10 ,

0.50.5} = 1 indicating that x6 will leave the basis (l = 6) – check the details if

you arenot sure. Speeding through the minimum-ratio test, we compute

∆zN = −(B−1N)T e6 = −

32 −4 −1

212 0 1

212 −2 −3

2

001

=

12−1

232

and s =z∗3

∆z3= −0.5

−0.5 = 1. Finally, all that is left to do is to update our new solutions

x∗3 = 1 x∗B =

x∗1x∗5x∗6

=

2.51

0.5

− 1

0.50

0.5

=

210

z∗6 = 1 z∗N =

z∗2z∗3z∗4

=

3.5−0.52.5

− 1

0.5−0.51.5

=

301

all of which are now greater or equal than zero (hooray!) and thus together with x∗2 = x∗4 = z∗1 =z∗5 = 0 optimal with an objective value of ζ∗ = 13.

4.2 Sensitivity and the Parametric Simplex Method

In Chapter 3, we had introduced the dual variables as shadow prices associated with each primalconstraint and discussed their relationships to changes in their respective right-hand sides. In thissection, we continue the investigation of problems for which the data may change or depends onsome unknown parameter.

Example. A U.S. steel mill produces two similar product variants for the sale in the U.S. and inCanada. The production and profit data for these two variants is given as follows.

profit ($/tons) production rates (tons/hr) demand (max tons)

Variant B (U.S.) 25 200 6,000Variant C (CAN) 30 140 4,000


Let us assume that the two profits are given in U.S. dollars and Canadian dollars, respectively, andlet µ be the USD/CAD exchange rate so that 1 CAD = µ USD (µ = 0.935 as of September 29,2009). Then we can formulate our (by now well-studied) optimization problem as

maximize 25xB + 30µxC

subject to1

200xB +

1

140xC + w1 = 40

0 ≤ xB + w2 = 6000

0 ≤ xC + w3 = 4000

Solving this problem by hand or using AMPL, the optimal solution is (x∗B , x∗C) = (6000, 1400) with

slacks (w∗1, w

∗2, w

∗3) = (0, 0, 2600) and an optimal profit of $188,410 (USD).

4.2. SENSITIVITY AND THE PARAMETRIC SIMPLEX METHOD 47

In the above example, we see that the optimal solution has not changed, and only the profit hasbecome slightly less because of the decrease in Canadian profits over U.S. profits. In particular,with a current exchange rate µ < 1, our Canadian profit rates of 30µ$ tons · 140 tons/hour =4200$/hour (USD) are still less than our U.S. profit rates of 25 · 200 = 5000$/hour (USD) sothat we keep production of the U.S. variant at its maximal level of 6000. However, if µ wereto increase, then Canadian profit rates were eventually meet or exceed U.S. rates, namely whenµ ≥ 5000/4200 = 1.19 (here is some trivia: the highest historical value of µ is 1.09 on November7, 2007). Hence, if µ > 1.19, then our optimal solution would change to xC = 4000 and xB =200 ·(40−4000/140) = 2285.71. This analysis of the unknown or variable parameter µ is commonlyreferred to as ranging.

4.2.1 Ranging

The question that we want to answer now is the following: “By how much can we change a problem(i.e., b or c) before changing the optimal solution i.e., the optimal basis)?” To study this question,let (x∗B , x

∗N ) = (B−1b, 0) and (z∗B , z

∗N ) = (0, (B−1N)T cB − cN ) be the optimal primal and dual

solutions. If we now change c by t∆c, then we see that (x∗B, x∗N ) always stays primal feasible

(because x∗B does not depend on c) and that z∗N changes by t∆zN = t[(B−1N)T ∆cB−∆cN ], whichremains feasible if and only if z∗N + t∆zN remains nonnegative, or equivalently, if

t

≥ −z∗j

∆zjfor all j ∈ N : ∆zj > 0

≤ −z∗j

∆zjfor all j ∈ N : ∆zj < 0

Hence, since by strong duality any primal-dual feasible solution is optimal, we conclude that thecurrent solution stays optimal for all t in the interval

maxj∈N

{

−z∗j

∆zj: ∆zj > 0

}

≤ t ≤ minj∈N

{

−z∗j

∆zj: ∆zj < 0

}

.

Similarly, if we change b by t∆b, then z∗N is still dual feasible (because z∗N does not depend on b)and x∗B changes by t∆b = tB−1∆b which remains feasible and, thus, optimal for all t in the interval

maxi∈B

{

−x∗i

∆xi: ∆xi > 0

}

≤ t ≤ mini∈B

{

−x∗i

∆xi: ∆xi < 0

}

.

Example. The optimal basic variables for our initial example problem are xB , xC , and w3 so that

B =

1200

1140 0

1 0 00 1 1

, N =

1 00 10 0

, cB =

25300

, and cN =

[00

]

and

x∗B = B−1b =

0 1 0140 −0.7 0−140 0.7 1

4060004000

=

600014002600

z∗N =(B−1N

)TcB − cN =

0 1 0140 −0.7 0−140 0.7 1

1 00 10 0

T

25300

=

[4200

4

]

.

1. First, let ∆c =[0 1 0

]T(we change the profit coefficient of the second basic variable xC)

so

∆z∗N = (B−1N)T ∆cB −∆cN =

[0 140 −1401 −0.7 0.7

]

010

=

[140−0.7

]


and we directly find the ranging interval as

−4200

140= −30 ≤ t ≤ −

4

−0.7= 5.71.

Note that 30µ = 30 + 5.71 = 35.71 implies that µ = 35.7130 = 1.19 as we found before. Similar,

note that for a Canadian profit change of -$30, our Canadian sales would not give us any profitanymore so that we would stop production of our Canadian variant yielding w1 = 10 free hourson the rolling mill and the new optimal solution (x∗B , x

∗C , w

∗1, w

∗2 , w

∗3) = (6000, 0, 10, 0, 400).

2. Next, we are interested to see what changes in U.S. demand would cause a change in theoptimal solution. For that purpose, let ∆b =

[0 1 0

](note that the first entry correspond

to the time availability on the rolling mill, and the second and third entries to the U.S. andCanadian demand, respectively) so that

∆x∗B = B−1∆b =

0 1 0140 −0.7 0−140 0.7 1

010

=

1−0.70.7

and hence

max

{

−6000

1,−

2600

0.7

}

= −3714.29 ≤ t ≤ −1400

−0.7= 2000

Here note if the U.S. demand increases to (or beyond) 6000 + 2000 = 8000 tons, thenwe would increase our U.S. production accordingly up to a maximum of 40 hours * 200tones/hour = 8000 tons while simultaneously decreasing Canadian production until reachingzero. Clearly, after that any further increase is prevented by our time constraint on therolling mill. Although the (numerical) solution will steadily adjust itself, the optimal basiswill remain the same until reaching the solution (x∗B , x

∗C , w

∗1 , w

∗2, w

∗3) = (8000, 0, 0, 0, 4000) at

which Canadian production drops to zero. Note that this solution is degenerate and thatthere are three possible optimal bases, based on if xC , w1 or w2 joins xB and w3 in the basis.Similarly, to understand what happens if the U.S. demand falls to 6000 − 3714.29 = 2285.71tons, recall from before that this production level corresponds to the case where Canadianproduction can reach its maximum level of 4000 tons, yielding the new optimal (and againdegenerate) solution (x∗B , x

∗C , w

∗1, w

∗2 , w

∗3) = (2285.71, 4000, 0, 0, 0).

4.2.2 The Homotopy Method

We now learn how to use the above ranging mechanism to solve linear program that are initiallyprimal and dual infeasible. The name homotopy derives from the Greek words homos (engl. same)and topos (engl. place) and when used in math describes the “continuous deformation” of one objectto another (group, set, function, problem). You may be familiar with the very similar concept ofa homomorphism, that derives from the Greek words homos and morphe (engl. shape). We willdemonstrate this method on the same example we solved before in Section 3.3.2 using both theprimal-dual and dual-primal two-phase simplex methods.


subject to − x1 − x2 − x3 ≤ −2

2x1 − x2 + x3 ≤ 1

x1, x2, x3 ≥ 0

First, let us re-write the above problem as the parametrized dictionary

ζ = −(−2 + µ)x1 − (6 + µ)x2

x4 = −2 + µ+ x1 + x2 + x3

x5 = 1 + µ− 2x1 + x2 − x3

4.2. SENSITIVITY AND THE PARAMETRIC SIMPLEX METHOD 49

which is primal-dual feasible and, thus, optimal for all µ ≥ 2. The homotopy method now repeatedlydecreases the range of permissible µ until µ = 0 is permissible, and makes either a primal or dualpivot as soon as further reduction by µ is prevented by the requirement to remain feasible. Hence,for the above dictionary we see that once µ falls below 2, both the objective coefficient of x1

becomes positive (suggesting a primal pivot on x1) and the current value for x4 becomes negative(suggesting a dual pivot on x4). Choosing the former (you will start with the later as part of yournext homework assignment) and increasing x1 to enter the basis, we see that only x5 decreases andthus will leave the basis yielding

x1 =1

2(1 + µ) +

1

2x2 −

1

2x3 −

1

2x5

x4 = −2 + µ+

(1

2(1 + µ) +

1

2x2 −

1

2x3 −

1

2x5

)

+ x2 + x3

= −3

2+

3

2µ+

3

2x2 +

1

2x3 −

1

2x5

ζ = −(−2 + µ)

(1

2(1 + µ) +

1

2x2 −

1

2x3 −

1

2x5

)

− (6 + µ)x2

= 1−1

2µ−

1

2µ2 − (5 +

3

2µ)x2 − (1−

1

2µ)x3 − (1−

1

2µ)x4

Clearly, this new dictionary is optimal as long as all objective coefficients are nonpositive and allcurrent basic variables nonnegative, so

1

2(1 + µ) ≥ 0 ⇔ µ ≥ −1 −

3

2+

3

2µ ≥ 0 ⇔ µ ≥ 1

5 +3

2µ ≥ 0 ⇔ µ ≥ −

10

31−

1

2µ ≥ 0 ⇔ µ ≤ 2

and 1 ≤ µ ≤ 2 where a further reduction is prevented by the current value of x4, now suggesting thedual pivot on x4 to leave the basis. Applying the dual minimum-ratio test with a value of µ = 1,we find that min{6.5

1 ,0.50.5} = 1 and hence choose x3 as new entering variable.

x3 = 3− µ− 3x2 + 2x4 + x5

x1 =1

2(1 + µ) +

1

2x2 −

1

2(3− µ− 3x2 + 2x4 + x5)−

1

2x5

= −1 + µ+ 2x2 − x4 − x5

ζ = 1−1

2µ−

1

2µ2 − (5 +

3

2µ)x2 − (1−

1

2µ)(3− µ− 3x2 + 2x4 + x5)− (1−

1

2µ)x4

= −2 + 2µ− µ2 − (2 + 3µ)x2 − (2− µ)x4 − (2− µ)x5

Similar to before, we find that this new dictionary is primal-dual feasible and, thus, optimal as longas

−1 + µ ≥ 0 ⇔ µ ≥ 1 3− µ ≥ 0 ⇔ µ ≤ 3

2 + 3µ ≥ 0 ⇔ µ ≥ −2

32− µ ≥ 0 ⇔ µ ≤ 2

or 1 ≤ µ ≤ 2 (note that we have not made any measurable improvement, but as we now already,things like that may happen). Not giving up, we continue with another dual pivot between x1 and


x2 (why?) to find

x2 =1

2−

1

2µ+

1

2x1 +

1

2x4 +

1

2x5

x3 = 3− µ− 3

(1

2−

1

2µ+

1

2x1 +

1

2x4 +

1

2x5

)

+ 2x4 + x5

=3

2−

3

2µ−

3

2x1 +

1

2x4 −

1

2x5

ζ = −2 + 2µ− µ2 − (2 + 3µ)

(1

2−

1

2µ+

1

2x1 +

1

2x4 +

1

2x5

)

− (2− µ)x4 − (2− µ)x5

= −3 +9

2µ+

1

2µ2 − (1 +

3

2µ)x1 − (3 +

1

2µ)x4 − (3 +

1

2µ)x5

which now is optimal for

1

2−

1

2µ ≥ 0 ⇔ µ ≤ 1

3

2−

3

2µ ≥ 0 ⇔ µ ≤ 1

1 +3

2µ ≥ 0 ⇔ µ ≥ −

2

33 +

1

2µ ≥ 0 ⇔ µ ≥ −6

and, in particular, for µ = 0. Hence, we have found the optimal solution (x∗1, x∗2, x

∗3) = (0, 0.5, 1.5)

with the optimal objective value ζ∗ = −3.

There are several nice properties of the homotopy method.

1. Note that we can simplify the method a little bit by adding the parameter µ only to thoseterms corresponding to initially negative primal and dual variables (x4 = z1 = −2 in theabove problem). You can make this change when you re-solve the above problem startingwith a dual pivot on x4.

2. Instead of adding the same parameter µ to the coefficient of each negative variable, we canalso add ρµ where ρ is a positive random number that is different for each occurrence of µ.Then, with probability 1 (a.s. / almost surely), this method will never encounter a degeneratepivot. For more information, please read the additional discussion offered in your text book!

4.2.3 The (Parametric) Self-Dual Simplex Algorithm

The following algorithm summarizes the homotopy method and is also known as the (parametric)self-dual simplex method.

4.3 The Primal Simplex Method with Ranges

Finally, we discuss one variant of the simplex method that makes explicit use of box constraints, orranges on either constraints or variables. Let’s look at the following example, which correspondsto our initial (rewritten) production problem from Section 2.1 with minimum production levels of2000 tons of both bands and coils, respectively.

maximize 5x1 + 6x2

subject to 7x1 + 10x2 ≤ 56

2 ≤ x1 ≤ 6

2 ≤ x2 ≤ 4

Note that our previously optimal solution (x1, x2) = (6, 1.4) violates the lower bound on x2 and,thus, is not feasible anymore for this modified problem. To solve this problem, we start by setting

4.3. THE PRIMAL SIMPLEX METHOD WITH RANGES 51

ζ = ζ∗ − z∗TN xNxB = x∗B −B

−1NxNwhere

x∗B = B−1bz∗N = (B−1N)T cB − cNζ∗ = cTBx

∗B = cTBB

−1b

ζ = ζ∗ − (z∗N + µzN )TxNxB = (x∗B + µxB)−B−1NxN

where xB > 0 and zN > 0 (random)

µ∗ = min{µ : z∗N + µzN ≥ 0 and x∗B + µxB ≥ 0} OPTIMAL

Primal Pivot:∆xB = B−1Nek

l ∈ arg mini∈B

{x∗

i +µ∗xi

∆xi: ∆xi > 0

}


Dual Pivot:∆zN = −(B−1N)T el

k ∈ arg minj∈N

{z∗j +µ∗zj

∆zj: ∆zj > 0

}

∆xB = B−1Nek

x∗k ← t =x∗

l

∆xl, xk ← t = xl

∆xl, z∗l ← s =

z∗k

∆zk, zl ← s = zk

∆zk

x∗B ← x∗B − t∆xB, xB ← xB − t∆xB, z∗N ← z∗N − s∆zN , zN ← zN − s∆zN

if µ∗ ≤ 0

if z∗k + µ∗zk = 0 if x∗l + µ∗xl = 0

Figure 4.2: The (Parametric) Self-Dual Simplex Method

each variable to its lower bound of 2 and then write the following dictionary with ranges

l 2 2u 6 4

ζ = 5x1 + 6x2 = 22

0 56 w = 7x1 + 10x2 = 34

Note that w does not act as a slack variable anymore, but now equals the actual constraint withits own lower and upper bound of 0 and 56, respectively. Since its current value of 22 falls belowthese two bounds, the current solution is feasible, and the current values of the nonbasic variablesx1 and x2 are indicated by putting a box around the current bound that is active.

The following rationale to update this new dictionary is very similar to our previous consider-ations. Namely, because both x1 and x2 can be further increased (as they are currently at theirlower bounds) and because we maximize, we choose an entering variable so that its associatedobjective coefficient is positive, then yielding an increase in our objective function. In this case, wecan choose either one, so let us start with x1. Since we need to ensure that the resulting value ofw stays between its two bounds, we can compute that

w = 7x1 + 10x2 = 7x1 + 20 ≥ 0 ⇔ x1 ≥ −20

7= −2.857

w = 7x1 + 10x2 = 7x1 + 20 ≤ 56 ⇔ x1 ≤36

7= 5.143

so that w leaves the basis (you may think this was obvious because there was only one constraint,


but you just wait!) and the new dictionary becomes

l 0 2

u 56 4

ζ = 57w − 8

7x2 = 2647 = 37.714

2 6 x1 = 17w − 10

7 x2 = 367 = 5.143

Clearly, this new solution is feasible as 5.143 lies in the feasible interval for x1. In fact, this solutionis optimal, as the nonbasic variable w whose objective coefficient is positive is already at its upperbound, whereas the objective coefficient of x2, which is currently at its lower bound, is negativeso that any further increase in x2 would only decrease our objective. The interpretation of thissolution in the context of our steel production problem should be clear, as the profit rates havenot changed so that we like to produce as many tons of bands as possible while only meeting theminimum demand of coils.

Now let us look at the case where we would have decided to initially increase x2, possible“mislead” by its larger objective coefficient compared to that of x1. Again ensuring that w doesnot violate its lower and upper bounds, in this case we would have found that

w = 7x1 + 10x2 = 14 + 10x2 ≥ 0 ⇔ x2 ≥ −14

10= −1.4

w = 7x1 + 10x2 = 14 + 10x2 ≤ 56 ⇔ x2 ≤42

10= 4.2

which again indicates that w left the basis if we increased x2 to 4.2. However, note that this upperlimit on x2 exceeds its own upper bound of 4, and hence we can only increase x2 to a maximumof 4 leaving w at a positive value strictly between 0 and 56 and, thus, as a basic variable. Theupdated dictionary then looks as follows

l 2 2

u 6 4

ζ = 5x1 + 6x2 = 34

0 56 w = 7x1 + 10x2 = 54

where x2 is now at its upper bound while both objective coefficients are unchanged and still positive,in particular. However, now we can only increase x1 and following the same steps as before, wefind that

w = 7x1 + 10x2 = 7x1 + 40 ≥ 0 ⇔ x1 ≥ −40

7= −5.714

w = 7x1 + 10x2 = 7x1 + 40 ≤ 56 ⇔ x1 ≤16

7= 2.286

which is still below its upper bound. Hence, we can update the dictionary using a regular pivot toobtain

l 0 2

u 56 4

ζ = 57w − 8

7x2 = 2487 = 35.429

2 6 x1 = 17w − 10

7 x2 = 167 = 2.286

Because w is now at its upper bound, we cannot further increase the objective value using w in spiteof its positive objective coefficient. However, note that the objective coefficient of x2 has becomenegative, which means that we can increase the objective if we can decrease the current value ofx2. Sure enough, with x2 at its upper this is possible, and to guarantee that x1 stays within itsbounds we compute

x1 =1

7w −

10

7x2 = 8−

10

7x2 ≥ 2 ⇔ x2 ≤

42

10= 4.2

x1 =1

7w −

10

7x2 = 8−

10

7x2 ≤ 6 ⇔ x2 ≥

14

10= 1.4


Hence, in order to increase the objective as far as possible, we wish to decrease to x2 to 1.4 butagain remember that x2 also has to meet its own lower bound of 2, so that we do not pivot butmerely adjust the value from of x2 finally resulting in the same optimal dictionary we had alreadyfound above.

Remark. The book by Vanderbei (2008) also describes a Phase-I method using the Simplex Methodwith Ranges in Section 9.2. Although we will not cover this section in class, you may (and should)read it to satisfy your mathematical curiosity and for general pleasure!

Important Note: Everything up to and including Section 4.2 will be part of our in-class, closed-notes and closed-book midterm exam (i.e., everything in the notes but the simplex method withranges)!

4.4 Problem Set 4

Exercise 4.1. (Mathematical Modeling and Optimization using AMPL) [6 points] Basedon recent market developments that have led to increasing prices in steel bands and coils, our steelcompany decides to stop producing plates and returns to the production of bands and coils only.Because it anticipates that a major repair of its rolling mill three weeks from now will reduce theavailable time from the regular 40 hours to only 32 hours during that week, the manager decides tostart building an inventory to benefit from increasing prices and to reduce the potential profit losscaused by the mill repair. Production rates (in tons per hour) and costs (in dollars per ton), initialinventory (in tons) and its carrying cost (in dollars per ton of inventory), and revenues (in dollarsper ton sold) and demands (limit on tons that can be sold) for the next four weeks are given below.

production inventory revenue demandrate cost initial cost 1 2 3 4 1 2 3 4

bands 200 10 10 2.5 25 26 27 27 6000 6000 4000 6500coils 140 11 0 3 30 35 37 39 4000 2500 3500 4200

Formulate an AMPL model to determine the optimal amounts of bands and coils to be produced,sold, and taken or put into inventory over a given time period of T weeks in order to maximize thesteel company’s overall profit. Your model should include that the total number of rolling hoursused by all products may not exceed the total number of hours available in each week, and thatthe sum of tons produced and taken from inventory must equal the sum of tons sold and put intoinventory for each product. To index your parameters and variables over time, you may eitherdeclare a set WEEKS in your model, define set WEEKS := 1 2 3 4 in your data file and then writew in WEEKS, or alternatively declare param T, define param T := 4, and then write t in 1..T.Using the above data, write a data file for this specific instance (with the mill repair in week three)and solve it. In your solution, clearly show your optimal decision variables and the maximal profit.

Exercise 4.2. (Sensitivity and Shadow Prices in AMPL) [6 points] Work Exercise 1-3 onpage 22 of Chapter 1 in the AMPL book (still posted on blackboard - read it again if necessary).

Exercise 4.3. (The Parametric Self-Dual Simplex Method) [4 points] Solve the followingLP using the parametric self-dual simplex method in matrix notation. Set xB and zN so to add µonly where necessary and to start with a first dual pivot on the slack variable of the first constraint.


subject to −x1 − x2 − x3 ≤ −22x1 − x2 + x3 ≤ 1x1 , x2 , x3 ≥ 0

Exercise 4.4. (The Simplex Method with Ranges) [4 points] Do Exercise 9.2 on page 160of Chapter 9 in your text book. Apply the regular simplex method with ranges starting with x1


and x4 as the initial basic variables (to get the corresponding feasible dictionary, solve the secondequation for x4 and substitute into the first equation). Clearly show your work and indicate anyswitch between a nonbasic variable’s lower and upper bound as well as any regular (primal) pivot.

Part II

LP Applications

55

Chapter 5

Convex Analysis

This section on convex analysis is very similar to Chapter 10 in Vanderbei (2008), so that we merelypresent some of the main results that were discussed in class. The standard reference on this topicis Rockafellar (1970), which you will probably run into again sooner or later in your mathematicalcareer.

Definition. 1. Given a finite set of points z1, z2, . . . , zn ∈ Rm, a point z ∈ Rm is called a (strict)convex combination of these points if z =

∑nj=1 tjzj where tj ≥ 0 (tj > 0) for each j and

∑nj=1 tj = 1.

2. A set S ⊆ Rn is called convex if for any two points x and y in S and 0 ≤ t ≤ 1, the pointtx+ (1− t)y also belongs to S.

Theorem. A set C is convex if and only if it contains all convex combinations of points in C.

Proof. First, if a set contains all convex combinations of its points, then especially all convexcombinations of pairs so that it is convex, in particular. We prove the opposite direction byinduction on the number of points, so let C be convex and thus contain all convex combinationsof pairs (n = 2). Then assume that C contains all convex combinations of at most n points, so∑n

j=1 tjzj ∈ C for any tj ≥ 0 with∑n

j=1 tj = 1 and zj ∈ C, and let∑n+1

j=1 tj = 1 with tj ≥ 0 and

z1, z2, . . . , zn+1 ∈ C. To show that z =∑n+1

j=1 tj zj ∈ C, pick any tk < 1 so that∑

j 6=k tj = 1tk > 0and write

z =

n+1∑

j=1

tj zj = tkzk +∑

j 6=k

tj zj = tkzk + (1− tk)

∑

j 6=k tj zj∑

j 6=k tj= tkzk + (1− tk)

∑

j 6=k

(

tj∑

j 6=k tj

)

zj

which belongs to C as the second term corresponds to a point in C as convex combination of npoints, so that z beongs to C as convex combination of pair. By induction, the proof is complete. �

Definition. Given any set S ⊂ Rm, the intersection of all convex sets containining S is called theconvex hull of S and denoted by conv S:

conv S =⋂

{C ⊇ S : C is convex}

The above definition implies that the convex hull of a set S is the smallest convex set thatcontains S. With this intution, we can easily prove the following result (this result was not provenin class, where we decided that it is “trivial”).

Theorem. A set C = conv S if and only if C contains all convex combinations of points in S:

C =

z =

n∑

j=1

tjzj : tj ≥ 0,n∑

j=1

tj = 1, zj ∈ S

.

56

57

Although the proof is not difficult, it is quite instructive - so here it is.

Proof. The proof goes in two parts and shows that for C as defined above, C ⊆ conv S andconv S ⊆ C. For the former inclusion, we recall the definition of convS as the intersection (orsmallest) of all convex sets that contain S, so that it sufficies to show that C ⊆ C for any convexset C that contains S. In particular, because C is convex and C ⊇ S, it must contain all convexcombinations of points in S (which is exactly the set C), so that C ⊆ C and hence C ⊆ conv S. Forthe second inclusion, we need to show that conv S ⊆ C, i.e., we need to show that C is convex andcontains S. For the former, let x and y in C, so x =

∑n1

j=1 tjzj and y =∑n2

j=n1+1 tjzj with suitabletj ≥ 0 and zj ∈ S, and show that tx+ (1− t)y ∈ S for any 0 ≤ t ≤ 1, which follows by writing

tx+ (1− t)y = t

n1∑

j=1

tjzj + (1− t)n1∑

j=1

tjzj =

n2∑

j=1

tjzj

where j = ttj for j = 1, . . . , n1 and tj = (1 − t)tj for j = n1 + 1, . . . , n2. Clearly, all tj ≥ 0 and∑n2

j=1 tj = t∑n1

j=1 tj + (1 − t)∑n2

j=n1+1 tj = t+ (1− t) = 1 showing that C is convex. Finally, it isindeed trivial that C contains every point in S if we let all but one tk = 0, which completes theproof. �

A famous result in convex analysis states that the convex hull of a set S ⊆ Rm is already fullycharacterized if it contains all convex combinations of at most m+1 points of S. Recall the pictureswe drew in class (as time permits, I will include some of those later). Although initially provenin 1907 from first principles, linear programming (which was not around at that time) offers anelegant (and much shorter) way to see why this result is true.

Theorem (Caratheodory (1907)). A set C = conv S if and only if C contains all convex combina-tions of at most m+ 1 points in S ⊆ Rm:

C =

z =

m+1∑

j=1

tjzj : tj ≥ 0,n∑

j=1

tj = 1, zj ∈ S

.

Proof. We will show that the two (different) sets C defined in the last two theorems are in factidentical. In particular, we show that for any point z =

∑nj=1 tjzj ∈ C that is represented by a

convex combination of more than m+ 1 points zj , we can select a subset of at most m+ 1 pointsthat result in a new convex combination of the same point z. In fact, we will see that this result is astraightforward application of the fundamental theorem of linear programming: Define the matrixA =

[z1 z2 . . . zm+1

], then the vector t = (t1, t2, . . . , tm+1)

T is feasible (and optimal) for theLP

max 0T t s.t. At = z, eT t = 1, t ≥ 0

where e = (1, 1, . . . , 1)T ∈ Rm+1 is the vector of all ones. By the fundamental theorem, theremust exist at least one basic feasible solution with as many basic variables as the LP has equalityconstraints: m+ 1! Hence, we can find a basis B of cardinality m+ 1 and a basic optimal solutiont∗ such that At∗ =

∑m+1j=1 t∗jzj =

∑

j∈B t∗jzj = z which concludes the proof. �

The above proof is possibly one of coolest applications of linear programing to prove a very fun-damental result in general mathematics. But there are more, this time highlighting an applicationof LP duality to prove the following “Theorem of the Alternative” also known as Farkas’ Lemma.

Theorem (Farkas (1902)). One and only one of the following systems has a solution:

(I) Ax ≤ b (II) AT y = 0, y ≥ 0, bT y < 0

58 CHAPTER 5. CONVEX ANALYSIS

Proof. Consider the following primal-dual pair of linear programs:

(P) max 0Tx s.t. Ax ≤ b (D) min bT y s.t. AT y ≤= 0, y ≥ 0

It is immediate that the dual problem is feasible with y = 0, and that the primal problems has onlyoptimal solutions if it is feasible. Hence, there are only two possible cases:

1. Let the primal problem be feasible, or equivalently, let there be a solution x to system (I).Because x is optimal for (P) with an optimal objective value of 0, the optimal dual objectiveis also 0 so that bT y ≥ 0 for all feasible y for D, showing that there is no solution to system(II).

2. If the primal problem is infeasible, or equivalently, if there is no solution to system (I), thenthe dual problem must be unbounded as it is feasible, and there must exist y ≥ 0 withAT y = 0 and bT y < 0 which provides a solution to system (II).

�

Farkas’ Lemma on Wikipedia: “Farkas’ lemma is a result in mathematics statingthat a vector is either in a given cone or that there exists a (hyper)plane separatingthe vector from the cone, but not both. It was originally proved by Farkas (1902).It is used amongst other things in the proof of the Karush-Kuhn-Tucker theorem innonlinear programming. Farkas’ lemma is an example of a theorem of the alternative;a theorem stating that of two systems, one or the other has a solution, but not both ornone.”

The above theorem is traditionally called a lemma because of its major importance in the proofof several other results. To show only one example for how the above theorem can be used, we willprove another fundamental result in convex analysis widely known as the separating hyperplanetheorem, for which we recall the following definitions from Assignment 1: A hyperplane in Rn isa set {x ∈ Rn : aTx = b}, the vector a ∈ Rn is referred to as its normal vector, and the sets{x ∈ Rn : aTx ≥ b} and {x ∈ Rn : aTx ≤ b} are its associated positive and negative halfspaces,respectively (convince yourself that the normal vector is orthogonal to the hyperplane and points‘into’ the positive halfspace). A set S ⊆ Rn is said to be polyhedral if it is the intersection of afinite number of halfspaces (convince yourself that every polyhedral set is convex). Now on to thetheorem.

Theorem (Separating Hyperplane Theorem (for polyhedra)). Let P1 and P2 be two disjointnonempty polyhedra (P1∩P2 = ∅). Then there exist two disjoint halfspaces H1 and H2 (H1∩H2 = ∅)such that P1 ⊆ H1 and P2 ⊆ H2.

Proof. Let P1 = {x : A1x ≤ b1} and P2 = {x : A2x ≤ b2}, then the combined system

Ax =

[A1

A2

]

x ≤

[b1b2

]

has no solution because the two polyhedra are disjoint. By Farkas’ Lemma, then there existy = (y1, y2)

T ≥ 0 such that bT y = bT1 y1 + bT2 y2 < 0 (*) and AT y = AT1 y1 + AT

2 y2 = 0 (**).From (*), it follows that at least one of the two terms must be negative, and without loss ofgenerality we can assume that bT1 y1 < 0 which implies that AT

1 y1 6= 0 again by Farkas Lemma’(but now in reverse because P1 is nonempty). Almost done, we can now define the (separated)negative and positive halfspaces H1 = {x : (AT

1 y1)Tx ≤ bT1 y1} and H2 = {x : (AT

1 y1)Tx ≥ −bT2 y2}

with nonzero normal vector AT1 y1, and it only remains to show that H1 and H2 are disjoint and

contain their respective polyhedra which we achieve as follows. For the former, it is sufficientto observe that bT1 y1 < −bT2 y2 by (*). For the later, we first let x ∈ P1, then A1x ≤ b1 andmultiplication with the nonegative vector y1 yields yT

1 A1x = (AT1 y1)

Tx ≤ yT1 b1 = bT1 y1 which

shows that P1 ⊆ H1. Second, we recall from (**) that −AT2 y2 = AT

1 y1 and then let x2 ∈ P2,so A2x ≤ b2 or −Ax2 ≥ −b2 which we now multiply with the nonnegative vector y2 to obtain−yT

2 A2x = −(AT2 y2)

Tx = (AT1 y1)

Tx ≥ −yT2 b2 = −bT2 y2 showing that P2 ⊆ H2. We are done. �


The separating hyperplane theorem remains true for any nonempty disjoint convex (not nec-essarily polyhedral) sets C1 ∩ C2 = ∅ (the proof then uses quadratic programming from convexoptimization rather than linear programming hidden within Farkas’ Lemma). Furthermore, a sim-ilar result holds true if the two sets are not disjoint but only touch each other at boundary points.Still good for your general toolbox, the corresponding theorem is listed without proof.

Theorem (Supporting Hyerplane Theorem). Let C be a convex set and let y be a boundary pointof C. Then there exists a hyperplace passing through y and fully containing C in one of its closedhalfspaces.

5.1 Problem Set 5

Exercise 5.1. (Mathematical Modeling and Optimization) [5 points] Do Problem 15.3 onpage 264 in your text book. You may but don’t have to use AMPL for your model and its solution.

Exercise 5.2. (Geometric Interpretation of Basic Solutions using Convexity) [10 points]In this exercise we will review the definition of basic feasible solutions and develop a geometric un-derstanding of their relationship to the extreme points of polyhedra using concepts from convexity.

(a) Basic Definitions (2 points): A set of vectors x1, x2, . . . , xm in Rn is said to be linearly inde-pendent if

∑mi=1 λixi = 0 if and only if λi = 0 for all i = 1, 2, . . . ,m. Given a linear system of

equalities (LSE) Ax = b with A ∈ Rm×n and b ∈ Rm, the matrix A is said to have full (row)rank, denoted rankA = m, if the m rows of A are linearly independent (convince yourselfthat rankA = m necessarily implies that m ≤ n). Let B ∈ Rm×m be a nonsingular submatrixmade up of columns of A; if all n−m components of x not associated with the columns of Bare set equal to zero, then the solution to the resulting set of equations is said to be a basicsolution of Ax = b, and the components of x associated with the columns of B are called itsbasic variables. [Remark: Because the columns of B are m linearly independent vectors thatform a basis for Rm, we also refer to B as basis. A basic solution then gives an expression ofthe vector b as a linear combination of these basic vectors.] Show that the LSE Ax = b willalways have a solution and, in fact, at least one basic solution if A has full row rank (yes, thisis as easy as it seems). What can “go wrong” if m > n, or if m ≤ n and rankA < m?

(b) The Fundamental Theorem of Linear Programming (1 point) An alternative version of the

fundamental theorem states that given any LP of the form max cTx s.t. Ax = b, x ≥ 0 withrankA = m, the existence of a feasible or optimal feasible solution implies the existence of abasic feasible or optimal basic feasible solution, respectively. Why can we drop the full rankassumption on the matrix A for an LP of the form max cTx s.t. Ax ≤ b, x ≥ 0?

(c) Extreme Points and their Equivalence to Basic Solutions (2 points) A point x in a convexset C is said to be an extreme point of C if x = λy + (1 − λ)z for 0 < λ < 1 and y and z inC if and only if y = z (an extreme point is a point that cannot be written as a strict convexcombination, or equivalently, that does not lie strictly within a line segment connecting twoother points in the set). Assuming rankA = m, show that a point x is an extreme point ofthe convex polytope P = {x : Ax = b, x ≥ 0} if and only if x is a basic feasible solution of P .

(d) Some Simple Proofs (4 points) Use the equivalence between extreme points and basic solu-tions to prove the following geometric properties of the convex polytope P defined above: (i)if P is nonempty, then it has at least one extreme point; (ii) if there is an optimal solution,then there is an optimal extreme point; (iii) the polytope P has a finite number of extremepoints (how many at most?); (iv) if P is bounded, then there exist a finite number of points(how many?) so that each point in P can be written as convex combination of these points.

(e) A Challenge (1 point) Suppose that x is a feasible (not necessarily basic) solution to an LP

of the form max cTx s.t. Ax = b, x ≥ 0 with rankA = m. Show that there is another feasiblesolution y with the same objective value cT y = cTx and at most m+ 1 positive components.

60 CHAPTER 5. CONVEX ANALYSIS

Exercise 5.3. (General Concepts and Theory) [4 points] Use your new geometric intuitionof basic solutions to re-do Problems 2 (also see 9), 3 (also see 10), 5 and 6 from the Midterm Prep.

2. True or False: Any LP that has a feasible solution has a basic feasible solution. Briefly justifyyour answer, or give a counterexample.

(9.) Give an example of an LP that has a feasible solution but no basic feasible solution. Howdoes this reconcile with the fundamental theorem of linear programming?

3. True or False: Any LP that is feasible and bounded has a basic optimal solution. Brieflyjustify your answer, or give a counterexample.

(10.) Give an example of an LP that has an optimal solution but no basic optimal solution. Howdoes this reconcile with the fundamental theorem of linear programming?

5. True or False: Any LP that has a basic optimal solution has either a unique optimal basicsolution, or infinitely many basic optimal solutions. Briefly justify your answer, or give acounterexample.

6. True or False: Any LP that has a unique optimal solution has either a unique optimal basis,or infinitely many optimal bases. Briefly justify your answer, or give a counterexample.

Exercise 5.4. (Proof of Caratheodory’s Theorem) [1 point] Use your new understanding ofthe fundamental theorem to answer the questions in Exercise 10.5 on page 171 in your text book.

Chapter 6

Network Flow Problems

An important and rich class of linear programs stems from network flow problems with a hostof possible applications areas varying from transportation, scheduling and production over TSPs(traveling-salesman problems) to comunnication networks and electricy circuits, to mentioned onlya few.

6.1 Examples

6.1.1 The Transportation Problem (HW 2.1)

Let S be a set of sources (or origins) with supplies bi ≥ 0, i ∈ S (also ri in book), D be a set ofdestinations with demands bj ≤ 0, j ∈ D (also sj = −bj in book), and cij be the transportationcost of sending one unit from i ∈ S to j ∈ D which is to be minimized.

4 1 1 3

2 2

8 2 3 7

4 5

7 3 5 2

Sources (supply nodes) Destination (demand nodes)

cij

Figure 6.1: A transportation network

For the LP formulation of this problem, let the decision variables xij be the amount shippedfrom i to j.

min∑

i∈S

∑

j∈D

cijxij

s.t.∑

j∈D

xij = ri, i ∈ S

∑

i∈S

xij = sj , j ∈ D

xij ≥ 0, i ∈ S, j ∈ D

61

62 CHAPTER 6. NETWORK FLOW PROBLEMS

Note how this formulation is very similar to the AMPL model you developed in Exercise 2.1 ofyour second assignment. Without much discussion at this point, we only highlight two importantobservations.

1. The above model makes the (often unrealistic) assumption that there is a transportationcost cij between every pair i ∈ S and j ∈ D. To use this formulation even if if there is noconnection between i and j, we can (theoretically) set cij = ∞, or cij = M � 1 in practice,where M is a very large number that ensure that nothing is shipped from i and j in theoptimal solution. If all connections between i ∈ S and j ∈ D exist, then the problem isalso called a Hitchcock Transportation Problem named after Frank Lauren Hitchcock (MITprofessor, 1875-1957) who first formulated this problem in 1941.

2. From a practical (and mathematical) standpoint, it makes sense that total demand andsupplies must balance

∑

i∈S

ri =∑

i∈S

∑

j∈D

xij =∑

j∈D

∑

i∈S

xij =∑

j∈D

sj

(otherwise we may need to replace some of the equality constraints by inequalities). However,this also means that the constraints are linear dependent, because a single supply or demandcan always be computed as long as all other supplies and demands are known. This showsthat at least one of the constraints is redundant and could be dropped (see Midterm Prep 25and HW 5.2(a)).

6.1.2 The Assignment Problem

In many applications, we may want to assign “objects” from one group to those of another, e.g.,workers to jobs in human resources, jobs to machines in production and scheduling, supplies todemands in transportation, boys to girls in HW 5.1, or students to projects in math courses: LetS be a set of students, P be a set of projects, and pij be the preference of student i ∈ S to workon project j ∈ P .

Casey Games

Chris

Denise Stats

Jingwei

Maureen Money

Sam

Alex Truss

?

!

Figure 6.2: An assignment network

Setting

xij =

{

1 if student i is assigned to project j

0 otherwise

6.1. EXAMPLES 63

the problem of meeting preferences and maximizing total student satisfaction (in our case) can beformulated as to

max∑

i∈S

∑

j∈P

pijxij

s.t.∑

j∈P

xij = 1, i ∈ S

∑

i∈S

xij = 2, j ∈ P

xij ∈ {0, 1}, i ∈ S, j ∈ D

which ensures that every student is assigned to exactly one project, and that every project is beingworked on by exactly two students. If you look into our or another book, you will find that the“standard” assignment problem is a one-to-one assignment (can you formulate “our” one-to-twoproblem as standard assignment problem?). Also, it is important to note that this problem isnot a linear program because xij are constrained to be binary (formally, this problem is called abinary integer linear program because it is linear in its binary integer variables). Usually, integerprograms are much harder to solve than linear programs, so that we often relax the integralitycondition on xij , say by 0 ≤ xij ≤ 1, and then solve this problem as a linear program (crossingour fingers that the optimal solution is not fractional): in fact, we see that the LP-relaxation ofthe assigment problem is a (very special) Hitchcock Transportation Problem! Later, we will seethat any assignment problem with integer data can be relaxed in this way and solved as a linearprogram, and still produce an optimal solution that is integer.

6.1.3 The Transshipment Problem

Let N be a set of m nodes with supplies or demands bi ≥ 0 (sources) or bi ≤ 0 (sinks), andA ⊆ N × N = {(i, j) : i 6= j ∈ N} be a set of n directed (loop-free) arcs with shipping costs cij0for each (i, j) ∈ A.

8 1 5 3

-3 3 4 -6

-7 2 6 5

6

1

2

4

2

3

5

1

3

Figure 6.3: A transshipment network

The transshipment problem is to to find a flow xij through the above network that meets allsupplies or demands and minimizes total shipment cost:

min∑

(i,j)∈A

cijxij

s.t.∑

i:(i,k)∈A

xik −∑

j:(k,j)∈A

xkj = −bk, k ∈ N

xij ≥ 0, (i, j) ∈ A

In the above formulation, the flow balance or convservation constraints ensure that the differencebetween the inflow into node k,

∑

i:(i,k)∈A xik, and the outflow of this node,∑

j:(k,j)∈A xkj, equalsthe negated supply −bk ≤ 0 if more flow is leaving than entering or arriving at this node, andvice versa, equals the negated demand −bk ≥ 0 if more flow is arriving than leaving. Furthermore,


in some applications we may have to restrict the amount of flow xij than can (or must) be sentalong certain arcs, in which case we replace the nonnegativity constraints on xij by box constraintslij ≤ xij ≤ uij with lower and upper bounds lij and uij, respectively. In the above case whereall lij = 0 and uij = ∞, the network (and problem) are also said to be uncapacitated, otherwisecapacitated. Finally, it is easy to see that this formulation generalizes the transportation problem(thus the Hitchcock TP and the assignment problem as special cases) and similarly includes manyother problems collectively referred to as minimum-cost network flow problems. See the books byAhuja et al. (1993) and Bazaraa et al. (2005) for many more practical network flow examples.

6.2 Some Network (not so much graph) Theory

Let us begin our more in-depth study of network flow problems by writing the minimum-costnetwork flow problem in matrix notation

min cTx s.t. Ax = −b, x ≥ 0

where c = [cij ] ∈ Rn, b = [bk] ∈ Rm, and A = [akij] ∈ Rm×n with Aij = ej − ei, e.g. for the aboveexample

(1,2) (1,3) (2,3) (3,4) (4,5) (5,1) (5,6) (6,2) (6,4)

A =

123456

−1 −1 0 0 0 1 0 0 01 0 −1 0 0 0 0 1 00 1 1 −1 0 0 0 0 00 0 0 1 −1 0 0 0 10 0 0 0 1 −1 −1 0 00 0 0 0 0 0 1 −1 −1

Because the rows of this matrix correspond to the m nodes and each column corresponds to an arc,this matrix is called the node-arc incidence matrix of the above network (or digraph) G = (N ,A),and an arc (i, j) ∈ A is also said to be incident at the nodes i and j which themselves are then saidto be adjacent. In particular, note that every column has exacty two nonzero entries: a negative onefor that node at which the arc emanates (the so-called tail of the arc), and a positive 1 where thearc ends (its so-called head). Furthermore, it is easy to see that the rows of this matrix add up tozero and thus are linear dependent, as we already observed in our earlier discussion. Consequently,we can drop any single row and preserve the rank of this matrix, which turns out to be m− 1 sothat, after dropping any one row, the remaining rows are linear independent.

Theorem. Let the network G = (N ,A) be connected and let A ∈ Rm×n be its node-arc incidencematrix. Then rankA = m− 1.

Proof. Because the m rows of A are linear dependent, it is clear that rankA can be at most m− 1,so that it sufficies to show that after dropping any one row, the remaining m − 1 rows are linearindependent. We will proceed in a series of iterative steps as follows:

Step 1: Drop any one row of A, leaving a (m− 1)×n matrix A1 with at least one column a1 wthexactly one nonzero entry.

Step 2: Reorder the columns and rows of the matrix A1 to a new matrix of the form

±1 | ∗− − −0 | A2

where A2 is a (m− 2)× (n− 1) matrix with at least one column a2 with exactly one nonzeroentry. To see why that is, assume by contradiction that all columns of A2 have either zero orexactly two nonzero entries and reorder A1 to

±1 | ∗ | 0− − − − −0 | − | A2

6.2. SOME NETWORK (NOT SO MUCH GRAPH) THEORY 65

where all columns of A2 have exactly two nonzero entries so that A2 is a full node-arc incidencematrix of a disconnected subgraph in contradiction to the connectedness of G.

Step k: Reorder the columns and rows of the matrix Ak−1 to a new matrix of the form

±1 | ∗− − −0 | Ak

where Ak is a (m − k) × (n − k + 1) matrix with at least one column ak with exactly onenonzero entry.

Step m: Putting things together, we have reordered the columns and rows of the matrix A1 to anew matrix of the form

a1 a2 . . . am−1

±1 ∗ . . . ∗ | ∗

0 ±1. . .

... |...

.... . .

. . . ∗ |...

0 . . . 0 ±1 | ∗

=[B | ∗

]

showing that the rows of A1 are linear independent and, thus, that rankA = m− 1. �

By construction of corresponding B matrices after initially dropping any one of the six rows(done in class), we can make the following observations.

1. Each matrix B obtained as in the above proof is upper triangular and thus regular, in par-ticular. Because all nonzero entries are either plus or minus one, the inverse of B can becomputed by backsubsitution using additions and subtractions only, so that if B is integer(which it is!), then B−1 is also integer (in fact, the matrix A is totally unimodular whichmeans that the determinant of every square submatrix of A is either 0, 1, or -1). We willfollow on this in a little while.

2. The matrices B obtained in the above proof are exactly the bases of the corresponding LP(after dropping one of the redundant constraints). We have not shown this formally, but youmay want to use your intution to convince yourself (or prove!) that this must be true.

3. Each matrix B obtained in the above proof corresponds to a spanning tree T in G (in graphtheory, a spanning tree is a connected set of arcs that covers - or spans - every node andcontains no cycle).

From the above observations, we can draw two major conclusions: the integrality theorem fornetwork flow problems, and the fact that a network simplex method can be formulated in terms ofspanning trees rather than in terms of basic matrices.

Theorem (Integrality Theorem). For any network flow problem with integer data b, every basicfeasible solution and, in particular, every basic optimal solution assigns integer flow to every arc.

The proof follows immediately from the first two observations. As a consequence, instead ofsolving the “hard” integer network flow problem, we can solve the “easier” LP relaxation using thesimplex method which ensures to find a basic optimal solution (if it exists) that is integer and thusalso feasible for the original problem (and meaningful in practice!). In the next section, we will seehow to formulate the network simplex method in terms of spanning trees.


6.3 The (Primal) Network Simplex Method

As with most other algorithms, we need three major ingredients: a way to find an initial feasibleflow (otherwise we may use a dual method), a check for optimality (usually in the form of easilyverifiable optimality conditions), and a mechanism to improve the current flow if it is not optimal(such as a pivoting rule like in the regular simplex method). For the first, we can use the third ofthe above observations to construct one or several spanning trees, and then compute the currentflows using a simple recursive scheme (as done in class). Eventually, this will result in a feasibleflow.

For the second, to check optimality of a given feasible flow, we already know that the key ofthis task lies in the dual problem, so let us consider the primal-dual pair

(P) min cTx s.t. Ax = −b, x ≥ 0 (D) max−bT y s.t. AT y ≤ c, y free

Because the columns of the node-arc incidence matrix A have exactly two nonzero entries, thetranspose matrix AT has exactly two nonzero entries in each row which results in the constraints−yi + yj ≤ cij or, equivalently, with slack variables zij ≥ 0

zij = cij + yi − yj ≥ 0 for all (i, j) ∈ A.

Here, the dual variables yi that are associated with the nodes i ∈ N are also called node potentials,and the dual slack variables zij are also called reduced costs. We are ready to state our firstoptimality condition for network flows.

Theorem (Optimality Conditions for Uncapacited Minimum-Cost Network Flow Problems). Afeasible (primal) flow xij, (i, j) ∈ A is optimal for the uncapacitated minimum-cost network flowproblem if and only if there exist a set of node potentials yi, i ∈ N , such that the all reduced costszij are nonnegative and zero if xij > 0

zij = cij + yi − yj ≥ 0 AND zij = cij + yi − yj = 0 if xij > 0.

The proof is an immediate consequence of strong LP duality and complementary slackness. Forthe third, to conceive a pivoting rule for network flows, we follow the exact same philosophy like inthe regular simplex method: If any of the current reduced costs is negative, zij < 0 which indicatesthat the current flow value can be further decreased, we add arc (i, j) to the current spanning treeT (i.e., (i, j) enters the basis) which necessarily creates a cycle. Then we begin to increase the flowon (i, j) from 0 while updating the flow on all other arcs in the cycle, until one of them reduces to0 and thus can be removed from T (i.e., that arc leaves the basis). An example will show you thatthis is not even half as complicted as it may sound, but first let us formulate the full algorithmthat we have developed so far.

Step 1: Find a suitable spanning tree T with an initial basic feasible flow xij ≥ 0, (i, j) ∈ T .

Step 2: Compute a set of node potentials yi for all i ∈ N .

1. Arbitrarily fix any one yi (e.g., set yi = 0).

2. Solve for the remaining ones using yj = yi + cij , (i, j) ∈ T .

Step 3: Compute the reduced costs zij = cij + yi − yj for all (i, j) /∈ T .

Step 4: If all zij ≥ 0, then STOP: the current flow is optimal.

Step 5: Otherwise, if zij < 0 for some (i, j) /∈ T , add (i, j) to T which creates a cycle. Increasethe flow on (i, j) until some other flow in the cycle reduces to 0; remove that arc from T andgo to Step 2.

6.3. THE (PRIMAL) NETWORK SIMPLEX METHOD 67

Step 2 in the above algorithm ensures that zij = cij + yi − yj = 0 for all (i, j) ∈ T and,in particular, for all xij > 0. Hence, like the regular (primal) simplex method this algorithmmaintains primal feasibility and complementarity slackness and iteratively works its way towardsdual feasibility, which by strong duality is equivalent to optimality.

Example. As example, let us solve the transshipment problem from Section 6.1. The same networkwith an initial feasible spanning tree T = {(1, 2), (2, 3), (5, 6), (6, 4), (6, 2)} is given in Figure 6.4,where each node label (bi/yi) gives its supply/demand and the computed node potential after settingy1 = 0, and each arc label (cij/xij) or (cij/zij) corresponds to cost and current flow or reducedcosts along this arc, based on if this arc belongs to the current spanning tree or not, respectively.Clearly, xij = 0 for all thin arcs (i, j) that do not belong the current spanning tree, and zij = 0for all highlighted thick arcs (i, j) ∈ T in this tree. To further emphasize the logical dependencebetween flows and the supplies and demands, we highlight these two labels in bold font, while nodepotentials and (reduced) costs (also in logical dependence) are kept in normal font (please let meknow if you can think of any better way, that does not require to print the notes in color).

(8/0) 1 5 (3/0)

(-3/8) 3 4 (-6/8)

(-7/6) 2 6 (5/5)

6/8

1/ -7

2/3

4/4

2/10

3/3

5/3

1/2

3/6

Figure 6.4: A transshipment network with an highlighted initial feassible spanning tree, node labels(bi/yi), and arcs labels (cij/xij) for (i, j) ∈ T and (cij/zij) for (i, j) /∈ T (current cost is 89)

For the spanning tree in Figure 6.4, we note that the reduced cost z13 = −7 on arc (1, 3) isnegative so that the overall cost can be reduced if we resent some of the flow along this arc. Whenadding this arc to the current spanning tree, it is easy to find the resulting (undirected) cycleC = {(1, 3), (2, 3), (1, 2)} by inspection, and increasing the flow along arc (1, 3) we observe that weneed to simultaneously increase the flow on all arcs in the cycle that point into the same directionas (1, 3) and decrease the flow along all arcs within this cycle that point into the opposite direction,while the flow along all arcs that do not participate in the cycle remain unchanged. In particular,because the smallest amount of flow on any arc within the cycle into the opposite direction of (1, 3)is the flow x23 = 3 on arc (2, 3), we can increase x13 to at most 3 before x23 is reduced to zero,and hence chosen to leave the current spanning tree. Updating all other flows in the cycle, andrecomputing the new node potentials and reduced costs accordingly, the new spanning tree is givenin Figure 6.5.

(8/0) 1 5 (3/0)

(-3/3) 3 4 (-6/8)

(-7/6) 2 6 (5/5)

6/5

1/3

2/5

4/ -1

2/10

3/3

5/3

1/2

3/6

Figure 6.5: The new feasible spanning tree with updated node and arc labels after redirecting 3units of flow along the (undirected) cycle C = {(1, 3), (2, 3), (1, 2)} (new cost is 68)

Again we can find a negative reduced cost z34 = −1 so that we repeat analogous steps to theprevious iteration: first, we add the arc (3, 4) to the current spanning tree which produces the


new (undirected) cycle C = {(3, 4), (6, 4), (6, 2), (1, 2), (1, 3)} with smallest flow x12 = 5 among allparticipating arcs pointing into the opposite direction of (3, 4). Hence, we resent the maximumamount of 5 units of flow along C which reduces the flow along (1, 2) to zero so that this arc isremoved in the new spanning tree, and all other flows and node and arc labels are updated asshown in Figure 6.6 in which all reduced costs are now nonnegative showing that the new spanningtree is optimal.

(8/0) 1 5 (3/-3)

(-3/1) 3 4 (-6/5)

(-7/3) 2 6 (5/2)

6/3

1/8

2/4

4/5

2/8

3/2

5/3

1/7

3/1

Figure 6.6: The optimal spanning tree with updated node and arc labels after redirecting 5 unitsof flow along the (undirected) cycle C = {(3, 4), (6, 4), (6, 2), (1, 2), (1, 3)} (minimum cost is 53)

From the above example, we see that Step 5 in the Network Simplex Method results in thefollowing easy-to-remember (primal) pivoting rule.

Step 5: Add that arc with the most negative reduced cost to the spanning tree, which creates a cycle(break ties arbitrarily), and drop that arc within the cycle that has the smallest current flowamong all arcs in the cycle that point into the opposite direction of the arc that is added.

Note that this results in a new spanning tree for which the new flow and all new labels can becomputed very easily using addition and subtractions only. Also be aware of the close conceptualsimilarity to the regular simplex method, in which we select the entering variable (here: arc) asthat nonbasic variable (here: arc not in the tree) with smallest current cost coefficient (or reducedcost) and remove that basic variable (here: arc in the tree) that decreases if we increase theentering variable (arc in opposite direction) and has minimum ratio telling us how much we canincrease the entering variable while staying feasible (here: smallest flow value with the exact sameinterpretation). Then it is not suprising that there also exists a dual network simplex method withan analogous dual pivoting rule if all reduced costs are nonnegative, but some arcs has a negativeflow (read Section 14.4 on pages 237-240 in your text book for precise details): Drop that arc withthe most negative current flow, which disconnects the spanning tree (break ties arbitrarily), andadd that arc to a new spanning tree that re-connects the two disconnected subtrees and has thesmallest current reduced cost among all such arcs that point into the opposite direction of the arcthat is dropped. Sounds complicated, but is quite easy if you work a handful of example problems(e.g., on the bonus assignment).

6.4 Shortest Paths and Maximum Flows

We conclude our discussion of network flows by two other important problems that often arise inpractice, the shortest-path and the max-flow problem.

6.4.1 The Shortest-Path Problem

Let G = (N ,A) be a network with nonnegative arc lengths cij ≥ 0 for all (i, j) ∈ A and find theshortest path from any node i ∈ N to some designated target r ∈ N (somewhat inconsistentlycalled the “root” node in our book). As example, we will consider the same network as in Figure6.3 without supplies or demands and with arc lengths identical to the given transportation costs

6.4. SHORTEST PATHS AND MAXIMUM FLOWS 69

(note that it is quite reasonable to assume that the travel distance between any two nodes i and jis closely related to its transportation cost). The following three methods can be used to solve thisproblem.

Method 1 (Network Flow Algorithm): Convert the problem into a minimum-cost networkflow problem by adding a supply bi = 1 to each node i 6= r, and a demand br = −

∑

i6=r 1 = 1−m tothe root node; the optimal flow / spanning tree will be the shortest-path tree with minimal lenghtsv∗i = y∗r − y

∗i .

Although we may feel relatively comfortable with this first method as we know how to solveminimum-cost network flow problems using linear programming and the network simplex method,in particular, it turns out that more efficient algorithms can be developed that make better use of thespecial structure inherent in the shortest-path problem. Although the analysis and implementationof these algorithms require more sophisticated methods and data structures than matrices andvectors, we briefly describe two of the arguably most popular methods without further explanationor discussion (but with an example).

Method 2 (Label-Correcting Algorithm)

Step 0: Let v(0)i =

{

0 if i = r

∞ if i 6= r.

Step k: For each i 6= r, let v − i(k) = min{cij + v(k−1)j : (i, j) ∈ A}.

Termination: Stop if v(k)i = v

(k−1)i for all i.

This algorithm is called label correcting because the node labels vi may change (are “corrected”)as long as the algorithm runs. Nevertheless, it can be shown that in iteration k, the node labels

v(k)i correctly give the length of the shortest path from i to r that has at most k arcs on the

path which then also implies that the algorithm terminates with the overall shortest paths afterat most m iterations. The full analysis of this method, which is also called a method of successiveapproximations, belongs to the area of dynamic programming and is postponed for the networkflows class in the spring.

Example. Let us consider our standard network as shown in Figure 6.7, where we have nowdropped all demands and supplies, consider the arc labels to be distances rather than costs (althoughan interpretation of costs would make a lot of sense, too), and have added one extra node to makethe algorithm a little bit more interesting (and to illustrate ons of its important properties). The

1 5

3 4 7

2 6

6

1

2

4

2

3

5

1

3

1

7

Figure 6.7: The shortest-path network used for the examples in Section 6.4.1

label-correcting algorithm can be organized nicely in tabular form as shown below, where the main

entries indicate the current node label v(k)i , and the entries in parenthesis indicate that node j

where a walker choosing the shortest path would be heading next and can be updated togetherwith the labels in an obvious manner.


node/step 0 1 2 3 4 [5] . . .

1 ∞ ∞ 5 (3) 5 (3) 5 (3) [5 (3)] . . .2 ∞ ∞ 6 (3) 6 (3) 6 (3) [6 (3)] . . .3 ∞ 4 (4) 4 (4) 4 (4) 4 (4) [4 (4)] . . .4 0 0 0 0 0 [0] . . .5 ∞ ∞ 8 (6) 8 (1,6) 8 (1,6) [8 (1,6)] . . .6 ∞ 3 (4) 3 (4) 3 (4) 3 (4) [3 (4)] . . .7 ∞ ∞ 10 (6) 9 (5) 9 (5) [9 (5)] . . .

It is clear (convince yourself!) that once all labes have finite value and repeat, they will notchange anymore so that we could have terminated the above algorithm in fewer than m iterations.In general, however, the labels at all (!) node (but the root) can still change until the verylast iteration, which motivates the algorithm’s classification as label-correcting. Now you will alsounderstand why we added the extra node 7, because the initial network was “too simple” and wouldnot have corrected any label once it is set. In larger networks (such as the one in Exercise 2 on thebonus assignment), this will happen much more often! Finally, also note from this example thatthe shortest path does not need to be unique, and that we actually have a choice between addingarc (5, 1) or (5, 6) to the shortest-spanning tree with these two alternatives shown in Figure 6.8.

5 1 5 8

4 3 4 0 7 9

6 2 6 3

6

1

2

4

2

3

5

1

3

1

7[The two shortest-path

spanning trees of the examples in Section 6.4.1]

Figure 6.8: The two shortest-path spanning trees of the examples in Section 6.4.1 with optimalnode labels (the two alternatives are indicated by the dashed arcs)

Method 3 (Label-Setting Algorithm) Different from the above method and first proposedby Dijkstra (1959), this algorithm maintains a set F of finished nodes with correct node labels v∗i ,i ∈ F , and in each iteration sets one new value vj to its correct value that is not changed anymorethe algorithm proceeds. Starting with F = ∅ and node labels vi like in Method 2, in each iterationthe algorithm selects a still unfinished node with smallest label j ∈ arg min{vi : i /∈ F}, fixes thislabel by adding this node to the set of finished nodes set F ← F ∪ {j}, and updates any stillunfinished node i /∈ F with (i, j) ∈ A by setting vi ← min{vi, cij + vj}. Clearly, this algorithmterminates once F = N after exactly m iterations.

Example. Figures 6.9 shows all intermediate networks with both newly fixed and updated nodelabels after each iteration of Dijkstra’s algorithm. The boxed node labels correspond to the smallestlabel among all unfinished nodes and indicate which node will be finished (added to F) in eachfollowing iteration. Clearly, after seven iterations the final shortest-path spanning tree will coincidewith one of the two optimal trees found using the label-correcting algorithm and already shown inFigure 6.8.

6.4.2 The Maximum-Flow Problem

Let G = (N ,A) be a network with positive arc capacities (no costs) uij > 0 for all (i, j) ∈ A andfind the maximum flow that can be sent from some designated source node s ∈ N to some sinknode t ∈ N . As example, we again consider the same network as in Figure 6.3 without supplies or

6.4. SHORTEST PATHS AND MAXIMUM FLOWS 71

∞ 1 5 ∞

4 3 4 0 7 ∞

∞ 2 6 3

6

1

2

4

2

3

5

1

F = {4}

3

1

7

∞ 1 5 8

4 3 4 0 7 10

∞ 2 6 3

6

1

2

4

2

3

5

1

F = {4, 6}

3

1

7

5 1 5 8

4 3 4 0 7 10

6 2 6 3

6

1

2

4

2

3

5

1

F = {4, 6, 3}

3

1

7

5 1 5 8

4 3 4 0 7 10

6 2 6 3

6

1

2

4

2

3

5

1

F = {4, 6, 3, 1}

3

1

7

5 1 5 8

4 3 4 0 7 10

6 2 6 3

6

1

2

42

3

5

1

F = {4, 6, 3, 1, 2}

3

1

7

5 1 5 8

4 3 4 0 7 9

6 2 6 3

6

1

2

4

2

3

5

1

F = {4, 6, 3, 1, 2, 5}

3

1

7

Figure 6.9: Six iterations of Dijkstra’s algorithm for finding a shortest path

demands and with upper capacities identical to the given transportation costs (without any logicalreasoning other than to keep things simple).

If we let s = 1 and t = 2 be source and sink, respectively, then it is relative easy to see thatthe maximum-flow value is 7 by sending 6 units of flow along arc (1, 2) and 1 unit of flow alongthe path 1− 3− 4− 5− 6− 2. In particular, this flow must be maximal because it coincides withthe maximum flow than can leave the source node 1 along arcs (1, 2) and (1, 3), and similarly, themaximal flow than can arrive at the sink node 2 from arcs (1, 2) and (6, 2) (note, however, thatthis is a pure coincidence: if we increase both u13 and u52 from 1 to 2 but decrease at least oneof u34, u45, or u56 to 1, then the maximum-flow value is still 7 although the maximum amout offlow that could principally leave the source or enter the sink is now 8). Nevertheless, we may saythat 7 is the maximum amount that can flow across the cut between nodes {1} and {2, 3, 4, 5, 6}, or{1, 3, 4, 5, 6} and {2} (note that 7 is also the maximum amount that can flow across the cuts {1, 3}and {2, 4, 5, 6} if we decrease u34 to 1, {1, 3, 4} and {2, 5, 6} if we decrease u45 to 1, and {1, 3, 4, 5}and {2, 6} if we decrease u56 to 1). No coincidence anymore, we introduce the following formatldefinition.

Definition. A s − t cut C ⊂ N is a set of nodes so that s ∈ C and t /∈ C. The capacity of C isdefined by κ(C) =

∑

i∈C,j /∈C uij.

Example. The following table list several (not all) cuts together with their capacities in the originalnetwork of Figure 6.10.


1 5

3 4

2 6

6

1

2

4

2

3

5

1

3

Figure 6.10: A network with positive arc capacities uij > 0

cut capacity

C = {1} κ(C) = u12 + u13 = 6 + 1 = 7C = {1, 3} κ(C) = u12 + u34 = 6 + 4 = 10C = {1, 3, 4} κ(C) = u12 + u45 = 6 + 2 = 8C = {1, 3, 5} κ(C) = u12 + u34 + u56 = 6 + 4 + 5 = 15C = {1, 3, 4, 5} κ(C) = u12 + u56 = 6 + 5 = 11C = {1, 3, 4, 5, 6} κ(C) = u12 + u62 = 6 + 1 = 7

This example shows that there is no immediate relationship between the cardinality of a cut andits respective cut capacity. However, it suggests that the majority (and possibly all) cut capacitiesare at least as large as the max-flow capacity of 7 on this particular example. As it turns out, thisis true in general.

Theorem (Max-Flow Min-Cut Theorem). The maximum flow value between any nodes s and tequals the minimum cut capacity κ(C) of any s− t cut C.

Proof. The proof comprises two major steps (and one embedded lemma) similar in spirit to theproof of weak and strong duality in Section 3.2.

Step 1: Convert max-flow to a capacitated network flow problem. Let bi = 0 for alli ∈ N , cij = 0 for all (i, j) ∈ A, and introduce a new (backward) arc from t to s with cts < 0 anduts =∞. Then it is clear that a minimum-cost flow will attempt to ship as much as possible fromarc t along the auxiliary arc (t, s) to s which, however, needs to be returned through the networkand thus coincides with the (regular) maximum-flow between nodes s and t. To conclude the firststep, now let C be any s− t cut in the original network [include figure] so that

xts =∑

i∈C,j /∈C

xij −∑

i/∈C,j∈C

xij

by flow conservation, which implies that

xts ≤∑

i∈C,j /∈C

xij ≤∑

i∈C,j /∈C

uij = κC.

Note that this inequality establishes weak duality between the max-flow and the min-cut problem.To establish equality at optimality, now let x∗ij , (i, j) ∈ A be an optimal flow and construct a cutC∗ such that x∗ts =

∑

i∈C,j /∈C x∗ij −

∑

i/∈C,j∈C x∗ij = κC∗. To do so, we need the following lemma.

Lemma (Optimality Conditions for Capacitated Network Flow Problems). Let xij, (i, j) ∈ T be anoptimal flow for the capacitated network flow problem, and let y be a set of optimal node potentials.Then

zij = cij + yi − yj

≥≤=

0 if

xij = 0xij = uij

0 < xij < uij


Step 2 and Proof of Lemma: Convert capacitated network flow problem to an un-capacitated problem. Rewrite the capacitated constraints 0 ≤ xij ≤ uij by introducing slackvariables tij as xij + tij = uij with xij ≥ 0 and tij ≥ 0, and then note that each xij appears inexactly three constraints: the above constraint and the two flow conservation constraints at nodesi and j

. . . − xij . . . = −bi

. . . + xij . . . = −bjxij + tij = uij

bi i j bj

xij

(cij , uij)

Then subtracting the third from the second constraint in order to restore that each term xij andtij occurs with opposite signs in exactly two places, we obtain the equivalent constraints and itsnetwork representation

. . . − xij . . . = −bi

. . . − tij . . . = −bj − uij

xij + tij = uij

bi i k −uij j bj + uij

xij

(cij ,∞)

tij

(0,∞)

The proof of the lemma now follows in three steps:

1. If xij = 0, then tij = uij and by complementary slackness cij + yi − yk ≥ 0 and yj − yk = 0,so cij + yi − yj ≥ 0.

2. If xij = uij, then tij = 0 and cij + yi − yk = 0 and yj − yk ≥ 0, so cij + yi − yj ≤ 0.

3. If 0 < xij < uij, then 0 < tij < uij and cij + yi − yk = yj − yk = 0, so cij + yi − yj = 0. �

To continue with the proof of the theorem, now define C∗ = {k : y∗k ≤ y∗s} and show that C∗ is ans− t cut with κ(C∗) = x∗ts. For the former, we observe that x∗ts < uts =∞ which then implies thatcts + y∗t − y

∗s ≥ 0, or equivalently, that y∗t ≥ y

∗s − cts > y∗s because cts < 0, showing that s ∈ C∗ and

t /∈ C∗ as desired. For the later, we first note from the above lemma that

(*) cij + yi − yj > 0 ⇒ xij = 0 and (**) cij + yi − yj < 0 ⇒ xij = uij

and then consider any arc (i, j) with i ∈ C∗ and j /∈ C∗. By definition of C∗ we know thaty∗i ≤ y

∗s < y∗j and cij = 0 yielding cij + y∗i − y

∗s < 0 and thus x∗ij = uij by (**). Similarly, if i /∈ C∗

and j ∈ C∗, then y∗j ≤ y∗s < y∗i and cij + y∗i − y∗j > 0 which implies that x∗ij = 0 by (*). Putting

things together, we finally see that

x∗ts =∑

i∈C∗,j /∈C∗

x∗ij −∑

i/∈C∗,j∈C∗

x∗ij =∑

i∈C∗,j /∈C∗

uij = κ(C∗) �

The above proof also indicates how we could solve a maximum-flow problem: simply convertit into a capacitated or uncapacitated transshipment problem and (for the current lack of a betteralgorithm) solve it using the network simplex method! Sufficient for a class on linear programming,however, more efficient max-flow algorithms that do not rely on LP but utilize the special networkstructure exist and are typically part of a separate course on network flows.

6.5 Problem Set 6

Exercise 6.1. (Network Flow Modeling) [5 points] Represent the LP in Problem 15.2 onpage 264 in your book as a network flow problem, and use a network flow algorithm to solve it.

Exercise 6.2. (Shortest Path Trees and Reliability Issues in a Telecommunication Sys-tem) [5 points] The network shown below represents a telecommunications system in which thenodes represent stations and the arcs are labeled with transmission times between these stations.


(a) Use a shortest path algorithm to find a shortest path tree T rooted at node 1 of this network.(Interpret the shortest path tree as to give the shortest path from node 1 to all other nodes.)

(b) Is there a unique shortest path tree (rooted at node 1) for this problem. Justify your answer.

(c) Because of equipment failures at node 5, the transmission times on all arcs entering or leavingnode 5 are increased by α. Determine the largest value of α so that the tree T found in part(a) remains a shortest path tree. Carefully justify your answer.

Exercise 6.3. (A Minimum-Cost Network Flow Problem) [5 points] Solve Problem 14.6on pages 246/247 in your book starting from an initial spanning tree that is (primal) feasible.

Exercise 6.4. (And Yet Another Pivoting Challenge) [5 points] Independently read Section14.4 on pages 237-240 in your book, and then do the Network Simplex Pivoting Challenge inProblem 14.8 on page 247 using the simplex network online pivot tool with the following settings:

Nodes: 7, seed: 0909, no. of probs: 5, instructor’s email: [email protected] URL: http://campuscgi.princeton.edu/∼rvdb/JAVA/network/challenge/netsimp.html

You may again take the challenge multiple times to get familiar with the tool or to improve yourprevious scores, if you wish. Only the last score submitted before the due date of this assignmentwill count!

http://campuscgi.princeton.edu/~rvdb/JAVA/network/challenge/netsimp.html

Chapter 7

Structural Optimization

75

76 CHAPTER 7. STRUCTURAL OPTIMIZATION

Chapter 8

Student Projects

8.1 Game Theory

78

8.1. GAME THEORY 79

80 CHAPTER 8. STUDENT PROJECTS

8.2. STATISTICAL REGRESSION 81

8.2 Statistical Regression

8.2. STATISTICAL REGRESSION 83


8.3 Financial Applications

8.3. FINANCIAL APPLICATIONS 85

8.3. FINANCIAL APPLICATIONS 87

Part III

Interior-Point Methods

88

Chapter 9

The Interior-Point Revolution

Recommended Readings (if full access to the articles by Eugene Lawler and Margaret Wrightis restricted from your domain, pdf-versions of these two articles are also posted on Blackboard.)

Disclaimer: I have not checked all of the below sources for correctness. If you happen to find anywrong references or miscitations, or any other articles of interest, please let me know. Thank you!

• The New York Times (frontline), November 7, 1979

A Soviet Discovery Rocks

World of Mathematics• The New York Times, November 27, 1979

Soviet Mathematician is Obscure No More

89

90 CHAPTER 9. THE INTERIOR-POINT REVOLUTION

• Jonathan Weiner, Science Writers Rock World of Mathematics (Tales of the Travling SalesmanProblem), National Association of Science Writer (NASW) Newsletter 28 (May 1980), no. 2,1–5.

http://tobaccodocuments.org/nysa_ti_s1/TI54203634.html

91

• Eugene L. Lawler, The great mathematical sputnik of 1979, The Mathematical Intelligencer2 (December 1980) no. 4, 191–198.

http://www.springerlink.com/content/vh32532p5048062u/


• The New York Times, November 19, 1984

• Time Magazine, December 3, 1984

93

• The Wall Street Journal, July 18, 1986

• Business Week, September 21, 1987


• The New York Times, May 14, 1988

• The Wall Street Journal, August 15, 1988

95

• Margaret H. Wright, The interior-point revolution in optimization (History, recent devel-opments, and lasting consequences), Bulletin of the American Mathematical Society (NewSeries) 42 (2005), no. 1, 39–56.

• The New York Times (online), May 23, 2005:

Leonid Khachiyan, 52; Helped to

Advance Computer MathLeonid Khachiyan, a Russian-born mathematician who helped to advance the field of lin-ear programming, which is used by computer scientists to schedule complex rosters of air-line flights and to solve problems in finance and industry, died on April 29 at his home inSouth Brunswick, N.J. He was 52. [. . . ] Computer scientists and mathematicians say hiswork helped revolutionize his field. [. . . ] In 1979, while a researcher at the academy, Dr.Khachiyan published a paper in a Russian mathematics journal that helped to demonstratethat certain problems in linear programming could be solved practically and in a reasonableamount of computing time. Computer scientists had previously relied on a method using thesimplex algorithm to review and order vast stores of information. In his paper ”A PolynomialAlgorithm in Linear Programming,” Dr. Khachiyan proposed using an ellipsoid algorithmin approaching theoretical problems believed to be too demanding for the simplex method,and ”turned the field on its head,” said Dr. Michael D. Grigoriadis, a professor of computerscience at Rutgers. [. . . ] Dr. Khachiyan’s algorithm received widespread acclaim for itsingenuity. It was subsequently refined and improved by other mathematicians and computerscientists, and has applications in finance, engineering and industry, where it is employed tocalculate transportation routes and cost-effective ways of allocating resources. In later work,Dr. Khachiyan studied cyclic games, which have applications in artificial intelligence, matrixgames and polytopes, which are regions of space defined by hyperplanes. [. . . ] Jeremy Pearce

http://www.ams.org/bull/2005-42-01/S0273-0979-04-01040-7/S0273-0979-04-01040-7.pdf

http://query.nytimes.com/gst/fullpage.html?res=9A05E4D61539F930A15756C0A9639C8B63

Chapter 10

The Affine-Scaling Method

You will agree that there are not too many other mathematical disciplines besides Linear Pro-gramming that can claim to first (in 1975) win a Nobel Prize in Economics shared between anAmerican/Netherland and a Soviet mathematician, and few years later (in 1979) witness how “aSoviet discovery rocks world of mathematics” causing a screaming press to declare a potentialthreat to the western world with “American defense experts wiring their hands worrying about itsapplications to secret codes, weather forecasting, and Kremlin-only-knows what else.” Good stuff!

Although the fundamentals of interior-point methods (IPMs) had been established several yearsbefore Leonid Khachiyan (in 1979) and then Narendra Karmarkar (in 1984) proposed their algo-rithms for solving linear programs, as outlined in the excellent survey article by Wright (2005), itis true that Khachiyan’s ellipsoid algorithm was the first algorithm for LP to be actually proven tobe theoretically efficient. Nevertheless, it turned out that although the simplex method is highlyinefficient for some pathological examples (including the so-called Klee-Minty examples by Klee andMinty (1972) that we didn’t cover in class; see Section 4.4 in Vanderbei (2008)), in comparison tothis new method it was clearly superior on average and, in particular, on essentially any practicalproblem that would occur in a real-life context. This picture changed with Karmarkar’s algorithmthat is both efficient in theory and practice and nowadays widely considered the first interior-pointmethod for linear programming (strictly speaking, the ellipsoid method by Khachiyan is not aninterior-point method). Quite technical to describe and based on several ideas from nonlinearprogramming, in this class we restrict ourselves to only study a simplified version of Karmarkar’soriginal algorithm called the affine-scaling method.

10.1 A Generic Linear Programming Algorithm

For most of this chapter, let us consider a maximization LP with equality and nonnegativityconstraints

max cTx s.t. Ax = b, x ≥ 0

At the beginning of Section 2.1, we had outlined the following generic procedure for solving a linearprogram, or similarly for any other type of mathematical optimization problem.

Phase I: Find an initial feasible point x0 so that Ax0 = b and x0 ≥ 0.

Phase II: Find an optimal feasible point x∗ so that cTx∗ ≥ cTx for all x that are feasible, byiteratively repeating the following four steps.

Step 1: Check optimality of the current point x; if x is optimal, stop with x∗ = x.

Step 2: If x is not optimal, then find a feasible direction ∆x of improvement so that cT ∆x > 0(ascent direction) and A∆x = 0 (so that A(x+ ∆x) = b).

Step 3: Find a “feasible” step length θ > 0 so that x+ θ∆x ≥ 0.

Step 4: Take a step and update x← x+ θ∆x; then go back to Step 1.

96

10.2. THE AFFINE-SCALING STEP DIRECTION 97

We will now discuss each step of the above procedure (in reversed order) and show that theaffine-scaling method is based on very naturally choices for both step lengths and step directions.Clearly, there is not much to say about Step 4 which is an elementary college algebra operation,so let us start with Step 3 that will be quite similar to the simplex method. Initially solving thevector inequality x+ θ∆x ≥ 0 for θ and using that x ≥ 0, we find that

θ

{

≥−xj

∆xjfor ∆xj > 0

≤−xj

∆xjfor ∆xj < 0

⇒ θ ≤ min

{−xj

∆xj: ∆xj < 0

}

=

(

max

{∆xj

−xj

})−1

where the last equality is true whenever ∆x � 0; otherwise it is easy to see that the problem isunbounded. From the above, it now seems reasonable to choose θ as large as possible to gain themaximum improvement in direction of ∆x while staying feasible, and surely enough the simplexmethod does exactly that which also ensure that at least one value of the current basic variablesdrops to zero so that the new iterate again lies at an extreme point of the boundary of the feasibleregion. The ingenuity of IPMs is to recognize that this strategy is not always the best, and that itmay be beneficial to choose smaller step sizes and temporarily remain in the interior of the feasibleset. Hence, these methods typically choose an additional parameter 0 < r < 1 and often set

θ = rmin

{

1,xj

|∆xj |

}

≤ rmin

{xj

−∆xj: ∆xj < 0

}

which reduces the largest possible step size in three ways, first by the additional parameter r, secondby limiting θ to never be larger than 1, and third by taking the minimum over all indices j includingthose for which ∆xj ≥ 0 (interpreting division by zero to be infinite) which are not all necessaryin practice, in general, but substantially simplify to prove convergence of these methods. Still anambitious endeavor, however, we postpone the theoretical analysis of IPMs for the advanced linearprogramming course in the spring.

10.2 The Affine-Scaling Step Direction

In comparison to the simplex method, or more general “active-set” methods that maintain iteratesat the intersection of subsets of active constraints and hence significantly restrict the choice offeasible ascent directions, one of the nice properties of IPMs whose iterates lie in the interior ofthe feasible region is that their choice of feasible directions is much more flexible (like a basketballplayer striving to score who better stays in the interior of the field where he can move around hisopponents, rather than getting stuck in a corner where his possible moves are much more limited).Clearly, the best possible ascent direction for an LP with the objective to maximize cTx is itsnormal vector c itself that points orthogonal into the positive halfspace associated with each levelcurve ζ = cTx. This important observation also follows from the elementary result in analysis thatthe steepest ascent direction of any differentiable function f(x) is given by its gradient ∇f(x), andbecause ∇cTx = c we also call ∆x = c the steepest ascent direction.

Although IPMs permit a larger choice of feasible directions ∆x than active-set methods, thischoice is still not unlimited because it remains necessary to guarantee that the new iterate x+ θ∆xsatisfies the equality constraints A(x+ θ∆x) = Ax+ θA∆x = b so that A∆x = 0, or equivalently,so that ∆x belongs to the null space N (A) = {d ∈ Rn : Ad = 0} of the linear mapping describedby the constraint matrix A ∈ Rm×n. Hence, if Ac 6= 0 so that the steepest ascent direction is notfeasible, we simply define the new direction as the projection of c onto the null space of A.

Proposition. Let the matrix A ∈ Rm×n be given and define P = I −AT (AAT )−1A ∈ Rn×n. ThenP is the matrix that maps any vector c ∈ Rn to its orthogonal projection onto the null space of A.

Proof. We need to show that Pc ∈ N (A) and dT (c− Pc) = 0 for any vector d ∈ N (A), where theformer follows immediately from

APc = A(I −AT (AAT )−1A)c = Ac−AAT (AAT )−1Ac = Ac−Ac = 0

98 CHAPTER 10. THE AFFINE-SCALING METHOD

Only slightly more difficult, for the later let d ∈ Rn be any vector in the null space of A, so Ad = 0,and then observe that

dT (c− Pc) = dT c− dT (I −AT (AAT )−1A))c = dT c− dT c+ dTAT︸︷︷︸

=0

(AAT )−1Ac = 0 �

The new direction ∆x = Pc is called the projected gradient and, by definition, corresponds to thedirection of steepest ascent among all feasible directions at the current iterate x. Note, however,that this direction does not depend on x and therefore always points into the same directionindependent of the current iterate. In particular, as the algorithm proceeds certain components ofthese new iterates (namely some of those variables that are nonbasic) will get closer and closer tozero and eventually lead to an infinitesimal small step sizes although possibly still quite far from anoptimal solution (this is especially true if the components of the initial point are of very differentsize). Using a quite simple idea to account for these differences, the main concept underlying affinescaling is to re-scale the original problem in each iteration so to achieve that the current point x ismapped to the unit vector e = (1, 1, . . . , 1)T ∈ Rn with all variables of equal value. This is achievedusing the fundamental observation that if x is the current iterate for the original LP, then ξ = e isfeasible for the scaled LP

max cTXξ s.t. AXξ = b, ξ ≥ 0

where the matrix X = Diag(x) ∈ Rn×n is defined as the diagonal matrix with elements of x on itsdiagonal, Xjj = xj for all j and Xij = 0 if i 6= j (this notation is quite common in the discussionof IPMs, and you will get used to it very quickly). The only (but now trivial) remaining thing todo is then to define the projected gradient in terms of this new problem

∆x = X∆ξ = X(

I −XAT(AX2AT

)−1AX

)

Xc) =(

D −DAT(ADAT

)−1AD

)

c

where have set D = X2 and used that XT = X because X is diagonal. The new direction ∆x isalso called the projected gradient direction with (primal) scaling, or the affine-scaling direction.

10.3 Termination and Phase-II Algorithm

Based on the above affine-scaling direction that ensures improvement of the objective function andfeasibility, the adopted step-size rule that ensures (strict) interiority with respect to the nonnega-tivity constraints x ≥ 0, and assuming that we start from a strictly positive initial point x0 > 0, itis clear that also each subsequent iterate will be feasible and remain strictly positive. However, weknow that many linear programs have (possibly unique) optimal solutions for which at least n−mvariables are zero, and hence it may seem (and is correct) that IPMs will never reach an optimalsolution. In fact, IPMs only find approximate solutions and usually need to be terminated beforereaching optimality, as long as the current solution is sufficiently close to an optimal solution (ifone exists). Some simple stopping criteria for general IPMs including the affine-scaling method areto terminate the algorithm if the current improvement in the objective and/or the current changein x is smaller than some (sufficiently small) threshold ε & 0 (often ε = 10−8)

max{cT ∆x, ‖∆x‖

}< ε for some vector norm ‖.‖

with the conclusion that the current solution is sufficiently close to optimal, if the largest component‖x‖∞ = maxi{xi} > M exceeds some big number M � 1 with the conclusion that the originalproblem is unbounded, or if a prespecified maximum number of iterations has been reached withoutany of the two former conclusion (especially common in practice if computational resources oravailable waiting time is sparse). Putting things together, the Phase-II affine-scaling algorithm canbe formulated as follows.

Step 0: Start from an initial feasible and strictly positive point x = x0 > 0 and set the step lengthparameter 0 < r < 1, an optimality threshold ε & 0, an unboundedness threshold M � 1,and a maximum number of iterations K. Let k = 1 and go to Step 2.

10.4. INITIALIZATION AND PHASE-I ALGORITHM 99

Step 1: Stop if max{cT ∆x, ‖∆x‖} < ε (x∗ is optimal), if ‖x‖∞ > M (the LP is unbounded), or ifk = K (the maximum number of iterations is reached). Otherwise increment k ← k + 1.

Step 2: Let D = X2 and compute the affine-scaling direction ∆x = (D −DAT (ADAT )−1AD)c.

Step 3: Compute the step size θ = rmin{

1,xj

|∆xj|

}

(or an alternative step size rule).

Step 4: Update the current iterate to x← x+ θ∆x and go to Step 1.

Clearly, if actually implementing this algorithm, Step 1 should be put after Step 4. The abovestructure, however, is consistent with the algorithm as initially outlined in Section 10.1.

Theorem. Vanderbei (2008) The above algorithm converges to an optimal solution (if one exists)for all r ≤ 2/3, and for all r < 1 if both primal and dual problems are non-degenerate.

According to Vanderbei (2008), there is currently only one example for which an optimal solutionexists and the above algorithm converges to a non-optimal solution for all r > 0.995, but correctlyidentifies this optimal solution for all r ≤ 0.995. This essentially means that if r is chosen toolarge, and the algorithm therefore takes relatively large steps in each iteration, then it is principallypossible that the algorithm approaches the boundary of the feasible region too fast and ultimatelygets stuck at a non-optimal point.

10.4 Initialization and Phase-I Algorithm

To complete the above algorithm of the affine-scaling method, we need to address the question howto find an initial point that is is feasible and strictly positive. Similar to the Phase-I method ofthe simplex method, this task can be accomplished by solving an auxiliary problem. Precisely, letx0 > 0 be any strictly positive point (which is easy to pick), define its current residual ρ = b−Ax0,and then consider the problem

max −x0 s.t. Ax+ xoρ = b, x ≥ 0, x0 ≥ 0

(note that x0 is an auxiliary variable and different from the initial point x0). It is easy to see thatthe point (x, x0) = (x0, 1) > 0 is feasible for the above problem and strictly positive so that we wecan apply the affine-scaling method to find an optimal solution (x∗, x∗0) if it exists. In particular,the original problem will have a feasible solution only if x∗0 = 0, and be infeasible otherwise (thereare some additional technical details because x∗0 will never be exactly equal to 0, but we will omitsuch details for this course). Nevertheless, it is instructive (not so much now but for later) to lookat the affine-scaling direction resulting from solving this auxiliary Phase-I problem for which

A =[A ρ

]∈ Rm×(n+1), c =

[0−1

]

∈ Rn+1, and x =

[xx0

]

∈ Rn+1

so that X =[

X 00 x0

]∈ R(n+1)×(n+1). Then setting D = X2 =

[X2 00 x2

0

]

=[

D 00 x2

0

]

and using a little

bit of algebra, we get

∆x =

[∆x∆x0

]

= (D − DAT (ADAT )−1AD)c

=

([X2 00 x2

0

]

−

[X2 00 x2

0

] [AT

ρT

]([A ρ

][D 00 x2

0

] [AT

ρT

])−1[A ρ

][D 00 x2

0

])[0−1

]

=

[0−x2

0

]

+

[X2AT

x20ρ

T

](ADAT + x2

0ρρT)−1

x20ρ

and, in particular, ∆x = DAT (ADAT + x20ρρ

T )−1ρ where we could drop the scalar x20 without

affecting the direction of ∆x (only its length, but this can be accounted for later by a proper choice

100 CHAPTER 10. THE AFFINE-SCALING METHOD

of step length). In fact, we can make this direction completely independent of x0 by using onevariant of the co-called Sherman-Morrison-Woodbury formula (Sherman and Morrison, 1950) orMatrix Inversion Lemma (Bartlett, 1951).

Lemma (Matrix Inversion Lemma / Sherman-Morrison-Woodbury Formulas). Let E ∈ Rm×m,D ∈ Rk×k, U ∈ Rm×k, and V ∈ Rk×m, and assume that D, E, E + UDV , and D−1 + V E−1U areinvertible matrices. Then the following matrix identities hold true.

1. (E + UDV )−1 = E−1 − E−1U(D−1 + V E−1U

)−1V E−1

2. (E + UDV )−1E = I − (E + UDV )−1 UDV

In particular, if u and v are two vectors in Rm and if 1 + vTE−1u 6= 0, then

3. (E + uvT )−1 = E−1 − E−1u(1 + vTE−1u

)−1vTE−1

4. (E + uvT )−1u = αE−1u for some scalar α

Proof. We will only prove the second identity and leave the remaining three for your own amusementas part of your homework assignment. The main idea (in each case) is to reduce one side (herethe left) of the equality to the identity matrix by appropriate matrix multiplications from the left(here by E + UDV ) and/or the right (here by E−1), and then simplify the other side (here theright-hand side) until it reduces to the identity matrix itself. Following this recipe, we obtain

(E + UDV )[

I − (E + UDV )−1 UDV]

E−1 = (E + UDV )E−1 − UDV E−1

= I + UDV E−1 − UDV E−1 = I �

In the above lemma, the first equality is also known as matrix inversion lemma, (Sherman-Morrison-)Woodbury formula or Woodbury matrix identity (pick your favorite but remember theothers). In terms of linear algebra and very useful in numerical and statistical analysis (e.g., Kalmanfilters and recursive least squares methods), it states that the inverse of a rankk-correction of somematrix E can be computed by doing a rank k-correction to the inverse E−1 of the original matrix.In particular, if the inverse of E is already known, then this computation is much cheaper thanfinding the new inverse from scratch, especially if the matrix D is of smaller dimension than E(which is often the case in practice, e.g., in case of the Kalman filter where the matrix D has thedimension of the vector of new observations which can be as small as 1 if only one new observation isprocessed at a time). In the special case where D is an identity matrix I, the matrix I+V E−1U onthe right-hand side of the Woodbury formula is also called capacitance matrix. In the (even more)special case where D is the 1× 1 unit matrix (a fancy way to say that D = 1), this identity reducesto the Sherman-Morrison formula which is the third of the above equalities. In this case, we canmanipulate individual columns or rows of E by using unit vectors for u or v and correspondinglyupdate its inverse in a relative cheap manner; namely, if u is a unit vector, then only one rowof E−1 has to be updated, and similarly, if v is a unit vector, then only one column of E−1. Inparticular, if both u and v are unit vectors, then we need to update only a single element of theoriginal inverse E−1.

Now on to the three good news: First, these identities (if they are not already) will hopefullybecome very powerful “tricks” in your mathematical tool bag. Second, and even better, you willget your own shot at proving them in Exercise 6.2 of your final homework assignment. Third, andprobably least, we can make use of the fourth with E = ADAT , u = x2

0ρ and v = ρ to now obtainthat there exist a scalar α so that (ADAT + x2

0ρρT )−1ρ = α(ADAT )−1ρ, which can be dropped

as well because it only affects the direction’s length which we are somewhat careless about. Thisallows us to define the Phase-I affine-scaling direction towards feasibility independent of x0 andquite innocently-looking as

∆x = DAT (ADAT )−1ρ.

As time permits, we will rediscover this direction later in our discussion of primal-dual methods.

10.5. AFFINE-SCALING FOR LPS WITH INEQUALITY CONSTRAINTS 101

10.5 Affine-Scaling for LPs with Inequality Constraints

This section will not be covered in class.

To conclude this chapter on the affine scaling method, let us only briefly consider the maxi-mization LP with inequalities and nonnegativity constraints


and write down the affine-scaling direction associated with this problem formulation. To make useof what we already know, let us introduce a set of slacks and define the new matrices and vectors

A =[A I

]∈ Rm×(n+m), c =

[c0

]

∈ Rn+m, x =

[xw

]

, and D =

[X2 00 W 2

]

Writing down the affine-scaling direction with D = X2 and E = W 2 for this problem, we easilyobtain that

∆x =

[∆x∆w

]

= (D − DAT (ADAT )−1AD)c

=

([D 00 E

]

−

[D 00 W

] [AT

I

]([A I

][D 00 E

] [AT

I

])−1[A I

][D 00 E

])[c0

]

=

[Dc0

]

−

[DAT

E

]

(ADAT + E)−1ADc

and primarily interested in the (scaled) projected gradient direction associated with x we see that

∆x = Dc−DAT (ADAT + E)−1ADc

Again, we want to remember this direction for later. Similar, for the feasibility direction we find

∆x =

[∆x∆w

]

=

[D 00 E

] [AT

I

]([A I

][D 00 E

] [AT

I

])−1(

b−[A I

][xw

])

=

[DAT

E

](ADAT + E

)−1(b−Ax− w)

so that the feasibility direction for the original variables is given by

∆x = DAT (ADAT + E)−1(b−Ax− w)

where b − Ax − w is again the current feasibility residual of x with respect to the constraintAx+ w = b. Fun times!

Chapter 11

Primal-Dual Methods

Note that our discussion of the affine-scaling method in the previous chapter has worked exclusivelyon the primal problem and ignored any type of duality relationships which, however, have beenexploited heavily in many of the (better) variants of the Simplex Method. Not surprisingly, then,that there is also large class of primal-dual interior-point methods two of which are presented andbriefly discussed in this chapter. For a more extensive treatment, a standard and very readablebook is the nicely written monograph by Wright (1997).

11.1 The Primal-Dual Path-Following Method

Slightly deviating from our book’s discussion that considers the primal-dual pair in the book’s (!)standard form

(P′) max cTx s.t. Ax ≤ b and x ≥ 0 (D′) min bT y s.t. AT y ≥= c and z ≥ 0

whose symmetry is appealing from a theoretical and computational point of view (both of which,however, we largely ignore for this introductory treatment of the subject), we shall consider LPs inwhat is more commonly referred to as the (!) primal-dual standard form

(P) max cTx s.t. Ax = b and x ≥ 0 (D) min bT y s.t. AT y − z = c and z ≥ 0

where x ∈ Rn is the primal variable that must be nonnegative, y ∈ Rm is the free dual variable,and z ∈ Rn is the nonnegative dual slack. In particular, in comparison to the first formulationwe only need to introduce one new vector of slack variables which allows us to use a little bit lessnotation (if you read along in your book, you will understand what I mean).

From the complementarity slackness theorem which we derived as an immediate consequenceof weak and strong duality, we know that a primal-dual solution (x, y, z) is optimal for the aboveprimal-dual pair (P)-(D) if and only if it is both primal and dual feasible and satisfies the com-plementarity conditions xjzj = 0 for all j = 1, . . . , n. Using the same matrix notation X =Diag(x) ∈ Rn×n and Z = Diag(z) ∈ Rn×n, these condition can be written in more compact fromas matrix equality XZ = 0 ∈ Rn×n or, more commonly, as vector equality XZe = 0 ∈ Rn wheree = (1, 1, . . . , 1)T ∈ Rn is the vector all ones and 0 = 0e = (0, 0, . . . , 0)T ∈ Rn is the n-dimensionalzero vector. It is not difficult to see that these complementarity conditions are only satisfied if forevery pair (xj , zj) at least one of the variables vanishes, which is clearly not in the spirit of an IPM.Hence, to enable that all variables remain strictly positive, the IPM that we discuss in this chapteruses the following very simple trick: replace xjzj = 0 by xjzj = µ for some µ > 0, or equivalently,perturb the complementarity conditions by some small (or not so small) positive value of µ.

Theorem. Consider an LP in primal-dual standard form and assume that there exist a feasiblepoint (x, y, z) for which (x, z) > 0 is strictly positive. Then, for every positive µ > 0, there exists a

102

11.1. THE PRIMAL-DUAL PATH-FOLLOWING METHOD 103

unique solution (x(µ), y(µ), z(µ)) to the system of nonlinear equations

Ax = b

AT y − z = c

XZe = µe

For µ tending to zero, the sequence of solutions converges to an optimal point (if one exists)

(x(µ), y(µ), z(µ)) →µ−→0

(x∗, y∗, z∗)

Being nonlinear in nature, your guess would be correct that a complete proof of this resultrequires some machinery from nonlinear analysis and nonlinear programming that we do not haveavailable at this time, so that we sadly omit it in spite of its fundamental importance _ (dependingon how sad this makes you, however, you may want to consider a course in nonlinear or (better!)advanced linear programming which happens to be offered in the spring - hint, hint). Nevertheless,and to help your intuition and re-discover the above conditions using a different line of argumen-tation using the so-called barrier method, you will also find a strategically placed exercise on yourlast homework assignment ¨

11.1.1 The Central Path and Path-Following Algorithm

The basic idea of the algorithm that we describe next is motivated directly from the theorem above:solve the nonlinear system for a series of values µ decreasing to zero, and identify its limit pointas the optimal solution for the original LP. Before we make this procedure precise, let us agreeon some common terminology. For obvious reasons, we refer to the three sets of equalities in thetheorem as primal feasibility, dual feasibility, and µ-complementarity, respectively. Motivated fromits interpretation in the barrier problem, the parameter µ is also called the barrier parameter andis related to the remaining duality gap

bT y − cTx = xTAT y − xT c = xT z = eTXZe = µeT e = nµ ⇔ µ =xT z

n=bT y − cTx

n

Finally, and probably most importantly, the set of all solutions C = {(x(µ), y(µ), z(µ)) : µ > 0} iscalled central path (central because it leads through the interior of the feasible region to an optimalsolution), and the limit value (x, y, z) = limµ→∞((x(µ), y(µ), z(µ)) is called analytic center (the“origin” of the central path). The basic idea of a path-following method then is exactly that, tofollow this path by computing iterates that move (usually approximately) along this path towardsan optimal solution.

Step 0: Start from an initial (feasible or infeasible) point (x, y, z) with strictly positive (x, z) > 0,let 0 < δ < 1 be a parameter to successively reduce the barrier parameter µ, and choose astep length parameter 0 < r < 1, an optimality threshold ε & 0, primal and dual feasibilitythresholds εb & 0 and εc & 0 (often ε = εb = εc), an unboundedness threshold M � 1, and amaximum number of iterations K. Let k = 1 and begin with Step 1.

Step 1: Compute the current duality gap as well as remaining primal and dual residuals

γ = xT z, ρ = b−Ax, and σ = c−AT y + z

respectively. Stop with an (approximate) optimal feasible solution if all optimality and feasi-bility thresholds are met

max

{γ

ε,‖ρ‖

εb,‖σ‖

εc

}

< 1.

Stop with the conclusion that the primal or dual problem is unbounded (and the dual orprimal problem infeasible) if

‖x‖∞ > M or ‖z‖∞ > M

respectively. Also stop if k = K so that the maximum number of iterations is reached.Otherwise increment k ← k + 1 and continue with Step 2.

104 CHAPTER 11. PRIMAL-DUAL METHODS

Step 2: Set the new barrier parameter to µ = δγ/n.

Step 3: Compute a new step direction (∆x,∆y,∆z) (see Section 11.1.2).

Step 4: Compute the step size θ = rmin{

1,xj

|∆xj|,

zj

|∆zj |

}

(or an alternative step size rule).

Step 5: Update the current iterate to (x, y, z)← (x, y, z) + θ(∆x,∆y,∆z) and go to Step 1.

The stopping criteria in Step 1 can be modified in various ways, and the correctness of anyconclusion clearly depends on the careful choice of proper parameters for this algorithm (somerecommendations are given in the book). You are invited to explore some of these dependenciesand try your own parameter choices in Exercise 6.3 of your current homework assignment. Theimplementation should be straightforward, and the computational experimentation a lot of fun,especially once we learn how to make the computer compute the still needed step directions.

11.1.2 Newton Steps and KKT Systems

Starting from a current iterate (x, y, z), in each iteration of the above algorithm we are interestedin taking a step (∆x,∆y,∆z) that brings us to a point new (x+∆x, y+∆y, z+∆z) on the centralpath so that

A(x+ ∆x) = b ⇔ A∆x = b−Ax = ρ

AT (y + ∆y)− (z + ∆z) = c ⇔ AT ∆y −∆z = c−AT y − z = σ

(X + ∆X)(Z + ∆Z)e = µe ⇔ X∆z + Z∆x+ ∆X∆z = µe−Xz

Note that the values of the variable x, y, and z are known, and that we are interested in finding onlythe unknown step directions ∆x, ∆y, and ∆z. Furthermore, note that the only nonlinearity occurs(not surprisingly) in the µ-complementarity condition, but can be removed if we simply droppedthe nonlinear term ∆X∆z. In fact, such “linearization” is one of the most common approachesto solve a nonlinear system and can be shown (and not too cumbersome) to correspond to oneiteration of Newton’s method for finding the root of the nonlinear vector function

F (x, y, z) =

Ax− bAT y − z − cXZe− µe

with Jacobian JF = F ′(x, y, z) =

A 0 00 AT −IZ 0 X

Then it is easy to see that the above “linearized” system is equivalent to the Newton equationJF (∆x,∆y,∆z)T = −F (x, y, z) which in matrix notation becomes

A 0 00 AT −IZ 0 X

∆x∆y∆z

=

b−Axc−AT y + zµe−XZe

=

ρσ

µe−Xz

This system is also called the Karush-Kuhn-Tucker or KKT system, a name that you will see againor have seen already in connection to the optimality conditions for general nonlinear programming.It can then be shown (and is shown in the book if you are interested) that the Jacobian of the KKTsystem is invertible if rankA = m so that the step direction is unique, in particular. Clearly, tocompute this direction it is usually more efficient to solve the KKT system rather than to computethe inverse of JF at this time because we can simplify things by making use of the very specialmatrix structure (Chapters 19 and 20 give a lot of details on how to do this efficiently). First, letus recall that the matrices X and Z are diagonal so that we can express ∆z solely in terms of ∆xby writing

∆z = X−1(µe−Xz − Z∆x) = µX−1e− z −X−1Z∆x

which we can then substitute into the second equation with σ = c−AT y + z and simplify to

AT ∆y − (µX−1e− z −X−1Z∆x) = σ ⇔ X−1Z∆x+AT ∆y = c−AT y + µX−1e

11.1. THE PRIMAL-DUAL PATH-FOLLOWING METHOD 105

The resulting system to solve for ∆x and ∆y is of smaller size with dimension (n+m) and usuallycalled the reduced KKT system

[A 0

X−1Z AT

] [∆x∆y

]

=

[ρ

σ + µX−1e− z

]

To reduce this system even further, we can now use the second equation to express ∆x solely interms of ∆y by writing

∆x = XZ−1(c−AT y + µX−1e−AT ∆y)

and letting D = XZ−1 and substituting the above expression into the first equation (having funyet?) yields the even smaller system

AD(c−AT y + µX−1e−AT ∆y) = ρ ⇔ −ADAT ∆y = ρ−AD(c−AT y + µX−1e)

of only m equations that are also called the normal equations (in primal form). Some early imple-mentations of path-following algorithms used these normal equations to compute the step direction∆y which then enables to find the step directions ∆x and ∆z using a simple series of back substi-tutions

∆y → ∆x = D(c−AT y + µX−1e−AT ∆y) → ∆z = µX−1e− z −X−1Z∆x

This seems intuitive as this last system is the smallest in size and, hence, supposedly the most effi-cient to solve. It turns out, however, that the matrix ADAT is usually dense and without structureeven if the original matrix A was sparse and of special structure, so that modern implementationsoften work with the original KKT system and use sophisticated numerical analysis techniques thatpreserve or work well with sparsity. Short of time, we will again postpone such details for anothercourse, and only send our cheers and thanks to the folks in computational math!

11.1.3 Path-Following Versus Affine-Scaling

To conclude this section of primal-dual path-following methods, let us only briefly take another lookat the step directions for both the path-following and the affine-scaling method. For the later, wewill remember that we had to distinguish between a direction ∆xfeas towards feasibility in Phase Iand a direction ∆xopt towards optimality in Phase II, respectively given by

∆xfeas = DAT (ADAT )−1ρ and ∆xopt =(

D −DAT(ADAT

)−1AD

)

c

where we had defined the primal-scaling matrix D = X2. This does not look too different fromthe matrices arising in the step directions for the path-following method other than that we haddefined D = XZ−1 which - you may have guessed already - is also called primal-dual scaling.To compute the step direction ∆x for the path-following method, it is sufficient (although againsomewhat tedious) to solve the primal normal equations for

∆y = −(ADAT

)−1 (ρ−AD(c−AT y + µX−1e)

)

and then substitute this expression into the equation defining the step direction ∆x and simplify

∆x = D(c−AT y + µX−1e−AT ∆y)

= D(

c−AT y + µX−1e+AT(ADAT

)−1 (ρ−AD(c−AT y + µX−1e)

))

= DAT(ADAT

)−1ρ

︸︷︷︸

∆xfeas

+(

D −DAT(ADAT

)−1AD

)

c︸︷︷︸

∆xopt

+µ(

D −DAT(ADAT

)−1AD

)

X−1e︸︷︷︸

∆xctr

where the first two contributions give the feasibility and optimality step direction like in the affine-scaling matrix but with primal-dual scaling matrix D = XZ−1, and the third contribution is a


centering direction that compromises progress towards optimality in favor of centrality or “loyalty”to the central path. To validate this characterization, let us begin to observe that the step directions∆xopt and ∆xctr are almost identical and both satisfy

A∆x = A(

D −DAT(ADAT

)−1AD

)

u = (AD −AD)u = 0

regardless if u = c or u = X−1e. On the other hand, we have that for the feasibility direction

A∆xfeas = ADAT(ADAT

)−1ρ = ρ

showing that ∆xfeas is indeed the only contribution towards feasibility. Next, to see that the secondcontribution ∆xopt is a step towards optimality, let ρ = 0 (pretending the current iterate is primalfeasible) and µ = 0 (ignoring interiority and shooting for optimality directly) so that the onlyincentive of a path-following algorithm is to move towards the optimal limit point of the centralpoint with the only remaining contribution coming from ∆xopt. Similar, to see that ∆xctr is adirection towards centrality, let ρ = 0 as before and c = 0 (now pretending we are not interested inoptimization) so that the only incentive of a path-following algorithm is to move towards the point(x(µ), y(µ), z(µ)) on the central path with the only remaining contribution coming from ∆xctr.

We finish this section with another brief remark. Motivated from the above step directiondecomposition that allows to explicitly distinguish between steps towards feasibility, optimality,and centrality, the class of so-called predictor-corrector methods uses the reduction parameter δ inthe algorithm of Section 11.1.1 as binary on/off switch parameter to alternate between so-calledpredictor or affine-scaling steps if δ = 0, then ignoring centrality and merely predicting the steepestascent direction towards optimality, and so-called corrector or centering steps if δ = 1, then ignoringfurther progress along the central path and merely correcting interiority and proximity to the centralpath. In practice, these primal-dual predictor-corrector path-following methods belong to the mostefficient interior-point methods, in general.

11.2 Homogeneous Self-Dual Linear Programs

Definition. A linear program max cTx s.t. Ax ≤ b, x ≥ 0 is said to be

1. homogeneous if both the objective vector and the right-hand side vanish, c = 0 and b = 0,and

2. self-dual if the constraint matrix is skew symmetric A = −AT (hence in particular m = n)and b = −c.

Clearly, for the above definition to be reasonable, primal and dual problem of a self-dual LPshould be identical which is verified easily. In view of primal-dual methods, such problems arevery appealing because of the massive symmetry inherent in these problems, which can be heavilyexploited for both the investigation of theoretical questions such as convergence and complexity ofalgorithms, and the efficient implementation of matrix decomposition and factoring strategies withhigh computational benefits. Nevertheless, it seems that the above definitions are very restrictive,and that almost no real problem would either be homogeneous or self-dual. The only purpose ofthis section is to make you aware that this quick conclusion is not completely correct and that, infact, every linear program can be formulated as an equivalent homogeneous self-dual LP.

Let us consider the the following primal-dual pair with inequality and nonnegativity constraintsin both primal and dual problem

(P) max cTx s.t. Ax ≤ b and x ≥ 0 (D) min bT y s.t. AT y ≥= c and z ≥ 0

11.2. HOMOGENEOUS SELF-DUAL LINEAR PROGRAMS 107

and combine both problems into the following homogeneous self-dual LP formulation

maximize 0s.t. − AT y + cφ ≤ 0,

Ax − bφ ≤ 0,−cTx + bT y ≤ 0,

x, y, φ ≥ 0.

It is easy to see that this new problem is homogeneous and self-dual with the same numberof n +m + 1 constraints and variables and a skew-symmetric constraint matrix. Furthermore, itis trivial that this problem is always feasible by setting all variables equal to zero, and that everysolution that is feasible is also optimal so that we are only interested in nontrivial solutions. In fact,we will later show that if the trivial solution is the only feasible solution, then at least one of theoriginal problems (P) or (D) must be infeasible. Without going into details, the homogeneous self-dual method is a predictor-corrector path-following interior-point method that solves the equivalentproblem with slack variables

maximize 0s.t. − AT y + cφ + z ≤ 0,

Ax − bφ + w ≤ 0,−cTx + bT y + ψ ≤ 0,

x, y, φ, z, w, ψ ≥ 0.

for a strictly complementary solution x, y, φ, z, w, ψ for which each complementary primal-dualvariable pair has exactly one variable equal to zero but not both, xj + zj > 0 for all J = 1, . . . , n,yi + wi > 0 for all i = 1, . . . ,m, and φ + ψ > 0. In this case, the next (and for this course final)theorem completely characterizes optimality and infeasibility for both the original primal and dualproblem.

Theorem. Let (x, y, φ, z, w, ψ) be a strictly complementary (thus optimal) solution for the slackedhomogeneous self-dual problem of an original primal-dual pair (P)-(D).

1. If φ > 0, then x∗ = x/φ is optimal for the primal problem (P) and y∗ = y/φ is optimal forthe dual problem (D).

2. If φ = 0, then either bT y < 0 or cT x > 0.

(a) If bT y < 0, then the primal problem (P) is infeasible.

(b) If cT x > 0, then the dual problem (D) is infeasible.

Proof. For the first part, let φ > 0, x∗ = x/φ and y∗ = y/φ. Dividing the first and second constraintby φ > 0 and using that z/φ ≥ 0 and w/φ ≥ 0, we directly obtain feasibility of x∗ and y∗ because

−ATy∗ + c ≤ 0 ⇔ AT y∗ ≥ c and Ax∗ − b ≤ 0 ⇔ Ax∗ ≤ b

Furthermore, from ψ ≥ 0 and the third constraint combined with weak duality, it then follows that

−cTx∗ + bT y∗ ≤ 0 and cTx∗ ≤ bT y∗ ⇔ cTx∗ = bT y∗

showing that x∗ and y∗ are an optimal primal-dual pair for the original problems (P) and (D). Forthe second part, let φ = 0 so that AT y ≥ 0 and Ax ≤ 0 because z ≥ 0 and w ≥ 0, respectively,and ψ > 0 so that −cT x + bT y < 0 by strict complementarity. In particular, this implies thatat least one of the two terms has to be strictly negative, or equivalently, that either cT x > 0 orbT y < 0. Without loss of generality, let us assume that bT y < 0 and show that this implies thatthe primal is infeasible (the other direction then goes analogously and is given in the book). Sothen, by contradiction assume that the primal is feasible, so that there exists a nonnegative vectorx ≥ 0 that satisfies Ax ≤ b. From AT y ≥ 0 and the nonnegativity of x and y, it then follows that

0 ≤ xTAT y = (Ax)T y ≤ bT y < 0

which is clearly a contradiction which implies that the primal problem must have been infeasible. �


Although still lacking some background for a full appreciation of the homogeneous self-dualmethod, especially the discussion of the primal-dual path-following algorithm in the previous chap-ter should have given you enough of an idea to read and understand the comparison between the(parametric) self-dual simplex method and the (homogeneous) self-dual interior-point method inSection 22.4 starting on page 375 in your text book. What a great final reading assignment to endthis class! Thank you for a fun term, and congratulations for making it through ¨

11.3 Problem Set 7

Exercise 11.1. (Mathematical Modeling and Optimization: A Cutting-Stock Problem)[6 points] The output of a paper mill consists of standard rolls 110 inches (110”) wide, which arecut into smaller rolls to meet orders. This week there are orders for rolls of the following widths:

width 20” 45” 50” 55” 75”

orders 48 35 24 10 8

The owner of the mill wants to know what cutting pattern to apply as to fill the orders using thesmallest number of 110” rolls. A cutting pattern consists of a certain number of rolls of each width,such as two of 45” and one of 20”, or one of 50” and one of 55” (and 5” of waste).

(a) Suppose, to start with, that we consider only the following six patterns:

Width 1 2 3 4 5 6

20” 3 1 0 2 1 345” 0 2 0 0 0 150” 1 0 1 0 0 055” 0 0 1 1 0 075” 0 0 0 0 1 0

How many rolls should be cut according to each pattern, to minimize the number of 110”rolls used? Formulate and solve this problem as a linear program, assuming that the numberof smaller rolls produced need only be greater than or equal to the number ordered.

(b) Resolve the above problem, with the additional restriction that the number of rolls producedin each size must be between 10% under and 40% over the number ordered.

(c) Find another pattern that, when added to those above, improves the optimal solution.

(d) All of the solutions above use fractional numbers of rolls. Can you find solutions that also sat-isfy the constraints, but that cut an integer number of rolls in each pattern? How much doesyour integer solution cause the objective function value to go up in each case? (Hint: To findinteger solutions using AMPL, you can declare var var name integer; and switch to an ap-propriate integer solver such as CPLEX or LPSOLVE using options solver solver name;)

Exercise 11.2. (The Matrix Inversion Lemma / Sherman-Morrison-Woodbury Formu-las) [4 points] Let E ∈ Rm×m, D ∈ Rk×k, U ∈ Rm×k, and V ∈ Rk×m, and assume that E, D,E + UDV , and D−1 + V E−1U are invertible matrices. In addition, for (b) and (c), let u and v betwo vectors in Rm and assume that 1 + vTE−1u 6= 0. Verify the following three matrix identities.

1. (E + UDV )−1 = E−1 − E−1U(D−1 + V E−1U

)−1V E−1

2. (E + uvT )−1 = E−1 − E−1u(1 + vTE−1u

)−1vTE−1

3. (E + uvT )−1u = αE−1u for some scalar α


Exercise 11.3. (Interior-Point Algorithms in Matlab) [6 points] Implement an affine-scalingand a primal-dual path-following algorithm in Matlab and use your two programs to do Exercise 18.1on page 314 in your text book. In extension to the original problem statement, however, you mayalso experiment with different parameters for your two algorithms and should try to solve allfour problems in Exercises 2.3, 2.4, 2.5, and 2.10 to optimality and, in addition, correctly detectinfeasibility and unboundedness for the two problems in Exercises 2.6 and 2.7. Briefly commenton the performance of your two methods in comparison with (i) each other, and (ii) the simplexmethod. For full credit, please submit your Matlab files through the file upload tool on Blackboard.

Exercise 11.4. (The Logarithmic Barrier Problem) [4 points] Let f(x) : Rn → R1 andg(x) : Rn → Rm be two differentiable functions with gradients ∇f(x) ∈ Rn and ∇gi(x) ∈ Rn fori = 1, . . . ,m. A famous result in nonlinear programming states that under certain conditions, apoint x∗ = arg max{f(x) : g(x) = 0} if and only if x∗ is feasible and the gradient of the objectivef at x∗ can be written as a linear combination of the gradient vectors of the constraints gi at x∗

∇f(x∗) =

m∑

i=1

yi∇gi(x∗) for some (unrestricted) y ∈ Rm

(usually called the Karush-Kuhn-Tucker or KKT conditions). Show that the µ-complementaritycondition XZe = µe for the primal-dual path-following method is equivalent to the above conditionfor the so-called logarithmic barrier problem maxx>0{c

Tx+µ∑n

j=1 log xj : Ax = b} (here log be thenatural logarithm), and motivate this problem formulation from an interior-point point of view.

Exercise 11.5. (The Central Path) [4 points] Do Exercise 17.1 on page 299 in your text book(hint on page 300). Also compute the central path limits (optimal solution and analytic center) asµ tends to zero and infinity. You will get credit either for this or the previous problem, but notboth.

Selected References

Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. (1993). Network flows. Prentice Hall Inc., EnglewoodCliffs, NJ. Theory, algorithms, and applications.

Bartlett, M. S. (1951). An inverse matrix adjustment arising in discriminant analysis. Ann. Math.Statistics, 22:107–111.

Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D. (2005). Linear programming and network flows.Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition.

Beale, E. M. L. (1955). Cycling in the dual simplex algorithm. Naval Res. Logist. Quart., 2:269–275(1956).

Caratheodory, C. (1907). Uber den Variabilitatsbereich der Koeffizienten von Potenzreihen, diegegebene Werte nicht annehmen. Math. Ann., 64(1):95–115.

Dantzig, G. B. (1951). Maximization of a linear function of variables subject to linear inequalities.In Activity Analysis of Production and Allocation, Cowles Commission Monograph No. 13, pages339–347. John Wiley & Sons Inc., New York, N. Y.

Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik,1:269–271.

Farkas, J. G. (1902). Uber die Theorie der Einfachen Ungleichungen. Journal fur die Reine undAngewandte Mathematik, 124:1–27.

Fourer, R., Gay, D. M., and Kernighan, B. W. (2002). AMPL: A Modeling Language for Mathe-matical Programming. Duxbury Press / Brooks/Cole Publishing Company. Second Edition.

Kantorovich, L. V. (1959/1960). Mathematical methods of organizing and planning production.Management Sci., 6:366–422.

Klee, V. and Minty, G. J. (1972). How good is the simplex algorithm? In Inequalities, III (Proc.Third Sympos., Univ. California, Los Angeles, Calif., 1969; dedicated to the memory of TheodoreS. Motzkin), pages 159–175. Academic Press, New York.

Rockafellar, R. T. (1970). Convex analysis. Princeton Mathematical Series, No. 28. PrincetonUniversity Press, Princeton, N.J.

Sherman, J. and Morrison, W. J. (1950). Adjustment of an inverse matrix corresponding to achange in one element of a given matrix. Ann. Math. Statistics, 21:124–127.

Vanderbei, R. J. (2008). Linear programming. Foundations and extensions. International Series inOperations Research & Management Science, 114. Springer, New York, third edition.

Wright, M. H. (2005). The interior-point revolution in optimization: history, recent developments,and lasting consequences. Bull. Amer. Math. Soc. (N.S.), 42(1):39–56 (electronic).

Wright, S. J. (1997). Primal-dual interior-point methods. Society for Industrial and Applied Math-ematics (SIAM), Philadelphia, PA.

110

Appendix A

Solutions

Solution 1.1 (The Diet Problem) Answers were meant to (and did) vary:

• Anzhelika spends $117.05/month on ground pork, milk, nectarines, and rice.

• Cathy developped two options and may either live on burritos, spinach, and zucchini for$28.78/week, or on apples, burritos, and tortellini for $58.01/week.

• Evan enjoys bananas, beef, carrots, corn, milk, and rice – all for $3.46/day.

• Ganesh spends $45.126/week on chicken, milk, vegetables, fruits, cookies, juice and rice.

• Jenny’s meal plan contains CHR, CRL, QNA, SHK, and SND and costs $71.65/week.

• Jeremy ensures a well-balanced diet with basic amounts of bread, dairy, fruits, meats, andveggies, but otherwise feeds his family exclusively with pasta.

• Kapil will find his diet later.

• Lauren lives healthy on eggs, ice cream, and peanut butter for only $14.45/month.

• Linn mixes his mac & cheese with some beef and chicken and pays $100.56/?.

• Parvaneh’s diet consist of rice, turkey, and vegetable at the bargain price of $2.65/day.

• YongLi must shop stricly organic: $266.49/month for bananas, broccoli, and eggs.

Solution 1.2 (Airline Modeling) To define a general model for the given problem, let us definethe three sets FARE, ORIG, and DEST and introduce the following decision variables and dataparameters:

xijk number of tickets to sell in fare class i ∈ FARE from j ∈ ORIG to k ∈ DEST,

uijk upper bound of customers in fare class i from origin j to destination k,

cijk ticket price for flying in fare class i from origin j to destination k,

s number of seats in aircraft.

Note that we can directly set uijj = cijj = 0 for all j ∈ ORIG ∩ DEST , i.e. we do not offer (andthus do not charge) for flights for which origin and destination are the same (actually, cijj could

111

112 APPENDIX A. SOLUTIONS

also be set arbitrarily). The mathematical model is then given as to

maximize revenue:∑

i∈ FARE

∑

j∈ ORIG

∑

k∈ DEST

cijkxijk

forbidding overbooking:∑

i∈ FARE

∑

j∈ ORIG

xijk ≤ s for all k ∈ DEST

∑

i∈ FARE

∑

k∈ DEST

xijk ≤ s for all j ∈ ORIG

subject to bounds: 0 ≤ xijk ≤ uijk for all i ∈ FARE, j ∈ ORIG, k ∈ DEST.

The no-overbooking constraints simply say that at most s customers can arrive at any given desti-nation, and that at most s customers can leave from any given origin, respectively. Clearly, theseconstraints may also be bounded by zero from below, but this is already implied by the lowerbounds on each individual variable. To translate the above model into AMPL language, we write

set FARE;

set ORIG;

set DEST;

param price {FARE, ORIG, DEST} >= 0;

param bound {FARE, ORIG, DEST} >= 0;

param seats >= 0;

var tickets {i in FARE, j in ORIG, k in DEST} <= bound[i,j,k], >= 0;

maximize revenue: sum {i in FARE, j in ORIG, k in DEST}

price[i,j,k] * tickets[i,j,k];

subject to no_overbooking_dest {k in DEST} :

sum {i in FARE, j in ORIG} tickets[i,j,k] <= seats;

no_overbooking_orig {j in ORIG} :

sum {i in FARE, k in DEST} tickets[i,j,k] <= seats;

The specification of the input data in 3 × 2 × 2 dimensions becomes a little tricky because of thetriple indices and can be done as follows (you will need to remember this for later):

set FARE := Y B M;

set ORIG := Ithaca Newark;

set DEST := Newark Boston;

param price :=

[Y,*,*]: Newark Boston :=

Ithaca 300 360

Newark 0 160

[B,*,*]: Newark Boston :=

Ithaca 220 280

Newark 0 130

[M,*,*]: Newark Boston :=

Ithaca 100 140

Newark 0 80;

param bound :=

113

[Y,*,*]: Newark Boston :=

Ithaca 4 3

Newark 0 8

[B,*,*]: Newark Boston :=

Ithaca 8 10

Newark 0 13

[M,*,*]: Newark Boston :=

Ithaca 22 18

Newark 0 20;

param seats := 30;

Now solving this instance using AMPL, we get the following solution output:

Presolve eliminates 0 constraints and 3 variables.

Adjusted problem:

9 variables, all linear

4 constraints, all linear; 18 nonzeros

1 linear objective; 9 nonzeros.

MINOS 5.5: optimal solution found.

8 iterations, objective 9790

tickets :=

B Ithaca Boston 10

B Ithaca Newark 8

B Newark Boston 9

B Newark Newark 0

M Ithaca Boston 0

M Ithaca Newark 5

M Newark Boston 0

M Newark Newark 0

Y Ithaca Boston 3

Y Ithaca Newark 4

Y Newark Boston 8

Y Newark Newark 0

;

Note how AMPL recognizes that the three variables corresponding to the number of tickets fromIthaca to Ithaca, Newark to Newark, and Boston to Boston are redundant and thus eliminated aspart of the algorithm’s presolve. Can you think of a more elegant way to avoid this redundancy?

Solution 1.3 (Geometry of Infeasible and Unbounded LPs) The two following plots depictthe geometry of the two above problems.


0 1 2 3 4

0

1

2

3

4

x1

x2

x1 +x2 ≤

2

x1 +x2 ≥

4.5

-1 0 1 2 3

-1

0

1

2

3

x1

x2

2x1 − x2 ≥ 1x1 + 2x2 ≥ 2

ζ = −8

ζ = −4

ζ = 0

Since the two halfspaces in the left plot do not overlap, the feasible region is empty and theproblem infeasible. In the right picture, we see (i) that the feasible region is unbounded into thenortheast direction, and (ii) that the objective value of the level curves increase as we move theirintercepts down along the vertical axis. Because the level curves are linear functions with a positiveslope, however, each curve eventually enters the feasible region so that there are feasible solutionswith arbitrarily large objective values which means that the problem is unbounded. Can you thinkof an example for an LP that has an optimal solution although the feasible region is unbounded?

Solutions 1.4 (Basic Linear Programming Theory)

1. Representation of LP in Standard and Canonical Form: For the conversion to standard form,we first change the original problem from maximization to minimization by multiplying theobjective by−1. We also introduce a (nonnegative) slack vector w ∈ Rm (with zero coefficientsin the objective) to write the less-or-equal inequalities as equality constraints:

c = −

[c0

]

∈ Rn+m, A =[A I

]∈ Rm×(n+m), and x =

[xw

]

∈ Rn+m

where I ∈ Rm×m is the m-dimensional identity matrix. Now the original problem can equiv-alently be written in standard form as follows:

− minimize cT x subject to Ax = b and x ≥ 0.

For the conversion into canonical form, we also change from maximization to minimization,and we write the less-or-equal inequalities as greater-or-equalities by multiplying each con-straint by −1 and combining them with the original nonnegativity constraints:

c = −c ∈ Rn, A =

[−AI

]

∈ R(m+n)×n, and b =

[−b

0

]

∈ Rm+n.

Then the original problem can be written equivalently in the following canonical form:

− minimize cTx subject to Ax ≥ b.

It is also possible to show that every LP in standard or canonical form can equivalently bewritten in the form that is used in our book. Do you see how? [Two hints: First, anyunrestricted variable x ≷ 0 can be written as the difference of two non-negative variables:x = x+ − x− where x+ ≥ 0 and x− ≥ 0. Second, an equality constraint Ax = b can be splitup into the two inequalities Ax ≤ b and Ax ≥ b, or equivalently, Ax ≤ b and −Ax ≤ −b.]

2. Characterization of LP as Convex and Conic Optimization Problem

(a) Convex Functions and Sets: If: Let epi(f) be convex, x1 and x2 be any two ele-ments in S, and 0 ≤ λ ≤ 1. Because f(x1) ≤ f(x1) and f(x2) ≤ f(x2), we see that

115

(x1, f(x1)) and (x2, f(x2)) are elements of epi(f), and by convexity of epi(f) it followsthat λ(x1, f(x1)) + (1 − λ)(x2, f(x2)) ∈ epi(f) also. Hence, by definition of epi(f) wecan conclude that f(λx1 + (1−λ)x2) ≤ λf(x1)+ (1−λ)f(x2) showing that f is convex.

Only if: Let f be convex, (x1, z1) and (x2, z2) be any two elements of epi(f), and0 ≤ λ ≤ 1. Then we have to show that λ(x1, z1) + (1 − λ)(x2, z2) ∈ epi(f) also, orequivalently, that f(λx1 + (1 − λ)x2) ≤ λz1 + (1 − λ)z2. The proof now follows in twoparts. First, for the term on the left, we know by convexity of f that f(λx1+(1−λ)x2) ≤λf(x1) + (1−λ)f(x2). Second, we have that f(x1) ≤ z1 and f(x2) ≤ z2 because (x1, z1)and (x2, z2) are in epi(f), and combining the two it follows that

f(λx1 + (1− λ)x2) ≤ λf(x1) + (1− λ)f(x2) ≤ λz1 + (1 − λ)z2

because λ ≥ 0, which was to be shown. The proof is complete. �

(b) Convex Optimization: We already know that every LP can be written in eitherstandard or canonical (or one of several other) forms. For simplicity, let us then assumethat the LP is given in its canonical form min cTx s.t. Ax ≥ b, so that it is sufficient toshow that the objective function f(x) = cTx is convex and that S = {x : Ax ≥ b} is aconvex set. For the former, we have that

f(λx+ (1− λ)y) = cT (λx+ (1− λ)y) = λcTx+ (1− λ)cT y = λf(x) + (1− λ)f(x)

for any λ, showing that linear functions are affine and, in particular, convex.

To show that the feasible set S = {x : Ax ≥ b} is convex, let x and y be any twoelements in S, so Ax ≥ b and Ay ≥ b, and let 0 ≤ λ ≤ 1, so A(λx) = λAx ≥ λb andA((1 − λ)y) = (1− λ)Ay ≥ (1− λ)b. Then it follows that

A(λx+ (1− λ)y) = A(λx) +A((1− λ)y) ≥ λb+ (1− λ)b = b

also, and hence we have λx+ (1− λ)y ∈ S which shows that S is convex. �

Finally, it is clear that the set x ≥ 0 is a convex set and that the function f(x) = x2 isconvex but not linear. Hence, a convex program that is not an LP is the (admittedlytrivial) optimization problem

minimize x2 subject to x ≥ 0.

(c) A Challenge: By definition, every convex program can be formulated to minimize aconvex function over a convex set

minimize f(x) subject to x ∈ S

where f(x) is a convex function and S is a convex set. This problem can also be writtenas to

minimize z subject to f(x) ≤ z and x ∈ S

because f(x) = z at any optimal solution (otherwise, we would have f(x) < z becausex is feasible and then could reduce z to improve our objective). Clearly, now z is alinear function, and the set {(x, z) : f(x) ≤ z} is the epigraph of f and convex becausef is convex (from part (a)). So we only need to make sure that the set {(x, z) : f(x) ≤z and x ∈ S} = epi(f) ∩ S × R is also convex, which follows because the intersection ofconvex sets is still a convex set (check if you are not sure).

(d) Polyhedral Sets and Cones (1 point): For part (i), note that the definition of a conesays that for every point x ∈ K, any nonnegative scaling λx with λ ≥ 0 is also in thecone. Hence, with every point x a cone also contains the complete ray emanating fromthe origin and passing through that point, and we could simply characterize a (general)cone as a collection of rays from the origin.


Trivial yet important examples include the halfline R+ = {x ∈ R : x ≥ 0} and thepositive quadrant R2

+ = {x ∈ R2 : x ≥ 0}. In particular, both of these cones arepolyhedral because R+ is itself a halfspace, and R2

+ is the intersection of the two positivehalfspaces H1 = {x ∈ R2 : eT1 x ≥ 0} and H2 = {x ∈ R2 : eT2 x ≥ 0} with unit normalvectors e1 = ( 1

0 ) and e2 = ( 01 ), respectively:

R2+ = H1 ∩H2 = {x ∈ R2 : x1 ≥ 0} ∩ {x ∈ R2 : x2 ≥ 0}.

Finally, for the statement in (iii) we need to show two directions.

If: Let K+K ⊆ K and show that K is convex, i.e. λx+(1−λ)y ∈ K whenever x ∈ K,y ∈ K, and 0 ≤ λ ≤ 1. First, because x and y are in K and both λ and 1 − λ arenonnegative, it follows that λx and (1− λ)y are also in K because K is a cone. Second,because K +K ⊆ K, it then follows that λx+ (1− λ)y ∈ K +K ⊆ K which shows thatK is convex.

Only If: Let K be a convex cone and show that K +K ⊆ K, i.e. x+ y ∈ K wheneverx ∈ K and y ∈ K. First, because K is a cone and x ∈ K and y ∈ K, we have that0.5x ∈ K and 0.5y ∈ K. By convexity of K, it then follows that 0.5x + (1 − 0.5)y =0.5x + 0.5y ∈ K, and again 2(0.5x + 0.5y) = x+ y ∈ K because K is a cone. �

(e) Conic Optimization: Since the objective function of any LP is linear, we only needto show that the feasible set for any LP in standard or canonical form can be written asintersection of an affine set with a convex cone. The feasible set for the standard form isgiven by S = {x ∈ Rn : Ax = b, x ≥ 0} = {x ∈ Rn : Ax = b} ∩ {x ∈ Rn : x ≥ 0}, where{x ∈ Rn : Ax = b} is affine because for any x and y with Ax = b and Ay = b, we have

A(λx+ (1− λ)y) = λAx+ (1− λ)Ay = λb+ (1− λ)b = b

for any (not only nonnegative) scalar λ. In part (d), we have already discussed that thesecond set is a convex cone, which completes the case of the standard form.

For the canonical form, the feasible set is given by {x ∈ Rn : Ax ≥ b} which, however, isneither affine nor a convex cone, in general. In this case, we must introduce an additionalslack vector w ∈ Rm to “lift” the original feasible set from Rn into the higher-dimensionalspace Rm+n, yielding {(x,w) ∈ Rn×Rm : Ax−w = b}∩{(x,w) ∈ Rn×Rm : w ≥ 0, x free}where the set first set is the affine set and the second set a convex cone.

Note the the above problem is not in standard form because the variable x ∈ Rn isstill unrestricted. However, by writing x as the difference of two non-negative variablesx = x+ − x− where x+ ≥ 0 and x− ≥ 0, we can alternatively write a canonical LPproblem in (conic) standard form in R2n+m. You may begin to realize that in order todevelop a good (intuitive) understanding of LP and its geometry in terms of affine setsand convex cones, a solid understanding of convex analysis is often very helpful.

(f) Another Challenge (for the fun only): For simplicity, let us choose the completeunderlying space as our affine set, and any linear function as our objective. Because anypolyhedral cone can be written as the finite number of halfspaces which are given bylinear equalities, a conic program that is not an LP must use a nonpolyhedral convexcone. Two important examples are the following.

• The second-order (or ice-cream) cone is defined by

K2 = {(x, z) ∈ Rn+1 : z ≥ ‖x‖2}

(here ‖x‖2 =√∑n

i=1 x2i is the L2-norm). It is convex and nonpolyhedral, and the

associated conic programs are called second-order cone programs (SOCPs).

117

• The set of symmetric, positive semidefinite (psd) matrices

KS = {X ∈ Rn×n : X is a symmetric, positive semidefinite matrix}

is also a convex nonpolyhedral cone, and the associated conic programs are calledsemidefinite programs (SDPs). The geometry of this cone is completely understood!

Figure A.1: A polyhedral cone (left), an ice-cream cone (middle), and a (projected) psd cone (right)

Recently, there has been much interest in these extensions of traditional linear program-ming. Congratulations to your successful start in exploring these new “hot” areas!

Solution 2.1 (Transportation Model) [transp.mod]: The following two files are possibleAMPL model and data files (you will also find them in your AMPL models folder).

set ORIG; # origins

set DEST; # destinations

param supply {ORIG} >= 0; # amounts available at origins

param demand {DEST} >= 0; # amounts required at destinations

check: sum {i in ORIG} supply[i] = sum {j in DEST} demand[j];

param cost {ORIG,DEST} >= 0; # shipment costs per unit

var Trans {ORIG,DEST} >= 0; # units to be shipped

minimize Total_Cost:

sum {i in ORIG, j in DEST} cost[i,j] * Trans[i,j];

subject to Supply {i in ORIG}:

sum {j in DEST} Trans[i,j] = supply[i];

subject to Demand {j in DEST}:

sum {i in ORIG} Trans[i,j] = demand[j];

Data (Transportation Model) [transp.dat]

data;

param: ORIG: supply := # defines set "ORIG" and param "supply"

GARY 1400

CLEV 2600

PITT 2900 ;

param: DEST: demand := # defines "DEST" and "demand"

FRA 900

DET 1200


LAN 600

WIN 400

STL 1700

FRE 1100

LAF 1000 ;

param cost:

FRA DET LAN WIN STL FRE LAF :=

GARY 39 14 11 14 16 82 8

CLEV 27 9 12 9 26 95 17

PITT 24 14 17 13 28 99 20 ;

To solve this problem, we can then use AMPL to find the optimal objective value of $196,200which corresponds to the minimum transportation costs when choosing the following shipping plan.






Trans [*,*](tr)

: CLEV GARY PITT :=

DET 1200 0 0

FRA 0 0 900

FRE 0 1100 0

LAF 400 300 300

LAN 600 0 0

STL 0 0 1700

WIN 400 0 0 ;

Note that there is another shipping plan that Maureen has found using an alternative but alsocorrect AMPL model, yielding the same minimum transportation cost as the one found above.

Trans [*,*](tr)

: CLEV GARY PITT :=

DET 1200 0 0

FRA 0 0 900

FRE 0 1100 0

LAF 400 0 600

LAN 600 0 0

STL 0 300 1400

WIN 400 0 0 ;

In fact, using these two solutions, you should be able to come up with an infinite number ofoptimal shipping plans, all of which having the same minimum cost of $196,200. Do you see how?

Solution 2.2 (Solving Linear Programs using the Simplex Method)

Solution 2.5 The initial dictionary and tableau for the auxiliary Phase-I problem are givenby

119

maximize ξ = −x0

subject to w1 = −3 + x0 + x1 + x2

w2 = −1 + x0 + x1 − x2

w3 = 4 + x0 − x1 − 2x2

x0 x1 x2 1

ζ (0 1 3 0)ξ −1 0 0 0

w1 -1 −1 −1 −3w2 −1 −1 1 −1w3 −1 1 2 4

We then choose the auxiliary variable x0 as entering variable and pivot on the most negative right-hand side w1 = −3 + x0 + x1 + 3x2 so that x0 = −3 + w1 + x1 + 3x2. Following the standardprocedures for both methods, we obtain the updated dictionary and tableau as to

maximize ξ = −3 − w1 + x1 + x2

subject to x0 = 3 + w1 − x1 − x2

w2 = 2 + w1 − 2x2

w3 = 7 + w1 − 2x1 − 3x2

w1 x1 x2 1

ζ (0 1 3 0)ξ −1 1 1 −3

x0 −1 1 1 3w2 −1 0 2 2w3 −1 2 3 7

Using Bland’s Rule and choosing the smaller-indexed variable x1 over x2 as entering variable, nextwe pivot between x1 and x0 which recovers the auxiliary objective function in its initial formtogether with a feasible solution for the original problem

maximize ξ = −x0


w2 = 2 + w1 − 2x2

w3 = 1 − w1 + 2x0 − x2

w1 (x0) x2 1

ζ 1 (−1) 2 3ξ (0 −1 0 0)

x1 −1 (1) 1 3w2 −1 (0) 2 2

w3 1 (−2) 1 1

Note that this last tableau is optimal for Phase-I with x0 = 0, so that we can again drop x0 andreplace the auxiliary objective ξ by the original ζ = x1 + 3x2 = (3+w1−x2)+ 3x2 = 3+w1 + 2x2.Then using Alex Rule (i.e., look ahead and detect which pivot provides the optimal solution in thenext step, if possible), we may pivot between x2 and w3 to get an optimal dictionary and tableau

maximize ζ = 5 − w1 − 2w3

subject to x1 = 2 + 2w1 + w3

w2 = 0 + 3w1 + 2w3

x2 = 1 − w1 − w3

w1 w3 1

ζ −1 −2 5

x1 −2 −1 2w2 −3 −2 0x2 1 1 1

Hence, an optimal solution is (x∗1, x∗2, w

∗1 , w

∗2, w

∗3) = (2, 1, 0, 0, 0) with an objective value of ζ∗ = 5.

Solution 2.6 Again, we start with the initial dictionary and tableau for the Phase-I problem

maximize ξ = −x0

subject to w1 = −3 + x0 + x1 + x2

w2 = −1 + x0 + x1 − x2

w3 = 2 + x0 − x1 − 2x2

x0 x1 x2 1

ζ (0 1 3 0)ξ −1 0 0 0

w1 -1 −1 −1 −3w2 −1 −1 1 −1w3 −1 1 2 2


Note that because only the positive right-hand side has changed compared to the previous problemin Exercise 2.5, the first pivot is largely identical with the first step that we had done before

maximize ξ = −3 − w1 + x1 + x2


w2 = 2 + w1 − 2x2

w3 = 5 + w1 − 2x1 − 3x2

w1 x1 x2 1

ζ (0 1 3 0)ξ −1 1 1 −3

x0 −1 1 1 3w2 −1 0 2 2

w3 −1 2 3 5

For the second step, we may again apply Bland’s Rule to find x1 as entering variable, and then theMinimum-Ratio test indicates that the leaving variable is now w3 rather than w1 (because of thesmaller right-hand side of the third constraint and, thus, the smaller ratio associated with w3)

max ξ = −0.5 − 0.5w1 − 0.5w3 − 0.5x2

s.t. x0 = 0.5 + 0.5w1 + 0.5w3 + 0.5x2

w2 = 2 + w1 − 2x2

x1 = 2.5 + 0.5w1 − 0.5w3 − 1.5x2

w1 w3 x2 1

ζ (0.5 −0.5 1.5 2.5)ξ −0.5 −0.5 −0.5 −0.5

x0 −0.5 −0.5 −0.5 0.5w2 −1 0 2 2x1 −0.5 0.5 1.5 2.5

Note that the solution (x0, x1, x2, w1, w2, w3) = (0.5, 2.5, 0, 0, 2, 0) is optimal for Phase-I but infea-sible for the original problem because x0 = 0.5 is still greater than 0 – the problem is infeasible.

Solution 2.7 Once more starting from the initial dictionary and tableau for the Phase-Iproblem

maximize ξ = −x0

subject to w1 = −3 + x0 + x1 + x2

w2 = −1 + x0 + x1 − x2

w3 = 2 + x0 + x1 − 2x2

x0 x1 x2 1

ζ (0 1 3 0)ξ −1 0 0 0

w1 -1 −1 −1 −3w2 −1 −1 1 −1w3 −1 −1 2 2

Again very similar to the previous problem, now only the coefficient of x1 in the third constrainthas changed sign so that the first tableau can be easily adjusted from the one we had before

maximize ξ = −3 − w1 + x1 + 3x2


w2 = 2 + w1 − 2x2

w3 = 5 + w1 − 3x2

w1 x1 x2 1

ζ (0 1 3 0)ξ −1 1 1 −3

x0 −1 1 1 3w2 −1 0 2 2w3 −1 0 3 5

Here observe that both w2 and w3 do not depend anymore on x1 after the first pivot which makesthe update very convenient if we again decide to pivot between x1 and x0 as in Exercise 2.5. Infact, note that we only need to update the pivot constraint and the objective now yielding

maximize ξ = −x0


w2 = 2 + w1 − 2x2

w3 = 5 + w1 − 3x2

w1 (x0) x2 1

ζ 1 −1 2 3ξ (0 −1 0 0)

x1 −1 (1) 1 3w2 −1 (0) 2 2w3 −1 (0) 3 5

121

Note that the above dictionary and tableau are optimal for Phase-I with x0 = 0, so that its solutionis also feasible for the original problem. Hence, we drop x0 and reintroduce the original objectiveas ζ = x1 +3x2 = (3+w1−x2)+3x2 = 3+w1 +2x2 (check that this is the same as in Exercise 2.5).In particular, the objective coefficient of w1 is still positive so that we can further increase ζ byincreasing w1, although now can do so without bound because any increase in w1 will only increasethe values for all current basic variables x1, w2, and w3 – the original problem is unbounded.

Solution 2.4 (Some More Theory and Proofs)

1. Ascent and Descent Directions: From the above definitions, it can be derived easily thata nonzero vector d is an ascent, descent, or orthogonal direction with respect to ai if and onlyif aT

i d ≥ 0, aTi d ≤ 0, or aT

i d = 0, respectively. Furthermore, from geometry or linear algebrawe know that the dot-product of two vectors (here ai and d) is positive, negative, or zero ifand only if the two vectors form an acute, obtuse, or right angle, respectively. In particular,because aT

i ai = ‖ai‖2 > 0 for any nonzero (normal) vector, that clearly forms an acute (zero)

angle with itself, both conditions readily imply that any nonzero vector is an ascent directionwith respect to itself.

2. Active Constraints and Sets: Let x′ ∈ S = {x : Ax ≥ b} be feasible, so Ax′ ≥ b (*)and, by definition of Ax′ as active constraint matrix, Ax′x′ = bx′ (**) where bx′ is the vectorconsisting of those elements of b that correspond to the rows of Ax′ . Now given any feasiblepoint x ∈ S, we obtain that

Ax′(x− x′) = Ax′x︸︷︷︸

≥bx′ by (*)

− Ax′x′︸︷︷︸

=bx′ by (**)

≥ 0 �

For the converse, first note that the above proof uses that Ax ≥ b implies Ax′x ≥ bx′ , i.e.,that a valid system of linear inequality remains valid upon dropping some (or all) inequali-ties. Clearly, since the converse does not hold in general (because we could add an invalidinequality to a previously valid system), we can construct a counterexample in the followingway. Consider the system

Ax =

[1 00 1

] [x1

x2

]

≥

[00

]

= b

with feasible point x′ = (1, 0)T and Ax′ = (0, 1). Then observe that x = (−1, 1)T satisfiesAx′(x − x′) = (0, 1)(−2, 1)T = 1 ≥ 0 although x = (−1, 1)T is not feasible. An even simplerexample can be constructed using S = {x ∈ R1 : x ≥ 0}, x′ > 0 and x < 0. Do you see why?

3. Feasible Directions: If: Let aTi d ≥ 0 for all i ∈ A(x′) (*) and x′ ∈ S be feasible, so

aTi x

′ ≥ bi for all i with equality if and only if i ∈ A(x′) (**). To show that d is a feasibledirection, we then need to find σ > 0 such that x′+λd is a feasible point, or equivalently, thatA(x′ +λd) ≥ b for all 0 ≤ λ ≤ σ. With the usual convention that inf ∅ =∞ (the largest lowerbound on an empty set is infinite; if you are uncomfortable with infimums, think minimum!),now let

σ = infi/∈A(x′)

{bi − a

Ti x

′

aTi d

: aTi d < 0

}

> 0

where strict positivity follows because x′ is feasible and because we only minimize over thoseindices i /∈ A(x′) corresponding to inactive constraints, so that aT

i x′ > bi. Now let 0 ≤ λ ≤ σ,

then λ ≤ σ ≤ (bi − aTi x

′)/aTi d and it follows that λaT

i d ≥ bi − aTi x

′ for all i /∈ A(x′) withaT

i d < 0 (***) (note the sign change because aTi < 0). Hence, we obtain that

aTi (x′ +λd) = aT

i x′ +λaT

i d

≥ aTi x

′ ≥ bi if i ∈ A(x′) from (*) and (**)

≥ aTi x

′ + bi − aTi x

′ = bi if i /∈ A(x′) and aTi d < 0 from (∗ ∗ ∗)

> bi + λaTi d ≥ bi if i /∈ A(x′) and aT

i d ≥ 0


Only If: For the reverse, let d be a feasible direction at x′, i.e., there exists σ > 0 such thatA(x′ + λd) ≥ b for all 0 ≤ λ ≤ σ and specifically aT

i (x′ + σd) = aTi x

′ + σaTi d ≥ bi = aT

i x′ for

all i ∈ A(x′). Hence, we see that aTi d ≥ 0 for all i ∈ A(x′), and the proof is complete. �

Finally, if all constraints are inactive, then the active constraint matrix is empty and thereis no restriction for a direction d to be feasible, meaning that every direction is a feasibledirection. Alternatively, we may define σ as we did in the first part of the proof (note theinfimum would now be taken over all indices), and then show that A(x′ + λd) ≥ b for all0 ≤ λ ≤ σ yielding the same conclusion as before.

4. Optimality Condition for LPs in Canonical Form: If: Let cTd ≥ 0 for all feasibledirections d at x∗, and let x ∈ S = {x : Ax ≥ b} be any other feasible point. By convexityof S, we know that d = x − x∗ is a feasible direction at x∗ with σ = 1 (because x∗ + λd =x∗+λ(x−x∗) = λx+(1−λ)x∗ with 0 ≤ λ ≤ σ = 1). Hence, it follows that cTd = cT (x−x∗) ≥ 0or equivalently, cTx∗ ≤ cTx which implies that x∗ is optimal (because the feasible point x ∈ Swas chosen arbitrarily).Only If: Let x∗ be optimal, so cTx∗ ≤ cTx for any feasible x ∈ S, and let d be any feasibledirection at x∗. Because d is feasible and x∗ optimal, there exists σ > 0 such that x∗ + λd isfeasible for all 0 ≤ λ ≤ σ and has a greater-or-equal objective value than x∗. In particular,then cTx∗ ≤ cT (x∗ + σd), or equivalently, cT d ≥ 0 because σ > 0. The proof is complete. �

Solution 3.1 (Production/Transportation Model) [steelP.mod]

set ORIG; # origins (steel mills)

set DEST; # destinations (factories)

set PROD; # products

param rate {ORIG,PROD} > 0; # tons per hour at origins

param avail {ORIG} >= 0; # hours available at origins

param demand {DEST,PROD} >= 0; # tons required at destinations

param make_cost {ORIG,PROD} >= 0; # manufacturing cost/ton

param trans_cost {ORIG,DEST,PROD} >= 0; # shipping cost/ton

var Make {ORIG,PROD} >= 0; # tons produced at origins

var Trans {ORIG,DEST,PROD} >= 0; # tons shipped

minimize Total_Cost:

sum {i in ORIG, p in PROD} make_cost[i,p] * Make[i,p] +

sum {i in ORIG, j in DEST, p in PROD}

trans_cost[i,j,p] * Trans[i,j,p];

subject to Time {i in ORIG}:

sum {p in PROD} (1/rate[i,p]) * Make[i,p] <= avail[i];

subject to Supply {i in ORIG, p in PROD}:

sum {j in DEST} Trans[i,j,p] = Make[i,p];

subject to Demand {j in DEST, p in PROD}:

sum {i in ORIG} Trans[i,j,p] = demand[j,p];

Data (Production/Transportation Model) [steelP.dat]

data;

123

set ORIG := GARY CLEV PITT ;

set DEST := FRA DET LAN WIN STL FRE LAF ;

set PROD := bands coils plate ;

param avail := GARY 20 CLEV 15 PITT 20 ;

param demand (tr):

FRA DET LAN WIN STL FRE LAF :=

bands 300 300 100 75 650 225 250

coils 500 750 400 250 950 850 500

plate 100 100 0 50 200 100 250 ;

param rate (tr): GARY CLEV PITT :=

bands 200 190 230

coils 140 130 160

plate 160 160 170 ;

param make_cost (tr):

GARY CLEV PITT :=

bands 180 190 190

coils 170 170 180

plate 180 185 185 ;

param trans_cost :=

[*,*,bands]: FRA DET LAN WIN STL FRE LAF :=

GARY 30 10 8 10 11 71 6

CLEV 22 7 10 7 21 82 13

PITT 19 11 12 10 25 83 15

[*,*,coils]: FRA DET LAN WIN STL FRE LAF :=

GARY 39 14 11 14 16 82 8

CLEV 27 9 12 9 26 95 17

PITT 24 14 17 13 28 99 20

[*,*,plate]: FRA DET LAN WIN STL FRE LAF :=

GARY 41 15 12 16 17 86 8

CLEV 29 9 13 9 28 99 18

PITT 26 14 17 13 31 104 20 ;

AMPL Output [model steelP.mod; data steelP.dat; solve; display Make, Trans;]

Presolve eliminates 1 constraint and 3 variables.

Adjusted problem:




MINOS 5.5: MINOS 5.5: optimal solution found.


Make :=

CLEV bands 0

CLEV coils 1950


CLEV plate 0

GARY bands 1125

GARY coils 1750

GARY plate 300

PITT bands 775

PITT coils 500

PITT plate 500

;

Trans [CLEV,*,*]

: bands coils plate :=

DET 0 750 0

FRA 0 0 0

FRE 0 0 0

LAF 0 500 0

LAN 0 400 0

STL 0 50 0

WIN 0 250 0

[GARY,*,*]


DET 0 0 0

FRA 0 0 0

FRE 225 850 100

LAF 250 0 0

LAN 0 0 0

STL 650 900 200

WIN 0 0 0

[PITT,*,*]


DET 300 0 100

FRA 300 500 100

FRE 0 0 0

LAF 0 0 250

LAN 100 0 0

STL 0 0 0

WIN 75 0 50

;

Solution 3.2 (Solving LPs using the Primal-Dual and Dual-Primal Two-Phase SimplexMethods)

Solution 5.8 (Exercise 2.4): Since this problem has the immediate dual feasible solution(z1, z2, z3) = −(c1, c2, c3) = (1, 3, 1), we can skip a primal phase-I and solve this problem directlyusing the regular dual simplex method (as special case of the primal-dual simplex method). Inparticular, after one (uniquely determined) dual pivot between x2 and x4, we find optimal primaland dual solutions (x∗1, x

∗2, x

∗3, x

∗4, x

∗5) = (0, 1, 0, 0, 5) and (z∗1 , z

∗2 , z

∗3 , z

∗4 , z

∗5) = (2.2, 0, 1.6, 0.6, 0) with

the optimal objective value ζ∗ = −3.

125

x1 x2 x3 1

ζ −1 −3 −1 0

x4 2 -5 1 −5x5 2 −1 2 4

x1 x4 x3 1

ζ −2.2 −0.6 −1.6 −3

x2 −0.4 −0.2 −0.2 1x5 1.6 −0.2 1.8 5

Solution 5.9 (Exercise 2.6): Since the initial dictionary / tableau to this problem is bothprimal and dual infeasible, we need to choose one of the two two-phase methods to solve thisproblem. We note that both the initial primal and dual iterate have two infeasibilities (becausethe first two primal slacks x4 = −3 and x5 = −1 and the dual variables z1 = −1 and z2 = −3 arenegative), so that it is not clear whether the primal-dual or the dual-primal simplex method wouldgive us an easier game to play. Thus doing both, we start with the primal-dual simplex methodand a first (uniquely determined) primal pivot between x2 and x5.

x1 x2 1

ζ 1 3 0

x3 −1 −1 1 −3x4 −1 1 1 −1

x5 1 2 1 2

x1 x5 1

ζ −0.5 −1.5 3

x3 −0.5 0.5 1.5 −2x4 −1.5 −0.5 0.5 −2x2 0.5 0.5 0.5 1

Since the new tableau is dual feasible, we can drop the auxiliary dual objective and continue withthe regular dual simplex method. Using Bland’s rule, we make a dual pivot between x1 and x3

x1 x5 1

ζ −0.5 −1.5 3

x3 -0.5 0.5 −2x4 −1.5 −0.5 −2x2 0.5 0.5 1

x3 x5 1

ζ −1 −2 1

x1 −2 −1 4x4 −3 −2 4x2 1 1 −1

resulting in an unbounded dual feasible tableau which shows that the original primal problemmust be infeasible. The same conclusion follows (actually a little bit quicker) when using thedual-primal simplex method and Bland’s rule that directly forces the pivot between x1 and x3

x1 x2 1

ζ 1 3 0

−1 −1

x3 -1 −1 −3x4 −1 1 −1x5 1 2 2

x3 x2 1

ζ 1 2 3

−1 0

x1 −1 1 3x4 −1 2 2x5 1 1 −1

Solution 3.4 (Dual Formulation and Stories): Note that the given diet problem is formulatedas minimization LP that has the exact same form as our regular dual problem. Hence, the associateddual problem looks like an initial primal problem

maximize

m∑

i=1

biyi

subject to

n∑

i=1

aijyi ≤ cj j = 1, 2, . . . , n

yi ≥ 0 i = 1, 2, . . . ,m

To give a proper interpretation of this problem, let us first clarify the meaning of the new dualvariables as shadow prices for the original primal constraints. To do that, we may look at the units

[y] = [c]/[a] = ($/unit food)/(nutrient content/unit food) = $/nutrient content


which suggests that the yi correspond to the shadow prices for each of the m nutrients i = 1, . . . ,m.In particular, then the objective is still to maximize a dollar rather than a nutrient amount, because

[b] · [y] = (nutrient amount)·($/nutrient content) = $

Hence, the dual problem tries to price each nutrient (possibly by some food provider) as to maximizeoverall selling price (profit) while not exceeding current market prices for the food items thatinclude these nutrients. Alternatively, you may remember that the dual variables contain someinformation on how the objective will change if we change the original constraints’ right-hand sides(by some sufficiently small amount, otherwise we may have to change the optimal basis): if nutrientrequirement bi increases by 1 unit, then our diet cost increases by yi dollars. Hence, as before wewould interpret yi as the shadow price for nutrient bi, and the story could continue as before. Foryour enjoyment and comparison to some other ideas, here are your own stories in loosely decreasingorder of “truthfulness” to the actual situation / increasing order of creativity. One point each wasdeducted if it maximized nutrients or nutritional requirements, and interpreted the constraint asto stay within some budget (note that c is not a budget, but the food prices or costs).

• There are some producers who produce the n kinds of food which the MIT student needs. Then kinds of food are made from m different nutrients. So there is a wholesaler who supplies theproducers with the m nutrients needed to make n kinds of food. The producers who producethe j (j = 1, 2, . . . , n) informs the supplier (wholesaler) that food j contains i (i = 1, 2, . . . m)nutrients and he intends to purchase the number of nutrients i in food j (aij) to meet theMITs student’s minimum nutritional requirements bi. The supplier now has an optimizationproblem as follows: How can I set the each price of nutrient i (yi) so that the producer willbuy from me, and so that I will maximize my income

∑mi=1 yibi? The supplier also thinks

over that producer j will buy only if the total cost of nutrient i for food j is below the pricecj ; otherwise he runs the risk of making a loss if the MIT student opts to buy food j. Thisrestriction imposes the constraint

∑mi=1 aijyi ≤ cj on the prices of the food. [2 points]

• A vitamin manufacturer wants to maximize the cost [price !] bT y of his multi-vitamin. Buthe wants to guarantee that the cost of his multi-vitamin is competitive with the cost of foodthat provides the same amount of each nutrient, AT y ≤ c. [2 points]

• Lets assume that King Supers also delivers pizza. Then this would be the LP that the managerof King Supers would like to solve in order to maximize profit, subject to the constraints thathis costumer still gets his minimum daily nutritional intake on his budgeted small stipend. [1point]

• A scenario in which the dual would be a natural problem would be for a person (perhaps anathlete, or someone else on a restricted diet) who wants to maximize their nutrient intakewhile not going over the cost of what it would have normally been to buy food to attain thosenutrient levels. Perhaps they are buying supplements (because they can avoid calories andfats) and want to get nutritional intake but don’t want their cost to exceed what would havebeen the cost had they bought the food instead. [1 point]

• The dual would be to maximize the minimum daily requirements, subject to the cost of thefood being less than or equal to his budget. His mother or doctor would naturally want himto solve the dual. [Great comment, but not completely accurate]

• The MIT student was very excited about his diet plan and called his mother to tell her thegood news. Unfortunately, his mother was not quite as impressed. While she appreciated themathematical thinking behind the problem, she did not like that he was focusing on spendingthe least amount of money while getting the minimum number of nutrients. Being his mother(and a math lover), she suggested he alter his linear problem to focus on maximizing hisnutrients while staying within his budget, and then taught him all about the dual format! ,

So, his new problem does just that – maximize the number of nutrients he gets while keepingthe total cost of all his food within budget. [Best story, but not completely accurate]

127

Solution 4.1 (Multiperiod Production Model) [steelT.mod]

set PROD; # products

param T > 0; # number of weeks

param rate {PROD} > 0; # tons per hour produced

param inv0 {PROD} >= 0; # initial inventory

param avail {1..T} >= 0; # hours available in week

param market {PROD,1..T} >= 0; # limit on tons sold in week

param prodcost {PROD} >= 0; # cost per ton produced

param invcost {PROD} >= 0; # carrying cost/ton of inventory

param revenue {PROD,1..T} >= 0; # revenue per ton sold

var Make {PROD,1..T} >= 0; # tons produced

var Inv {PROD,0..T} >= 0; # tons inventoried

var Sell {p in PROD, t in 1..T} >= 0, <= market[p,t]; # tons sold

maximize Total_Profit:

sum {p in PROD, t in 1..T} (revenue[p,t]*Sell[p,t] -

prodcost[p]*Make[p,t] - invcost[p]*Inv[p,t]);

# Total revenue less costs in all weeks

subject to Time {t in 1..T}:

sum {p in PROD} (1/rate[p]) * Make[p,t] <= avail[t];

# Total of hours used by all products

# may not exceed hours available, in each week

subject to Init_Inv {p in PROD}: Inv[p,0] = inv0[p];

# Initial inventory must equal given value

subject to Balance {p in PROD, t in 1..T}:

Make[p,t] + Inv[p,t-1] = Sell[p,t] + Inv[p,t];

# Tons produced and taken from inventory

# must equal tons sold and put into inventory

Data (Multiperiod Production Model) [steelT.dat]

data;

param T := 4;

set PROD := bands coils;

param avail := 1 40 2 40 3 32 4 40 ;

param rate := bands 200 coils 140 ;

param inv0 := bands 10 coils 0 ;

param prodcost := bands 10 coils 11 ;

param invcost := bands 2.5 coils 3 ;


param revenue: 1 2 3 4 :=

bands 25 26 27 27

coils 30 35 37 39 ;

param market: 1 2 3 4 :=

bands 6000 6000 4000 6500

coils 4000 2500 3500 4200 ;

AMPL Output [model steelT.mod; data steelT.dat; solve; display Make, Inv, Sell;]

Presolve eliminates 2 constraints and 2 variables.

Adjusted problem:






display Make, Inv, Sell;

: Make Inv Sell :=

bands 0 . 10 .

bands 1 5990 0 6000

bands 2 6000 0 6000

bands 3 1400 0 1400

bands 4 2000 0 2000

coils 0 . 0 .

coils 1 1407 1100 307

coils 2 1400 0 2500

coils 3 3500 0 3500

coils 4 4200 0 4200

;

Solution 4.2 (Sensitivity and Shadow Prices in AMPL)

1. display Time, Make.rc;

Time = 4640

Make.rc [*] :=

bands 1.8

coils -3.14286

plate 0

;

As discussed in Section 1.6, AMPL interprets a constraint’s name alone as referring to theassociated dual values or shadow prices, which measure how much the objective value wouldimprove if the constraint were relaxed by a small amount, and a variable’s name appendedby .lb, .ub, or .rc as the dual values of shadow prices of its lower bound, upper bound, and“reduced cost” which has the same meaning with respect to the bounds as the shadow priceswith respect to the constraints. Hence, the above display tells us that up to some point,additional rolling time would produce another $4640 of extra profit per hour, higher marketdemand in bands another $1.8 per ton, lower (!) commitment of coils an extra $3.14286per ton, and (small) changes to either demand or commitment of coils would not affect thecurrent profit; changes in the opposite directions would decrease the profit correspondingly.

2. Computing the profit rates (in dollars per hour) for both reheat and rolling stage, we get

129

profit rate ($/hour) bands coils plate

reheat 5000 6000 5800roll 5000 4200 4640

indicating the although bands remain the most resource-efficient product during the rollingstage, the production of coils and plates is more efficient in the reheat stage. In particular, theefficiency gain of plates over coils during rolling is more than twice its efficiency loss in reheat,resulting in a higher production of plates compensated by a lower production of bands.

3. let avail["reheat"] := 36; solve; let avail["reheat"] := 37; solve;

let avail["reheat"] := 38; solve; let avail["reheat"] := 38.1; solve;





1 iterations, objective 191871.4286 # $1800 more than the initial profit





1 iterations, objective 193671.4286 # another additional $1800 in profit





1 iterations, objective 194828.5714 # only $1157.1428 increase in profit





0 iterations, objective 194828.5714 # no additional increase in profit

4. Also shown in the above plots, the following table lists the shadow prices of Time["reheat"]and the total profit for different integer values of avail["reheat"] (the exact “break” pointsof the piecewise linear shadow-price and total-profit curves can be computed using ranging).

avail["reheat"] 12 13 14 15 16 17 18 19 20

Time["reheat"] 6000 6000 6000 6000 6000 6000 6000 6000 6000

Total Profit 66250 72250 78250 84250 90250 96250 102250 108250 114250

21 22 23 24 25 26 27 28 29 30

6000 6000 6000 6000 6000 6000 6000 5800 5800 5800

120250 126250 132250 138250 144250 150250 156250 162250 168200 174000

31 32 33 34 35 36 37 38 39 40

4400 2667 2667 2667 1800 1800 1800 0 0 0

178600 182458 185125 187792 190071 191871 193671 194829 194829 194829


When the available reheat time drops to (or below) 11 hours, we receive the AMPL outputlet avail["reheat"] := 11; solve;

presolve: constraint Time[’reheat’] cannot hold:

body <= 11 cannot be >= 11.25; difference = -0.25

which tells us that our problem has become infeasible, because production of our committedminimum quantities of 1000 tons of bands, 500 tons of coils, and 750 tones of plates to bereheated at 200 tons per hour each already requires 2250/200 = 11.25 hours of reheat time.

Solution 4.3 (The Parametric Self-Dual Simplex Method): Introducing new slack vari-ables x4 and x5 for the two inequality constraints, we define

A =

[−1 −1 −1 1 02 −1 1 0 1

]

, b =

[−21

]

, c =[2 −6 0 0 0

]T

and choose an initial decomposition of the columns of A into B = {4, 5} and N = {1, 2, 3} so that

B = B−1 =

[1 00 1

]

, N =

[−1 −1 −12 −1 1

]

, cB =[0 0

]T, and cN =

[2 −6 0

]T.

Then setting x1 = x2 = x3 = z4 = z5 = 0, we compute the current basic primal and dual variables

x∗B =

[x4

x5

]

= B−1b =

[−21

]

and z∗N =[z1 z2 z3

]T= (B−1N)T cB − cN =

[−2 6 0

]T

which are infeasible in x4 and z1. Hence, we let xB = ρ1(1, 0)T and zN = ρ2(1, 0, 0)

T and consider

x∗B + µxB =

[−2 + ρ1µ

1

]

and z∗N + µzN =[−2 + ρ2µ 6 0

]T

where ρ1 > 0 and ρ2 > 0 be any two positive numbers that can be chosen (i) at random so to avoidcycling in the case of degeneracy, and (ii) to satisfy ρ1 < ρ2 which then forces an initial dual pivoton x4 because µ∗ = min{µ : −2+ρ1µ ≥ 0,−2+ρ2µ ≥ 0} = max{2/ρ1, 2/ρ2} = 2/ρ1 if ρ2 > ρ1 > 0.Randomly setting (ρ1, ρ2) = (0.2, 0.6) (but essentially any other choice will do as well), it followsthat the above solution is primal-dual feasible and thus optimal for all µ ≥ µ∗ = 2/ρ1 = 2/0.2 = 10.

Since further decrease of µ is prevented by x4, we then compute its corresponding dual pivot as

∆zN =[∆z1 ∆z2 ∆z3

]T= −(B−1N)T e4 = −

[−1 −1 −12 −1 1

]T [10

]

=[1 1 1

]T

k = arg minj∈N

{z∗j + zjµ

∗

∆zj: ∆zj > 0

}

= arg minj∈{1,2,3}

{4, 6, 0} = 3

∆xB =

[∆x4

∆x5

]

= B−1Ne3 =

[−1 −1 −12 −1 1

][0 0 1

]T=

[−11

]

so that x4 leaves and x3 enters the basis yielding the following update of primal and dual variables

x∗3 ← t =x∗4

∆x4=−2

−1= 2 x3 ← t =

x4

∆x4=

0.2

−1= −0.2

z∗4 ← s =z∗3

∆z3=

0

1= 0 z4 ← s =

z3∆z3

=0

1= 0

x∗B =

[x∗4x∗5

]

← x∗B − t∆xB =

[−21

]

− 2

[−11

]

=

[0−1

]

xB =

[x4

x5

]

← xB − t∆xB =

[0.20

]

− (−0.2)

[−11

]

=

[0

0.2

]

z∗N =[z∗1 z∗2 z∗3

]T← z∗N − s∆zN =

[−2 6 0

]T

zN =[z1 z2 z3

]T← zN − s∆zN =

[0.6 0 0

]T

131

and new basic primal variables (x∗3, x∗5) = (2,−1) and dual variables (z∗1 , z

∗2 , z

∗4) = (−2, 6, 0). Hence,

for the next iteration we now let B = {3, 5} and N = {1, 2, 4} and analogously to before compute

B = B−1 =

[−1 01 1

]

, N =

[−1 −1 12 −1 0

]

, cB =[0 0

]T, cN =

[2 −6 0

]T,

x∗B =

[x3

x5

]

= B−1b =

[2−1

]

, and z∗N =[z1 z2 z4

]T= (B−1N)T cB − cN =

[2 −6 0

]T

which is infeasible in x5 and z1. Using xB = (x3, x5) = (−0.2, 0.2) and zN = (z1, z2, z4) = (0.6, 0, 0),

x∗B + µxB =

[2− 0.2µ−1 + 0.2µ

]

and z∗N + µzN =[−2 + 0.6µ 6 0

]T

is optimal for 10 ≥ µ ≥ µ∗ = max{1/0.2, 2/0.6} = 5. Computing the dual pivot on x5 with µ∗ = 5

∆zN =[∆z1 ∆z2 ∆z4

]T= −(B−1N)T e5 = −

[1 1 −11 −2 1

]T [01

]

=[−1 2 −1

]T

k = arg minj∈N

{z∗j + µ∗zj

∆zj: ∆zj > 0

}

= arg minj∈{2}

{6

2

}

= 2

∆xB =

[∆x3

∆x5

]

= B−1Ne2 =

[1 1 −11 −2 1

][0 1 0

]T=

[1−2

]

we see that x5 leaves and x2 enters the basis yielding the following new primal and dual variables

x∗2 ← t =x∗5

∆x5=−1

−2= 0.5 x2 ← t =

x5

∆x5=

0.2

−2= −0.1

z∗5 ← s =z∗2

∆z2=

6

2= 3 z5 ← s =

z2∆z2

=0

2= 0

x∗B =

[x∗3x∗5

]

← x∗B − t∆xB =

[2−1

]

− 0.5

[1−2

]

=

[1.50

]

xB =

[x∗3x∗5

]

← xB − t∆xB =

[x4

x5

]

←

[−0.20.2

]

− (−0.1)

[1−2

]

=

[−0.1

0

]

z∗N =[z∗1 z∗2 z∗4

]← z∗N − s∆zN =

[−2 6 0

]T− 3

[−1 2 −1

]T=[1 0 3

]T

zN =[z1 z2 z4

]← zN − s∆zN =

[0.6 0 0

]T.

Since all primal and dual variables are now nonnegative, the current solution (x∗1, x∗2, x

∗3, x

∗4, x

∗5) =

(0, 0.5, 1.5, 0, 0) and (z∗1 , z∗2 , z

∗3 , z

∗4 , z

∗5) = (1, 0, 0, 3, 3) is optimal with B = {3, 5} and N = {1, 2, 4}.

Solution 4.4 (The Simplex Method with Ranges): After solving for x1 and x4 using sub-stitution or a computer, the initial dictionary is

l 0 0 0 0 0 0u −6 10 2 10 4 3

ζ = 6 + 4x2 − 2x3 + 4x5 − 2x7 + 3x8 = −6

0 8 x1 = 4 − x2 − x3 + x5 + x6 − x7 + x8 = 40 15 x4 = 3 + x2 − 3x3 + 4x5 + x6 − 2x7 + 5x8 = 3

Adopting the largest-coefficient rule to find a first primal pivot, we find a tie between x2 and x5

but may observe that x5 can be increased to its upper bound 2 without changing the current basis

l 0 0 0 0 0 0

u −6 10 2 10 4 3

ζ = 6 + 4x2 − 2x3 + 4x5 − 2x7 + 3x8 = 2

0 8 x1 = 4 − x2 − x3 + x5 + x6 − x7 + x8 = 60 15 x4 = 3 + x2 − 3x3 + 4x5 + x6 − 2x7 + 5x8 = 11


Then choosing x2 as next pivot, which is currently at its lower bound, we can increase its value to

x2 ≤ u2 ∧min

{x∗i − li−ai2

: ai2 < 0

}

∧min

{ui − x

∗i

ai2: ai2 > 0

}

= min

{

6,6− 0

−(−1),15− 11

1

}

= 4

in which case x4 attains its upper bound 15 and therefore leaves the basis in the updated dictionary

l 0 0 0 0 0 0

u 10 15 2 10 4 3

ζ = −18 + 10x3 + 4x4 − 12x5 − 4x6 + 6x7 − 17x8 = 18

0 8 x1 = 7 − 4x3 − x4 + 5x5 + 2x6 − 3x7 + 6x8 = 20 6 x2 = −3 + 3x3 + x4 − 4x5 − x6 + 2x7 − 5x8 = 4

Again using the largest-coefficient rule, we repeat the above computation for a primal pivot on x3

x3 ≤ u3 ∧min

{x∗i − li−ai3

: ai3 < 0

}

∧min

{ui − x

∗i

ai3: ai3 > 0

}

= min

{

10,2− 0

−(−4),6− 4

3

}

= 0.5

in which case x1 is reduced to zero and therefore replaced by x3 in both basis and the new dictionary

l 0 0 0 0 0 0

u 8 15 2 10 4 3

ζ = −0.5 − 2.5x1 + 1.5x4 + 0.5x5 + x6 − 1.5x7 − 2x8 = 23

0 6 x2 = 2.25 − 0.75x1 + 0.25x4 − 0.25x5 + 0.5x6 − 0.25x7 − 0.5x8 = 5.50 10 x3 = 1.75 − 0.25x1 − 0.25x4 + 1.25x5 + 0.5x6 − 0.75x7 + 1.5x8 = 0.5

The only remaining pivot candidate is now x6 because x4 and x5 are already at their upper boundswhile all other variables that remain at their lower bounds have a negative objective coefficient, so

x6 ≤ u6 ∧min

{x∗i − li−ai6

: ai6 < 0

}

∧min

{ui − x

∗i

ai6: ai6 > 0

}

= min

{

10,6− 5.5

0.5,10− 0.5

0.5

}

= 1

and x6 is set to 1 and enters the basis while x2 attains its upper bound 6 and thus leaves the basis

l 0 0 0 0 0 0

u 8 6 15 2 4 3

ζ = −5 − x1 + 2x2 + x4 + x5 − x7 − x8 = 24

0 10 x3 = −0.5 + 0.5x1 + x2 − 0.5x4 + 1.5x5 − 0.5x7 + 2x8 = 10 10 x6 = −4.5 + 1.5x1 + 2x2 − 0.5x4 + 0.5x5 + 0.5x7 + x8 = 1

Because all variables that are currently at their lower bounds have a negative objective coefficient,and analogously, because all variables that have a positive objective coefficient are already at theirupper bounds, we have found an optimal solution (x∗1, x

∗2, x

∗3, x

∗4, x

∗5, x

∗6, x

∗7, x

∗8) = (0, 6, 1, 15, 2, 1, 0, 0)

with an optimal objective value ζ∗ = 24. There is a pretty good chance that this is the “largest”non-trivial LP (8 original variables, 2 original equality constraints, and 8 inequality constraints, orif written with slacks, 16 variables and 10 equality constraints) that you will ever solve by hand.

Solution 5.1 (Mathematical Modeling and Optimization): If we think about this problemas transportation problem where we transport “affection” from the two (more or less gentle) gen-tlemen Bob (B) and David (D) to the two proper young woman Alice (A) and Carol (C), each witha demand of one and determined to maximize the total number of clams they receive, we arrive atthe following LP formulation and corresponding network

max − 3xBA + 7xBC + 2xDA + 9xDC

s.t. xBA + xDA = xBC + xDC = 1

xBA, xBC , xDA, xDC ∈ {0, 1}

Bob Alice (proper 1)

David Carol (proper 1)

-3

7

2

9

Not difficult to find, Alice and Carol best accept affection (and 11 clams) from David (sorry Bob).

133

Solution 5.2 (Geometric Interpretation of Basic Solutions using Convexity)

(a) Basic Definitions: If rankA = m, then we can findm linearly independent columns of A withindex set B which form an invertible basis matrix B and hence give a basic feasible solutionxB = B−1b with xi = 0 for all i /∈ B. If m > n, then the LSE is overdetermined with moreconstraints than variables so that the rows of A must be linear dependent, possibly leading toan inconsistent system with no feasible solutions. Similarly, if m ≤ n and rankA < m, thenwe have at least as many variables as constraints but the rows of A are again linear dependentwith the same conclusion as before. In particular, because rankA < m in both cases it is notpossible to find m linearly independent columns of A to form an invertible basis matrix sothat there are no basic feasible solutions even if there are feasible solutions.

(b) The Fundamental Theorem of Linear Programming: Given an LP of the form


this problem can equivalently be written as to max cT x s.t. Ax = b, x ≥ 0 where x =(x,w)T ∈ Rm+n includes m additional slack variables that do not contribute to the objectivevector c = (c, 0)T ∈ Rm+n but add m additional (linear independent) columns (unit vectors)to A resulting in the new matrix A =

[A I

]∈ Rm×m+n with full (row) rank A = m, in

particular. Here note that different from the addition of new constraints, the addition of newvariables does not change the underlying geometry of the problem but merely lifts the sameproblem-geometry into a higher-dimensional space in which a basic feasible solution is alwaysknown to exists.

(c) Extreme Points and their Equivalence to Basic Solutions: “If” Let

x = (xB, xN )T = (B−1b, 0)

be a basic feasible solution of P , so Ax = BxB = b, and let y and z be any two other points inP such that x = λy+(1−λ)z for some 0 ≤ λ ≤ 1. Since y, z, λ and 1−λ are all nonnegative,it then follows immediately that yN = zN = 0 and thus, because y and z are in P , thatAy = ByB = b and Az = BzB = b or yB = zB = B−1b = xB which implies x = y = z andshows that x is an extreme point of P .“Only If” Let x be an extreme point of P and, by contradiction, assume that x is not basicfeasible so that x has at least m+ 1 nonzero (strictly positive) components. Without loss ofgenerality, assume that xi > 0 for i = 1, . . . , k > m, and xi = 0 for i = k + 1, . . . ,m so thatAx =

∑ki=1 xiai = b where ai be the ith column of A. Because rankA = m < k, it then

follows that these column vectors are linearly dependent and thus that there exists a nonzerovector y = (y1, y2, . . . , yk, 0, . . . , 0)

T such that Ay =∑k

i=1 yiai = 0. Now select a (sufficientlysmall) vector ε > 0 such that x± εy ≥ 0, which is possible because xi > 0 for i = 1, 2, . . . , k,and write x = 0.5(x+ εy) + 0.5(x− εy) where A(x± εy) = Ax± εAy = b in contradiction tothe extremity of x and showing that x must have been a basic feasible solution. �

(d) Some Simple Proofs: For (i), P nonempty implies that there exists a feasible solution so thatthere exists a basic feasible solution by the Fundamental Theorem, which is equivalent to anextreme point of P . For (ii), if there is an optimal solution then then there is a basic optimalsolution by the Fundamental Theorem, which is equivalent to an optimal extreme point. For(iii), we recall that any polytope P has only a finite number of bases and therefore only afinite number of extreme points. Because every basis is fully characterized by exactly m linearindependent columns of the matrix A ∈ Rm×n, there are at most

(nm

)= n!

m!(n−m)! possible

choices and, hence, at most(nm

)bases and extreme points of P . Finally, for (iv), we know that

every polytope is convex so that we can represent any point in P as a convex combinationsof its finite number of extreme points. Furthermore, from Caratheodory’s Theorem we knowthat we can represent any point in P as convex combination of at most n+1 points in P . �


(e) A Challenge: Similar to the proof of Caratheodory’s Theorem presented in class, now con-sider

max 0T y s.t. Ay = b, y ≥ 0 where A =

[AcT

]

and b =

[bcTx

]

which is feasible because x is feasible and hence has a basic feasible solution with at mostrank A ≤ rankA+ 1 = m+ 1 nonzero components by the Fundamental Theorem.

Solution 5.3 (General Concepts and Theory)

2. True or False: Any LP that has a feasible solution has a basic feasible solution. Briefly justifyyour answer, or give a counterexample.

False: This statement is false, in general, and only true if the equality (!) constraint matrixA ∈ Rm×n has rank m and thus permits the choice of m linearly independent column vectorsof A as basis vectors. Two counterexamples in the spirit of Exercise 5.2(a) are as follows:

(i) max 0 s.t. x = 1, 2x = 2, x ≥ 0 (ii) max 0 s.t. x1 + x2 = 1, 2x1 + 2x2 = 2, x1, x2 ≥ 0

For (i), it is easy to see that x = 1 is the only feasible solution but not basic according tothe definition in 5.2 which would require to select a 2× 2 submatrix B of A ∈ R2×1 so thatB−1b = 1, clearly a mismatch of dimensions (also note that, by definition, we would need toset n −m = 1 − 2 = −1 components of x equal to zero)! Similarly, for (ii), we now have aninfinite number of feasible solutions x = (x1, x2)

T = (t, 1 − t)T for 0 ≤ t ≤ 1 none of whichcan be represented as x = B−1b, however, because the only 2× 2 submatrix B of A ∈ R2×2

is the same as A but has linear dependent rows and columns and, thus, is not invertible.

9. Give an example of an LP that has a feasible solution but no basic feasible solution. Howdoes this reconcile with the fundamental theorem of linear programming?

Answer: Similar to the answer of Exercise 5.2(b), the fundamental theorem only holdsunder a full (row) rank assumption on the equality constraint matrix A which, however, isnot satisfied in the two above examples. Nevertheless, this assumption is always guaranteed(and thus dropped in our statement of the fundamental theorem) if the original constraintsare formulated as inequalities after introducing a full set of slacks (which can serve as initialbasic variables), or equivalently, by augmenting the original constraint matrix by the m×midentity matrix which can be chosen as initial (and clearly invertible) m×m submatrix.

3. True or False: Any LP that is feasible and bounded has a basic optimal solution. Brieflyjustify your answer, or give a counterexample.

False: Consider the same examples as above, for which every feasible solution is also optimal.

10. Give an example of an LP that has an optimal solution but no basic optimal solution. Howdoes this reconcile with the fundamental theorem of linear programming?

Answer: Consider the same examples as above, and repeat our previous explanations for 9.

5. True or False: Any LP that has a basic optimal solution has either a unique optimal basicsolution, or infinitely many basic optimal solutions. Briefly justify your answer, or give acounterexample.

False: Consider the following LP with 2 optimal basic solutions x∗ = (1, 0) and x∗∗ = (0, 1)

max x1 + x2 s.t. x1 + x2 = 1 and x1 + x2 ≤ 1, x1, x2 ≥ 0

After introducing a single slack variable w to the second constraint, there are(32

)= 3 possible

2 × 2 submatrices of the equality constraint matrix A = [ 1 1 01 1 1 ] corresponding to index sets

{1, 2}, {1, 3}, and {2, 3}. Because columns 1 and 2 are linear dependent, however, there are

135

only two possible bases B1 = {1, 3} and B2 = {2, 3} with same basis matrix B1 = B2 = [ 1 01 1 ]

so that xB1= (x∗1, x

∗3)

T = xB2= (x∗∗2 , x

∗∗3 )T = B−1b = (1, 0) are the only two optimal basic

solutions. In particular, similar to part (iii) of Exercise 5.2(d) which states that every polytopehas a finite number of extreme points, we also know that there are only

(nm

)= n!

m!(n−m)!

possible m×m submatrices of A ∈ Rm×n and, thus, at most(

nm

)(optimal) basic solutions.

6. True or False: Any LP that has a unique optimal solution has either a unique optimal basis,or infinitely many optimal bases. Briefly justify your answer, or give a counterexample.

False: Essentially following already from our previous discussion, consider another (but asalways trivial) LP with a unique feasible (thus optimal) point x = (x1, x2, . . . , xn)T = 0 andexactly n optimal bases (also giving examples for midterm prep questions 16, 18, 19, and 20)

max 0 s.t. x1 + x2 + · · · + xn = 0, x1, x2, . . . , xn ≥ 0

Because A =[1 1 . . . 1

]∈ R1×n, there are exactly n possibilities to choose a 1 × 1

submatrix B = 1 of A as basis, or equivalently, exactly n possible choices for the single basicvariable xB = B−1b = 0. Convince yourself that if there is any feasible solution that can berepresented by more than one basis, then this solution has to be degenerate with at least onezero basic variable in every basis. Can you derive a general formula for the number of differentbases for the same basic solution in terms of n, m, and the number of its zero variables?

Solution 5.4 (Proof of Caratheodory’s Theorem): The proof of Caratheodory’s theoremapplies the stated theorem to an LP of the form

max cTx s.t. Ax = b, x ≥ 0 where A =

[AeT

]

and b =

[z1

]

where the columns of A ∈ Rm×n correspond to n (essentially arbitrary) vectors in Rm and e ∈ Rn

is the vector of all ones, and uses the existence of a feasible solution to conclude that there is abasic feasible solution with at most m + 1 nonzero variables. Clearly, this conclusion ignores thefull rank assumption on the matrix A. However, if rank A = m+1, then the proof is correct and weobtain a basic feasible solution with at most m+1 nonzero variables; otherwise, if rank A = k ≤ m,then rankA ≤ k also and there are only k linear independent column vectors of A. Although thismeans that there is no basic feasible solution anymore, we can now express and consequently dropall remaining n − k columns or variables as combinations of the former k yielding a solution withat most k nonzero variables or, in the context of the theorem, a representation of the right-handside vector z as convex combination of at most k ≤ m (rather than m+ 1) other points.

Solution 6.1 (Network Flow Modeling): To convert the given LP into network flow formwith a node-arc incidence matrix, equalities and nonnegativity constraints, we can first multiplythe last two constraints by -1 and then introduce the four nonnegative slacks w1, w2, w3, and w4

resulting in the equivalent problem

maximize 7x1 − 3x2 + 9x3 + 2x4

subject to x1 + x2 + w1 = 1x3 + x4 + w2 = 1

−x1 − x3 + w3 = −1− x2 − x4 + w4 = −1

x1 , x2 , x3 , x4 , w1 , w2 , w3 , w4 ≥ 0


Next, we need to introduce only one more constraint that includes only the negated slack variables,which can be obtained easily by adding all constraints and multiplying this new constraint by -1.

maximize 7x1 − 3x2 + 9x3 + 2x4

subject to x1 + x2 + w1 = 1x3 + x4 + w2 = 1

−x1 − x3 + w3 = −1− x2 − x4 + w4 = −1

− w1 − w2 − w3 − w4 = 0x1 , x2 , x3 , x4 , w1 , w2 , w3 , w4 ≥ 0

Clearly, this last constraint is redundant, but as we already know node-arc incidence matrices havealways at least one redundancy. The network associated with the above problem together with the(easy-to-find) optimal solution (x∗1, x

∗2, x

∗3, x

∗4, w

∗1 , w

∗2, w

∗3, w

∗4) = (1, 0, 0, 1, 0, 0, 0, 0) is given below

(remember that above right-hand sides correspond to the negated supplies and / or demands −b).

5

1 4 1 -1

1 3 2 -1

-32 7

9There clearly is a relationship between the above problem and Problem 15.3 (the desert-island clam-and-love exchange) that you modeled and solved in Exercise 5.1 on Assignment 5, but I have noclue as to what exactly it should be. In particular, since the fifth node really plays no role at all forthe above problem (a pure supply node with only outgoing arcs but zero supply), I assume the book’sexercise has a typo and the “correct” formulation should have been different. Can you propose how?

Solution 6.2 (Shortest Path Trees and Reliability Issues in a Telecommunication Sys-tem): The first possibility is to set up the problem as a linear program and solve it by hand orusing AMPL. Clearly, this would have been the longest (and least preferred) option. Alternatively(and better), you could have used either the label-correcting or Dijkstra’s label-setting algorithmthat can be “executed” as shown below, both resulting in the same (and unique) shortest path tree.

Node / Step 0 1 2 3 4 5 . . .

1 0 (-) 0 (-) 0 (-) 0 (-) 0 (-) 0 (-) . . .2 ∞ 2.1 (1) 2.1 (1) 2.1 (1) 2.1 (1) 2.1 (1) . . .3 ∞ ∞ 9.8 (2) 9.8 (2) 9.8 (2) 9.8 (2) . . .4 ∞ 4.8 (1) 4.8 (1) 4.8 (1) 4.8 (1) 4.8 (1) . . .5 ∞ 6.3 (1) 5.2 (2) 5.2 (2) 5.2 (2) 5.2 (2) . . .6 ∞ ∞ 11.6 (5) 10.5 (5) 10.5 (5) 10.5 (5) . . .7 ∞ ∞ 10.3 (4) 10.3 (4) 10.3 (4) 10.3 (4) . . .8 ∞ ∞ 7.7 (5) 6.6 (5) 6.6 (5) 6.6 (5) . . .9 ∞ ∞ ∞ 14.8 (6) 13.7 (6) 13.7 (6) . . .

0 1 2 ∞ 3 ∞

∞ 4 5 ∞ 6 ∞

∞ 7 8 ∞ 9 ∞

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 ∞

4.8 4 5 6.3 6 ∞

∞ 7 8 ∞ 9 ∞

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

137

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 ∞

∞ 7 8 ∞ 9 ∞

2.1

6.34.8

7.7

3.1 5.3

5.55.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 ∞

10.3 7 8 10.2 9 ∞

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 ∞

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 14.4

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 14.4

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 14.4

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 13.7

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

0 1 2 2.1 3 9.8

4.8 4 5 5.2 6 10.5

10.3 7 8 6.6 9 13.7

2.1

6.34.8

7.7

3.1 5.3

5.5 5.4

1.3

5.5

5.3

1.4 3.2

1.1 7.8

4.5

Now let us consider the scenario in which we observe an equipment failure at node 5 resulting in anincrease of the transmission times on all arcs entering or leaving node 5 by α. Clearly, this failureonly affects all shortest paths that contain node 5, namely the shortest paths from node 1 to nodes5, 6, 8, and 9. It is also clear that the length of the shortest path to node 5 has to increase by α,and there is nothing we can do about it. For the paths to nodes 6, 8, and 9 whose current lengths10.5, 6.6, and 13.7 will increase by 2α because we have to both enter and leave node 5, however,there exist several alternative paths from node 1 avoiding node 5, namely (only listing the shortestones) 1-2-3-6 of length 15.1, 1-4-8 of length 10.2, and 1-4-8-9 of length 18. Hence, the largest valueof α so that the above tree remains a (not anymore unique) shortest path tree is determined by

α = min {15.1 − 10.5, 10.2− 6.6, 18− 13.7} /2 = min{4.6, 3.6, 4.3}/2 = 1.8.


Solution 6.3 (A Minimum-Cost Network Flow Problem): The following series of networksshow the execution of the primal network simplex method starting from the highlighted initialfeasible spanning tree and using the same notation as was introduced for the example in the lecturenotes, namely labels (bi/yi) at the nodes and labels cij/xij or cij/zij on the arcs currently containedor not contained in the spanning tree, respectively.

(2/0) a c (-2/5)

b (1/5)

(6/1) d e (-1/3) h (-3/6) f (0/6)

g (-3/5)

1/2 3/ -2

0/51/0

0/2

2/5

4/3

2/13/3 1/1

1/2

0/ -1

(2/0) a c (-2/3)

b (1/3)

(6/1) d e (-1/3) h (-3/6) f (0/4)

g (-3/5)

1/1 3/10/3

1/00/2

2/4

4/3

2/23/3 1/3

1/0

0/ -1

(2/0) a c (-2/3)

b (1/3)

(6/1) d e (-1/3) h (-3/5) f (0/4)

g (-3/5)

1/1 3/10/3

1/00/2

2/1

4/6

2/23/1 1/2

1/0

0/3

In the last network, the current flow is feasible and all reduced costs nonnegative so that this flow(or spanning tree) is optimal with a minimum cost of 30 (compared to 35 for the initial feasibleflow and 33 for the intermediate second flow). However, like in the regular simplex method we canfind that this flow (or spanning tree) is not unique - why? Can you find the second optimal flow?

Solution 11.1 (Mathematical Modeling and Optimization: A Cutting-Stock Problem)

(a) Solution (Cutting-Stock Model) [cut.mod]

1 param roll_width > 0; # width of raw rolls

2

3 set WIDTHS; # set of widths to be cut

4 param orders {WIDTHS} > 0; # number of each width to be cut

5

6 param nPAT integer >= 0; # number of patterns

7 set PATTERNS := 1..nPAT; # set of patterns

8

9 param nbr {WIDTHS,PATTERNS} integer >= 0;

10

139

11 check {j in PATTERNS}:

12 sum {i in WIDTHS} i * nbr[i,j] <= roll_width;

13

14 # defn of patterns: nbr[i,j] = number

15 # of rolls of width i in pattern j

16

17 var Cut {PATTERNS} integer >= 0; # rolls cut using each pattern

18

19 minimize Number: # minimize total raw rolls cut

20 sum {j in PATTERNS} Cut[j];

21

22 subject to Fill {i in WIDTHS}:

23 sum {j in PATTERNS} nbr[i,j] * Cut[j] >= orders[i];

24

25 # for each width, total

26 # rolls cut meets total orders

Data (Cutting-Stock Model) [cut.dat]

1 data;

2

3 param roll_width := 110 ;

4

5 param: WIDTHS: orders :=

6 20 48

7 45 35

8 50 24

9 55 10

10 75 8 ;

11

12 param nPAT := 9 ;

13

14 param nbr: 1 2 3 4 5 6 7 8 9 :=

15 20 3 1 0 2 1 3 0 5 0

16 45 0 2 0 0 0 1 0 0 0

17 50 1 0 1 0 0 0 2 0 0

18 55 0 0 1 1 0 0 0 0 2

19 75 0 0 0 0 1 0 0 0 0;

AMPL Output [model cut.mod; data cut.dat; solve; display Cut;]

1 Presolve eliminates 1 constraint.

2 Adjusted problem:

3 6 variables, all integer

4 4 constraints, all linear; 11 nonzeros

5 1 linear objective; 6 nonzeros.

6 MINOS 5.5: ignoring integrality of 6 variables

7 MINOS 5.5: optimal solution found.

8 1 iterations, objective 49.5

9 Cut [*] :=

10 1 7.5

11 2 17.5

12 3 16.5


13 4 0

14 5 8

15 6 0

16 ;

For this solution, note that the MINOS solver is not an integer solver and therefore ignoresthe declared integrality constraint for the six Cut variables on line 17 in the model file, asreported in the AMPL output on line 6. We will ensure integrality in part (d) of this exercise.

(b) Replacing the Fill constraint on line 23 in the model file with the new constraint

0.9* orders[i] ¡= sum j in PATTERNS nbr[i,j] * Cut[j] ¡= 1.4 * orders[i];

and resolving with the same data as before gives the following AMPL output and new solution


2 iterations, objective 44.55

Cut [*] :=

1 7.6

2 15.75

3 14

4 0

5 7.2

6 0 ;

(c) If we set the parameter nPAT to 7 and include the additional pattern that cuts two 50” rolls(and produces 10” waste) in the nbr data on lines 14-19 in the above data file

param nPAT := 7 ;

param nbr: 1 2 3 4 5 6 7 :=

20 3 1 0 2 1 3 0

45 0 2 0 0 0 1 0

50 1 0 1 0 0 0 2

55 0 0 1 1 0 0 0

75 0 0 0 0 1 0 0 ;

then both optimal solutions will be improved as shown in the following two AMPL outputs.

You may have found yourself that adding the seemingly better pattern that cuts two 55” rolls(and produces no waste) does not improve any solution. Can you (intuitively) explain why?

(d) The following AMPL outputs give the optimal integer solutions found by CPLEX and LP-SOLVE for the two models in part (a) (on the left) and (b) (on the right) with the additionalcutting pattern introduced in part (c). Note that the solutions found by the two solvers usethe same number of rolls but different cutting patterns and, thus, are not unique.

Note that LPSOLVE uses a much larger number of simplex iterationsthan CPLEX, which makes use of more powerful MIP (mixed integer programming) simplexsteps. Also note that LPSOLVE uses an additional integer programming technique calledbranch-and-bound which is not necessary using the algorithm implemented by CPLEX (atleast for this problem). These (and many other) topics concerned with integer programmingand more general discrete or combinatorial optimization are discussed in the course MATH7594 Integer Programming ¨

141

Solution 11.2 (The Matrix Inversion Lemma / Sherman-Morrison-Woodbury Formu-las): For (a), we can multiply both sides by (E+UDV ) from either the right or the left and thensimplify the right-hand side until it reduces to the identity. Choosing the former, we get

(

E−1 − E−1U(D−1 + V E−1U

)−1V E−1

)

(E + UDV )

= I + E−1UDV − E−1U(D−1 + V E−1U

)−1V − E−1U

(D−1 + V E−1U

)−1V E−1UDV

(keep the identity and factor E−1U from the left and DV from the right of the other terms)

= I + E−1U[

I −(D−1 + V E−1U

)−1D−1 −

(D−1 + V E−1U

)−1V E−1U

]

DV

(now keep the inner identity and factor(D−1 + V E−1U

)−1from the left of the other terms)

= I + E−1U[

I −(D−1 + V E−1U

)−1 (D−1 + V E−1U

)]

DV

(observe that the inner-most expression reduces to zero leaving only the identity to the left)

= I + E−1U [I − I]DV = I − E−1U [0]DV = I �

Then (b) follows directly from (a) by setting D = 1 ∈ R1×1, U = u ∈ Rm×1, and V = vT ∈ R1×m.Finally, for (c), it is sufficient to multiply the identity in (b) by u from the right and then rewrite

(E + uvT )−1u = E−1u− E−1u (1 + vTE−1u)−1

︸︷︷︸

a scalar number

vTE−1u︸︷︷︸

another scalar

=

(

1−vTE−1u

1 + vTE−1u

)

E−1u =(1 + vTE−1u

)−1

︸︷︷︸

α

E−1u = αE−1u �

Solution 11.3 (Interior-Point Algorithms in Matlab)

Program 1 (An Affine-Scaling Algorithm in Matlab) [affscale.m]

1 function x = affscale(c,A,b,varargin)

2 %----------------------------------------------------

3 % function x = affscale(c,A,b,options)

4 %----------------------------------------------------

5 % Solves linear programs of the form

6 % max c^Tx

7 % s.t. Ax = b

8 % x >= 0

9 % using the affine-scaling algorithm

10 %-----------------------------------------------------

11 % Inputs: c, A, b LP problem data as given above

12 % options an optional list of parameters

13 % .maxiter (maximum number of iterations)

14 % .r (step length reduction factor)

15 % .epsilon (tolerance value: convergence)

16 % .delta (tolerance value: feasibility)

17 % .M (tolerance value: boundedness)

18 % .x0 (strictly pos. starting point)

19 % Output: x (approx) opt. primal solution

20 %-----------------------------------------------------

21 % Last updated: AE 2009-11-16 (for UCD Math 5593 LP)

22 %-----------------------------------------------------

2324 %----------------------------------------

25 % Phase 0: Check data and get parameters

26 %----------------------------------------

27 [m n] = size(A);

28 if length(b) ~= m error(’Dimensions of A and b are not compatible.’); end


29 if length(c) ~= n error(’Dimensions of A and c are not compatible.’); end

30 if nargin == 4 param = varargin{1}; else param = []; end

31 if isfield(param,’maxiter’) maxiter = param.maxiter;

32 else maxiter = 10^5; end

33 if isfield(param,’r’) r = param.maxiter;

34 else r = 2/3; end

35 if isfield(param,’epsilon’) epsilon = param.epsilon;

36 else epsilon = 10^-8; end

37 if isfield(param,’delta’) optimal = param.delta;

38 else delta = 10^-2; end

39 if isfield(param,’M’) M = param.M;

40 else M = 10^6; end

41 if isfield(param,’x0’) && length(param.x0) == n && all(param.x0 > 0) x = param.x0;

42 else x = ones(n,1); end

43 %-------------------------------------

44 % Phase 1: Find initial feasible point

45 %-------------------------------------

46 rho = b - A*x;

47 if any(abs(rho) > delta);

48 fprintf(’Phase 1: ’)

49 A1 = [A rho];

50 c1 = [zeros(n,1); -1];

51 options.x0 = [x; 1];

52 x1 = affscale(c1,A1,b,options);

53 x0 = x1(n+1); x = x1(1:n);

54 if x0 < delta

55 fprintf(’Phase 2: ’)

56 else

57 fprintf(’Original LP is infeasible.\n’)

58 return;

59 end

60 end

61 %-----------------------------------

62 % Phase 2: Find an optimal solution

63 %-----------------------------------

64 for k = 1:maxiter

65 D = diag(x.^2); AD = A*D;

66 dx = (D - AD’*(AD*A’)^(-1)*AD)*c;

67 theta = r*min([x./abs(dx);1]);

68 x = x + theta*dx;

69 if max([c’*dx,norm(dx)]) < epsilon/theta

70 fprintf(’Optimal solution found (%d iterations).\n’,k)

71 fprintf(’Optimal objective value: %f\n’,c’*x)

72 return;

73 elseif max(x) > M

74 fprintf(’Problem is unbounded (%d iterations).\n’,k)

75 return;

76 end

77 end

78 fprintf(’Maximum number of iterations reached (%d iterations).\n’,k)

79 fprintf(’Current objective value: %f\n’,c’*x)

Program 2 (A Primal-Dual Path-Following Algorithm in Matlab) [primdual.m]

1 function [x,y,z] = primdual(c,A,b,varargin)

2 %----------------------------------------------------

3 % function [x,y,z] = primdual(c,A,b,options)

4 %----------------------------------------------------

5 % Solves primal-dual linear programs of the form

6 % (P) max c^Tx (D) min b^Ty

143

7 % s.t. Ax = b s.t. A^Ty + z = c

8 % x >= 0 z >= 0

9 % using the primal-dual path-following algorithm

10 %-----------------------------------------------------

11 % Inputs: c, A, b LP problem data as given above

12 % options an optional list of parameters

13 % .maxiter (maximum number of iterations)

14 % .r (step length reduction factor)

15 % .delta (barrier parameter red factor)

16 % .epsilon (tolerance value: duality gap)

17 % .epsprim (tolerance value: primal feas)

18 % .epsdual (tolerance value: dual feasib)

19 % .M (tolerance value: boundedness)

20 % .x0 (strictly pos. starting point)

21 % .y0 (unrestricted/zero by default)

22 % .z0 (strictly pos. starting point)

23 % Output: x, y, z (approx) optimal prim-dual sol

24 %-----------------------------------------------------

25 % Last updated: AE 2009-11-16 (for UCD Math 5593 LP)

26 %-----------------------------------------------------

2728 %------------------------------------------------

29 % Initialization: Check data and get parameters

30 %------------------------------------------------

31 [m n] = size(A);

32 if length(b) ~= m error(’Dimensions of A and b are not compatible.’); end

33 if length(c) ~= n error(’Dimensions of A and c are not compatible.’); end

34 if nargin == 4 param = varargin{1}; else param = []; end

35 if isfield(param,’maxiter’) maxiter = param.maxiter;

36 else maxiter = 10^2; end

37 if isfield(param,’r’) r = param.maxiter;

38 else r = 2/3; end

39 if isfield(param,’epsilon’) epsilon = param.epsilon;

40 else epsilon = 10^-8; end

41 if isfield(param,’epsprim’) epsprim = param.epsprim;

42 else epsprim = 10^-8; end

43 if isfield(param,’epsdual’) epsdual = param.epsdual;

44 else epsdual = 10^-8; end

45 if isfield(param,’delta’) optimal = param.delta;

46 else delta = 10^-1; end

47 if isfield(param,’M’) M = param.M;

48 else M = 10^6; end

49 if isfield(param,’x0’) && length(param.x0) == n && all(param.x0 > 0) x = param.x0;

50 else x = ones(n,1); end

51 if isfield(param,’y0’) && length(param.y0) == m y = param.y0;

52 else y = zeros(m,1); end

53 if isfield(param,’z0’) && length(param.z0) == n && all(param.z0 > 0) z = param.z0;

54 else z = ones(n,1); end

55 %-------------------------------------------------

56 % Optimization: Find an optimal feasible solution

57 %-------------------------------------------------

58 for k = 1:maxiter

59 gamma = x’*z;

60 rho = b - A*x;

61 sigma = c - A’*y + z;

62 if max([gamma/epsilon,norm(rho)/epsprim,norm(sigma)/epsdual]) < 1

63 fprintf(’Optimal solution found (%d iterations).\n’,k)

64 fprintf(’Optimal objective value: %f\n’,c’*x)

65 return;

66 elseif max(x) > M


67 fprintf(’Primal problem is unbounded (%d iterations).\n’,k)

68 return;

69 elseif max(z) > M

70 fprintf(’Dual is unbounded and primal is infeasible (%d iterations).\n’,k)

71 return;

72 end

73 mu = delta*gamma/n;

74 dxyz = [A sparse(m,m) sparse(m,n);

75 sparse(n,n) A’ -eye(n,n);

76 diag(z) sparse(n,m) diag(x)] \ ...

77 [rho; sigma; mu*ones(n,1)-diag(x)*z];

78 dx = dxyz(1:n);

79 dy = dxyz(n+1:n+m);

80 dz = dxyz(n+m+1:2*n+m);

81 theta = r*min([x./abs(dx);z./abs(dz);1]);

82 x = x + theta*dx;

83 y = y + theta*dy;

84 z = z + theta*dz;

85 end

86 fprintf(’Maximum number of iterations reached (%d iterations).\n’,k)

87 fprintf(’Current objective value: %f\n’,c’*x)

Test Base (Matlab Input Data for Problem Exercises 2.1-2.11) [testbase.m]

1 display(’---------------------Problem Exercise 2.1---------------------’)

2

3 A = [2 1 1 3 1 0; 1 3 1 2 0 1]; b = [5 3]’; c = [6 8 5 9 0 0]’;

4

5 display(’1) Affine-Scaling Algorithm’); affscale(c,A,b)

6 display(’2) Primal-Dual Algorithm’); primdual(c,A,b)

7 %display(’3) Primal-Dual Simplex’); simplex(c,A,b)

8


10

11 A = [[2 1; 2 3; 4 2; 1 5] eye(4)]; b = [4 3 5 1]’; c = [2 1 0 0 0 0]’;

12




16


18

19 A = [-1 -1 -1 1 0; 2 -1 1 0 1]; b = [-2 1]’; c = [2 -6 0 0 0]’;

20




24


26

27 A = [2 -5 1 1 0; 2 -1 2 0 1]; b = [-5 4]’; c = [-1 -3 -1 0 0]’;

28




32

145


34

35 A = [-1 -1 1 0 0; -1 1 0 1 0; 1 2 0 0 1]; b = [-3 -1 4]’; c = [1 3 0 0 0]’;

36




40


42

43 A = [-1 -1 1 0 0; -1 1 0 1 0; 1 2 0 0 1]; b = [-3 -1 2]’; c = [1 3 0 0 0]’;

44




48


50

51 A = [-1 -1 1 0 0; -1 1 0 1 0; -1 2 0 0 1]; b = [-3 -1 2]’; c = [1 3 0 0 0]’;

52




56


58

59 A = [[1 -2; 1 -1; 2 -1; 1 0; 2 1; 1 1; 1 2; 0 1] eye(8)];

60 b = [1 2 6 5 16 12 21 10]’; c = [3 2 0 0 0 0 0 0 0 0]’;

61




65


67

68 A = [[0 2 3; 1 1 2; 1 2 3] eye(3)]; b = [5 4 7]’; c = [2 3 4 0 0 0]’;

69




73


75

76 A = [1 1 1 1]; b = 1; c = [6 8 5 9]’;

77




81


83

84 A = [1 1 1 0 0 0 -1 0; -1 0 0 1 1 0 0 0; 0 -1 0 -1 0 1 0 0; ...

85 0 0 1 0 1 1 0 1]; b = [1 0 0 1]’; c = -[1 8 9 2 7 3 0 0]’; %minimize!


86




Give-Away: You will implement the still missing simplex routine for your final exam ¨

Matlab Output (compare especially the number of iterations and solution accuracy)

1 ---------------------Problem Exercise 2.1---------------------

2

3 1) Affine-Scaling Algorithm

4

5 Phase 1: Optimal solution found (21073 iterations).

6 Optimal objective value: -0.000071


8 Optimal objective value: 17.001392

9 ans =

10 1.9998 0.0000 1.0004 0.0000 0.0001 0.0000

11

12 2) Primal-Dual Algorithm

13

14 Optimal solution found (27 iterations).


16 ans =

17 2.0000 0.0000 1.0000 0.0000 0.0000 0.0000

18


20


22





27 ans =

28 1.0001 0.0000 1.9998 0.9998 0.9996 0.0000

29


31



34 ans =

35 1.0000 0.0000 2.0000 1.0000 1.0000 0.0000

36


38


40




147


45 ans =

46 0.0001 0.5000 1.5000 0.0000 0.0000

47


49



52 ans =

53 0.0000 0.5000 1.5000 0.0000 0.0000

54


56


58





63 ans =

64 0.0000 1.0000 0.0000 0.0001 4.9998

65


67



70 ans =

71 0.0000 1.0000 0.0000 0.0000 5.0000

72


74


76





81 ans =

82 2.0001 1.0000 0.0001 0.0002 0.0000

83


85



88 ans =

89 2.0000 1.0000 0.0000 0.0000 0.0000

90


92


94




97 Original LP is infeasible.

98 ans =

99 2.4998 0.0002 0.0002 1.9998 0.0002

100


102

103 Dual is unbounded and primal is infeasible (32 iterations).

104 ans =

105 2.5000 0.0000 0.0000 2.0000 0.0000

106


108


110



113 Phase 2: Problem is unbounded (30 iterations).

114 ans =

115 1.0e+006 *

116 0.7731 0.3865 1.1596 0.3865 0.0000

117


119

120 Primal problem is unbounded (30 iterations).

121 ans =

122 1.0e+006 *

123 0.7019 0.3510 1.0529 0.3510 0.0000

124


126


128





133 ans =

134 3.9999 7.9997 12.9995 5.9998 5.9998 1.0000 0.0001 0.0001 1.0001 2.0000

135


137



140 ans =

141 4.0000 8.0000 13.0000 6.0000 6.0000 1.0000 0.0000 0.0000 1.0000 2.0000

142


144


146




149


151 ans =

152 1.5001 2.4999 0.0000 0.0001 0.0000 0.5000

153


155



158 ans =

159 1.5000 2.5000 0.0000 0.0000 0.0000 0.5000

160


162


164





169 ans =

170 0.0000 0.0001 0.0000 1.0002

171


173



176 ans =

177 0.0000 0.0000 0.0000 1.0000

178


180


182





187 ans =

188 1.0001 0.0000 0.0000 1.0002 0.0000 1.0001 0.0000 0.0003

189


191



194 ans =

195 1.0000 0.0000 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000

Solution 11.4 (The Logarithmic Barrier Problem): Letting f(x) = cTx + µ∑n

j=1 log xj

and g(x) = Ax − b, we can first find the gradient ∇f(x) by taking the partial derivatives off(x) = f(x1, x2, . . . , xn) with respect to each variable xj

∂f(x)

∂xj=

∂

∂xj

n∑

j=1

cjxj + µ

n∑

j=1

log xj

= cj −µ

xj


which can be written in matrix notation as ∇f(x) = c− µX−1e where X−1 is the diagonal matrixwith X−1

jj = 1/xj . Similarly, for each constraint gi(x) = gi(x1, x2, . . . , xn) we compute its gradient

∂gi(x)

∂xj=

∂

∂xj

n∑

j=1

aijxj − bi

= aij

and thus ∇gi(x) = aTi . Substituting these two expressions into the KKT conditions, we obtain that

c− µX−1e =

m∑

i=1

yiaTi = AT y

which becomes equivalent to µ-complementarity if we substitute the dual slack variable z = AT y−c

z = c−AT y = µX−1e ⇔ Xz = µe �

You will find a motivation of the logarithmic barrier problem in Chapter 17 (“The Central Path”)in your text book, which we will study in more detail in Math 7593 Advanced Linear Programming.

Solution 11.5 (The Central Path): First, we formulate the associated dual problem, introduceslack variables, and rewrite

(P) max − x1 + x2 (D) min y1 − y2

s.t. x1 − w1 = 1 s.t. y1 + z1 = 1

x2 + w2 = 1 y2 − z2 = 1

x1, x2, w1, w2 ≥ 0 y1, y2, z1, z2 ≥ 0

(carefully check the details!) and then write down the KKT with the µ-complementarity conditions

x1 − w1 = x2 + w2 = y1 + z1 = y2 − z2 = 1 x1z1 = x2z2 = w1y1 = w2y2 = µ

which we can solve explicitly for the primal and dual central paths (x1(µ), x2(µ)) and (y1(µ), y2(µ))

x1 = 1 +w1 = 1 + µ/y1 = 1 + µ/(1− z1) = 1 + µ/(1− µ/x1) = 1 + µx1/(x1 − µ)

⇔ x1(x1 − µ) = (x1 − µ) + µx1 ⇔ x21 − (1 + 2µ)x1 + µ = 0

x1>0=⇒ x1 =

(

1 + 2µ+√

1 + 4µ2)

/2,

x2 = 1−w2 = 1− µ/y2 = 1− µ/(1 + z2) = 1− µ/(1 + µ/x2) = 1− µx2/(x2 + µ)

⇔ x2(x2 + µ) = (x2 + µ)− µx2 ⇔ x22 − (1− 2µ)x2 − µ = 0

x2>0=⇒ x2 =

(

1− 2µ+√

1 + 4µ2)

/2,

y1 = 1− z1 = 1− µ/x1 = 1− µ/(1 + w1) = 1− µ/(1 + µ/y1) = 1− µy1/(y1 + µ)

⇔ y1(y1 + µ) = (y1 + µ)− µy1 ⇔ y21 − (1− 2µ)y1 − µ = 0

y1>0=⇒ y1 =

(

1− 2µ+√

1 + 4µ2)

/2,

y2 = 1 + z2 = 1 + µ/x2 = 1 + µ/(1− w2) = 1 + µ/(1− µ/y2) = 1 + µy2/(y2 − µ)

⇔ y2(y2 − µ) = (y2 − µ) + µy2 ⇔ y22 − (1 + 2µ)y2 + µ = 0

y2>0=⇒ y2 =

(

1 + 2µ+√

1 + 4µ2)

/2.

The above derivation follows a little faster (not so much easier) by using the author’s hint to observeand exploit the symmetry between the primal and dual which allows us to directly set y1 = x2,y2 = x1, z1 = w2, and z2 = w1 and then compute the (primal) central path from the smaller system

x1 −w1 = x2 + w2 = 1 x1w2 = x2w1 = µ.

In addition, we can also find the primal and dual slack central paths (w1(µ), w2(µ)) = (z2(µ), z1(µ))

w1 = z2 = y2 − 1 =(

−1 + 2µ+√

1 + 4µ2)

/2 w2 = z1 = 1− y1 =(

1 + 2µ−√

1 + 4µ2)

/2

151

and the optimal solution and analytic center by taking the limits for µ tending to zero and infinity

x∗1 = y∗2 = limµ→0

(

1 + 2µ+√

1 + 4µ2)

/2 = 1, x1 = y2 = limµ→∞

(

1 + 2µ+√

1 + 4µ2)

/2 =∞,

x∗2 = y∗1 = limµ→0

(

1− 2µ+√

1 + 4µ2)

/2 = 1, x2 = y1 = limµ→∞

(

1− 2µ+√

1 + 4µ2)

/2 = 1/2,

w∗1 = z∗2 = lim

µ→0

(

−1 + 2µ+√

1 + 4µ2)

/2 = 0, w1 = z2 = limµ→∞

(

−1 + 2µ+√

1 + 4µ2)

/2 =∞,

w∗2 = z∗1 = lim

µ→0

(

1 + 2µ−√

1 + 4µ2)

/2 = 0, w2 = z1 = limµ→∞

(

1 + 2µ−√

1 + 4µ2)

/2 = 1/2.

The above equations allow us to plot the primal, dual, and slack central paths as parametrizedtwo-dimensional curves with positive parameter µ > 0. Alternatively, we can continue our collegealgebra work out to remove the explicit dependency on the parameter µ by writing, say, x2 = x2(x1)as a function of x1, and similar for y2 = y2(y1) = x1(x2), y2 = y2(y1) = x1(x2), w2 = w2(w1) =z1(z2) and z2 = z2(z1) = w1(w2). This is where the primal-dual symmetry comes in really handy:

x1 − w1 = x2 + w2 = 1 x1w2 = x2w1 = µ > 0

⇒ x1x2 − µ = x2 and x1x2 + µ = x1 ⇒ 2x1x2 − x1 − x2 = 0

⇒ x1 =x2

2x2 − 1and x2 =

x1

2x1 − 1for x1 ≥ 1, 1/2 < x2 ≤ 1

or µ− w1w2 = w2 and µ+ w1w2 = w1 ⇒ 2w1w2 − w1 + w2 = 0

⇒ w1 =w2

1− 2w2and w2 =

w1

1 + 2w1for w1 ≥ 0, 0 ≤ w2 < 1/2

where the bounds follow from feasibility and nonnegativity, and analogously (or again by symmetry)

y1 =y2

2y2 − 1and y2 =

y1

2y1 − 1for y2 ≥ 1 and 1/2 < y1 ≤ 1,

z1 =z2

1 + 2z2and z2 =

z11− 2z1

for z2 ≥ 0 and 0 ≤ z1 < 1/2.

Finally, the below plots show primal and dual central paths for the original and all slack variables.

0 1 2 3 40

1

2

x1

x2 x1 ≥ 1

x2 ≤ 1

0 1 2 3 40

1

2

y2

y1 y2 ≥ 1

y1 ≤ 1

0 1 2 3 40

1

2

w1/z2

w2/z1

university of colorado denver, fall 2011 alexander engauaengau/math5593/lecturenotes.pdf · math...

Documents