the impact of emerging technologies on computer science and operations research

THE 1M PAC T 0 F

E MER G IN GTE C H N 0 LOG IE S

ON COMPUTER SCIENCE

AND 0 PER AT 10 N S RES EAR C H

OPERATIONS RESEARCH/COMPUTER SCmNCE INTERFACES

SERIES

Modeling Hardware/Software Formulation Databases

Algorithmics Craphics Analysis Techniques \m~~firoqij~J AI/Neural Nets

Telecommunications

Ramesh Sharda, Series Editor Conoco/DuPont Chair of Management of Technology

Oklahoma State University Stillwater, Oklahoma U.S.A.

Other published titles in the series:

Greenberg, Harvey J. University of Colorado @ Denver A Computer-Assisted Analysis System for Mathematical Programming Models and Solutions: A User's Guide for ANALYZE©

Greenberg, Harvey 1. University of Colorado @ Denver

Modeling by Object-Driven Linear Elemental Relations: A User's Guide for MODLER©

Brown, Donald/Scherer, William T. University of Virginia

Intelligent Scheduling Systems

THE IMPACT OF

E M E R G IN G T E C H N O LOG lE S

O N C O M PUT E R S C lE N C E

ANO OPERATIONS RESEARCH

EDITED BY

Stephen G. N ash and Ariela Sofer

George Mason University

Fairfax, Virginia, USA

Associate Editors:

William R. Stewart

Edward A. Wasil

SPRINGER SCIENCE+BUSINESS MEDIA, LLC

ISBN 978-1-4613-5934-0 ISBN 978-1-4615-2223-2 (eBook)

DOI 10.1007/978-1-4615-2223-2

Library of Congress Cataloging-in-Publication Data

A C.I.P. Catalogue record for this book is available from the Library of Congress.

Copyright @ 1995 by Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 1995 Softcover reprint ofthe hardcover Ist edition 1995

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photo-copying, recording, or otherwise, without the prior written permis sion of the publisher, Springer Science+Business Media, LLC

Printed on acid-free paper.

CONTENTS

PREFACE xv

1 AN UPPER BOUND SUITABLE FOR PARALLEL VECTOR PROCESSING FOR THE OBJECTIVE FUNCTION IN A CLASS OF STOCHASTIC OPTIMIZATION PROBLEMS K.A. Ariyawan,a 1

1 Introduction 2 2 Design of the Upper Bound 6

3 Concluding Remarks 22 REFERENCES 24

2 ON EMBEDDED LANGUAGES, META-LEVEL REASONING, AND COMPUTER-AIDED MODELING Hemant K. Bharga11a and Ste11en O. Kimbrou.gh 27 1 Introduction 27 2 Meta-Level Reasoning 28 3 Reasoning: Inference and Decoding 31

4 Embedded Languages 33 5 Computer-Aided Modeling 37 6 Discussion and Examples 39 REFERENCES 42

3 MAPPING TASKS TO PROCESSORS TO MINIMIZE COMMUNICATION TIME IN A

v

VI THE IMPACT OF EMERGING TECHNOLOGIES

MULTIPROCESSOR SYSTEM Jaishankar Chakrapani and Jadranka Skorin-Kapo'U 45 1 Introduction 46 2 Tabu Search for the Mapping Problem 48 3 Robust Parallel Tabu Search Algorithm 53 4 Computational Results 56 5 Conclusions 59 REFERENCES 60

4 REFINEMENTS TO THE SO-CALLED SIMPLE APPROXIMATIONS FOR THE BULK-ARRIVAL QUEUES: MX/G/l Mohan L. Chaudhry 65 1 In trod uction 65 2 The Model 68

3 Queueing-Time Distributions 68 4 The Tails of the Queueing-Time Distributions 73 5 Special Cases 75 6 Numerical Results 77 7 Conclusions 85 APPENDIX A 85 APPENDIX B 86 REFERENCES 87

5 A NEARLY ASYNCHRONOUS PARALLEL LP-BASED ALGORITHM FOR THE CONVEX HULL PROBLEM IN MULTIDIMENSIONAL SPACE J.H. DuM., R. V. Helgason, and N. Venugopal 89 1 Introduction 90 2 Previous LP-based Approaches. 90 3 Theoretical Aspects of LP-based Approaches. 91 4 A General Approach 93 5 The New LP-based Approach 95 6 Parallel Formulation 97 7 Test Problem Generation 98

Contents Vll

8 Computational Results 99 9 Concluding Remarks 100 REFERENCES 101

6 A DYNAMICALLY GENERATED RAPID RESPONSE CAPACITY PLANNING MODEL FOR SEMICONDUCTOR FABRICATION FACILITIES Kenneth Fordyce, Gerald Sullivan 103 1 Introduction 104 2 A Brief Review Of Producing Micro-Electronic Chips 105 3 The Required Fact Bases 106 4 Steady State Capacity Analysis Model 112 5 Summary 117

APPENDIX A Decision Tiers 118

APPENDIX B Overview of ROSE 119 APPENDIX C First Small Model 120 APPENDIX D Code To Calculate Reachability Matrix 121 APPENDIX E Handling Variations Between Tools 122 APPENDIX F Transient Solver 123 REFERENCES 125

7 QUEUEING ANALYSIS IN TK SOLVER (QTK) Donald Groll and Carl M. Ha"l'1'i& 129 1 Introduction 129 2 TK and QTK 131 3 Selecting and Working with a QTK Model 135 4 Modifying Existing Models 151 REFERENCES 154

8 ON-LINE ALGORITHMS FOR A SINGLE MACHINE SCHEDULING PROBLEM Weizhen Mao, Rez K. Kincaid, and Adam Rifkin 157 1 Introduction 157 2 A single machine scheduling problem 159 3 Analysis of FCFS and SAJF 161

Vlll THE IMPACT OF EMERGING TECHNOLOGIES

4 A general lower bound 165 5 Computational results 166 6 Conclusions 170 REFERENCES 171

9 MODELING EXPERIENCE USING MULTIVARIATE STATISTICS Jerrold H. May and Luil G. Varga, 175 1 Introduction 175 2 Expectations 177 3 Constructing Expectations 178 4 An Example 183 5 Conclusions 191 REFERENCES 193

10 OPTIMAL SPARE PARTS ALLOCATION AND INDUSTRIAL APPLICATIONS Wolfgang Mergenthaler, Sigbert Felgenhauer, Peter Hardie, Mar/cu,

Gro1r., and J06et Lugger 195 1 Introduction 196 2 Model 197 3 SPARE-an implementation 205 4 Industrial Applications 210 REFERENCES 217

11 A C++ CLASS LIBRARY FOR MATHEMATICAL PROGRAMMING Soren S. Nie"en 221 1 Introduction 221 2 A Small Example Model 223 3 Structure and Use of the Class Library 225 4 Algebraic Notation and Sparse Arrays 233 5 Variable Aliasing 238 6 Extensions 240 7 Conclusion 241 REFERENCES 242

Contents ix

12 INTEGRATING OPERATIONS RESEARCH AND NEURAL NETWORKS FOR VEHICLE ROUTING Jean- Yves Potvin and Christian Robillard 245

1 In trod uc tion 245 2 A parallel insertion heuristic 246 3 The Initialization Phase 249

4 Computational Results 253 5 Concluding Remarks 256 REFERENCES 261

13 USING ARTIFICIAL INTELLIGENCE TO ENHANCE MODEL ANALYSIS Ramesh Sharda and David M. Steiger 263

1 Introduction 263 2 Current Analysis Tools 264 3 Insight System Description 265 4 INSIGHT-A Sample Session 269 5 A Sample Problem 270 6 Results 276 7 Research Directions 276 REFERENCES 277

14 SOLVING QUADRATIC ASSIGNMENT PROBLEMS USING THE REVERSE ELIMINATION METHOD Stefan Vop 281 1 Introduction 281 2 Reverse Elimination Method 283 3 Intensification and Diversification-A Clustering Approach 287 4 Computational Results 290 5 Conclusions 292 APPENDIX A Best Found Solutions 293 REFERENCES 294

x THE IMPACT OF EMERGING TECHNOLOGIES

15 NEURAL NETWORKS FOR HEURISTIC SELECTION: AN APPLICATION IN RESOURCE-CONSTRAINED PROJECT SCHEDULING Dan Zh.u. and Rema Padman 297

1 Introduction 298 2 Description of Problem and Data 299

3 Data Preprocessing and Representation 304 4 Experimental Design and Results 306

5 Conclusion 310

REFERENCES 310

K.A. Ariyawansa Department of Pure and

Applied Mathematics Washington State University Pullman, WA 99164-3113

Hemant K. Bhargava Naval Postgraduate School Code AS/BH Monterey, CA 93940

Jaishankar Chabapani Environmental Systems Research

Institute, Inc. 380 New York Street Redlands, CA 92373-8100

Mohan L. Chaudhry Department of Mathematics and

Computer Science Royal Military College of Canada Kingston, Ontario K7K 5LO Canada

J.B. Dul' Southern Methodist University Dallas, TX 75275

Sigbert Felgenhauer AEG Aktiengesellscha.ft Goldsteinstrafle 238 60528 Frankfurt Germany

xi

CONTRIBUTORS

Kenneth Fordyce International Business Machines, Inc. Mail Station 922, Kingston, NY 12401

Markus Groh Beratende Ingenieure Fra.nkiurt Kiefernweg 1 65439 Florsheim Germany

Donald Gross Depa.rtment of Opera.tions Research The George Washington University Washington, DC 20052

Peter Hardie Airbus Industrie-Airspares Weg beim Jiger 150 P.O.Box 630107, 22335 Hamburg Germany

Carl M. Barris Department of Operations Research

and Engineering George Mason University Fairfax, Virginia 22030

R. V. Belgason Southern Methodist University Dallas, TX 75275

Xl1

Steven O. Kimbrough University of PennsylVll.nia The Wharton School Suite 1300, Steinberg Hall-Dietrich

Hall Philadelphia, PA 19104-6366

Rex K. Kincaid Department of Mathematics College of William and Mary Williamsburg, VA 23187-8795

Josef Lugger Beratende Ingenieure Frankfurt Kiefern weg 1 65439 Florsheim Germany

Weizhen Mao Department of Computer Science College of William and Mary Williamsburg, VA 23187-8795

Jerrold H. May AIM Laboratory Joseph M. Katz Graduate

School of Business University of Pittsburgh Pittsburgh, PA 15260

Wolfgang Mergenthaler Beratende Ingenieure Frankfurt Kiefern weg 1 65439 Florsheim Germany

Soren S. Nielaen Management Science and

Information Systems University o{Texas Austin, TX 78712

CONTRIBUTORS

Rema Padman The Heinz School of Public Policy

and Management Carnegie Mellon University Pittsburgh, PA 15213

Jean-Yves Potvin Departement d'Informatique et

de Recherche Operationnelle Universite de Montreal C.P. 6128, Succ. Centre-Ville Montreal (Quebec) Canada H3C 3J7

Adam Rifkin Department of Computer Science California Institute of Technology Pasadena, CA 91125

Christian Robillard Departement d'Informatique et

de Recherche Operationnelle Universite de Montreal C.P. 6128, Succ. Centre-Ville Montreal (Quebec) Canada H3C 3J7

Ramesh Sharda College of Business Administration Oklahoma State University Stillwater, Oklahoma 74074

Jadranka Skorin-Kapov Harriman School for

Management and Policy State University of New York

at Stony Brook Stony Brook, NY 11794

David M. Steiger School of Business University of North Carolina, Greensboro Greensboro, North Carolina

Contributors

Gerald Sullivan International Business Machines, Inc. IBM Consulting Management

Technologies Group Burlington, VT 05401

Luis G. Vargas AIM Laboratory Joseph M. Katz Graduate

School of Business University of Pittsburgh Pittsburgh, PA 15260

N. Venugopal Southern Methodist University Dallas, TX 75275

Stefan VofJ Technische Hochschule Darmstadt, FB 1 / FG Operations Research, HochschulstraBe 1, D-64289 Darmstadt, Germany

Dan Zhu The Heinz School of Public Policy

and Management Carnegie Mellon University Pittsburgh, PA 15213

XlII

PREFACE

The emergence of high-performance computers and sophisticated software technology has led to significant advances in the development and application of operations research. In turn, the growing complexity of operations research models has posed an increasing challenge to computational methodology and computer technology. This volume focuses on recent advances in the fields of Computer Science and Operations Research, on the impact of technological innovation on these disciplines, and on the close interaction between them. The papers cover many relevant topics: computational probability; design and analysis of algorithms; graphics; heuristic search and learning; knowledge-based systems; large-scale optimization; logic modeling and computation; modeling languages; parallel computation; simulation; and telecommunications.

This volume developed out of a conference1 held in Williamsburg, Virginia, January 5-7, 1994. It was sponsored by the Computer Science Technical Section of the Operations Research Society of America. The conference was attended by over 120 people from across the United States, and from many other countries.

We would like to take this opportunity to thank the participants of the conference, the authors, the anonymous referees, and the publisher for helping produce this volume. We express our special thanks to Bill Stewart and Ed Wasil for serving as Area Editors.

Stephen G. Nash and Ariela Sofer

1 "Computer Science and Operationa Research: The Impact of Emerging Technology"

xv

1 AN UPPER BOUND SUITABLE FOR

PARALLEL VECTOR PROCESSING

FOR THE OBJECTIVE FUNCTION

IN A CLASS OF STOCHASTIC

OPTIMIZATION PROBLEMS

ABSTRACT

K.A. Ariyawansa

Department of Pure and Applied Mathematics Washington State University

Pullman, WA 99164-3113

We consider the two-stage stochastic programming problem with recourse, and with a discretely distributed random variable with a finite number of realizations. When the number of realizations is large, the solution of these problems is difficult because the computation of values and subgradients of the expected recourse function is difficult. In this paper, we describe an algorithm that designs an upper bound to the expected recourse function. The computation of the values and subgradients of this upper bound is much faster than the computation of those of the expected recourse function, and is well-suited for parallel vector processors.

1

2 CHAPTER 1

1 INTRODUCTION

The two-stage stochastic program with recourse, and with a discretely distributed random variable with a finite number of realizations, is the following:

Find ZO ERn, such that when z := z· z(z) := eT z + Q(z) is minimized, and Az = b, z ~ 0, where

K

Q(z) := E[Q(z, h, T)] = L: IQ(z, hi, Tie), 1e=1

Q(z,h, T):= inf {qTy: My = h - Tz,y > O}, pER'" -

A E Ron, xn" bERon" e ERn" q ERn., M E R m• xn. are deterministic and given, and h E R m., T E R m• xn, are random with (h, T) having the given probability distribution F := {((hie, Tie), I), Ie = 1,2, ... , K}.

(1)

Problem (1) arises in operations research problem areas including industrial management, scheduling, and transportation; in control theory; and in economics. The monographs [8,9] for example, contain details of specific applications.

In meaningful applications the number of realizations K of the probability distribution F is 'large' (say K R:: 10000). Any algorithm for the solution of (1) would at least require values z(z) ofthe objective function z for many values of the argument z. Since z(z):= eTz + Q(z), this implies that Q(z) needs to be computed for many z during the execution of an algorithm for (1). Note that the evaluation of Q(z) for a single z involves the solution of K linear programs (obtained by setting (h, T) := (hie, Tie), Ie = 1,2, ... , K in the definition of the function Q). Thus for large K even the evaluation of z for a single value of z is expensive. Therefore, algorithms for the solution of (1) take prohibitively large amounts of computation.

The function Q in (1) is referred to as the reco'Ur6e fu.nction, and consequently its expectation Q with respect to the probability distribution F is referred to as the ezpectetl reeOu.r6e fu.nction. In general, 12 is non6mooth, and hence information on its 'slopes' at z are contained in a set termed the 6u.bgradient denoted by 8Q(z) [11]. Actually, many algorithms for (1) require information on the 'slopes' of 12 in addition to its value at many z [12,2,1]. Thus, stochastic programs are computationally difficult because the evaluation of the values and the subgradients of the expected recourse function is computationally difficult.

Function Bounds for Stochastic Optimization 3

The recourse function Q possesses a lot of structure. In order to be specific, let us make the following assumptions on problem (1).

(A1) The set {z : Az = b, z ~ O} is nonempty and bounded.

(A2) The set {w: My = w,y ~ O} = 1Rm ,.

(A3) The set {v : MT v ::; q} is nonempty.

It can be verified that when (A1), (A2) and (A3) are satisfied, (1) has a finite minimum. Therefore, issues of unboundedness and infeasibility of (1) do not arise. An important consequence of (A3) is that we can always write down a problem equivalent to (1), with form and data exactly as those in (1), except that q is replaced by ql where ql ~ 0 [4, Appendix]. Therefore, without loss of generality, we shall assume that

(A4) q ~ 0

in the rest of the paper. In order to expose the structure present in Q let us now define 1/J : 1Rm• -+ lR by

1/J(w) min {qTy: My = w,y ~ O} YEIR~'

min{a: (:) E pos (~)}. (2)

In (2) and in the rest of the paper, given A E 1Rmxn we define the set pos (A) ~ 1Rm by pos (A) := {v : Au = v, u E 1Rn , u ~ O}. The second expression for 1/J in (2) would then easily follow. We can now write down the following expressions for the value Q(z) and the subgradient oQ(z) of Qat z in terms of 1/J.

K

Q(z) = LPlc1/J(h" - T"z) (3) 1c=1

K

oQ(z) = - Ll(T")Ta1/J(hk - T"z) (4) "=1

It can be shown that o1/J(hk - Tkz) is given by the maximizers of the dual of the linear program on the right-hand-side of (3):

o1/J(h/r. - T"z) = argmax {[h" - T"zf v: MT v::; q}, k = 1,2, ... , K. (5) lIEIR""

4 CHAPTER 1

Note that we can solve the K linear programs on the right-hand-side of (3) (defined through (2)) in parallel to obtain the minima .,p(hle - Tlez) and dual maximizers 8.,p(hle - Tle z ), Ie = l,2, ... ,K, needed in computing Q(z) and 8Q(z). We illustrate such a computation within the context of an algorithm for (1) schematically in Figure 1. In fact, in [1,2], performance results on a Sequent/Balance and on an Alliant FX/8 are presented for an algorithm for (1) when the computation of the values and subgradients of 12 is done in parallel. The results presented in [1,2] indicate that the execution time for the computation of the values and subgradients of 12 dominate the overall execution time. They also indicate that even with parallel processing, problems with large K take prohibitively large amounts of execution times.

As (3-4) indicate, the computation of values and subgradients of 12 within an algorithm for (1) essentially amounts to computing values and subgradients of .,p for a large number of values for its argument. It is therefore appropriate to consider approximating .,p so that values and subgradients of this approximant can be computed faster than those of.,p. In this paper, we present an upper bound .,p1) to .,p based on a collection 'D of nonsingular matrices in 1Rm • xm ••

We wish to make three general remarks about the upper bound .,p1). First, the computation of values and subgradients of .,p1) is in general easier than the computation of those of.,p. Moreover, these computations are well-suited for parallel vector processors even more so than those of "".

Second, the aim is to use 1/J'D in place of.,p in (3-4) to obtain the upper bound 121) on 12, and to solve the approximant to problem (1) that results when 12 in (1) is replaced by 121). In the case of practical models that yield (1) all what is needed is a good approximate solution. Indeed, since algorithms for (1) take large amounts of computation, models are currently solved approximately using heuristics [10].

Third, a substantial amount of work is necessary to design the collection 'D and the upper bound .,p1) so that .,p1) shares the important properties of.,p. We believe that the speed with which we can solve the resulting approximant to (1) would more than outweigh the effort needed to produce ""1), and also the inexactness of the solution obtained.

The upper bound we describe here is related to that described in [4]. Our purpose here is to provide an informal description of the algorithm that designs the upper bound, avoiding technical details that hinder the interpretation of the operations of the algorithm. The reports [3,4] are being revised to contain all the necessary technical details and computational results so that they would form more technical companions of the present paper.


OUTER

ALGORITHM

X

J2(-x) ? o~('X) ?

7J:;T , , , , I I , I I I I I , , I

(A~TA) I I I I I I I I I , I

I I I I

a~TK) K

~('X)=E ~A~aA_ ~:I

SOLVE m Wrr;H (j);: J,.~ T'X -+ 'f(/..~T!xJ) v-p{/,XX) I

, I I

SOLVE (2) W'~H W:= A~ T~ -+ J6 (-A.h~), U,v(t!Tt)

Figure 1 Exact Computation of Q(",) and 8Q(",) in the Context of an Algorithm for (1)

6 CHAPTER 1

In the following section we consider properties of 1/1 and discuss how we propose to design the collection 'D and the upper bound 1/11). In the concluding section we comment on some important properties of the upper bound 1/11) and the upper bound (21) it induces, and on some related work in [5,6,13].

2 DESIGN OF THE UPPER BOUND

We begin by listing some properties of the function 1/1 that result when (A2), (A3) and (A4) hold. Properties (i), (ii) and (iii) below are easy to establish (see for example [14]), and properties (iv) and (v) are established in [3].

(i) 0 ~ 1/I(w) < 00, and 1/1(0) = o.

(ii) 1/1 is convex.

(iii) 1/1 is positively homogeneous, i.e. 1/I(AW) = A1/I(W) for all A > o.

(iv) The epigraph epi 1/1 := {[a, wT]T : a ~ 1/1 ( w), w E m.m2} of 1/1 is a convex polyhedral cone with the characterization

(6)

where qT P = [OT, (ql)T] (with 0 E m."o, ql E It"', ql > 0, nO + n 1 = n2) is the permutation of qT so that its first nO components are zeroes and the next n1 components are positive, and MP = [MO, Ml] is the corresponding permutation of M.

(v) The level set lev .. 1/I:= {w: 1/I(w) ~ T,W E Itm2} of 1/1 at T (0 ~ T < 00) has the characterization

(7)

In (7) and in the rest of the paper, given A E Itmx" , co (A) := {v : Au = v, u E It", eT u = I} C m.m is the convex hull of the columns of A, and given u E m.", diag(x) is the diagonal matrix whose ii-th diagonal element is zo, i = 1,2, ... , n.

In Figure 2 we illustrate the epigraph and the level set of a typical 1/1 with m2 := 2, n2 := 6 and q > O. Note that these follow from the characterizations

Function Bounds for Stochastic Optimization

w.

Figure 2 Function ¥. Epigraph cpi ¥. and Level Set levI ¥ (ml := 2. nl := 6. q > 0 in this Example

1

8 CHAPTER 1

(6-7). Note also that the level sets levTtP have the same shape for all T > 0 and therefore we have chosen the arbitrary value T := 1 in Figure 2.

Suppose now that we are given a nonsingular matrix D E JR.m.xm.. It is possible to construct a function tPD : JR.m. -+ R which agrees with tP along the directions given by the columns of D and their negatives as follows. Define r, 6 E JR.m. by

m.

r:= L tP(D.j)ej, j=l

and then tPD : JR.m. -+ JR. by

m.

S := L tP( -D.j )ej j=l

(8)

tPD(W) := min{a: (:) E pos U}, U = (~ ~:). (9)

Then it can be shown that tPD(±)..D.i) = tP(±)..D.i) for all ).. ~ 0 and that 1jJ(w) ~ tPD(W) for all W E JR.m •• In Figure 3, we indicate the epigraphs and the level sets of such a tPD and the tP in Figure 2. In Figure 3 note that epitPD c epitP and lev1tPD c lev1tP, which illustrate the fact that tPD is an upper bound on tP.

An important consequence of the way we define tPD (through (8-9» is that its values and subgradients, unlike those of tP, could be computed easily. In order to indicate how this may be done, we need to define some notations. Given z E JR.", we define z+, z_ and z. all in JR." by (z+), := z, if z, ~ 0 and (z+), := 0 otherwise, (z_), := Zi if Zi < 0 and (Z-)i := 0 otherwise, and (Z .. )i := 1 if Zi = 0 and (z .. ), := 0 otherwise, for i = 1,2, ... , n. Given A E JR.mx .. we define A+, A_ and A .. by (A+).j := (A.j)+, (A_).j := (A.j)_ and (A .. ).j:= (A.i)" for j = 1,2,,,.,n respectively. For)" E JR., sgn(>.):= 1 if ).. ~ 0, and sgn()..) := -1 otherwise. For z E JR." we define sgn(z) E:IN" by (sgn(z)), := sgn(z,) for i = 1,2, ... , n.

We now give expressions for the value and the subgradient of tPD at W E JR.m •. Let t E JR.m. be the solution to the system Dt = W and let Y, Z E JR.m.xm. be given by

Y = diag(sgn(t)), Z = diag(t.).

Then if assumptions (A2) and (A4) are satisfied, the value tPD(W) and the subgradient 8tPD(W) of tPD at W are given respectively by

= rTt+ - sTL and (10)

{u: DT u = v, v = (Y+ - Z)r + Y_s + Zh, hE [-s, r]). (11)


epi-PD epip

• QJ

, , .. \ .. ,

, , . , . 1 \

lev, YD

Figure a Functions.p, .pD, Epigraphs epi.p, epi.pD, and level sets levI.p, levI.pD

9

10 CHAPTER 1

Note that the most expensive part ofthe computations in (10-11) is the solution of the two systems Dt = w and DT u = v for t and u respectively. Computing 1/ID(W) and 81/ID(W) therefore is considerably cheaper than computing 1/I(w) and 81/1(w) by solving the linear program on the right-hand-side of (2).

1/ID therefore is an attractive upper bound. Note however, that although it agrees with 1/1 along 27n2 directions given by the columns of D and their negatives, it may provide a poor approximation for 1/1 along other directions. See for example direction Wi in Figure 3 along which levl1/lD and levl1/l do not match. In order to improve the approximation, it is possible to define L functions 1/ID1,1/ID" ... ,1/ID£, relative to a collection V of nonsingular matrices in IRm • xm., V := {DI, D2, ... , DL} and then define 1/1» : IRm• --+ R by

(12)

In (12), 1/1 D' is defined by (8-9) with D := Di , i = 1,2, ... , L. In Figure 4 we illustrate the epigraphs and level sets pertinent to a collection V := {DI, D2}.

Of course using 1/1» in (12) improves the approximation. Note however, that 1/1» in (12) for an arbitrary collection V is not convez. The level set levl1/l» in Figure 4 for example is not convex. This means that when 1/1» is used in (3-4) in place of 1/1 the resulting upper bound Q%) on Q is not convex. Consequently, the resulting approximation to problem (1) becomes nonconvex. Nonconvex optimization problems are much more difficult to solve than convex ones, and this fact far outweighs the improved accuracy provided by (12).

We now describe an algorithm that would generate a special collection V so that 1/1%) in (12) defined with respect to that collection is convex. As stated in the introductory section, we avoid proofs in our description here. We shall however, try to justify our description by appealing to intuition. In §3, we shall indicate some additional results concerning this collection and the upper bound it defines.

We assume that as part of the input to our algorithm we have a set of nonzero vectors u· E IRm., k = 1,2, ... , J along whose directions we desire 1/1» to be exact. We propose to build the collection V by an updating scheme as follows. We begin with a trivial collection Vo := {Do}, where Do E IRm • xm•

is nonsingular. In §3 we suggest choices for u·, k = 1,2, ... , J and Do, but our algorithm can take any set of nonzero vectors u·, k = 1,2, ... , J and any nonsingular matrix Do as input.

Now let the values of r and 8 when D := Do in (8) be ro and 80 respectively. Define 1/IDQ : IRm • --+ lR. by setting D := Do, r := ro and 8 := 80 in (9), and let


\ \ \ \ \ \

Figure 4 Functions.p, .pDl, .pDl, Epigraphs epi.p, epi.pDl, CPi.pDl, and lcvcl.etllcvI .p, levI .pDl, leVI.pDl

11

12 CHAPTER 1

tP'Do := tPDo. Note that

tP'Do (±A(Do).;) = tP(±A(Do).;) VA > 0, j = 1,2, ... , m2,

and that

Now if we have tP(u1 ) = tP'Do(u1 ) for 1c = 1,2, ... , J then the initial collection Vo does provide an approximant that is exact along u1 , 1c = 1,2, ... , J and we terminate. Otherwise let

lei := min{1c : tP(uA:) < tP'Do(uA:), le = 1,2, ... , J}. (13)

We now wish to update the collection Vo to Vi so that 'I/J'DI is exact along wi :=

u'" while retaining exactness in all directions along which tP'Do is exact. Note that this can be achieved if we simply let Vi consist of Do and a nonsingular matrix obtained by replacing a column of Do by Wi. Indeed, because of the way we define tP'D" such a collection would make 'I/J'D, exact along _w1 as well. Such a ""'D, need not be convex. If however, we let V 1 consist of Do and a.ll nonsingular matrices obtained by replacing a single column of Do by w 1 , then 'I/J'D, would be convex. We illustrate this procedure in Figure 5 in terms oflevel sets at 1 oftP, ""Do, and ""'D,. Note that in Figure 5, leV1""'D, is convex, and this fact is true in general. In fact, the following is true from which the convexity of",,'D, follows. If we let p1 := ",,(w1), 0'1 := ",,(_w1), a1 := [rg',p1,sg',0'1)T, V1 := [Do, wi, -Do, _w1) and then define

co tP'Dl(W):= min{a: (:) E pOS (~)} (14)

then it follows that

(15)

Note that (15) is a powerful result since the computation of values and subgradients of co tP'D, needs the solution of the linear program on the right-hand-side of (14). (15) indicates that the values and subgradients of co 'I/J'DI may be evaluated considerably faster using (12), (10-11) and (8-9) for matrices in this special collection. Note that the matrices in the collection V1 are related as they are obtained by replacing columns of Do by w 1 one at a time. Therefore, we do not have to solve the two systems Dt = wand DT u = v referred to by (10-11) with D set to all the matrices in V 1 . In fact, one can solve these systems for D := Do, and obtain the solutions when D is set to other matrices in the collection without solving any additional systems.


I W

, -w

w,

NOTE: lev,,o ) THE UNiON

OF lev, ,oD., 1~y, Jb(W; (0.).21 AND lev, "[(D,l.pUJ'] , IS CONVEX'.

Figure 5 Level Sets levl "'. levl "'Do. levl "'["".{Dol.,]leVl "'[(Do).,,""] (w1

is Parallel to a Column of M in this Example)

13

14 CHAPTER 1

The above relations among the matrices in the collection suggests the following way of representing it. We can think of the collection 'D1 as consisting of two group6 of matrices. (We use the word group in a nonrigorous literal sense to indicate that members in the group are related.). The first group (which we number 0) consists of the single member Do, and the members in the second group (which we number 1) are nonsingular matrices obtained by replacing appropriate columns of Do by w1 one at a time. We say that members of group 1 are obtained by 6plitting Do. (In the algorithm we describe at the end of this section, members of group 1 are obtained by splitting a column permutation of Do.) Now the aim is to update 'Dl to'D2 so that tP'D. a.grees with tP along another direction in the set u·, k = 1,2, ... , J, and also that cotP'D.(w) = tP'D.(w) for all w E IRm ., the result analogous to (15). We shall see that the collection 'D2 would be obtained by adding a third grou p of matrices (numbered 2) to'Dl . Members of group 2 are obtained by splitting a column permutation of a member in group 0 or group 1. In general, (under a certain technical assumption we make later) each time we update the current collection 'DI to 'Dl+l the number of groups in the collection would increase by 1, and members of the new group are obtained by splitting a column permutation of a member of one of the existing groups.

With the structure described in the previous paragraph in mind we adopt the following notational conventions for describing 'DI and quantities pertinent to members of'Dl . If a quantity is the same for all the members in a group numbered g we describe that quantity by a symbol with a 6ingle superscript g. If a quantity is different for different members in a group g then we denote the quantity associated with member number m by a symbol with two superscripts-gm (in that order).

We now describe the collection 'Dl in precise terms. In (13) we compute tP'DoO using (12) and (10-11) so that

tP'Do(w1) = tPDo(Wl ) = (ro)Ttt - (60)Tt: where Dot1 = w1.

Let v1 be the number of nonzero components of t1. Let (Pl)T E IRm • xm• be any permutation matrix that permutes the components of t l so that the first vi components are nonzero and the remaining ~ - vl components are zero. A particular pl may be described as follows.

(16)


where Ie E INm• is defined by

index of j-th nonzero component of t l , for j = 1,2, ... , /11

index of (j - /11 )-th zero component of ti, for j = /11 + 1,/11 + 2, .. . ,m2.

The collection Vl is written as

V I := {Dgm : m = 1,2, .. . ,/lgjg = 0, I}

where

/10 ._ 1,

DOl Do and

DIm ._ DOlpl+(wl_DOlplem)e'!" m=I,2, ... ,/Il.

15

(17)

(18)

(19)

Note that DOl is the single member in group 0, and DIm, m = 1,2, ... , /1 1

are the /11 members in group 1 obtained by replacing the first /11 columns of DOl pI one at a time by WI. The permutation of DOl to DOl pI is performed for notational convenience: after the permutation the columns that have to he replaced by WI one at a time are just the first /11 columns.

We now define the following in preparation to write down an expression for ""1>,(w).

rOl := ro, sOl:= So

pl := ""(WI), 0-1 = ",,(_wl)

rIm (pl)TrOl+(pl_e'!,(pl)TrOl)em , m=I,2, ... ,/ll (20)

slm (pl)T sOl + (0- 1 _ e'!,(pl)T sOI)em , m = 1,2, ... , /11

Ugm ((r9m)T (s9mf ) Dgm _D9m m=I,2, ... ,/l9jg=0,1

""1>, (w) can then be expressed as

""1>, (w) = min{""Ds-(w) : m = 1,2, ... , j.l.9 j 9 = 0, I} (21) m,9

where

""Ds-(w)=min{a: (:) EposUgm}j m=1,2, ... ,/l9 jg=0,1. (22)

We note that ""1>, is exact along any direction along which ""1>0 was exact,

16 CHAPTER 1

and that

Note that by (18-19), (27-29), (21-22) and (10-11), we can compute ..p1J, (w) and 8..p1J, (w) for w E JR.m. if we just have Do, ro, So, which define 1)0 and ..p1Jo' and w1, 1/1 , pl, pl cr1 which in addition define 1)1 and ..p1J,. We therefore treat

(23)

as the data we store to define 1)1 and ..p1J,. -yl, 1/1 in (23) are redundant in describing 1)1' They are included so that the form of data representing 1)1 is consistent with the form of data representing 1), for I > 1.

If k1 = J or if we have ..p(uk) = ..p1J, (uk) for k = k1 + 1, k1 + 2, ... , J then the collection 1)1 provides an approximant that is exact along uk, k = 1,2, ... ,J, and we terminate. Otherwise let k2 := min{k : ..p(uk) < ..p1J, (uk), k = k 1 + 1, k1 + 2, ... , J}, and let w2 := uk'. We now wish to update the collection 1)1

to 1)2 so that ..p1J. is exact along w2 while being exact along all directions along which ..p1J, is exact. The specific way we update 'V1 to'V2 is as follows. Let

..p1J, (w2) = min{..pD'''' (w2), m = 1,2, ... ,1/9, 9 = 0, I} = ..pD"I'''' (w2) (24) "',9

where -y2, J102 are values of indices g, m respectively that yield the minimum . ..pD._(W2) in (33-35) for m = 1,2, ... ,1/9, 9 = 0, 1 is of course computed using (21-22) and (10-11), so that

where D1'''' t 2 = w 2• Let 1/2 be the number of nonzero components of t2 , and define the permutation matrix p2 by (16-17) using t2, 1/2 and p2 in place oft!, 1/1 and pl respectively. We construct 'V2 by adding a third group of matrices to the collection 1)1' This new group of matrices consists of all nonsingular matrices obtained by replacing a single column of D1',,' p2 by w 2. Specifically, the quantities pertinent to members of group 2 of'V2 are

D2m ._ D1'''' p2 + (w2 _ D1'''' p 2em )e'!" m = 1,2, ... ,1/2

l ~ ..p( w 2), (1"2 ~ ..p( _w2) chosen appropriately

r2m (p2)Tr1'''' + (p2_ e'!.(p2)Tr1'''')e ... , m= 1,2, ... ,v2 (25)

8 2m (p2)T 81',,' + ((1"2 _ e'!.(p2)T 81 'I")e ... , m = 1,2, ... , v2


Note that given VI defined by the data (23), the quantities in (25) which are needed to describe the new group to be added to VI to obtain V 2 are completely known if we have data 7 2, p.2, w2, ,,2, p2, p2 and (1'2. Therefore, the data defining V2 is

We can write V 2 := {Dgm : m = 1,2, ... , vg ; 9 = 0,1, 2}.

Here Dgm and quantities needed to compute WD .... (W) and aWD .... (W) for m = 1,2, ... , "g; 9 = 0,1 are of course defined in (18-19) and (27-29). Consider obtaining D 2m and quantities needed to compute WD .... (W) and aWD .... (W) for m = 1,2, ... , v 2 (i.e. quantities pertinent to group 2). These may be related to quantities for group 0, using data (26) that we have stored to describe Va as follows. First, note that if 7 2 = 0, then members in group 2 are obtained by splitting the matrix DoP2. If on the other hand 7 2 = 1, then they are obtained by splitting matrix D l ,.· of group 1 permuted by p2, which in turn is a result of splitting DOPI. Using indices 72, p.2, 71, p.l that we have stored we can write down a path of group-member indices that relate quantities pertinent to a member in group 2 to appropriate quantities pertinent to Do (the single member of group 0). Suppose for the moment that 7 2 = 1, and that we need to obtain D2m , r2m and 82m from data (26). Define Po := 2, qo := m, Pl := 7Po , ql :== p.1'0, and P2 :== 7P1 , q2 := p.1'1. Of course, we have PI == 1, ql == p.2 and P2 = 0, q2 = 1 since we have assumed that 7 1 == 1. Note that the groupmember indices (P2, q2), (PI, qd, (Po, qo) indicate the path that we followed to arrive at D2m from DOl == Do: a result of splitting matrix DP·9.· == DOl == Do is DP,9, = Dl,.1 and D2m is a result of splitting Dl,.l. So we can write,

DP'9.1 = DP·9.· pPI + (wP' - DP'9.. pple9.' )e~

DPo9.o = DPI9.1 pPo + (wPO _ DPI9.1 PP0 e9.o)e:o

to obtain D 2m , m = 1,2, ... , VI recursively from data in (26). Similarly,

rP'9' = (ppI)T rP'9.. + (PP' _ e;, (pp,)T rp.9.')e9.'

and

rP°9.o == (ppo)T rP,9.' + (pp0 _ e:o(ppo)T rP'9.1 )e9.0'

sP,9., == (PP'l8P'9.' + «(I'p, - e;, (pPI)T sP.9.')e9.,

sPo9.O == (ppo)T sP'9.' + «(I'po - e:o(PPOl sP' 9.' )eqO

so that we can generate r 2m, s2m for m = 1,2, ... , v 2 recursively from data in (26).

18 CHAPTER 1

tP'D. (w) can now be written as

'ifi'D.(w) = min{'ifiD (w): m = 1,2, ... , /lgjg = 0, 1, 2} mig 11m.

where

'ifiD''''(W) = min{a: (~) E pos ugm}j m = 1,2, ... , /l9 j 9 = 0,1,2

and

m = 1,2, ... , /lgj 9 = 0,1,2.

In analogy with (27-29) the reader may have expected to see p2 := 'ifi(w2) and 1T2 := 'ifi( _w2) in (25). It turns out however, that if we set

a2:= [(rO)T,/,p2, (SO)T,ITi,1T2f, V2:= [DO,W\W2,_DO,_W1,_W2] (27)

and define

co 'ifi'D.(w) := min{a: (~) E pos (~)}, (28)

then co'ifi'D.(W) = tP'D.(w), wE IRm • (29)

would not follow with p2 := 'ifi( w2) and 1T2 := 'ifi( _w2) when m2 > 2. We have to choose p2 and 1T2 more carefully to satisfy certain additional technical conditions. In fact, this is true when choosing p' and IT' for forming collections 1), for I 2:: 2 for problems with dimension m2 > 2. A discussion of these technical issues would take us too far astray and therefore, we simply state them in Algorithm 1 that we indicate below. Note however, tha.t validity of (29) is important because it ensures tha.t 'ifi'D., and in turn the resulting approximation to problem (1), are convex.

We shall now describe the general step of updating 1), to 1)1+1. The collection 1), and 'ifi'D1 would be described by data

(30)

(with 'Y1 := ° and 1-£1 := 1) and suppose therefore that we have these data already set up. Then

'V, = {Dgm : m = 1,2, ... , /l9 j 9 = 0,1, ... , I} (31)

where DOl = Do and Dgm for m = 1,2, ... , /lg, 9 = 1,2, ... ,1 are generated recursively as follows from the data (30). To generate Dgm , define Po := 9 and


qo := m. Then for i = 0, 1, ... , while Pi :f:. 0 let Pi+! := -yP; and qi := pp;.

Note that Pi = 0 for some i (say for i := ~) and that ~ ::; I. Let q>. := 1. The set of group-member indices (P>., q>,), (P>.-lJ q>.-l), ... , (Po, qo) define the path from DOl that we followed to obtain Dgm.

Now we can generate D9m from the recursion

DP'-lq'-1 = DP,q, PP;-l + (WP;-l - DP;q; PP;-l eq'_l)eq;_11 i = ~, ~ - 1, ... , O.

(32) .,pVI is defined by

.,pV1 (w) = min{'IjJD.-(w) : m = 1,2, ... , ",9i 9 = 0,1, ... , I} (33) m,g

where

min{a: (:) EposUgm}j m=I, 2,,,,,,,,9 j 9=0,1, ... ,1

( (r9m)T (S9m)T). Dgm _D9m I m = 1,2, ... , ",9 j 9 = 0, 1" .. , I, (34)

r01 = ro, SOl = SO, and r9m, s9m for m = 1,2"", ",9, 9 = 1,2, ... , I are generated recursively by

"'pi-lfi-l (pp;_l)T rP;q; + (P';-I _ (eq;_.)T (PPH f rP;q;)eq;_"

i=~,A-ll""O (PP;-l )T sP;q; + (qP;-l _ (eq;_.)T (PP;-1 f sP;q, )eq;_ II

i = A, A-I" .. 10. (35)

Now suppose that wi = '1£"'. If lei = J or if we have .,p(u") = .,pv,(u") for Ie = lei + 1, lei + 2, ... , J then 1), is exact along all desired vectors '1£", Ie = 1,2, ... I J and we terminate. Otherwise let 1e'+1 := min{le: .,p(tl·) < .,pv,(u"),1e = lei + 1, lei +2, ... , J, and let wl+1 := '1£,,'+1. Of course .,pv, (w l+1) would be computed using (33-34, 32) and (10) so that

(36)

where D"YI+'l'l+l tl+1 = wl+1. In (36) p'+1 and 1'+1 are values of indices m and 9 respectively in (33) that yield the minimum when w := wl+1. Let ",'+1 be the number of nonzero components of tl+ 1 and define the permutation matrix pl+ 1 by (16-17) using tl+t, ",'+1 and pl+1 in place oft1, ",1 and p1 respectively. A

20 CHAPTER 1

new group-group I + I-is now created as follows and added to 'DI to obtain 'D1+1:

D (1+1)m D71+11'1+lpI+1 (1+1 _ D7l+11'1+lpl+1 ) T .- + w em em'

m= 1,2, ... ,i+l

pl+1 ~ ,p(wl+\ 0.1+1 ~ ,pC _wI+1) chosen appropriately (37)

'D1+1 and ,p"l+l are completely determined by data

Do, ro, So; 'Yg, p.9, wg, 11', pg,pI, ag; 9 = 1,2, ... , 1+ 1

(with '11 := 0 and ,.,.1 := 1), and (32) and (33-35) with I replaced by 1+1.

We emphasize again that for problems with m2 > 2, ';+1 and al+1 for I ~ 1 have to be chosen to guarantee

where

CO¢1)I(w):=min{Q: (:) Epos (i,)}, al := [(rO)T, pl, p2, .•. , pi (sOf ,al , a2, .. . , a1]T,

and Vi .- [DO 1 2 I DO 1 2 I] 1.- ,w ,w , ... ,w ,- ,-w ,-w , ... ,-w .

(38)

(39)

(40)

(41)

A specific way of choosing ';+1 and al+1 to guarantee (38) is spelled out in Algorithm 1 below.

We shall continue updating the collection in the manner described above until we have treated all the vectors in the input set of vectors uk, Ie = 1,2, ... , J. If we have performed L updates when we terminate, then the collection that we use is 1h. Note that when we have the data defining the collection 'DL and the upper bound ,p"L at hand we can drop the subscript L and refer to the collection simply by 'D.

Our description above leads to the following algorithm for designing the collection'D.

Algorithm 1: (Design of the collection 'D)

Input: nonsingular Do E IRm• xm.; nonzero v/' E IRm., Ie = 1,2, ... , J.


Step 0: (Initialization) begin I := 0; for j := 1 to m3 do rif e, := 'I/J(Doe,); sife, := 'I/J(-Doe,); end do; 11° := 1; DOl := Do; rOl := ro; sOl := sO; ao := [rif, sify; Va := [Do, -Do]; end initialization.

Main Step: begin for Ie := 1 to J do wl+1 := u"; pl+l := 'I/J(W I+1); jJ+l := 'l/J2)1 (w l+1 );

1* 'l/J2)1(wIH ) is computed using (33-35), (32) and (10). Let 'Y'H,~l+l be the values of indices g,m in (33-35) that yield the minimum. */ if p'+1 < JJH then if I = 0 call fird update ([" ... ,]); if I ~ 1 call update ([,' ... ,]); end if; end do; L := I; call output; end main step.

First Update: (Form collection VI) begin solve Dotl = WI for t 1 ; 111 := number of nonzero components of t1; pl:= any permutation matrix (such as that in (16-17)) that permutes the components of t l so that its first 111

components are nonzero; 0- 1 := 'I/J(-w1); al := [(ro)T,pl, (so)T,o-l]T; VI := [Do, WI, -Do, _WI J; I := 1; end first update.

( . ) b' 0 1+' 1+, Update: Form collectIon V'+1, I ~ 1 egm B := D'T ,. ,)

1* B* is an optimal basis for the lp min{a: (:) E pOS (~)}. We assume

that B* is unique. * / for i := 1 to m2 do

i := argmin {[(al)~,(BO)-l(Vi).; - (al);]/[-e[(B*)-l(Vi).;] : ;=1,2, ... ,2(m,+I);i ~3" B'

end do; 1* .JB· is the set of column indices of Vi corresponding to B* * /

_ ._ pH _ min {e[(B*)-lwl+l[(a,)~'(B*)-l(Vi).;' - (al)"]}. PI+l .- l:Si~m, [e[(B*)-l(Vi).;>J ,

if IH < Pl+l then pl+l := Pr+l; 0-'+1 := ,p( _w'+l)j o-'H := 'l/J2)1 (_w'H ); if 0-1+1 < &1+1 then if -(al)f_B.)(B*)-I(Vi).; - (a,); < 0, j ~ .J(-B'), j = 1,2, ... , 2(ffl2 + I) then for i = 1 to m2 do

i := argmin {-[(al)f_B.)(B*)-I(Vi).; + (a').i] : ;=1,2, ... ,2(m,+I)#3"c_ B ') e[(BO)-I(Vi).i]

e[(BO)-l(Vi).i > o};

22 CHAPTER 1

end do;

if 0"1+1 < 0"1+1 then 0"1+1 ::::: 0"1+1; else 0"1+1 ::::: 0"/+1; end if; end if; solve ,+, 1+1 I I I I I D'Y I' t +1:::: W +1 for t +1; II +1 := number of nonzero components of t +1;

pl+1:= any permutation matrix that permutes the components of t l+1 so that its first 111+1 components are nonzero;

. [( )T 1 2 1+1 ( )T 1 2 1+1] T. al+l.= TO ,p ,p , ... ,p ,So ,0" ,0" , •.. ,0" ,

TT ._ [D 1 2 1+1 D 1 2 1+1]. vl+l.- o,W ,W , ... ,W ,- o,-W ,-W , ... ,W ,

I := 1+ 1; end update.

Output: begin output L; and if L :::: 0 then Do, TO, So;

1* This is data defining 'Do. * / else Do, TO, So; 'Yg , pg, w g , IIg , pg, p9, O"g; g:::: 1,2, ... ,L. 1* This is data defining 'D L, L ~ 1. * / end if; end output.

3 CONCLUDING REMARKS

We conclude the paper with the following remarks.

(1) By (31) the collection 'DI = {Dgm : m = 1,2, ... , IIg; 9 = 0,1, ... , I} has 1 + L~=1 IIg matrices. If these matrices were arbitrary, we may have to store all these matrices, and all the vectors T9m , sgm, m:::: 1,2, ... , IIg, 9 = 0,1, ... , I necessary to compute 1/J'DI(') and 81/J'DIO using (33-35) and (10-11). Due to the special structure of'Dl that Algorithm 1 above produces for any value of I, we need to store only the data Do, TO,

So; 'Yg , pg, wg, IIg , pg, p9, O'g, 9 = 1,2, ... , l. Using this data, and the recursions (32) and (35), we can generate Dgm , T9m , sgm for m = 1,2, ... , IIg , 9 :::: 0, 1, ... , l.

(2) The special structure of'Dl can be exploited in the computation of 1/J'DIO and 81/J'D1(')' Note that if'Dl were arbitrary, then this computation using (33-35) and (10-11) would involve the solution of 2 + L~=ll1g systems


of equations of size m2 x m2' Recursion (32), which specifies the special structure in 1)/, can be used to reduce the number of systems of equations that needs to be solved considerably. For example, it is possible to develop a scheme to compute '1/12)/(') that needs the solution of only one system of equations with Do as the coefficient matrix, and obtain the solutions to all other necessary m2 x m2 systems using sequences of updates to this single solution. These updates can be expressed in forms suitable for vectorization. Therefore, if Q2) and '1/12) are used in place of Q and '1/1 in Figure 1, the scheme of computation of values and subgradients of Q2) that would result is suitable for parallel vector processors.

(3) Comments in item (2) above implies that we should choose Do so that systems of equations with Do as coefficient matrix are easily solvable. Note also that by Figure 5, we should attempt to make '1/12) and '1/1 agree along the directions of the columns of M. Therefore, a possible choice for Do would be an easily invertible m2 x m2 submatrix of M that is also a feasible basis [7) for the linear program in (5). The set of vectors uk, k = 1,2, ... , J could then be the columns of M that are not used to form the submatrix chosen above for Do.

(4) It can be shown [3) that for all values of the update index l during the execution of Algorithm 1, the relation

holds. In particular, it holds for l := L the index value corresponding to the output of Algorithm 1. Therefore, the collection 1) that Algorithm 1 produces, ensures that the resulting upper bound Q2) is convex.

(5) Remarks 1,2 and 4 summarize the important properties of 1), '1/12) and Q2). These properties constitute the contribution of the present paper and the papers [4,3] relative to some related work in [5,6,13]. The idea of approximating '1/1 by 'I/ID of the form (8-9) is presented in [5], and using '1/12) of the form in (12) with arbitrary collections 1) to improve such approximations is presented in [6). As mentioned in §3, such '1/12) and the resulting Q2) need not be convex. In [13], the rudiments of a scheme for constructing 1) that contains the notion of splitting a matrix currently in the collection to create additional matrices to be added to enrich the collection as in §3 are given. The potential for use of parallel processors for the resulting computations is also indicated in [13). However, no concrete algorithms or properties analogous to those in Remarks 1,2 and 4 are given in [13). The work described here and in

24 CHAPTER 1

[4,3] is the result of an attempt to develop a formal algorithm beginning with the ideas presented in [13].

Acknowledgements

This research was supported in part by DOE Grant DE-FG-06-87ER25045, NSF Grant DMS-8918785, the NSF Science and Technology Center for Research in Parallel Computation, and the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.

REFERENCES

[1] K.A. Ariyawansa, 1992. Performance of a Benchmark Implementation of the Van Slyke and Wets Algorithm for Stochastic Programs on the AIliant FX/8, J.J. Dongarra, K. Kennedy, P. Messina, D.C. Sorensen and R.G. Voigt (eds.), Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX (March 25-27, 1991), 186-192.

[2] K.A. Ariyawansa and D.D. Hudson, 1991. Performance of a Benchmark Parallel Implementation of the Van Slyke and Wets Algorithm for TwoStage Stochastic Programs on the Sequent/Balance, Concurrency: Practice and Experience 3:2, 109-128.

[3] K.A. Ariyawansa, D.C. Sorensen and R.J.-B. Wets, 1988. On the Convezity of an Upper Bounding Approzimant to the Recourse Function in a Class of Stochastic Programs, Manuscript, Department of Pure and Applied Mathematics, Washington State University, Pullman, WA 99164-2930.

[4] K.A. Ariyawansa, D.C. Sorensen and R.J.-B. Wets, 1987. Parallel Schemes to Approzimate Values and Subgradients of the Recourse Function in Certain Stochastic Programs, Manuscript, Department of Pure and Applied Mathematics, Washington State University, Pullman, WA 99164-2930.

[5] J.R. Birge and R.J.-B. Wets, 1986. Designing Approzimation Schemes for Stochastic Optimization Problems, in particular, for Stochastic Programs with Recourse, Mathematical Programming Study 27, 54-86.


[6] J.R. Birge and R.J.-B. Wets, 1989. Sublinear Upper Bounds for Stochastic PrograTn8 with Recour6e, Mathematical Programming 43, 131-149.

[7] G.B. Dantzig, 1965. Linear Programming and Ezten!ion6, Princeton University Press, Princeton, NJ.

[8] M.A.H. Dempster, 1980. Stocha6tic Programming, Academic Press, New York.

[9] Y. Ermoliev and R.J .-B. Wets, 1988. Numerical Technique6 for Stocha6tic Optimization, Springer-Verlag, New York.

[10] T. Higgins and H. Jenkins-Smith, 1985. AnalY6" of the Economic Effect of the Ala6kan Oil Ezport Ban, J. of Opera Res. 33, 1173-1202.

[11] R.T. Rockafellar, 1970. Convez AnalY6", Princeton University Press, Princeton, N J .

[12] R. Van Slyke and R.J .-B. Wets, 1969. L-Shaped Linear Program6 with Application! to Optimal Control and Stochastic Progro,mming, SIAM J. Appl. Math. 11,638-663.

[13] R.J.-B. Wets, 1985. On Parallel Proce660r6 Design for Stochastic Program6, Working Paper WP 85-67, International Institute for Applied Systems Analysis, A-2361 Laxenburg, Austria.

[14] R.J .-B. Wets, 1966. Programming Under Uncertainty: The Equivalent Convez Program, J. SIAM Appl. Math. 14:1,89-105.

ABSTRACT

2 ON EMBEDDED LANGUAGES,

META-LEVEL REASONING, AND

COMPUTER-AIDED MODELING Hemant K. Bhargava

and Steven o. Kimbrough*

Naval Postgraduate School Code AS/BH

Monterey, CA 99940

*University of Pennsylvania The Wharton School

Suite 1900, Steinberg Hall-Dietrich Hall Philadelphia, PA 19104-6366

We discuss the role of meta.-level reasoning in developing programs that can rea.son both within and about some domain. In particular, we discuss why meta.-level reasoning is useful, and essentially required, in developing computer-based modeling systems that can provide meaningful support to modelers throughout the modeling life cycle. We then describe a general technique, which we call the embedded languages technique, for constructing systems that do meta.-Ievel reasoning. Finally, we describe how the embedded languages technique may be used in developing an advanced computer-aided modeling environment.

1 INTRODUCTION

We have four broad purposes in this paper:

1. To clarify the concept of meta-level reasoning (MLR). This concept has been discussed fairly much in the literature and has a series of different senses [9, 14, 23, 26, 27, 29]. In this paper we wish to make a clear presentation of what MLR is, and we will do so in terms of first-order versus second-order system functionality. Our discussion of this topic begins in §2.

27

28 CHAPTER 2

2. To say clearly what inference-meta-Ievel or not-is and why it is needed for computer-aided modeling. Specifically, in §3 we distinguish two sorts of reasoning: that based on decoding and that based on inference. We shall argue that meta-level reasoning is highly useful for reasoning of the decoding type, and essentially necessary for reasoning of the inferential type. Further, in what follows, we present a case for a requirement for inference-based reasoning in computer-aided modeling systems.

3. To develop a general, principled technique for constructing systems that do meta-level reasoning, both for decoding and for inference. We call this the embedded languages technique, and we will explain why we believe this to be an excellent way to develop MLR systems. We have found this technique to be useful in developing a particular kind of system (discussed below) that needs meta-level reasoning. §4 contains the main elements of what we say here on the elements of embedded languages.

4. To present, in §5, an application of this idea in developing a computeraided modeling environment. We will present examples of functionality that we want such an environment to have, and will show how we achieve this functionality with meta-level inference and the embedded languages technique.

2 META-LEVEL REASONING

2.1 First-Order Questions, Second-Order Questions

For our purpose of clarifying the concept of meta-level reasoning, it will be helpful to begin by thinking in terms of functions; we will initially focus on first-order and second-order functions. Consider a simple example to illustrate our discussion: matrix inversion. The universe of discourse about matrices and about inverting them has objects at three levels.

1. At level 0, we have objects such as matrices and matrix variables.

2. In level 1 we have functions (or, subroutines) for matrix inversion, such as L U decomposition and Gaussian Elimination. These functions take as input a matrix, and if that matrix is regular, return a matrix of the same order. In general, levell functions map from a Cartesian product of level o objects to a level 0 object.

On Embedded Languages 29

3. In level 2 we have functions that may take level 0 and/or level! entities as arguments, and map to either level 0 or level! entities. This allows us, for instance, to write functions that can reason about the individual matrix inversion subroutines that exist in the system.

Do we need level 2 functions? If we had just one matrix inversion subroutine, a simple first-order system would be sufficient.1 But when there are several such subroutines and they have properties that would make us discriminate between them, we need to make inferences such as to determine which subroutine to use when, or to control the execution of a particular subroutine. This is accomplished with second order functions-functions whose arguments themselves are first order functions.2,3 To continue our illustration, let us consider examples of second order questions for a matrix inversion system.

1. Data-driven inference about first order functions: Given a particular matrix, select the most suitable subroutine to invert it (maps from a level 0 object to a level! object)

2. Pure second order functions: Given a level! subroutine, find another that is more robust (but possibly slower), or find one that is faster (possibly less robust).

3. Controlling execution of a levell function: If the execution of a subroutine is approaching numerical instability, change the tolerance level or recommend another subroutine.

As with other examples of meta-level reasoning, it might appear that these "second-order rules" could be programmed as first-order functions. Although this is possible, doing so would miss the point about meta-level reasoning, at least from an architectural and design point of view. Two design goals for developing MLR systems are !) to separate, explicitly, first-order reasoning from

1 Even here, second-order flUlctions could .be used gainfully, for example to detect that (for a regular matrix M) M . N . M-1 can be simplified to N.

2 Note that thi. is distinct from functional compOlition. Functions are composed by applying a first order function / to the output g(z) (where z is a level 0 object) of a first order function g. For example, if particular functions F and G invert matrices and M is a regular matrix, then (F. G)(M) = F(M-1 ) = M. The argument of /, i.e. g(z), is still a level 0 object, whereas an argument of a second order function may be a first order flUlction.

3 This sort of a thing could continue indefinitely (e.g., we can imagine a 3'" order function that tells us what kind of second order information is available for a first order object), but it is a matter of diminishing returns; see Genesereth and Nilsson [14].

30 CHAPTER 2

second-order reasoning and 2) to have as much of a declarative representation as possible of the control knowledge (second order functions). Regarding the first goal, a clean separation of first- and second-order reasoning is-from an implementation point of view-desirable in and of itself. In general, anything thai makes for clarity and modularity in a program also contributes to maintainability and modifiability. Regarding the second goal, a declarative (or non-procedural) representation is to be contrasted with a procedural representation. A procedural representation is compiled, and is executed when called. Declarative representations are executed via other programs, called interpreters, which translate the declarations into machine-processable form. The crucial difference between them, for present purposes, is that in the case of declarative representations it is possible to write programs (other types of interpreters) that can manipulate them, modify them, and extract useful information from them. This is not practicable in the case of compiled, procedural representations. There is a penalty to be paid in terms of computing resources for working with declarative representations, but the benefits, measured in flexibility, maintainability, ease of adding features, and generality are substantial. We have noted that it is possible to program particular second-order rules as first-order functions, but it is not possible to do so in a general, flexible, and maintainable way. MLR systems, thus, offer an attractive alternative, albeit at the cost of computing time.

With a meta-level architecture (for supporting meta-level reasoning), one has an explicit, declarative statement of the control strategy; such a statement makes it easier to understand and modify the strategy and can be used to support inferencing and learning. This leads us to make an essential distinction, which we explain in the next section, between two kinds of reasoning: inference and decoding. Programming "second-order rules" as first-order functions would enable us to do some amount of decoding at the meta-level but would reduce the capability of our system to make inferences about these functions. This distinction will allow us to understand clearly certain advantages of the embedded languages technique for meta-level inference. Not all architectures for meta-level reasoning possess the two properties mentioned above. In what follows we shall show how the embedded languages technique yields one architecture (called a bilingual pure MLR system) that does.


3 REASONING: INFERENCE AND DECODING

The reasons for using embedded languages for reasoning, whether meta-level or not, are perhaps easiest to understand if we distinguish between two kinds of reasoning-inference and decoding.

There is more than one sort of way to extract information from a body of data. Communication theorists (e.g., [1, 2, 28]) recognize broadly two theories for how a message may be extracted from a signal (or, in information systems parlance, for how information may be extracted from data): decoding theories and inferential theories. This distinction is apt for our purposes. To appreciate the distinction, consider two paradigmatic cases, one of decoding and one of inference.

Decoding. In the first case, an encrypted message has been received. We examine certain contextual information, e.g., today's date, and we select a particular code book to use in decoding the message. The code book, plus knowledge of a certain algorithm, allow us to decode the message and to produce another message that is in plain text and that is meaningful to us. For present purposes, there are three things to note about this familiar procedure.

1. The decoding process is deterministic. We selected the code book according to a fixed rule, and we applied the algorithm to the code book and message, producing the output in a completely mechanical way (without any further choices on our part). We simply followed the rules, and there is always at most one rule to follow.

2. The decoding process is finite. The decoding procedure is an algorithm; it necessarily stops. At some point, there simply is nothing left to do, and the job is done. Decoding is not an open-ended task.

3. The decoding process is indefeasible, that is to say, new information cannot undo, or defeat, our results. This is simply a consequence of the fact that the message, the context, the code book, and the algorithm completely determine the output of the process.

Inference. Consider, on the other hand, a paradigmatic case of inference. We are presented with a collection of assertions-e.g., some axioms or even a book we have read-and directed to make inferences. What can we conclude from what we have been given, plus our knowledge of logic, mathematics and the present context? As before, there are three things to note about this familiar procedure.

32 CHAPTER 2

1. The inferential process is nondeterminilltic. There are very many inferences we could draw. (Think, for example, of all the theorems that follow from Euclid's axioms in geometry.) In making inferences we must, of course, follow the appropriate rules (say in logic or mathematics), but at any given time there are many rules that apply and we must choose among them.

2. The inferential process is infinite. The application of any particular rule of inference can be an algorithm, but inference in general is not. If, for example, we can conclude P, then for any Q we can also conclude P V Q, and so on ad infinitum. At no point is there simply nothing left to do. The job of inference is never done, although we may of course choose to stop at any time.

3. The inferential process is defeasible, that is to say it is often the case that new information could defeat our results. We may, for example, read a book and come to a certain conclusion on a matter of public policy. Later, we acquire further information and we change our conclusion. This is a commonplace sort of event. New information undermines, or is in tension with, previously-drawn conclusions, and we must decide how to handle it. There is even a sense in which mathematical and logical systems are defeasible. Certainly these systems are marked by the fact that their rules of inference (e.g., modus ponens in logic and za+b = za . zb in algebra) have no exceptions and are not defeasible. Still, the process of deriving a conclusion may involve much defeasible inference in the sense that we try different rules for transforming formulas until we succeed in getting the needed result. The application of these rules can be a trial-and-error process. We try certain transformations until we reach a dead end, and then we try something else. Thus, while the transformation rule is not defeasible, our hypothesis that it is relevant and useful for the purpose at hand may well be defeasible. Further, we note that not only is this defeasible-trial-and-error process a reasonable description of how people solve certain types of mathematical problems, it is in fact an accurate description of how certain computer programs successfully attack these problems (e.g., [27]).

Both decoding and inference are procedures by which an input collection of tokens is transformed, typically with the aid of contextual information, into an output collection of tokens. If the input and output tokens are to be interpreted propositionally (semantically)-as they surely are in the case of mathematical models-then the distinction between decoding and inference can be maintained as follows: an inferential procedure is a decoding procedure that is


nondeterministic or injiniti6tic or defea6ible. Since more than one of these conditions may obtain and since there are degrees of these conditions, the overall distinction itself admits to degrees of difference.

We are now in a position to see clearly why support for inferencing is required in computer-aided modeling systems. We want inference, particularly metalevel inference, not because we merely want to extract information from a data base, but because our needs (at times) call for nondeterministic, infinitistic, or defeasible extraction of information, and this, at the very least, requires higher-level control of the reasoning process. Faced with nondeterminism, our systems need to choose which procedures to execute; faced with an infinity or lengthy series of operations, our systems need to employ search strategies and stopping criteria; faced with competing indicators for conflicting options, our systems need to choose an appropriate course of action.

In the sequel (§6) we discuss a series of examples of meta-level reasoning (both decoding and inference) in model management. In the interim, it may be useful to think of a meta-level inferencing (as opposed to decoding) task in model management, as any task you would be seriously tempted to implement or support with an expert system. Model formulation is surely an example of this sort and in fact it has been treated from the perspective of defeasible reasoning [3].

4 EMBEDDED LANGUAGES

A small framework will be useful in explaining the main idea behind the embedded languages technique. Consider three related languages, called L!, L 1, and Li. Li is the embedding language; it is completely formalized. First-order logic (FOL) [24] is an example of such a language. L1 is the embedded language. It, too, is completely formalized and has a full interpretation as an independent (of Li) language. The purpose of an L1 language is (normally) to partially formalize and represent the target language, L1, which is normally not fully formalized, but typically contains significant natural language elements.

There is something paradoxical about the idea of embedding one language, L 1, within another language, Li. It is our intention that formulas-either in L1

34 CHAPTER 2

or in L1-are to be interpreted propositionally.4 These formulas have truth values. It is also our intention that L1 be a language of first-order logic (FOL), yet on any straightforward view, first-order logic does not permit predication applied to buth-bearing formulas. For example, F(a)-where a stands for "Bob" and F(z) stands for "z is tall"-is a well-formed formula in FOL, with a an individual constant and F a predicate of arity 1. But, G(F(a)), where G is any predicate at all, is not a legal expression in FOL. How then is this sort of embedding to be done?

4.1 Some Definitions

In the embedded languages technique we address this problem in the following way. We will have two separate languages, Lj. and L1, with their own constants, variables, logical connectives, functions, predicates, and inference procedures. But we will relate them by defining a collection of axioms in L t that in effect will provide an alternate interpretation, in LT, for all objects and expressions in Lj.. In particular, the inference procedure for Lj. will be represented as a collection of L t formulas, and we will interpret L j. formulas as terms in LT. That will permit us to make statements in LT about Lj. formulas. There are four key concepts in making this approach work.

First, an embedding (of Lj. in LT) is a triple, (I,:F,f:::..), where:

1. I, called the image function, uniquely maps all expressions (terms and formulas) in Lj. into terms in Lt. We require I be invertible, i.e., that I- 1(I(<p)) = <p, for all expressions, <p, in Lj..

2. :F, called the translation function, uniquely maps the images of all formulas in Lj. (which are terms in LT) into formulas in L1. We require that :F be invertible, i.e., that :F- 1(:F(I(<p))) =I(<p).

3. f:::.. is a (possibly empty) collection of LT formulas for representing the rules of inference and transformation of L J. •

Second, a collection of formulas, ~, from Lj. is embedded as a collection of formulas, IP, in LT for a particular embedding, £ if IP comes from ~ by applying

4 Expressions in a first-order language are of two broad categories: tCI'Illli and fonnulas. Tenns denote objects in the universe of discourse whereas fonnulas, when fully instantiated, have a truth value. For further details, any text on first-order logic (e.g., [24]) should help.


the embedding, £ to ~. More formally, we require that if £ = (I,:F, .6.), then

'l1 = .6. U U :F(I(¢)). q,Eif!

As our third key concept associated with the embedded languages idea, we say that an embedding is correct if, when ~ is embedded as 'l1, then what can be derived in L i from 'l1 is "the same as" what can be derived from 'l1 in L!. More formally, we say that an embedding is correct if, for all r, ¢ in the L r embedded set, ~, if

then r f-Lr ¢.

(The symbol f- represents logical entailment; f-L! indicates logical entailment

in Lr.)

Fourth, and finally, we say that an embedding is complete (for a set of Lr sentences, ~) if, for all r, ¢ in the L! embedded set, 'P, if

then .6., :F(I(r)) f- Li :F(I(¢)).

Note that correctness ofthe embedding is defined only with references to those formulas of L i that are translations of some L r formulas. Therefore, L i can be correct and complete with respect to Lr, yet more powerful than it.

4.2 Meta-level Inference with Embedded Languages

There is a natural fit between embedded languages and meta-level inference. Expressions in an Lr language constitute, in effect, statements in an objectlevel knowledge base. The "nonlogical" axioms (in Li) constitute an objectlevel interpreter for expressions in the object-level knowledge base. Both the statements in the object-level knowledge base (the sentence logic axioms) and the statements in the object-level interpreter (the "nonlogical" axioms) are embedded in the Li language. In effect, they become constituents of statements in a meta-level knowledge base, which statements are then interpreted by a metalevel interpreter. Thus, the correspondence between meta-level architectures

36 CHAPTER 2

and embedded languages is apt. Embedded languages offers us a principled, modular lingua franca, a way to generalize meta-level inference across multiple L 1 languages (e.g., statements in algebra and statements about algebraic models). Not only may meta-level inference be used to control the application of object-level interpreters, but the object-level knowledge bases may span multiple (L1) languages. Thus, not only may we embed and use multiple L1 languages in a common system, but we may also reason at the meta-level about the object-level interpreters for these disparate languages. All this works because the embedding effects a translation into the common Li language. In a sense we have our cake (one common language) and we eat it, too (we also have multiple, very different languages).

In addition the embedded languages technique yields a particularly useful architecture, pure and bilingual, for meta-level inference (see [23] for discussion of this point). We discuss these and other properties below.

1. The technique yields a bilingual MLR system. The object level, represented in an L1 language, is distinct from the meta-level, represented in Li. The image and translation functions relate the two distinct languages. Object-level variables range over object-level terms; meta-level variables range over meta-level terms. Object-level constants and variables map into meta-level constants. Object-level functions, predicates, and logical connectives map into meta-level functions. Object-level terms and formulas map into meta-level terms, and there is a meta-level formula corresponding to each object-level formula. Finally, the object-level interpreter maps into L i formulas.

2. The technique yields a pure MLR system.5 There is an object-level interpreter, bu t it is formalized as a set of axioms in L i . Therefore, any L1 inference is simulated within Lt by combining the inference rules of Lt with these axioms. The object-level, as well as the inference rules and strategy at the object-level, can be made entirely declarative.

3. The technique yields a complete (with respect to the object-level) MLR system. With a properly defined I and F, every term, formula and infer-

6 There are different types of MLR systems. Briefly, in pure MLR systems [23], the metalevel interpreter can entirely simulate the object-level interpreter; thus the computation is performed mostly at the meta level. At the other extreme, in object-level inference systems, the object-level interpreter executes both object-level expressions as well as the meta-level expressions that govern its behavior; there is no separate meta-level interpreter. Jackson et &1. [23] discuss several examples of these two extremes as well ... hybrid systems, and offer several re ... ons for preferring pure MLR systems. We are broadly in agreement.


ence procedure of L! can be embedded in LT. Thus, every L! inference can be simulated in LT.

4. The technique yields a correct representation in the MLR system.

5 COMPUTER-AIDED MODELING

How does all of this relate to computer-aided modeling environments? This is perhaps best understood by relating the embedded languages technique to the executable modeling languages (EML) [12] approach for developing modeling systems. The EML approach has proved useful for developing model representations, and for manipulating these representations in ways that are useful for the purposes of modeling environments (well-known EMLs include GAMS [8], AMPL [13], MODLER [20]), and SML [16, 17]). In terms of embedded languages, the idea behind the (standard) EML approach is to develop an appropriate L!, i.e., a fully formal language that is itself a model of the language (L!, a combination of natural language and mathematics) we would otherwise use to represent and reason about mathematical models. The objective in our embedded languages approach is to embed one or more L~ languages in an LT language, which is itself executable. We have three motives for doing so (see [4] for further discussion).

First, L T can be used to represent-both formally and computationallyinformation about expressions (formulas and terms) in an L~ language, including, e.g., units of measurement for variables, accuracy of data, and rules of formation of expressions. Thus, the embedded languages technique allows us to represent in a rigorous, completely flexible, and general manner, a rich variety of qualitative knowledge abou.t expressions in a modeling (or L!) language. Such information is required to reason at the meta-level about these L! expressions; it cannot (or cannot easily) be expressed in the language, and consequently is largely absent in existing executable modeling languages. This knowledge can be used in defining inferences to support several aspects of modeling (e.g., see [5, 6] for illustrations in model validation).

Second, this embedding allows us to make an explicit, declarative statement of the control strategy for executing meta-level functions; i.e., it allows us to make inferences at the meta-level. Again, let us contrast this with the EML approach. Some of the manipulations (e.g., adding two variables) that EML systems perform are object-level functions, but there are others (e.g., determining which solver to call to solve a particular model instance) that

38 CHAPTER 2

are naturally thought of as meta-level functions. The control knowledge for these meta-level functions is typically embedded in the compiler, and these functions are executed by decoding this knowledge. Thus, all modeling systems (particularly those that can manipulate multiple models and data scenarios) do some amount of meta-level reasoning; what we wish to provide a general architecture for doing meta-level inference.

Third, we recognize that research in computer-aided modeling has produced several particular modeling languages. These languages have competing strengths and weaknesses, and are often aimed at different purposes, e.g., some are strong for mathematical programming models in general, others work only for linear programming models, still others support only simulation models. Instead of aiming to design a universal modeling language, we have sought to develop techniques for exploiting particular modeling languages (particular L! languages) and for combining them in a common modeling system. Thus, multiple languages can be embedded and their interpreters reasoned about in L i , an Li expression can combine and relate elements from multiple L! languages (say, a data modeling language and an algebraic modeling language-see [7]), and axioms can be written in Li for translating expressions in one embedded language into another.

In the context of applying this technique to modeling systems the embedded language might be an EML such as AMPL [13) or SML [16, 17), with the target language being the (natural and man-made) language we use to represent, discuss, and reason about mathematical programming models. The embedded language (or languages, since there can be more than one) models (model) parts ofthe target language(s). In terms of our functional presentation of MLR, level o objects include data, models, variables, units of measurement (e.g., kg); level 1 objects (functions) include arithmetic functions, algorithms/solvers, the dim function (determines unit of measurement of a given modeling variable); and level 2 objects include meta-level functions in the three categories discussed in §2.1. We have applied embedded languages and meta-level inference in building a computer-aided modeling system TEFAj the details of how we did so and what we gained are described in [4). An overview of a version of our computer-aided modeling system can be found in [25).


6 DISCUSSION AND EXAMPLES

We conclude our essay on the application of embedded languages to computeraided modeling with a set of examples that support our motivating claims about embedded languages. Our examples are in two categories: a) examples that provide evidence that embedded languages is useful in building modeling languages and systems, and b) examples that illustrate inference and decoding in modeling systems, and which show the usefulness of embedded languages in enhancing the functionality of modeling systems. We begin with two examples in the first category.

1. Geoffrion's structured modeling framework [15] provides a powerful notation-independent methodology for developing a wide variety of models, and several modeling languages based on this framework have been proposed. In [10], Chari and Krishnan describe a logical reconstruction of structured modeling, and develop a logic-based language, L5M, for structured modeling. They apply the embedded languages technique to separate out the object-level and meta-level (which they model in Li) aspects of structured modeling. They also describe how additional meta-level information can be represented in the Li language. The ability to do meta-level reasoning in this reconstruction of structured modeling is facilitated by the explicit use of the embedded languages technique in the development of L5M.

2. While EMLs are good at capturing mathematical relationships among modeling variables, several data modeling languages are excellent at capturing qualitative relationships among modeling elements. Bhargava et al. [7] make a strong argument in support of the integration of data and algebraic modeling languages in mathematical modeling systems. They describe how such an integration can be achieved with the embedded languages technique in a way that is consistent with the use of existing data and algebraic languages. They also discuss how meta-level functions implemented in Li provide support for model rationale, model formulation, and data integrity.

With our second category of examples, we illustrate meta-level decoding and inference, finally leading up to a paradigmatic case of meta-level inference in computer-aided modeling.

1. Meta-level decoding.

40 CHAPTER 2

(a) A very simple example of a second-order question is: Given an endogenous variable in some model, which other models use that variable exogenously? This can easily be answered with meta-level decoding; indeed that is the case in TEFA [4], a modeling system built using the embedded languages technique. We are unaware of any other modeling system able to answer this question.

(b) Our second example involves automatically checking model expressions for dimensional consistency. Again, this feature is absent in most modeling systems which neither have the information nor the functions required to verify dimensional consistency. A language for dimensional analysis, Ldim , is described in [6], and the embedded languages technique is used to implement this feature as a function in TEFA's embedding language. Similarly, automatic checking of other validity information, in the context of model integration, is described in [5].

2. Meta-level inference.

(a) In developing our concept of embedded languages and in implementing TEFA we were especially influenced by the impressive performance reported for PRESS [27], a Prolog Equation Solving System, which makes extensive use of meta-level inferencing. The following passage eloquently describes the flow of control (at the meta-level) for PRESS, and provides a strong intuitive sense for why such techniques work well, i.e., for why they can greatly reduce the search space.

We ... use the term heuristic waterfall to describe the control flow of PRESS. The waterfall consists of a number of methods [object-level interpreters]. At the top of the waterfall, PRESS checks to see if the equation is already solved. If it is, PRESS returns the answer and the equation is removed from the waterfall. Otherwise, the equation is passed over the waterfall. On the way down, the PRESS methods try to transform the equation. If a method succeeds in transforming the equation, the new equation is sent to the top of the waterfall and the process is repeated. If a method such as Change of Unknown creates more than one equation, all such equations are sent to the top. If a method fails to transform the equation, the equation falls to the next level where the next method is tried. The process terminates with success when there are no more equations to be processed. If an equation falls right through the waterfall, i.e. no method can transform the equation, PRESS backtracks. . .. Finally, if

On Embedded Languages

all possibilities have been tried, and equations still remain in the waterfall, the process terminates with failure, i.e. PRESS fails to solve the equation. PRESS tries the methods in the order Isolation, Factorization, Polynomial Methods, Change of Unknown, Collection, Attraction, Trigonometric Methods, Logarithmic Methods, Homogenization and Nasty Function Methods. [27, pages 29-30]

41

We see illustrated in this passage the use of meta-level inference for defeasible reasoning about equation solving. Further, as reported by Silver [27], although PRESS's performance was impressive, it was improved substantially by the addition of a learning module, LP (learning PRESS). The meta-level inferential architecture was materially usefully in facilitating the adding on of LP.

(b) There is a significant literature (see e.g.,[19], [21], [22], and the other references mentioned below) on the diagnosis of infeasibilities in mathematical programming models. In general, infeasibility diagnosis is a non-trivial task, and automated assistance for it is certainly desirable in a computer-aided modeling system. In the case of linear programming models, Gleeson and Ryan [18] describe an efficient method for identifying a minimally infea6ible subsylltem (also called i7Tedu.cibly inconsistent system or liS [30]) of constraints, "a subsystem of Az ~ b that is infeasible, but which could be made feasible by dropping any equality from it." It is useful to identify such systems (rather than just any infeasible sets of constraints) since these systems can be used to determine the smallest number of constraints that must be dropped (or modified) to result in a feasible model.

Chinneck and Dravnieks [11] improve previous methods for finding IISs, and describe three filtering techniques for use in identifying IISs. In terms of the embedded languages framework, each of these three would be functions in an L t language. The filtering techniques (analogous to the methods in PRESS) have different properties, and therefore the authors suggest that a judicious combination would work best. The "deletion filter" guarantees identification of all functional (i.e., other than the non-negativity) constraint in exactly one lIS but is computationally expensive. The "elastic filter" efficiently eliminates non-lIS functional constraints from large models. The "sensitivity filter" can speed up the identification of lIS constraints, but is not guaranteed to find a full lIS. Due to these properties, Chinneck and Dravnieks suggest integration of these techniques, and that the final filtering process would depend "on whether the goal is to iden-

42 CHAPTER 2

tify a single lIS as quickly as possible, or to identify as many IISs at reasonable cost." It is not within the scope of this paper to describe how a strategy for integrating the filtering methods would be implemented under the embedded languages framework, or even to describe the strategy itself. For example, elastic filtering efficiently eliminates non-lIS constraints, but it may not be wise to use it if the constraint set is already small enough. In any case, it should be evident that it would be simple and useful to define the strategy via declarations in a logical L T language rather than to program it as a first-order function or to embed this meta-level knowledge in a compiler. This control strategy would control the execution of the filtering process, and would determine which filter to use at each stage ofthe filtering process. Further, as was demonstrated with PRESS, abstracting the control strategy in this way facilitates the addition of learning modules.

In summary, we find that meta-level reasoning (both decoding and inference) is a clear and operationalizable concept, one that provides a number of important advantages for computer-aided modeling systems. Further, we have described an architectural approach, called embedded languages, that promises to be a general and powerful technique for implementing MLR systems.

Acknowledgements

Hemant Bhargava's work was partly funded under u.S. Coast Guard grant USCG Z 51100-2-E00735.

REFERENCES

[1] A. Akmajian, R.A. Demers and R.M. Harnish, LinguilticI: An Introduction to Language and Communication, 2nd ed., The MIT Press, Cambridge, Massachusetts, 1984.

[2] K. Bach and R.M. Harnish, Linguiltic Communication and Speech Acu, The MIT Press, Cambridge, Massachusetts, 1979.

[3] H.K. Bhargava and R. Krishnan, "Reasoning with Assumptions, Defeasibly, in Model Formulation," Proc. 25th Hawaii International Conference


on Syltem Science" Vol. 3, Jay F. Nunamaker, Jr., ed., IEEE Computer Society Press, Los Alamitos, CA, (January 1992), pp. 407-414.

[4] H.K. Bhargava and S.O. Kimbrough, "Model Management: An Embedded Languages Approach," Deci.ion Support Syltems 10:3, pp. 277-300, 1993.

[5] H.K. Bhargava, S.O. Kimbrough and R. Krishnan, "Unique Names Violations, a Problem for Model Integration or You Say Tomato, I Say Tomahto," ORSA Joumalon Computing 3: 2, pp. 107-120, Spring 1991.

[6] H.K. Bhargava, "Dimensional Analysis in Mathematical Modeling Systems: A Simple Numerical Method," ORSA Joumalon Computing 5: 1, 1993.

[7] H.K. Bhargava, R. Krishnan and S. Mukherjee, "On the Integration of Algebraic and Data Modeling Languages," Annaz., 0/ OR 38, pp. 69-95, 1992.

[8] J. Bisschop and A. Meeraus, "On the Development of a General Algebraic Modeling Language," Mathematical Programming Study, Vol. 20, pp. 1-29, 1982.

[9] P.B. Brazdil and K. Konolige, eds., Machine Leaming, Meta-Rea.oning and Logic" Kluwer Academic Publishers, Norwell, Massachusetts, 1990.

[10] S. Chari and R. Krishnan, "Towards a Logical Reconstruction of Structured Modeling," Decision Support Syltems, forthcoming, 1993.

[11] J.W. Chinneck and E.W. Dravnieks, "Locating Minimal Infeasible Constraint Sets in Linear Programs," ORSA Joumalon Computing 3:4, pp. 157-168, 1991.

[12] R. Fourer, "Modeling Languages versus Matrix Generators for Linear Programming," ACM Tran,action, on Mathematical Software 9:2, pp. 143-183, 1983.

[13] R. Fourer, D. Gay and B.W. Kernighan, "A Mathematical Programming Language," Management Science 36:5, pp. 519-554, May 1990.

[14] M. Genesereth and N. Nilsson, Logical Foundation, of Artificial Intelligence, Morgan Kaufman Publishers, New York, 1987.

[15] A.M. Geoffrion, "An Introduction to Structured Modeling," Management Science 33:5, pp. 547-588, 1987.

[16] A.M. Geoffrion, 'The SML Language for Structured Modeling: Levels 1 and 2" Operations Re.earch 40:1, pp. 38-57, 1992.

44 CHAPTER 2

[17] A.M. Geoffrion, "The SML Language for Structured Modeling: Levels 3 and 4" OperatioM RelJeGrch 40:1, pp. 58-75, 1992.

[18] J. Gleeson and J. Ryan, "Identifying Minimally Infeasible Subsystems of Inequalities," ORSA Joumal on Computing 2:1, pp. 61-63, 1991.

[19] F. Glover and H.J. Greenberg, "Logical Testing for Rule-Base Management," AnnallJ of OperationlJ RelJearch 12, pp. 199-215, 1988.

[20] H.J. Greenberg, "MODLER: Modeling by Object-driven Linear Elemental Relations," AnnallJ of OperationlJ Ruearch, Vol. 38, pp. 239-280, 1993.

[21] Greenberg, Harvey J., "Diagnosing Infeasibility in Min-Cost Network Flow Problems Part I: Dual Infeasibility," IMA Joumal of MaihematiclJ in Management 1, pp. 99-109, 1987.

[22] Greenberg, Harvey J., "Diagnosing Infeasibility in Min-Cost Network Flow Problems Part II: Primal Infeasibility," IMA Journal of MathematiclJ in Management 4, pp. 39-50, 1988.

[23] P. Jackson, H. Reichgelt and F. van Harmelen, Logic-BalJed Knowledge ReprelJentation, The MIT Press, Cambridge, Massachusetts, 1989.

[24] R. Jeffrey, Formal Logic: IttJ Scope and Limit" Third. Edition, McGrawHill, Inc., New York, New York, 1991.

[25] S.O. Kimbrough, C.W. Pritchett, M.P. Bieber, and H.K. Bhargava, "The Coast Guard's KSS Project," InterfacelJ, 20, no. 6 (November-December 1990), pp. 5-16. (Shortened version of: "An Overview of the Coast Guard's KSS Project: DSS Concepts and Technology," (S.O. Kimbrough, C.W. Pritchett, M.P. Bieber and H.K. Bhargava), TranlJactioM of DSS-90, Tenth International Conference on DeciIJion Support SYlJtemlJ, L. Volonino, ed., Boston, Massachusetts, May 21-23, 1990, pp. 63-77.)

[26] P. Maes and D. Nardi, eds., Meta-Level ArchitecturelJ and Reflection, North-Holland, New York, New York, 1988.

[27] B. Silver Meta-Level Inference, North-Holland, Amsterdam, Holland, 1986.

[28] D. Sperber and D. Wilson, Relevance: Communication and Cognition, Harvard University Press, Cambridge, Massachusetts, 1988.

[29] L. Sterling and E. Shapiro, The Art of Prolog: Advanced Programming Technique" The MIT Press, Cambridge, Massachusetts, 1986.

[30] J.N .M. van Loon, "Irreducibly Inconsistent Systems," European Journal of OperationlJ RelJearch 8, pp. 283-288, 1981.

3 MAPPING TASKS TO PROCESSORS

TO MINIMIZE COMMUNICATION TIME

IN A MULTIPROCESSOR SYSTEM

ABSTRACT

J aishankar Chakrapani and Jadranka Skorin-Kapov*

Environmental SylteTn6 Research Institute, Inc. 380 New York Street

Redlands, CA 92373-8100

"Harriman School for Management and Policy State University of New York at Stony Brook

Stony Broole, NY 11794

The problem of mapping tasks to processors in a multi-proce88or system in order to minimize communication time is addressed. The following assumptions about the problem are made. Communication among tasks follows a static patternj all processors are identicalj and all tasks are similar. We formulate the problem as a quadratic assignment problem (QAP). Two significant features of such a QAP are its large size and sparseness.

A heuristic algorithm based on tabu search is developed and implemented in parallel on the connection machine CM-2. In our parallel implementation two levels of parallelism are employed. First, the candidate tasks to be swapped are identified in parallel. Second, more than one pair of tasks are swapped in a single iteration. The computed effect of a single swapping is based on the assumption that no other swapping takes place in the current iteration. When performing multiple swaps, the cumulative effect of the swapping may not correspond to the sum of the individual effects. We show how this could lead to an inferior performance and illustrate the elements of our heuristic that makes it robust under these circumstances. Computations are performed on data of size up to 64000 tasks.

45

46 CHA PTER 3

1 INTRODUCTION

A typical parallel computation in a fine grain SIMD machine involves decomposing the application into several identical tasks that can be executed independently by different processors. It is often the case that these tasks need to exchange information among themselves. As a result, the parallel computation steps are interspersed with some inter-processor communication steps. We introduce the notion of task graph which describes the tasks that need to exchange information, and distinguish it from the communication graph which identifies the processors that need to communicate. The task graph is fixed with respect to a decomposition of an application, whereas the communication graph is a function of the mapping of the tasks to the processors.

The mapping problem is one of allocating tasks to processors in order to minimize the total communication time. In stating the problem we assume that the processors are identical and the tasks are similar. Therefore, any task can be mapped to any processor. We also assume that the number of processors equals the number of tasks. Note that this is not a restrictive assumption. If there are more processors, a set of "dummy" tasks can be added. Alternately, the case where there are more tasks could be treated as if virtually there are enough processors so that each task can be mapped to a single processor. We, therefore, distinguish between the physical nodes of the multiprocessor system and the processors themselves. Each node may contain (virtually or otherwise) more than one processor. Processors in the same node can communicate among themselves with minimal time spent in communication (assumed to be zero). Communication time for processors in different nodes is dependent on the architecture of the system. We approximate the communication time between two processors by the number of links that a message has to travel, and the total communication time by the sum of the individual communication times. This is only an approximation to the actual communication time because, typically, communication is done in parallel and other factors such as congestion affect the actual communication time. A true measure of the communication time would be the actual number of machine cycles it takes to complete the communication. However it is difficult to develop an objective function based on this measure, and approximations are used.

The problem as stated above can be thought of as a graph partitioning problem with weights between partitions. The nodes of the parallel system are the partitions with the weights being the inter-node link distances. The task graph is the graph that is to be partitioned to minimize the sum of the product of edges that go between partitions, and the respective partition weights. The problem

Mapping Tasks to Processors 47

can also be formulated as a quadratic assignment problem [3] where the interprocessor link distances constitute the matrix D, and the task graph constitutes the matrix F. Formally, the problem is to min1rED L:::l L:;:1 dtj!1r(i)1rU), where IT is the set of all permutations. Note that such a formulation is general enough to cover different architectures, different communication patterns, and task graph with edge weights.

The task graph is assumed to be static in the sense that the tasks that need to communicate do not change with time. The class of such applications include finite element analysis [4], and neural network simulations [10]. Since this would be a preprocessing routine to an application, emphasis should be given to fast algorithms. Traditional QAP solving heuristics do not take into account any special structure in the problem. Here it can be reasonably assumed that the task graph is relatively sparse. (If all tasks need to communicate to all other tasks, any mapping would give the same objective function value.) Therefore, a specialized algorithm is to be developed to solve this problem. Also since the application would run on a parallel machine, and the task graph may not be known until run time, parallel algorithms are preferable to reduce sequential bottlenecks.

The mapping problem described above has been a subject of numerous computational studies, both under regular and irregular communication patterns. An example of a regular communication pattern is when the task graph is a rectangular mesh. In such cases, the optimal mapping can be computed (see for ego [9l). This paper deals with irregular communication patterns where no assumptions are made about the "regularity" of the task graph. We refer the reader to a Ph.D. thesis by Hammond [7] for an exhaustive survey on prior work on the mapping problem. In this paper, we refer to the work by Dahl [2] and Hammond [7] both of whom use an objective function identical to ours. For the mapping problem, Dahl [2) has developed a heuristic based on simulated annealing. He has also developed a communications compiler that can be used to schedule the actual communication once a mapping has been obtained. Hammond [7] has developed a local search heuristic called CPE which provides its best results when given an initial solution constructed using recursive spectral bi-partitioning (RSB) [12]-& technique to partition graphs using eigenvectors of the Laplacian matrix associated with the task graph.

We develop a heuristic algorithm based on tabu search [5, 6] for the mapping problem. The heuristic is based on iteratively selecting, greedily, a pair of tasks and swapping the processors to which they are mapped to. The algorithm lends itself to very efficient sequential and parallel implementations. In our parallel implementation two levels of parallelism are employed. First, the candidate

48 CHAPTER 3

tasks to be swapped are identified in parallel. Second, more than one pair of tasks are swapped in a single iteration. The computed effect of a single swapping is based on the assumption that no other swapping takes place at the current iteration. When performing multiple swaps, the cumulative effect of the swapping may not correspond to the sum of the individual effects. We show how this could lead to an inferior performance, and illustrate the elements of our heuristic that makes it robust under these circumstances.

2 TABU SEARCH FOR THE MAPPING PROBLEM

A solution procedure for the mapping problem should be very fast, as it is really a preprocessing step to improve the running time of an application. Also, it should be implemented on the parallel machine on which the application code is to be executed. In addition to reducing the sequential bottleneck, this would increase the scope of the preprocessing to include quasi-dynamic communication patterns, where the communication pattern varies "slowly" with time.

In practical applications, the size of the task graph tends to be large. It is not uncommon in finite element applications to have finely refined meshes with tens of thousands of elements. Thus, the resulting QAP is very large in size and while applying tabu search, the successful pairwise exchange move [14, 1, 11] with a neighborhood of O(n2 ) becomes computationally too expensive to evaluate.

2.1 The Connection Machine CM-2

The basic architecture of the CM-2 is a hypercube [8]. A CM-2 with p processors has p/16 chips, and each chip contains 16 processors and a router to handle communication. These chips are arranged in a 10gp-4 dimensional hypercube, and each edge of the hypercube constitutes a single uni-directional data path for communication. In an alternate sprint chip communication model, there are 32 processors in each sprint chip. The sprint chips are organized as a log p - 5 dimensional hypercube, and each edge of the hypercube corresponds to one bidirectional data path for communication. In each model, the processors within each chip have their own local memory, and can communicate with each other with zero cost. We use the sprint chip model in our study.


The CM-2 system also supports virtual processing when the required number of processors for an application is greater than the number of processors available. Each processor/memory unit can be, virtually, sliced into many units providing the application with more than the physical number of processors. The ratio of the number of virtual processors to the number of physical processors is called the VP ratio.

In the context of the QAP formulation of the mapping problem, the matrix D can now be defined for the hypercube architecture. Each node in the hypercube (corresponding to each sprint chip) has an address in the range (0, (logp -5) - 1). The addresses of all the processors in the same node are identical. The D matrix entry for two processors is the hamming distance between their respective addresses i.e., the number of bits in which their addresses dift'er. Note that for processors in the same node this corresponds to a distance of zero, and for processors in different nodes this corresponds to the shortest number of links a message has to travel to reach one node from the other. The F matrix is the adjacency matrix of the task graph.

2.2 Tabu Search Algorithms

A Sequential Algorithm

The pairwise exchange move for the tabu search algorithm would normally involve any two processors exchanging their tasks. We restrict the scope of the pairwise exchange in order to reduce the neighborhood size, and refer to this move as a Iwap. In a swap move two processors can exchange their tasks only if their addresses dift'er by exactly one bit. This reduces the neighborhood to a size of O(nlogn), where n is the number of (virtual) processors. For example, for the case of a 3 dimensional hypercube, a processor with address 0 (000 in binary representation) can swap tasks with a processor whose address is either 1 (001), 2 (010), or 4 (100).

The inter-processor distances (defining the distance matrix) are functions of their addresses, and it is possible to reformulate the QAP in terms of the processor addresses. Let a(i) refer to the address, and a.(i) refer to the b-th bit of the address of processor i. Now the objective function can be rewritten as

50 CHAPTER 3

where dim denotes the dimension of the hypercube and E9 denotes the binary exclusive or operation.

A simple tabu search algorithm can be developed based on the swap moves as follows.

Algorithm seq_alg:

step 0) set initial mapping and initialize tabu list to empty. step 1) for b = 1 to dim do

evaluate best pair of procelllloril with. addrellsell differing in bit b to swap tallies Ilubject to tabu relltrictions

end for dep 2) evaluate bed pair of procell.or. to .wap talJles from dep 1 Iltep 3) swap tallies correlJponding to th.e bed processors, and update tabu list step -4) if some ending criteria is reached IJtop

else go to dep 1 end if

Tabu list is maintained circularly by keeping track of the tasks swapping addresses and the corresponding bits in which the addresses change. In other words, two tasks changing their addresses in bit b are not allowed to change their addresses in bit b again for tabu sise number of iterations.

Implementation Details and a Parallel Algorithm

Implementing step 1 of the algorithm involves finding, for each bit b, the best pair of processors (whose addresses differ in bit b) to swap their tasks. Note that once the bit b is fixed, the addresses of the processors that can swap tasks are fixed. For example, for a 3 dimensional hypercube, there are processors with addresses in the range (0,7) and if b = 1 (the lower order bit) processors with addresses 0 and 1 can swap their tasks. Similarly, processors with addresses 2 and 3; 4 and 5; 6 and 7 can swap their tasks. However, there are 32*VP ratio processors that have the same address. In the worst case, finding the best processors with addresses 0 and 1 to swap tasks could involve checking every possible pair of the 32*VP ratio processors with address 0 and the 32*VP ratio processors with address 1.


Let al and a2 be two addresses that differ only in the b-th bit. Denote by valb(ad, the best move value due to a processor i with address a(i) = al

swapping its task with some processor with address a2. Similarly, let valb(a2) be the best move value due to a processor j with address a(j) = a2 swapping its task with some processor with address a2' The move value due to processors i and j swapping tasks is given by

(1)

The third (resp., fourth) term in the expression is present because, valb(i) (resp., valb(j)) does not take into account the fact that the task '/I'(i) (resp., '/1'(;)) may have to communicate with the task '/1'(;) (resp., '/I'(i». Due to the sparseness ofthe task graph, the probability that tasks '/I'(i) and '/1'(;) communicate is low and therefore, most of the time processors i and j would be the best processors with addresses al and ~ to swap their tasks. In effect, the third and the fourth term in equation (1) can be ignored, simplifying the implementation of step 1 in two ways. First, the complexity of performing step 1 is reduced since finding the best pair of processors with addresses at and a2 involves just finding the best processors with the best move value individually in each address. Second, specifically in parallel implementations, an extra communication step that would be involved in finding out if there is an edge in the task graph corresponding to the tasks mapped to two processors is skipped. An identical scheme is used by Dahl in his work using simulated annealing for the mapping problem, and he reports that the annealing scheme works quite well [2, page 760]. Note, however, that ignoring some terms in equation (1) results with only approximate calculation of the move values. In the computational experiments we investigate the effect of such approximations when using a deterministic tabu search heuristic, and ways to overcome their negative impact.

When implementing our heuristic in parallel, it is possible to perform multiple moves at the same iteration. Since the task graph is sparse, swapping of a single pair of tasks will affect very few other move values, and most of the previously improving moves would still he improving. Consequently, all improving swaps can be made in a single iteration. A parallel tabu search algorithm performing multiple moves in a single iteration is given below.

step 0) set initial mapping and initialize tabu list to empty step 1) for b = 1 to dim do

evaluate the best procellllor to swap tasks with some other

52 CHAPTER 3

procellJor who,e addrell differ, only in bit b end for

ltep14) for b = 1 to dim do if there are improving ,wap'

evaluate the sum of the value, of all improving ,wap' eye evaluate the ,um of the value, of all the lealt

non-improving ,wap' end if

end for step 3) find the belt bit with the belt cumulative ,wap value,

computed in 'tep ! Itep 4) if there are improving ,wap'

perform all improving ,wap' corresponding to the belt bit computed in step 3

eye perform aillealt non-improving ,wap' corre'ponding to the belt bit computed in ,tep 3

end if 'tep 5) update tabu lilt and mapping information Itep 6) if ,ome ending criteria iI reached ,top

eye go to dep 1 end if

2.3 Preliminary Computational Results

Algorithms ,eq_alg_e and par_alg..e are implemented and tested on two problems. Note that the seq_alg_e does not compute the correct value due to a single swap and the par_alg_e in addition performs multiple swaps in a single iteration adding on another level of approximate evaluations of the move values. The results are compared with those obtained by ,eq_alg, in which the move values are computed correctly and a single swap is performed in one iteration. The results are given in Table 1. 3eU has 8192 and grid1 has 16384 tasks (including dummy tasks), and both problems were mapped to a 256 node hypercube architecture. Both ,eq.-a.lg_e and par_alg_e obtain results that are inferior to ,eq_alg.

The results produced by ,eq_alg is comparable to those produced by Dahl's simulated annealing heuristic [2] and Hammonds CPE heuristic [7]. Thus, tabu search is a competitive heuristic for the mapping problem. However, the par_alg_e produces inferior results. There are two sources of errors associated with par_alg_e. First, the values of the swaps are not correctly com-


puted. Second, the cumulative effect of performing multiple swaps does not correspond to the sum of the individual swap values. Both sources of errors are present in Dahl's simulated annealing heuristic [21, and Hammonds CPE heuristic [7]. However, our tabu search algorithm is neither probabilistic (like simulated annealing), nor has a constructed initial solution (like CPE), and lacks the robustness of both simulated annealing and CPE. The inferior performance of par_alg_e can therefore be attributed to the two sources of errors in the computation of move values. This assumption is further substantiated by the performance of 8eq_alg_e, where only one swap is performed in one iteration, and consequently the error associated with move evaluation is lesser.

Finally, the para.llel implementation takes about 30 seconds to perform 1000 iterations on a CM-2 with 16,384 processors. Comparatively, simulated annealing and CPE provide results comparable to those provided by 8eq_alg within 60 seconds. Therefore, in addition to providing inferior results, the para.llel algorithm takes too much time to be useful. There is one observation to be made in favor of par_alg_e. It initia.lly improves the objective function value very fast exploiting para.llelism by performing a lot of swaps in each iteration. However after the first few local minima have been encountered, the errors seem to take over and there is very little improvement in the objective function. The number of swaps performed in a single iteration drops to within single digits, and the algorithm performs in an inefficient and essentia.lly sequential manner. Referring to step 4 of the para.llel algorithm it can be observed that, after a local minimum is reached, the algorithm would perform only the least non-improving swaps. If there are only a few swaps with the least non-improving value, the algorithm would perform only those swaps. For a para.llel tabu search heuristic to be useful it has to (1) exhibit more robustness when performing multiple moves and (2) exploit para.llelism even further.

3 ROBUST PARALLEL TABU SEARCH

ALGORITHM

In view ofthe inferior performance of par_alg_e, we decided to investigate some other strategies. Reca.ll that when calculating the value of moves with errors, the algorithm might perform a series of moves (a.ll seemingly improving), but de facto the moves will not result in a better objective function value. Therefore, without any diversification mechanism to "pull" the search out of the "flat" neighborhood, and with the errors present in the evaluation of the value of the move, the strategy is not promising.

54 CHAPTER 3

The chief objective of diversification is to drive the search to new regions. One way of doing that is by keeping track of the explored region of the solution space by maintaining information about the solutions visitied so far (frequency based diversification). Alternately, if highly non improving moves are performed, the search would quickly leave the current solution space and diversify to new areas. Our strategy is to set a threshold and perform swaps whose move values is below the threshold providing a quick mechanism for diversification requiring no additional computations. If the threshold is set high enough, a lot of non improving moves would also be performed (along with improving moves) in every diversifying iteration. We remark that our diversification strategy is simple, fast, performs a large number of moves (even in the non-improving phase) exploiting parallelism to a greater extent, and provides means of employing various levels of diversification by simply varying the threshold. A parallel tabu search algorithm called par-.alg_r incorporating such a diversification is developed.

The algorithm alternates between two phases viz., normal and diversification. The normal phase of the algorithm is identical to par_alg_e. If there are no improvements in the objective function value for iter_nphase number of iterations, the algorithm enters the diversification phase. The diversification phase is performed for a set iter_dphase number of iterations after which the algorithm switches back to normal phase. The inputs to the algorithm are the tabu size, iter_nphase, iter_dpha,e, and the threshold value used in the diversification phase. A description of the algorithm follows.

'tep 0) ,et initial mapping and initialize tab'll. lilt to empty .et .earch pha.e to normal

,tep 1) for b = 1 to dim do evaluate the belt procellor to .wap task. with .ome other proce"or who.e addre" differs only in bit b

end for 'tep 2) for b = 1 to dim do

if search phase iI normal if there are improving swaps

evaluate the sum of the values of all improving swaps el.e evaluate the ,um of the values of all the least

non-improving swap. end if

Mapping Tasks to Processors

end if if search pha8e is diversification

evaluate the sum of the values of all swaps with values less than a threshold

end if end for

step 3) find the best bit with the best cumulative swap values computed in step !J

step 4) if search phase is normal if there are improving swaps

55

perform all improving swaps corresponding to the best bit computed in step 3

else perform all least non-improving swaps corresponding to the best bit computed in step 3

end if end if if search phase is diversification

performing all swaps with move values less than the threshold end if

step 5) update tabu list and mapping information step 6) if search phase is normal

if no improvement in objective function for iter_nphase number of iterations set search phase to diverlJification

end if end if if search phalJe is diverlJification

if iter_dphase number of diversification iterations have been performed set search phase to normal

end if end if

step 6) if 1J0me ending criteria is reached stop else go to step 1 end if

The parameters iter_nphase, iter_dpha8e, and diversification threshold control the level of diversification. The value of iter _nphase controls how quickly the algorithm diversifies; the value of iter_dphase controls the length of diversification; and the threshold controls the level of diversification. We tried various levels of diversification and the algorithm seemed to perform better as the extent of diversification increased. In other words, there was no merit (with

56 CHAPTER 3

respect to solution quality and total computational time) in either waiting for a large number of iterations before invoking diversification or in having a low threshold. Finally, we set the values as follows. Iter_nphrue was set to 2dim, resulting in both a quick diversification and a smaller number of iterations during which very few swaps are performed. The threshold was set to an arbitrarily high value. Iter J-phfUe was set to 2dim after empirical testing of a few different values. The rationale behind trying low values for this parameter is that, since the threshold is very high and allows all possible swaps during the diversification phase, too many iterations are not required for the search to diversify.

Par_alg_r with the parameter settings described above, relies heavily on diversification. Also since the performance of the algorithm improved with the extent of diversification, we incorporated a second level of diversification very similar to the first one. When there is no improvement in a longer number of iterations, the algorithm performs a larger number of diversification iterations. More specifically, the second level of diversification is invoked for 4dim iterations (with the same threshold) if there is no improvement in 20dim iterations. Note that there would have been multiple invocations of the first level of diversification before the second level is employed.

4 COMPUTATIONAL RESULTS

The computations were organized into several groups to extensively test the robustness of par _alg_r and the applicability of tabu search as a heuristic to the mapping problem. The results are compared with those obtained by simulated annealing [2] and CPE [7]. Recall that the objective function value alone will not provide a fair evaluation of the heuristic, since the original problem is to minimize the communication time. We performed all computations on the connection machine CM-2, as there was a communications compiler available to measure the total number of message cycles to perform the communication for any mapping. A message cycle measures the time it takes to move a 32-bit word across the wire connecting hypercube nodes in the sprint chip mode. (Note that since the wires in this model are bi-directional in one message cycle two 32-bit words can move across a single wire.) Also, when mapping a problem on to CM-2, the number of (virtual) processors configured for the application is required to be a power of 2. This results in the creation of dummy tasks for all the test problems. When minimizing an objective function value, it is possible to have mappings that are not load balanced. In other words, certain


hypercube nodes may have only dummy tasks assigned and none of the actual processors in the node would perform any useful computation. Though this does not affect the total computational time for CM-2, for systems that do not have "the power of 2" processor requirement and for MIMD systems, a load balanced mapping would be better. Even in CM-2 it is possible that a highly non load balanced mapping results in more congestion increasing the communication time, whereas a load balanced mapping might result in a lesser communication time albeit a higher objective function value. The heuristic was tested both with and without load balancing requirements.

Recall that par_alg_r has a high level of diversification. It can be expected that the actual tabu size does not have much effect on its performance. Two tabu sizes of 2dim and 4dim were tried for all the problems.

Finally in order to measure the dependence of the heuristic on the starting solution, par_alg_r was tested on two kinds of starting solutions. The first is a naive initial mapping of task i to processor i. The second is a mapping produced by the RSB algorithm developed by Pothen, Simon, and Liou (12). CPE is a local search heuristic that performs best when provided a starting solution constructed using the RSB algorithm. Since, RSB is expensive it is desirable for a heuristic to function well without such a starting solution. Since the code to compute RSB works only on undirected graphs, par_alg_r was tested with RSB starting solutions only for all test problems with undirected task graphs.

The test problems are a combination of some of the problems tested by Dahl [2] (3elt), Hammond [7] (3elt, bump, -Ielt) and Shapiro [13] (gridl, grid!!). Among those gridl, grid!! have directed task graphs. Also, we do not have results from simulated annealing or CPE for these problems.

The computations were performed on CM-2 and the algorithm was coded in C*-a parallel extension of C. The problems were tested with combinations of naive and RSB starting solutions; with and without load balancing requirements; tabu sizes of2dim and 4dim; performing 2000, 5000 and 10000 iterations of tabu search. All problems were mapped to a 8192 processor (8 dimensional) hypercube configuration of the CM-2. The results are organized in the following tables. Table 2 provides, for all the test problems, information about the size, and starting objective function values and total message cycles for both naive and RSB starting solutions. Table 3 (resp., Table 4) provides results without (resp., with) load balancing requirements. The times reported are wall clock CM-2 times (in seconds) on a 16K machine and account for more than 95% of the total computation time. Finally, Table 5 compares the results from tabu search heuristic with simulated annealing and CPE.

58 CHAPTER 3

From the tables, it is clear that the results do not depend on the tabu size. The objective function values are better when solved without load balancing as can be expected. However, the number of message cycles are consistently lesser for load balanced mappings. This can be explained intuitively, since load balancing spreads out the tasks evenly on all hypercube nodes reducing the congestion in wires.

Though the objective function values are lower, RSB starting solution does not significantly change the number of message cycles. In fact, the tabu search heuristic without the RSB starting solution achieves mappings with message cycles that nearly equal the message cycles obtained using RSB starting solution, in the same time frame. Thus, the overhead of computing RSB starting solutions can be avoided without either compromising solution quality or spending more time. This is particularly significant since a parallel algorithm to compute the RSB has not been developed, and RSB is computationally expensive (3elt takes 2 minutes, 4elt 20 minutes and a graph with about 50,000 vertices and 200,000 edges takes about 60 minutes on a Sun SPARC 2).

The benefits of mapping increase with the size of the problem. For grid2 the largest problem solved in this study, applying the tabu search heuristic for 2000 iterations results in a reduction of message cycles by a factor of about 25. In general, the number of cycles is reduced as more iterations are performed.

Table 5 list the results from simulated annealing and CPE. It has been reproduced from Hammond's thesis [7] and it represents all the results we have on the test problems for comparison purposes. The results for CPE are provided for the cases of naive and RSB starting solution. RSB CPE is a sequential algorithm and its running times are not provided. However, it does require a starting solution from RSB and it stops at the first local optimum encountered. As a general comparison, tabu search does not require a good starting solution and therefore, avoids the overhead of having to compute a starting solution. Tabu search also goes beyond local optimality and has the potential to provide better solutions when one is willing to spend more time.

Finally we would like to reiterate that, solving the mapping problem is only a pre-processing step. The efficiency achieved in solving the original problem whose decomposition led to the mapping problem, ultimately determines whether the effort spent in pre-processing is worthwhile or not. We did not perform computations with any of the test grids to assess the ultimate merit of solving the mapping problem. However, we would like to refer the reader to Steve Hammond's thesis [7] for more information about the relative improvement in total computation time that can be expected by mapping tasks to


processors intelligently. Hammond performs several computations for a variety of test problems, including 3-elt and 4-elt, and presents results which show marked improvement in the total computation time.

5 CONCLUSIONS

In this paper we addressed the problem of mapping tasks to processors in a multi-processor system in order to minimize the time spent in inter-processor communication. The problem is approximated by a very large QAP with sparse flow matrix. Based on extensive computational experiments, a parallel heuristic algorithm par_alg_r with aggressive diversification seems to be appropriate for the problem. It is robust with respect to approximate evaluations during its course, and obtains improved solutions to all problems attempted in this study. Also, due to its robust parallel implementation, this algorithm can be used to develop heuristics for "quasi-dynamic" communication patterns, where the task graph changes slowly with time.

The QAP formulation of the problem is general enough to cover different architectures and weighted task graphs. Though communication time does not always decrease with a reduction in the QAP objective function value, it is still a good approximation to the actual communication time. Heuristics (at least for simpler architectures) that work on message cycles, the direct measure of communication time, would be an interesting field of study.

Acknowledgements

We thank Steve Hammond for providing the code to compute RSB and some of his data files. We also thank Denning Dahl and Richard Shapiro for providing data. This work was supported in part by NSF grant DDM-8909206. The computational work was conducted using the computing resources of the N orthEast Parallel Architectures Center (NPAC) at Syracuse University, Syracuse, NY; UMIACS College Park, Maryland; and Thinking Machines Corporation, Cambridge, Massachusetts.

60 CHAPTER 3

REFERENCES

[1] J. Chakrapani and J. Skorin-Kapov, 1993. Massively parallel tabu search for the quadratic assignment problem. Annals of Operations Research, 41: 327-341.

[2] D.E. Dahl, 1990. Mapping and compiled communication on the connection machine system. In Proceedings of the Fifth Distributed Memof"!J Computing Conference, 756-766. IEEE Computer Society.

[3] G. Finke, R.E. Burkard, and F. Rendl, 1987. Quadratic assignment problems. Annals of Discrete Mathematics, 31:61-82.

[4] G. Fox et al, 1988. Solving Problems on Concurrent Processors. Prentice Hall.

[5] F. Glover, 1989. Tabu search - part I. ORSA Journal on Computing, 1(3):190-206.

[6] F. Glover, 1990. Tabu search - part II. ORSA Journal on Computing, 2(1):4-32.

[7] S.W. Hammond, 1992. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New York.

[8] W. D. Hillis, 1985. The Connection Machine. The MIT Press.

[9] C.T. Ho and S.L. Johnson, 1989. Embedding meshes in boolean cubes by graph decomposition. Journal of Parallel and Distributed Computing.

[10] Behzad Kamgar-Parsi, J.A. Gualtieri, J.E. Devaney, and Behrooz KamgarParsi, 1990. Clustering with neural networks. Biological Cybernetics, 63:201-208.

[11] J.P. Kelly, M. Laguna, and F. Glover, 1991. A study of diversification strategies for the quadratic assignment problem. to appear in Computers and Operations RelJearch.

[12] A. Pothen, H.D. Simon, and Kang-Pu Liou, 1990. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal of Matriz AnalylJis and Application" 11(3):430-452.

[13] R. Shapiro, 1992. Private Communication.

[14] J. Skorin-Kapov, 1990. Tabu search applied to the quadratic assignment problem. ORSA Journal on Computing, 2(1):33-45.

Mapping Tasks to Processors

Table 1 Pcrl'ormance of straightforward sequential and parallel tabu search algorithms

Problem Objective function after 80000 iterations

seq-&lg seq..a.lg_e par-&lg_e

3elt 10146. 13152 16926 grid 1 18544 23406 28094

Table 2 Deac:ription of Test Problems

Problem Vertices Edges Obj. fn. value Tot message cycles Naive RSB Naive RSB

3elt 4720 27444 45482 15200 36 14 bump 9800 57978 93254 20632 48 199 4elt 15606 91756 116474 27506 46 19 grid 1 7788 27264 239306 32 grid2 58850 212400 1965010 1033

61

62 CHAPTER 3

Table a Results: No load balancing

Prob. Tabu No. Obj. fn. value Num. msg. eycles Time (Sees) size Iter. Naive KSB Naive RSB Naive RSB

3e1t 16 2000 10080 9142 8 7 105.384 104.610 5000 9204 8260 8 7 265.386 264,012

10000 8546 7 533.159 32 2000 10484 8922 8 7 105.517 105.273

5000 8932 8214 7 7 265.975 264.835 10000 8462 7 531.953

bump 16 2000 23810 16356 14 13 116.186 118.669 5000 17274 14708 13 12 292.735 294.483

10000 14806 11 589.325 32 2000 24706 16000 14 13 118.965 118.545

5000 17684 14430 13 11 287.517 295.278 10000 14940 10 589.953

4elt 16 2000 34236 23612 16 13 117.003 118.574 5000 26826 22722 14 13 293.050 294.498

10000 23742 12 580.231 32 2000 34518 23524 18 14 116.933 118.471

5000 26296 22744 15 11 292.845 294.548 10000 23222 11 586.623

grid 1 16 2000 19122 9 135.926 5000 15990 9 340.613

10000 15146 8 683.230 32 2000 19158 10 134.739

5000 15843 8 337.753 10000 15064 8 677.665

grid2 16 2000 150478 40 488.314 5000 122628 36 1206.250

10000 102982 31 2400.230 32 2000 151135 40 489.139

5000 129338 36 1207.640 10000 101794 32 2397.320


Table 4 Results: Load balancing

Prob. Tabu No. Obj. £n. value Tot. msg. cycles Time (Sees) size Iter. Naive RSB Naive RSB Naive RSB

3elt 16 2000 13346 12656 7 8 107.394 107.504 5000 12542 12082 7 7 269.137 268.853

10000 11926 7 538.585 32 2000 13686 12428 7 8 107.139 107.627

5000 12718 12428 8 8 268.255 268.950 10000 12062 6 536.534

bump 16 2000 31286 19842 12 14 118.712 120.085 5000 22202 18714 9 9 298.839 299.226

10000 19556 9 599.624 32 2000 29630 19296 12 11 118.848 120.248

5000 21586 18664 11 9 298.831 299.484 10000 18886 9 600.114

4elt 16 2000 38066 25342 17 13 116.212 118.218 5000 29858 24808 14 12 291.719 293.933

10000 26720 11 584.922 32 2000 35490 25344 17 13 116.212 118.063

5000 28814 24884 15 11 291.511 293.988 10000 26454 13 585.122

grid1 16 2000 19937 10 134.085 5000 16345 8 337.075

10000 15500 9 681.914 32 2000 19506 9 134.371

5000 16302 9 337.066 10000 15064 9 677.051

grid2 16 2000 153151 40 485.865 5000 129048 36 1203.490

10000 102982 31 2400.230 32 2000 153123 40 483.194

5000 132196 35 1208.160 10000 107548 31 2406.560

64

4elt

CHAPTER 3

Table 5 Comparison of different heuristics

Message eycles Sim. a.nn. Naive CPE RSB CPE

8 17

10 16

7 12

Time (Sees) Sim. ann. Naive CPE

67 382

53 94

4 REFINEMENTS TO THE SO-CALLED

SIMPLE APPROXIMATIONS FOR THE

BULK-ARRIVAL QUEUES: MX /G/l

ABSTRACT

Mohan L. Chaudhry

Department of Mathematic. and Computer Science Royal Military College of Canada

King.ton, Ontario, Canada, K7K 5LO

For the MX/G/l queue, this paper gives explicit closed-form expressions in terms of roots ofthe so-called characteristic equation (CE) and approximations for the tail of the steady-state queueing-time distribution of a random customer of an arrival group. It is shown that the approximation improves if more than one root, in decreasing order of the negative real parts, is used, leading eventually to exact results if all the roots are used. Further, it is shown that our approximations are much simpler and, in general, more efficient to implement numerically than those given by Van Ommeren. Numerical aspects have been tested for a variety ofservice-time distributions including generalized distributions such as Coxian-A: (CII) with complex phase rates. Samples of numerical computations are also included in the form of tables and graphs. It is hoped that the results obtained should prove beneficial to both practitioners and queueing theorists dealing with bounds, inequalities, approximations, and other aspects.

1 INTRODUCTION

In a recent paper, on the assumption that the service-time distribution has a rational Laplace-StieItjes Transform (L-S.T.), ratio of a polynomial of degree at most (AI - 1) to a polynomial of degree AI, Chaudhry and Gupta [3] discuss the queueing-time distribution of a random customer of an arrival group in the MX/G/l queue. While customers belonging to different batches are served in order of arrival, customers belonging to the same batch are served in random order independent of their service times. The queueing time of a randomly selected customer of an arrival group, Wq , is equal to the queueing time of

65

66 CHAPTER 4

the first customer of the group, Wq" plus Wg, the time necessary to serve the customers ahead of him in his own group. Chaudhry and Gupta [3] discuss the steady-state distribution of Wq by considering the distributions of Wq, and Wg , separately, in terms of roots of the so-called characteristic equations (CE's). They neither give the explicit analytic form of the cumulative distribution function (C.D.F.) Wq(t) of the waiting time Wq, nor its moments, nor an approximate expression for the distribution of Wq, . It may also be pointed out that the procedure discussed here for numerical calculations of the distribution of Wq is much simpler than the one given in Chaudhry and Gupta [3]. For more information on this, see Section 3.

For the queueing system MX /G/1, Van Ommeren [12] gives two approximations for the distribution of queueing time Wq• These approximations which involve one and three (or two) exponential terms are called the first- and second-order approximations, respectively. He further states that since it is difficult to find the poles of the L-S.T. of the waiting-time distribution Wq(t), the constants used in the approximations are obtained by matching the exact explicit results for the delay probability, the derivative of Wq(t) at t = 0 and the first two moments of Wq against the corresponding results of the approximate wait in queue. The pole with the largest negative real part, which is simple and real, gives the first-order approximation. The poles that have the second largest negative real part(s) lead to the second-order approximation. By calculating a number of constants and establishing certain relationships among them, he suggests how to use either approximation in a particular situation. Further, he also states that the first-order approximation does not perform very well in light traffic, and, as an example, mentions that it does not give good results when Gis E,. (k-Erlang with k = 10) even with traffic intensity p = 0.2. The second-order approximation is shown to work much better than the first-order approximation even for small values of t and for a wide range of values of p, and hence it is called a refinement over the first-order approximation. It may be stated that though he does not use roots, his one- or three-exponential-termsapproximation assumption is, in fact, based on one or three roots. He gives some numerical results by taking service-time distribution as either E,. (k = 2 and 10) or hyperexponential of order 2 (HE2 ), and batch-size distribution as constant, uniform, geometric, or mixed geometric with balanced means. He considers cases when the expected values of X are taken small such as 2 and 5.

For the queue MX/G/l, we present much simpler approximations in compariBon to Van Ommeren's so-caIled simple approximations for the tail probabilities of the queueing time Wq of a random customer (Section 4). These approximations, which are based on roots of the CE, form hierarchical results and are also easy to understand and implement numerically. They improve if more

Refinements to Approximations for Queues 67

than one root, in decreasing order of the negative real parts, is used, leading eventually to exact results if all the roots are used. On the other hand, it may be remarked that improving Van Ommeren's approximations will require matching more moments. This, in turn, would lead to tedious algebra involving more constants. Moreover, for the queue M x / D /1, while our approximations can easily be implemented by considering the queue M x / E,. / 1 with Ie large (for more information, see Section 6), it may be hazardous, as stated by Van Ommeren, to use his approximations, particularly when the traffic intensity is low. In view of these observations, we can say that our approximations are refinements over Van Ommeren's so-called simple approximations. Further, in particular when p :: 1, the one-root approximation corresponds to the one given in Kleinrock [8] for GI/G/l. An analytic expression for the initial approximation of the smallest root of the CE, that is negative, is given in Appendix B. The tail of the distribution of Wq" the queueing time of the first customer of an arrival batch, is also discussed.

The results have been obtained for the service-time distribution belonging to the class R, which has a rational L-S.T., a ratio of a polynomial of degree at most Ie to a polynomial of degree Ie. This class is more general than the class R,. discussed by Botta, Harris and Marchal [1], which contains generalized hyperexponential (GH,.) , mixed-generalized Erlang (MGE/c), generalized Erlang (GE,.), PH (phase type) and K,. (distribution functions whose L-S.T.'s are reciprocals of polynomials of degree Ie). Their class R,. includes all those distributions that have rational L-S.T.'s, ratios of a polynomial of degree at most (Ie - 1) to a polynomial of degree Ie. The class R, discussed by us, includes even the most general distributions discussed by Cox [6], which, in the literature, are referred to as Coxian (C,.). The Coxian distributions have their own importance as any distribution having a rational L-S.T. can in practice be approximated as closely as one wishes by a C/c distribution for a given C~, the square of the coefficient of variation for a random variable (r.v.) V. This makes the model applicable to almost any service-time distribution that may arise in practice.

To find various roots, we use the Chaudhry QROOT software package [2]. Further, to see how good our approximations are, extensive computations were done on a COMPAQ-286 PC in double precision, but only a few results are appended here in the form of tables and graphs. For the purpose of comparing our approximations against those of Van Ommeren's, computations have been done and presented for the same data as taken by him.

68 CHAPTER 4

2 THE MODEL

Customers arrive in batches of random size according to a Poisson process with rate ,\. The arrival batch-size X is a r.v. with P(X = m) = 11m, m = 1, 2, ... , I and E(X) = i1. The probability generating function (p.g.f.) of {amH is denoted by A(z) := L:~=1 a.;zi. While the arriving batches are served in order of arrival, customers belonging to the same batch are served individually in random order by a single server. Let the r.v. S with C.D.F. B(·) and mean 1/1-' represent the service time of a customer. The traffic intensity p( = '\0./1-') is assumed to be less than unity.

3 QUEUEING-TIME DISTRmUTIONS

3.1 The Distribution of Wq and its Moments

The L-S.T. Wq(Oi) of the C.D.F. Wq(t) of the queueing time Wq is given by

(1)

where wq1 (Oi) and Wg(Oi) are the L-S.T.'s of the C.D.F.'s Wq1 (t) and Wg(t) of the queueing times Wq1 and W"~ respectively. The L-S.T.'s wq1 (Oi) and Wg(Oi) are given by

Wq1 (0i) (1 - p)Oi

Re(Oi) ~ 0, (2) Oi - ,\ + '\A(b(Oi»'

and

Wg(Oi) = 1 - A(D(Oi»

Re(Oi) ~ 0, (3) a(1 - b(Oi» ,

respectively, where D(a) is the L-S.T. of the C.D.F. B(t), Chaudhry and Templeton [5], pages 170-171.

Let D( a) be a rational function such that

D( ) - P(a) a - Q(a)' (4)

Here, we assume P(a) to be a polynomial of degree at most Ie and Q(a) of degree Ie. For further analysis and without loss of generality, it is assumed that the coefficient of aA: in Q(a) is unity, and in P(a) it is PIc. Assume also that

Refinements to Approzimations for Queues 69

IPlt I :5 1 (see Appendix A). It may be mentioned that our assumptions are more general than those considered by Chaudhry and Gupta [3], Cohen [7], page 322, and Botta, Harris and Marchal [1], i.e., in their case p" = o.

We first derive the C.D.F. Wq(t) and then discuss the approximations. In fact, using (2), (3), and (4), wq(a) in (1) reduces to

where

and

(1 - p)aQ [QI - E!=1 CliPiQI-i]

ii [(a - ~)Q' + ~ E!=1 CliPiQI-i] (Q - P)

(1 - p)aQf(a) ) ii(Q - P)g(a) ,

I

f(a) = Q' - LCliPiQI-i i=1

I

g(a) _ (a - ~)Q' + ~ LCliPiQI-i i=1

= aQ' - ~f(a),

with P == pea) and Q == Q(a).

Clearly, the roots of the equation

(Q - p)g(a) = 0,

are the roots of either the equation

Q-P o or

g(a) o.

(5)

(6)

(7)

(8)

(9)

(10)

Equation (10) will be called the CE ofthe model. The evaluation of the roots of (9) and (10) is discussed in Chaudhry and Gupta [3]. Under their assumptions of pea) and Q(a), Chaudhry and Gupta [3] show that (10) which is of degree (lA:+ 1) has one root at a = 0 and the other lie roots a,.,. (m = 1,2, ... , lie) with Re(am ) < O. Similarly, (9) has one root at a = 0 and the other (k - 1) roots a~ (n = 1, 2, ... , Ie - 1) with Re(a~) < O. Even under our assumptions of pea) and Q(Q), it can be shown by Rouche's theorem (see Appendix A) that (10) has lk roots with Re(Q) < O. Since in queueing models repeated roots

70 CHAPTER 4

rarely occur, Chaudhry, Harris and Marchal [4], it is assumed that the roots of (9) and (10) are diltinct. Since a = 0 is a double zero of the numerator of the right-hand side (r-h.s.) of (5), making partial fractions, (5) reduces to

where

A

c" =

wg(a) = --}!. A+ L: _m_ + L: -"-, ' 1 - [ II: B 1:-1 C ]

a m=l a - am ,,=1 a - a"

I i 1- Li-1 GiPI:

1- PI:

amQ(am)!(am) [Q(am) - P(am)]g(l)(am) '

a~Q(a~)!(a~) _ 0 [Q(l)(a~) - p(l)(a~)]g(a~) - ,

m= 1,2, ... ,1,1:

11,= 1,2, ... ,.1: - 1

(11)

(12)

(13)

(14)

and W(i) (. ) denotes the it" derivative of w(· ) . Since a~ (11, = 1,2, ... ,.I: - 1) is a root of (9), it satisfies J(a) = o. In view of this, all the C" 's are identically zero. Therefore, (11) reduces to

[ II: ] _ 1-p Bm

wg(a) = -_- A + L: ( )' a m=l a - am

(15)

and as such roots of only (10) are required. Though at present QROOT is designed to find only the distinct roots, efforts are being made to change it so that it can also find multiple roots. However, it may be remarked that for the present if some roots are repeated, they can be found by checking whether the successive derivatives of the left-hand side of CE (10) vanish at those roots, and then only a small modification in the partial fractions would be needed to complete the solution, see e.g. Tijms [11], page 402. The accuracy of the roots am's and the constants Bm's can be checked by using

Wg(O) = --:!!- A - L: 2!. = 1. 1- [ II: B 1 a m=l am

(16)

Taking the inverse Laplace Transform (L.T.) of (15), we have the explicit closedform expression for the probability density function (p.d.f.) of W'l as

(17)


where 6(t) is the Dirac delta function. From (17), the C.D.F. is obtained as

where

'10 B' Wq(t) = 1 + L ~ea",t,

"'=I Q",

I l-p B", =-_-B",.

a

(18)

(19)

The mean and the variance of the queueing time Wq can be obtained from either (15) or (17), and are given by

'10 B' E(Wq) = L Q;' (20)

",=1 '" and

'10 B' Var(Wq) -2 L ;' - (E(Wq»2, (21)

",=I Q", respectively.

3.2 The Distribution of Wq1 and Its Moments

Here, wq, (Q) given in (2), using (4), can be expressed as

_ [ )./(Q)] wq,(Q) = (1- p) 1 + g(Q) , (22)

where /(Q) and g(Q) are defined in (6) and (7), respectively. Now, since Q = 0 is a zero of both the numerator and the denominator of the second factor of (22), expressing (22) into partial fractions, we get

where

[ '10 A' 1 wq,(Q) = (1- p) 1 + L Q _:

",=1 '"

A' - )./(Q",) '" - g(I)(Q",)·

The accuracy of the coefficients A~ can be checked by the condition

wq,(O) = (1- p) 1- L ~ = 1. [ I. A' ]

",;:I Q",

(23)

(24)

(25)

72 CHAPTER 4

Proceeding the same way as in the case of the random customer, the p.d.f. wq, (t) and C.D.F. Wq, (t) are given by

and

respectively, where

11&

wq, (t) = (1 - p)6(t) + L Amea ... t

m=1

11& A a t 1+ L me ... ,

m=1 am

Am = (1- p)A:".

The mean and the variance, therefore, are given by

E(Wq,) =

and

(26)

(27)

(28)

(29)

(30)

The results in equations (26), (27), (29), and (30) match the corresponding results obtained by Chaudhry and Gupta [3J, where the Am's as given in their paper are

11& A - -pf(am ) II a;

m - f(1)(0) ;=1 (a; - am)' (31)

#m

Clearly, it is better to compute Am from (24) and (28) rather than from (31), as (24) is defined in terms of one root am only, while (31) is defined in terms of all the roots of CE (10). Further, the B:"'s can also be expressed in terms of the Am's as

B' m Aa[Q(am ) - P(am )]

Amwg(am). (32)

Thus, once the Am's are known, (32) facilitates the calculations of the B:"'s and vice-versa.

Refinements to Approzimations for Queues

4 THE TAILS OF THE QUEUEING-TIME DISTRIBUTIONS

4.1 The Tail of the Distribution of Wq

73

For large t, the distribution of W, can be approximated by a single term corresponding to the am with the smallest negative real part, say aI, which based on our computational experience, is negative, and close to the origin. Moreover, al moves closer to the origin as p -+ 1-. A proof for the existence of the root al can be obtained from Theorem 4 of Chaudhry, Harris and Marchal [4], page 285. For high p, an approximate analytic expression for alJ denoted as ao, is given in Appendix B,Le.,

-2(1- p) ao ~ ~[u~ + au: + u:/1o'2]' (33)

u!, u:, and u; being respectively the variances of interarrival-time, service-time, and arrival batch-size distributions. In fact ao, which is obtained in terms of the parameters and the first two moments of the distributions involved, serves as a good upper bound of the root al.

To get the tail based on the single root aI, assume that

B' W,(t) ~ 1 + -lea", for large t, (34)

al

where B~ is defined in (19). Denoting the asymptotic estimate of W,(t) given on the r-h.s. of (34) by W,Mt), we choose t. as the smallest value oft such that

i.e.,

(35)

where F(t) == 1 - F(t). This approximation gets better if more than one root is used. It may be remarked that those roots that occur in complex-conjugate

74 CHAPTER 4

pairs should be used in pairs. Thus, the asymptotic estimate W:/l(t) of Wq(t), using three roots, is given by

(36)

where Om (m = 1, 2, 3) are the roots in decreasing order of the negative real parts. The numerical performance of W:q(t) (appe) can be observed in Tables 1, 2,3, and 4. It may be noted that the one root approximation WII~(t) (asy) given in (34) is exactly the same as the one given by Van Ommeren [12] (equation (5), page 680).

From (18), it can be seen that when we use the approximation W1q(t), W1q(0) = (1 + BU 01). Further, as p -+ 1-, 00 and 01 coalesce and -+ 0-. Also, Wq(O) = (1- p)/ii:::} (1 + BU01) -+ 0 or BU01 -+ -1 as p -+ 1-. Thus, we conclude that as p -+ 1-, (34) gives

(37)

where 00 is given in (33). In other words, we prove that the queueing-time distribution of a random customer of a batch, for large t, is approximately exponential for p ~ 1. It may be pointed out that the result (37) is similar to that given in Kleinrock [8], pages 29-31 for the queue Gl/G/I.

4.2 The Tail of the Distribution of Wq1

Using (27), one- and three-roots approximations can also be obtained for the distribution of queueing time W/l'. We shall denote them by W1q, (t) and W:/l l (t), respectively, which can be obtained from (34) and (36) with B!'s replaced by ~ 's, i.e.,

W:/ll (t) 1 + A1ealc (38) 01

and

W:/l,(t) 3 A

= 1 + E .-!!!.ea".c. (39) m=1 0m

Since Wq, (0) = (1 - p), proceeding along similar lines as in Section 4.1, it can be shown that At/01 -+ -1 as p -+ 1-. Thus the queueing-time distribution W'l, (t) is also approximately exponential for p ~ 1, i.e.,

WI (t) '" 1 _ eaoC 11'1, - ,


where ao is given in (33). In view of this, we can conclude that as p -+ 1-, W;,(t) and W;" (t) are approximately equivalent, i.e., W;,(t) <-+ WJ" (t), a result which seems surprising. This result can also be obtained otherwise; for using (32), it follows that as p -+ 1-, B~ <-+ A1•

5 SPECIAL CASES

Let the service-time distribution be CODan, i.e., R is CIc for which

I: ,

-( ) "" II 0';1'; b a = 'Y1 + ~ 'Y1+1 ~' ,=1 ;=1 a 1',

such that 0 ~ 'Y" 0', ~ I, 'Y, + O'i = I, i = I, 2, ... , Ie and 'Y1:+1 = 1. Here,

I: Ic, I:

P(a) = ·n II(a + 10';) + E 'Y'+1 II u;lo'j II (a + J.',.) (40) ;=1 .=1 j=l r=.+l

and I:

Q(a) = II(a+IJ.;). (41) ;=1

Further, the first two moments used in numerical computations are given by

and

where

Ic (' IIi 1) 2L'Y1+1Fi L~+2 L -.- , i=l ;=1 1', 3.p=1 1', JI.p

ji.p

76 CHAPTER 4

Let IJIe be the largest of IJj (j = 1, 2, ... , k). To find the roots am (m = 1, 2, ... , lk) (Re(am) < 0) of (10), we use the transformation

(42)

If (42) is used in (10), then a routine application of Rouche's theorem shows that the equation in z has lk roots inside the unit circle. This means that the lk roots of (10) with Re(am) < 0 correspond to the lk roots in z with IZm I < 1, and IZm I < 1 implies Re(am) < o. Therefore, it suffices to solve the equation in z for roots inside the unit circle and then find the roots am (m = 1, 2, ... , Ik) from (42).

In particular, if '"Yi = 0 (i = 1, 2, ... , k), then Cle reduces to GEIe , for which

Further,

E(S)

and

In particular, if IJj = IJ' (j = 1, 2, ... , k), we have the Ele service-time distribution with each phase having mean 1/ IJ', so that 1/ IJ = k/ IJ'.

(ii) MX /Cle /l with complex phase rates

Let the service-time distribution be Conan with complex phase rates of the form given in Cox [6], page 314, equation (3). Therefore, the L-S.T. b(a) is given by

Ie II JJ' b(a) = -'-;=1 IJj + a

where IJj (j = 1, 2, ... , k) occur in complex-conjugate pairs. In particular, if k = 3, taking IJl = ClI IJ2,3 = Cl ± id1 , with Cl > 0 and i = H, b(a) reduces

Refinements to Approzimations for Queues

to b( ) C1(C~ + dD

a = (C1+a){(C1+a)2+dn'

Cox [6], page 314, equation (4». In this case,

E(S) = and

3c~ + d~ C1(C~ + 4)

2(4 + 3c~d~ + 6et) cHc~ +dn2

77

To find the roots am. (m = 1, 2, ... , 31) with Re(am ) < 0 of CE (10) with b(a) as defined above, we use the transformation

a = -1~1(1- z), Ip21 = Jc~ +d~.

6 NUMERICAL RESULTS

Numerical tables and graphs give insight into the effects of varying the parameter values. Besides, they may be also useful to other researchers who may like to compare their results, using other methods, against ours. Keeping this in view, and to test our results against Van Ommeren's results as well as to test our method thoroughly, extensive calculations were done both for the exact distribution of W" and its tail. The results of the computations, which were done on a COMPAQ 286 PC, are excellent, as a high degree of accuracy is always achieved. Clearly, the accuracy of the results (expressed in terms of the roots of CE) depends on the accuracy of the roots found by QROOT. For the purpose of this paper, a root a found by QROOT is said to be a good root if Ig(a)1 < 10-14•

Of the numerous tables and graphs we produced while testing our solution procedure, we append here only a few tables for illustrative purposes. The selection being made in such a way that by looking at them one can appreciate the strength of the rootfinding algorithm as well as the applicability of the rootfinding procedure to almost any distribution with any set of reasonable values of the parameters.

In Tables 1 and 2, we present both our and Van Ommeren's [12] numerical results displaying conditional waiting-time percentiles. As discussed by Van

78 CHAPTER 4

Ommeren, since the percentiles lI(p) of the conditional waiting-time distribution of the delayed customer are defined for all 0 < P < 1, they are convenient to use rather than the percentiles ((p) of the unconditional waitingtime distribution W'l(t). The percentiles lI(p) are determined by the equations (1 - W,(II(p)))/(1 - W,(O)) = 1 - p and ((Po) = II(PI) when Po = 1 - (1 - PI)(1 - W,(O)). Noting that Cf denotes the square of the coefficient of variation of a variable V, we use four different batch-size distributions as considered by Van Ommeren: (i) the constant batch size (Ci = 0), (ii) the uniformly distributed batch size (C} = E(X - 1)/3E(X)), (iii) the geometrically distributed batch size (C} = E(X - 1)/E(X)), and (iv) the mixedgeometric distribution batch size with balanced means, where C} is taken to be 2. A batch-size distribution {a,., n ~ I} is said to be a mixed-geometric distribution with balanced means when a,. = qpI(I-PI)n-I+(I-q)P2(1-P2)n-1, n ~ 1 with q/PI = (1 - q)/P2. For the service-time r.v. S, he considers the E10 distribution (C~ = 1/10), the E2 distribution (C~ = 1/2), and the HE2 distribution with balanced means where C~ = 2. In all the cases, it is assumed that E(S) = 1. The results consist of conditional waiting-time percentiles obtained by: (a) using one root, i.e., using W';,(t) and denoted by asy, (b) using three roots (sometimes using two roots when there are two real roots), i.e., using W:, (t) and denoted by aPPe, (c) the second order approximations of Van Ommeren denoted by app .. , and (d) using all roots (exact solution), i.e., using W,(t) and denoted by eza. As mentioned earlier in Section 4, the values given in the first row using one root are the same as the one's given by Van Ommeren which he calls the first-order approximation and denotes it by easy'. The numerical investigations reveal that except in a few cases marked by • and t, aPPe and app" match. For the cases marked by., appe is better than app", and for the cases marked by t, it is otherwise. On the basis of these results it can be concluded that both app" and appe are almost equally accurate numerically, there being 17 .'s and 16 t's in Table 1 and 16 .'s and 17 t's in Table 2. But it can be seen that our approximations are much simpler to understand analytically and implement numerically than those of Van Ommeren's. As can be seen from Tables 1 and 2, W:,(t), in general, gives better results than W~,(t) and the approximations improve as p increases. Further, the approximations using roots improve if more roots are used. Also, in order to get better results, whereas our approximations can be easily extended symmetrically and implemented numerically, Van Ommeren's cannot. In view of this and many other features of our method discussed in the paper, our approximations may be considered refinements over those of Van Ommeren.

To investigate the implementation and performance of our approximations in more detail, Table 3 gives the results for MX /Cs/l when the service-time distribution has complex phase rates with CI = 2, dl = 1, so that E(S) = 1.3,


E(S2) = 2.18, and cl == 0.2899. The batch-size distribution is taken as a20 = 0.1, aSO = 0.3, a70 = 0.4 and alOO = 0.2, so that E(X) == a = 65, E(X2) = 4750, and ci = 0.1243. The results given consist of the different conditional waiting-time percentiles obtained corresponding to W;q(t), W:q(t) and Wq(t) when p = 0.1 and 0.9, respectively. These values indicate that in this case also W:q(t), in general, gives better results than W;'1(t) even for smaller values of t. Moreover, the performance of the approximations improves when p gets larger.

In Table 4, similar results are given for MX /C2/1 when 71 = 0.3, U1 = 0.7, 72 = 0.5, U2 = 0.5, 1-'1 = 1, 1-'2 = 2, so that E(S) = 0.875, E(S2) = 1.925, and C~ == 1.5143, and p = 0.7. The batch-size distribution is taken as a1 = 0.1, as = 0.3 and alO = 0.6, so that E(X) = 7.6, E(X2) = 67.6, and Ci = 0.1704. The consideration of the C2 distribution has its own importance since, as proved by Marie [9], for a distribution with any mean and any Cf such that Cf ~ 0.5, it is possible to determine a C2 distribution which has the desired mean and vallance.

For the system M X / EIO/l when p = 0.2, E(S) = I, and E(X) = 2, Figures I and II show the graphs of the exact expression W,(t) and the asymptotic expressions W;q(t) and W:q(t) corresponding to the constant (C~ = 0) and uniformly distributed (C} == 0.17) batch sizes, respectively.

80 CHAPTER 4

Table I' Comparison of Conditional Waiting Time Percentiles '(P) when E(X)-2 -p P Elo• c.2 =.1 E,. c:1.=,5 HE,.

c,' 0.00 0.17 0.50 0.17 0.50 2.00 0.00 0.50

0.2 asy 0.56 0.80 0.85 0.68 0.75 0.38 0.00 0.00 app .. °0.74 0.70 °0.90 0.55 0.72 1.06 0.07 0.46 ·PP. 0.70 to.82 0.88 to.61 0.72 1.06 to.24 0.46 en 0.75 0.83 0.90 0.60 0.72 1.06 0.24 0.46

0.5 0.94 1.28 1.73 \.31 1.76 3.32 0.00 0.97 1.05 1.41 °1.74 \.34 1.76 3.41 0.73 1.57

tl.07 t\.35 1.73 1.34 1.76 3.41 to.84 1.57 1.07 \.35 1.74 \.34 1.76 3.41 0.84 1.57

0.2 0.8 1.71 2.22 3.46 2.54 3.73 9.03 2.15 4.25 °1.63 °2.25 3.46 2.59 3.73 9.03 2.50 4.35

1.64 2.24 3.46 t2.58 3.73 9.03 t2.53 4.35 1.63 2.27 3.46 2.58 3.73 9.03 2.52 4.35

0.9 2.28 2.94 4.76 3.48 5.21 13.36 4.\8 6.73 °2.26 °2.89 4.76 ·3.49 5.21 13.36 ·4.30 6.75 2.23 2.92 4.76 3.50 5.21 13.36 4.31 6.75 2.27 2.88 4.76 3.49 5.21 13.36 4.30 6.75

0.2 asy 0.72 0.92 1.04 0.84 0.99 0.83 0.00 0.21 .pp, 0.83 °0.94 1.05 0.82 0.98 1.39 0.40 0.79

·PP. 0.83 0.97 1.05 to.83 0.98 1.39 to.47 0.79 exa 0.85 0.95 1.04 0.83 0.98 1.39 0.47 0.79

0.5 1.52 1.88 2.47 2.02 2.62 4.82 1.50 2.74 °1.48 °1.92 2.47 ·2.04 2.62 4.86 \.82 2.88

1.50 1.89 2.47 2.03 2.62 4.86 tl.83 2.88 1.48 \.92 2.47 2.04 2.62 4.86 1.83 2.88

0.5 0.8 3.08 3.76 5.26 4.32 5.79 12.59 5.50 7.68 3.09 3.76 5.26 4.32 5.79 12.59 5.52 7.68 3.09 3.76 5.26 4.32 5.79 12.59 5.52 7.68 3.09 3.76 5.26 4.32 5.79 12.59 5.52 7.68

0.9 4.27 5.18 7.37 6.06 8.19 18.48 852 11.41 4.27 5.18 7.37 6.06 8.19 18.48 8.53 11.41 4.27 5.18 7.37 6.06 8.19 18.48 8.53 11.41 4.27 5.18 7.37 6.06 8.19 18.48 853 11.41

0.2 •• y 1.40 1.68 2.02 1.74 2.12 3.12 1.32 2.18 app" 1.39 • •. 10 '2.03 1.76 2.12 3.27 ·1.54 2.29 app ... 1.39 1.69 2.02 t1.75 2.12 3.27 1.55 2.29 ex. \.38 1.70 2.03 1.75 2.12 3.27 1.53 2.29

0.5 3.71 4.38 5.64 4.95 6.22 11.59 6.16 8.21 3.71 4.38 5.64 4.95 6.22 11.59 6.16 8.21 3.71 4.38 5.64 4.95 6.22 11.59 6.16 8.21 3.71 4.38 5.64 4.95 6.22 11.59 6.16 8.21

0.9 0.8 8.21 9.66 12.69 11.20 14.20 28.09 15.58 19.96 8.21 9.66 12.69 11.20 14.20 28.09 15.58 19.96 8.21 9.66 12.69 11.20 14.20 28.09 15.58 19.96 8.21 9.66 12.69 11.20 14.20 28.09 15.58 19.96

0.9 11.61 13.65 18.03 15.92 20.24 40.57 22.72 28.86 11.61 13.65 18.03 15.92 20.24 40.57 22.72 28.86 11.61 13.65 18.03 15.92 20.24 40.57 22.72 28.86 11.61 13.65 18.03 15.92 20.24 40.57 22.72 28.86

Legend;The values of C~l=O.OO, 0.17, 0.50, and 2.0 correspond. respectively. 10 fixed, unifonn, geometrically. and mixed geometrically distributed balch sizes.

c,2=2

2.00

0.00 0.57

to.81 0.82

2.98 3.36 3.38 3.37

9.64 9.70 9.70 9.70

14.67 °14.69

14.68 14.69

0.53 1.19

t\.32 1.32

5.21 5.39

t5.40 5.40

14.34 14.36 14.36 14.36

21.25 21.25 21.25 21.25

3.58 3.76

t3.77 3.77

13.77 13.78 13.78 13.78

33.63 33.63 33.63 33.63

48.65 48.65 48.65 48.65

Refinements to Approximations for Queues

Table II' Comparison of Conditional Waiting Time Percentiles ,(P) when E(X)-S -

p P £'0' c.2 =.1 E" c.2 =.5 HE"

c.' 0,00 0.27 0.80 0.00 0.80 2.00 0.27 0.80

0.2 "'Y 1.64 2.29 1.71 1.53 1.58 0.00 0.93 0.73 'pp, 0.81 1.41 "1.72 0.89 1.58 1.78 0.56 1.16 ·PP. tl.53 t1.70 1.71 t1.27 1.58 1.78 to.98 1.16 eXl 1.41 1.56 1.72 1.23 1.58 1.78 0.97 1.16

0.5 2.56 3.56 4.23 2.60 4.36 4.02 3.02 4.04 "2.89 3.45 4.23 "2.70 4.36 5.82 2.83 4.11 2.69 3.45 4.23 2.62 4.36 5.82 t3.05 4.11 2.84 3.51 4.23 2.70 4.36 5.82 3.06 4.11

.2 0.8 4.36 6.03 9.57 4.68 9.79 17.10 7.10 10.49 "4.32 "6.25 9.57 4.74 9.79 17.21 7.05 10.49 4.44 6.15 9.57 4.77 9.79 17.21 t7.11 10.49 4.33 6.26 9.57 4.74 9.79 17.21 7.10 10.49

0.9 5.71 7.89 13.50 6.25 13.90 27.00 10.18 15.38 "5.54 "7.85 13.50 "6.23 13.90 27.01 10.17 15.38 5.72 8.02 13.50 6.31 13.90 27.01 t10.19 15.38 5.54 7.86 13.50 6.23 13.90 27.01 10.19 15.38

0.2 ISY 1.88 2.44 2.44 1.76 2.36 0.00 1.48 1.79

"PP, 1.96 2.00 2.44 1.70 2.36 2.74 1.40 1.98 .pp'¥ tl.94 t2.25 2.44 tl.73 2.36 2.74 tl.62 1.98 eXI 1.93 2.17 2.44 1.74 2.36 2.74 1.63 1.98

0.5 3.81 4.98 6.71 3.91 6.81 9.08 5.14 7.03 3.85 "5.13 6.71 "3.96 6.81 9.72 5.12 7.04 3.85 5.00 6.71 3.94 6.81 9.72 t5.18 7.04 3.87 5.10 6.71 3.96 6.81 9.72 5.17 7.04

.5 0.8 7.56 9.93 15.Q3 8.09 15.51 27.00 12.26 17.25 "7.59 "9.90 15.03 8.09 15.51 27.01 12.26 17.25 7.56 9.95 15.03 8.09 15.51 27.01 tl2.27 17.25 7.59 9.90 15.03 8.09 15.51 27.01 12.27 17.25

0.9 10.40 13.67 21.33 11.25 22.08 40.55 17.65 24.98 "10.39 13.68 21.33 11.25 22.08 40.56 17.65 24.98

10.40 13.68 21.33 11.25 22.08 40.56 17.65 24.98 10.39 13.68 21.33 11.25 22.08 40.56 17.65 24.98

0.2 asy 3.49 4.39 5.47 3.50 5.52 6.34 4.28 5.57 aPPe "3.56 "4.47 5.47 "3.54 5.52 7.13 4.27 5.59 .pp" 3.52 4.39 5.47 3.52 5.52 7.13 t4.32 5.59 eXI 3.57 4.45 5.47 3.54 5.52 7.13 4.32 5.59

0.5 9.07 11.54 16.15 9.58 16.68 26.18 13.86 18.57 "9.06 11.54 16.15 9.58 16.68 26.20 13.86 18.57

9.07 11.54 16.15 9.58 16.68 26.20 13.86 18.57 9.06 11.54 16.15 9.58 16.68 26.20 13.86 18.57

.8 0.8 19.95 25.48 36.98 21.44 38.43 64.87 32.53 43.90 19.95 25.48 36.98 21.44 38.43 64.87 32.53 43.90 19.95 25.48 36.98 21.44 38.43 64.87 32.53 43.90 19.95 25.48 36.98 21.44 38.43 64.87 32.53 43.90

0.9 28.18 36.03 52.74 30.40 54.88 94.14 46.66 63.06 28.18 36.03 52.74 30.40 54.88 94.14 46.66 63.06 28.18 36.03 52.74 30.40 54.88 94.14 46.66 63.06 28.18 36.03 52.74 30.40 54.88 94.14 46.66 63.06

Legend: The values of c.2 =O,OO, 0.27, 0.80, and 2.0 correspond, respectively, to fixed, unifonn, geometrically, and mixed geometrically distributed balch sizes.

81

c.2 =2

2.00

0.00 1.05

tl.38 1.38

3.61 5.77

t5.79 5.79

17.54 17.84 17.84 17.84

28.08 28.13 28.13 28.13

0.00 2.31

t2.44 2.44

9.37 10.20 10.20 10.20

28.56 28.61 28.61 28.61

43.08 43.09 43.09 43.09

6.67 7.46 7.46 7.46

28.20 28.23 28.23 28.23

70.17 70.17 70.17 70.17

101.92 101.92 101.92 101.92

82

Tablem Conditional Waiting Time Percentile. ,Cp) for MX/C,II with complex phase rale. when c,=2. d,=I. £(5)=1.3 .... =0.2899 .... =0.1 •• .,=0.3 • ... =0.4. a,.,=0.2. E(X) ... =65. c,'=0.1243; p=O.1 and 0.9.

p p=O.1 ,Cp) p=0.9

.1 asy 34.8641 62.7394 app, 23.1729 63.1425 exa 9.6553 63.2280

.2 aay 38.3035 117.2920 app, 25.5013 117.1999 exa 18.7518 117.4928

.5 asy 52.0278 334.9799 app, 39.1311 334.9797 exa 47.6487 334.9794

.8 aay 78.7841 759.3711 app, 83.8555 759.3711 _xa 81.9411 759.3711

.9 a.y 99.0244 1080.4106 app, 102.7768 1080.4106 exa 103.3882 1080.4106

.99 a.y 166.2613 2146.8809 apPe 164.9210 2146.8809 exa 165.6576 2146.8809

Table IV Conditional Wailing Time Percentile. ,Cp) for MX/c,t1 with complex phase rales when 'Y,=0.3. 17,=0.7. 'Y,=0.5. 17,=0.5. ",=1. ",=2. £(5)=0.875. c:z= 1.5143, 8 1 =0.1,1,=0.3, 1 10 =0.6, BOO •• =7.6, c '=O.1704.p=0.7. ,

p ,Cp)

.1 .. y 2.1609 app, 1.7799 exa 1.8161

.2 asy 3.8160 app, 3.7131 exa 3.7446

.5 .. y 10.4206 app, 10.4284 exa 10.4282

.8 a.y 23.2965 app, 23.2966 eXl 23.2966

.9 a.y 33.0368 app, 33.0368 exa 33.0368

.99 a.y 65.3932 appc 65.3932 eXl 65.3932

CHAPTER 4


1.0 1.0,---------

0.'

0.1

0.1

, ci°,5 u O•4

0.3

0.2

0.1

0.0 0

TlMEm rIt'{(])

Fig. I: Graphs ofWq(t) (#1), Wq'(t) (#2), and W,'(t) Fig. II: Graphs of Wq(t) (#1), W,'(t) (#2), and (#3) for the model MX/E"/I of Table 1 with W,'(t) (#3) for the model MX/E,oll of Table 1 with E(X)=2, E(S) = I, c.'=O.OO, and p=O.2. E(X)=2, E(S) = 1, c.'=O.27, and p=O.2.

84 CHAPTER 4

Similarly, for the system M x / E1o/ 1 when p = 0.8 and E( X) = 5, Figures III and IV are drawn corresponding to ci == 0.27 and c,i = 0.8, when batch size is distributed uniformly and geometrically, respectively.

0.15 0.15

0.11 0.12

0.09 0.09 . . ci 0

U O•Di uO.06

0.03 0,03

/3/2

0.00 0,00 0 0

TIl'( (I) TIllEtT)

Fig. III: Grapbs of W.(t) (#1), w.'(t) (1/2), and Fig. IV, Graphs of W.(t) (#1), W.'(t) (1/2), and W.'(t) (#3) for the model MX/E,.,II of Table 2 with Witt) (#3) for the model MX/E,.,II of Table 2 with E(X)=S, E(S)=I, c.'=O.27, and p=O.8. E(X)=S, E(S)=!, c.'=O.80, and p=O.8.

It can be observed from all the Figures I, II, III, and IV that W:q(t) generally gives better results than W';q(t). Further, as an example, it may be mentioned that for E = 10-2 , t. (as defined in (35» == 0.48, as can be seen in Fig. IV.

The exact numerical results ofWq(t)(Wql (t» for M X/D/l are not possible since it involves finding an infinite number of roots of the CE, Chaudhry and Gupta [3]. However, these can be obtained through MX /E,,/l by taking Ie sufficiently large. As such, the tail ofthe distribution Wq(t)(Wlll(t» for M X/D/l can also be approximated through the tail of MX /E,,/l by considering Ie large, subject to lie ~ 2500, Chaudhry and Gupta [3].


7 CONCLUSIONS

The various expressions obtained in this paper are computationally efficient, accurate, and easily implement able for both low and high values of the system parameters. The approximations for the tail are also discussed in terms of one or three roots, which are used in decreasing order of the negative real parts. It is shown that these approximations not only work for a wide range of values of the traffic intensity (p), but improve if higher values of p are taken. It is also shown that our approximations, which form hierarchical results, are simpler to understand analytically and easier to implement numerically than those of Van Ommeren, can easily be extended symmetrically if more roots are used.

In a recent article, Suri and De Treville [10] note the scarcity of commercially supported software packages on queueing models. We would like to report that the proposed method has enabled us to develop a menu-driven package for this queueing model which is nearing completion. It is hoped that it should be available soon to both practitioners and researchers to enable them to have access to highly accurate results for a wide range of distributions satisfying the underlying assumptions. It may, however, be emphasized here that the approximations discussed in this paper have facilitated the computations to a great extent.

APPENDIX A

Theorem: The CE (10) has lk roots with Re(a) < 0, provided p < 1.

The proof is similar to the one given in Cohen [7], page 322. But since it runs parallel to the proof given in Chaudhry and Gupta [3], it is not given here.

86 CHAPTER 4

APPENDIX B

For high p, the root al (Re(al) < 0) of (10) with the smallest negative real part can be approximated by

-2(1- p)

U~, uC, and u; being, respectively the variances of interarrival-time, servicetime, and arrival batch-size distributions.

Proof: The CE (10) can be written as

cf>(a)A(b(a)) - 1 = 0, (1)

where cf>(a) = >'/(>' - a).

Using the Taylor series expansions for A(b(a» and cf>(a), we have

cf>(a) (2)

and 00 (2 1 2 2 ) a 2 2 A(h(a» = 1--+ au.+2"(Ug +il) ,+o(a). I-' I-' 2.

(3)

Substituting (2) and (3) in (1), we have

From (4), it is clear that a = 0 is a root of (1). Further, since p ~ 1, we can neglect the (1- p)2 term. Under this assumption and dropping o(a2 ) for small


a, we solve (4) for the second root, say ao, which is approximately given by

ao = -=-=-:---_2(:...,I ..... -....:.p...!.,):-:-=:_ .\[CT: + au: + CTU 1-'2]

(5)

as 1/.\2 = CT!. Clearly, this root is negative and ---t 0 - as p ---t 1-. Since we have neglected o(a2) for small a, clearly, when p is high, the root ao given in (5) can be taken as an approximation of the smallest of all the roots of CE (1) for which Re(a) < o. Also, if p = 1, ao = 0 becomes a repeated root of (1). That ao is a repeated root of (1 can be seen analytically too. This implies that ao is a repeated root even if A(b(a» is not a rational function.

Acknowledgements

The preliminary work for the paper was done by Manju Agarwal who acknowledges with thanks the financial support provided by the Department of Mathematics and Computer Science, Royal Military College of Canada, Kingston and the Department of Industrial Engineering, University of Toronto. The author is grateful to Mr. Haynes Lee, Research Assistant, R.M.C., for his help in the preparation of this paper. Thanks are also due to a referee whose suggestions led to some improvements of the paper. This research was supported (in part) by the research grant number CRAD FUHDH.

REFERENCES

[1] B.F. Botta, C.M. Harris and W.G. Marchal, 1987. Characterizations of Generalized Hyperexponential Distribution Functions. Communications in Statistics-Stochastic Models 3, 115-148.

[2) M.L. Chaudhry, 1993. QROOT Software Package. A & A Publications, 395 Carrie Crescent, Kingston, Ontario, Canada, K7M 5X7.

[3) M.L. Chaudhry and U.C. Gupta, 1992. Exact Computational Analysis of Waiting-Time Distributions of Single-Server Bulk-Arrival Queues: MX/G/l. European Journal of Operations Research 63,445-462.

[4] M.L. Chaudhry, C.M. Harris and W.G. Marchal, 1990. Robustness of Rootfinding in Single-Server Queueing Models. ORSA Journal on Computing 2, 273-286.

88 CHAPTER 4

[5] M.L. Chaudhry and J .G.C. Templeton, 1983. A Firlt Cour.e on Bulle Queue., John Wiley, New York.

[6] D.R. Cox, 1955. A Use of Complex Probabilities in the Theory of Stochastic Processes. Proceeding. of the Cambridge Philo.ophical Society 51, 313-319.

[7] J.W. Cohen, 1982. The Single SenJer Queue, North Holland, Amsterdam.

[8] L. Kleinrock, 1976. Queueing Syltem&: Computer Application. Vol. II, John Wiley, New York.

[9] R. Marie, 1980. Calculating Equilibrium Probabilities for ~( .. ) /C,./l/N Queues. A CM SIGMETRICS, Conference on Measurement and Modelling of Computer Systems, 117-125.

[10] R. Suri and S. de Treville. 1991. Full Speed Ahead. OR/MS Today, June 1991, 34-42.

[11] H.C. Tijms, 1986. Stocha&tic Modelling and Analyli.: A Computational Approach, John Wiley, New York.

[12] J.C.W. Van Ommeren, 1990. Simple Approximations for the Batch-Arrival M X /G/l Queue. Openltion. Re.earch 38,679-685.

5 A NEARLY ASYNCHRONOUS PARALLEL

LP-BASED ALGORITHM FOR THE

CONVEX HULL PROBLEM

IN MULTIDIMENSIONAL SPACE J.B. Dubi

R. V. Belgason and N. Venugopal

ABSTRACT

Southern Methodilt Univer,it1l Dallal, Tezal 75175

The convex hull of a set A of n points in R'" generates a polytope 'P. The frame r of A is the set of extreme points of 'P. The frame problem, the identification of r given A, is central to problems in operations research and computer science. In OR it occurs in specialised areas of optimisation theory: stochastic programming and redundancy in linear programming. In CS it is an important problem in computational geometry. The problem also appears in economics and statistics. The frame problem is computationally intensive and this limits its applications. The standard LP-based approaches for identifying r solve several linear programs with m rows and n - 1 columns, one for each element of A. In this paper we report on a parallel procedure for identifying r using a new LP-based approach. The new approach also uses linear programs with m rows, but the linear programs which must be solved begin with a small number of columns and grow in sise, never exceeding the number of points of r. On a small set of test problems, the serial time to identify r varied from one-half to two-thirds that of an enhanced implementation of the standard approach. We discuss parallelisation of this algorithm for the MIMD environment. On a suite of test problems, our parallel MIMD nearly asynchronous implementation on the Sequent Symmetry S81 achieved a speedup factor of 7 to 13 using up to 16 processors. These developments will permit the solution of problems previously considered too large.

89

90 CHAPTER 5

1 INTRODUCTION

A given collection of n points A = {al , ... , an} in !Rm defines or generates a polytope P of dimension at most m, which is the set of all convex combinations of points of A, also known as the convez hull of A, denoted by conA. The extreme points of P, a subset of A which we call the frame of A and denote by :F, provides a minimal description of the polytope. We call the identification of :F given A the frame problem. The frame problem appears in equivalent forms in several applications. In operations research the problem appears directly in two important areas of optimization: redundancy in linear programming and stochastic programming. The frame problem is also involved in the econometric methodology for measuring the comparative efficiency among many economic firms known as "data envelopment analysis" (DEA). In computer science the frame problem plays a role in one of the classical problems in computational geometry, that of finding the hyperplanes which define the facets of the convex hull of a finite set of points. Finally, the frame problem appears in statistics in the evaluation of Gastwirth estimators. The role of the frame problem in these applications is presented in more detail in Dulcl. and Helgason[3].

2 PREVIOUS LP-BASED APPROACHES.

Perhaps the first work to address directly the frame problem in its general form was presented in 1967 by Wets and Witzgall[8] in the context of the equivalent problem of identifying the generating elements of a convex polyhedral cone. The approach taken by Wets and Witzgall to find the "frame" of the cone is essentially based on simplex method iterations. A more formal algorithm presented in Wallace and Wets[7] is also based on the solution of linear programs.

A more recent work by Rosen, Xue, and Phillips[5] also proposes an algorithm for identifying the extreme points of the convex hull based entirely on linear programs and, in addition, reports numerical results using a parallelization scheme, apparently the first attempt at implementing an LP-based approach to the frame problem in parallel.

Most previous LP-based approaches to the frame problem have essentially relied on the following linear program to determine if element ale i= 0 of the set A = {at, ... ,an} is an element of:F:

Parallel Algorithm for the Convez Hull Problem 91

ft ft

minzl = L'\j, ;=1 j*~

s.t. Lai'\j = ale; ,\j ~ 0; j = 1, ... , n (LP1) ;=1 #~

The following result (see [3]) relates the solution of LP1 to the determination of the status of ale =f:. O.

Result 1. For LP1 fetuible, the point a" =f:. 0 is an element of the frame :F if and only if the optimal objective function value of LP1, zi, is greater than 1.

The linear program formulation LP1 is a generic form which can be used to resolve whether or not the point a" E A belongs to:F. Note that it is possible to identify conclusively the status of all the points in the set A by solving this linear program n times over all right-hand side vectors aI, ... , aft.

Linear programming formulations for solving the frame problem previously proposed are equivalent to LP1 and the approach utilizing repeated solutions of linear programs such as the one here is standard. For example, in Rosen, Xue, and Phillips[&] the approach is to add the constraint E~=l ,\j = 1 to formu-

i*~ lation LP1, discard the objective function and then apply Phase 1 to verify if the set of m + 1 equalities has a nonnegative solution. The approach presented in Wallace and Wets[7] is also based on verifying feasibility, but since their formulation is for finding the extreme rays of the positive cone generated by the elements of A, the constraint L:'=1 ,\J' = 1 is not needed. The linear

#~

programming formulation applied in DEA introduces extra variables, one to measure "efficiency" and the rest used as slacks.

3 THEORETICAL ASPECTS OF LP-BASED

APPROACHES.

We now summarize some recent results which apply to previous LP-based approaches to the frame problem and to the newer approach we proposed in [3].

We assume that the number of points n is greater than the dimension m with at least one subset of m vectors being linearly independent, and that the convex hull 'P contains the origin in its interior (if not, the points can be translated to

92 CHAPTER 5

satisfy this condition). These assumptions are necessary to establish that the polytope 'P has dimension m.

Consider the following linear program:

.. minzz = ~~j,

j=1

.. s.t. ~aj~j=bj

j=1

~j ~ OJ j = 1, ... ,n (LP2)

where b is an arbitrary nonzero vector in !Rm and not necessarily one of the elements of A. Notice also that the index j is defined over all its possible values without excluding any as in the original expression for LPI. Finally, observe that LP2 is always feasible and its solution bounded since, by assumption, conA has full dimension and contains the origin in its interior. Denote by z; the optimal objective function value of LP2.

The following two results are proved in [3]:

Result 2. If z; u the optimal ,olution to LP2 for lome b :f:: 0 then

(1) z; < 1 if and only if b is interior to 'P.

(2) z; = 1 if and only if b is on the boundary of 'P.

(3) z; > 1 if and only if b is exterior to 'P.

Result 3. The optimal buu to LP2, if unique, U compoled of point. ail, ... , ai ... which are element. of the frame :F.

These results can be directly applied to the formulation LPI with two important implications. The first is that any time the original linear program LPI is solved and a unique optimal basis is obtained, the m points of A which are in the basis are revealed as elements of the frame. The second is that every time a point is discovered not to be an element of the frame it can be removed from the linear program formulation. These implications can be used to enhance the performance of the procedure for identifying the frame of A by reducing the total number of linear programs that need to be solved as well as by reducing their size by removing columns from the matrix of coefficients.

Using the formulation LPI and the results accompanying it to enhance it means that it is required that both the objective function value and the basic feasible solution be known to determine whether a point belongs to the frame. The fact that, eventually, an accurate optimal basic feasible solution to LPI is required

Parallel Algorithm for the Convex Hull Problem 93

is one reason why interior point methods are not used. Another reason is that the input-output matrix in LPI is dense with many more columns than rows. This is a particularly unattractive structure for interior point methods since these are very sensitive to the number of columns.

4 A GENERAL APPROACH

We now assume that some of the elements of the frame are known. (Initial elements could easily be identified by applying simple preprocessing schemes to the set A as in [4].) With such knowledge, the set A can be partitioned into three subsets, AE, AU, and AN where:

AE the set of all currently known elements of :F,

AN = the set of all currently known nonextreme points of 1',

AU = the set of all other points of A, whose status is yet to be assigned.

Based on this partitioning we define

pE == the convex hull ofthe points in AE, itself a polytope such that pE C 1'.

Consider the following procedure:

Begin Procedure ProcessPoint

Step 1. Select a point ale E AU.

Step 2. Determine if ale belongs to pE (the "current" convex hull). If so, remove (the interior point) ale from AU, add ale to AN, and exit the procedure.

Step 3. Generate a direction v E Rm that is normal to a hyperplane separating ale and pE and points away from pE.

Step 4. Calculate the maximum of the inner products (v, al')j Val' E AU. Let Amax be the set of all points of AU which attain this maximum. Identify one or more extreme points of the set Am&x itself. Remove all such identified points from AU and add them to AE.

94 CHAPTER 5

Step 5. Calculate the minimum of the inner products (v, ap)j Val' E AU U AE. Let Amin be the set of all points of AU U AE which attain this minimum. Identify one or more extreme points of the set Amin itself. Remove all such identified points which are also from AU and add them to AE.

End Procedure

The following results (see [3]) show how Steps 4 and 5 of procedure ProcessPoint may be implemented:

Result 4. If the mazimum in Step 4 of Procedure ProcessPoint occurs at a unique point, a new element of:F is generated.

Result 5. If the minimum in Step 5 0/ Procedure ProcessPoint occurs at a unique point and that point is from AU, a new element of:F is generated.

The possibility of ties among eligible points in the maximum or minimum value of the inner products in Steps 4 and 5 presents a complication. If there is a tie among several points from the reference hyperplane, it may not be immediately possible to identify which of the points participating in the tie are extreme points of P. The following result (see [4]) shows how this can be resolved in essentially a recursive manner.

Result 6. Suppose that ezactly T points, a l , .•. , aT, participate in a tie for the farthest distance (on the same side) from a reference hyperplane H in Step 4 or 5 of procedure ProcessPoint. Then ai is an eztreme point of P if and only if oj is an eztreme point of U = con{al , .•• , aT}.

This result indicates that the resolution of ties reduces to a smaller version of our original frame problem. The resolution of ties is an implementation problem. Note that if only two points are involved in a tie they are both necessarily extreme points of P.

A general algorithm for identifying :F is now apparent. The procedure ProcessPoint is simply repeated until AU becomes empty. This algorithm must solve the frame problem since at least one point leaves AU in either Step 2 or


Step 4. Such an algorithm with a declared objective to not use linear programs was implemented and computational results were reported in [4].

5 THE NEW LP-BASED APPROACH

We recently (see [3]) proposed a new procedure for solving the frame problem based on the solutions to linear programs for the case of a polytope of full dimension. On a small set of test problems, the (serial) time to identify :F varied from one-half to two-thirds that of an enhanced implementation of the standard approach.

Our new LP-based approach to the frame problem relies on the following linear program:

fI.

minZ3 = LA; s.t. ;=1

fa

LaiA; = ale; A; ~ 0; j = 1, ... , n (LP3) ;=1

where ai, ... , afl. are the elements of AE, n ~ m + 1, pE has dimension m and contains the origin, and ale E AU.

The following result (see [3)) shows how Step 3 of procedure ProcessPoint may be implemented following the use of LP3 for Step 2:

Result 7. An optimal, dual-feasible, basis for LP2 for an ezterior point a" defines a supporting hyperplane for pB that separates it from ale. Moreover, this hyperplane is given by H ( i· , 1) where i· is the corresponding optimal dual solution and 11". points away from pE .

We may now state the new LP-based procedure:

Belin Procedure LPFindFrame

Step o. If AU is empty, exit the procedure.

Step 1. Select a point a" E AU.

96 CHAPTER 5

Step 2. Determine if ale belongs to 'PE by solving LP3. If so, remove ale from AU, add ale to AN, and return to Step O.

Step 3. Generate the direction v E Rm normal to a hyperplane separating ale and 'PE, by setting v = 7r., the optimal dual solution to LP3.

Step 4. Calculate the maximum of the inner products (v, ap}j Val' E AU. Let Amax be the set of all points of AU which attain the maximum. Identify one or more extreme points of the set Am&X itself. Remove all such identified points from AU and add them to AE.

Step 5. Calculate the minimum ofthe inner products (v, ap}j Val' E AU U AE. Let Amin be the set of all points of AU U AE which attain the minimum. Identify one or more extreme points of the set Amin itself. Remove all such identified points which are also from AU and add them to AE.

Step 6. If ale E AU, return to Step 2. Otherwise return to Step O.

End Procedure

We propose that the procedure be initialized in the following manner. Find the vector in A with greatest norm, which is necessarily an element of :F (see Result 2 in [4]). Take the negative of this "max-norm" vector and use it as the right-hand side element of the linear program LP2. The resultant basic feasible solution, if unique, is composed of m more elements of the frame from Result 3. (If not unique select another right-hand side which is the negative of some other element of the frame until one is found which generates a unique optimum.) These m vectors contain the right-hand side in their positive conej therefore, applying Farkas' Lemma we may conclude that the m vectors in conjunction with the negative of the right-hand side vector constitute an affinely independent set of m + 1 vectors that positively span the space. Moreover, the convex hull of these vectors necessarily contain the origin (apply Stiemke's Theorem of Alternative). Note that this initialization scheme essentially identifies m+ 1 points from the frame of A, the convex hull of which is an m-dimensional simplex which contains the origin in its interior.

Notice that procedure LPFindFrame based on the linear program formulation LP3 is fundamentally different from the standard LP-based approach. Here we "build-up" the polytope. The procedure using LP3 generates linear programs that grow by one column every time a new vertex of'P is identified. In the case of LPl the size of the linear program starts at m by n-l and, if enhancements

Parallel Algorithm for the Convez Hull Problem 97

are implemented, the number of columns may be reduced by removing points that are discovered not to belong to the frame. Since the columns used in LP3 are always elements of the frame, the size of the final linear program is determined by the total number of extreme points of 'P and the size of each intermediate linear program is the total number of extreme points of pE. On the other hand, a difference which favors the approach based on LPI is the necessity of calculating and comparing inner product values in Steps 4 and 5. From our computational results in [3] we conclude that this difference is not enough to offset the advantages of the new procedure.

An important concern in the new method is the complication that arises from the presence of ties in Steps 4 or 5. Ties among three or more points are resolved by finding the frame of the points participating in the tie. However, finding just one element of this nested frame problem is sufficient to be able to proceed. A simple sorting as in "Preprocessor I" of [4] will yield such a point. The inclusion of a point in AE means that the current polytope changes its shape.

6 PARALLEL FORMULATION

We wish to consider the implementation of the new LP-based approach in a parallel MIMD environment in which several processors may work concurrently. Ideal algorithms for such an environment contain independent computational blocks, especially when such blocks are basically identical procedures operating on different data. Steps 2-6 of LPFindFrame are an instance of such computational blocks which can operate concurrently on different choices of points (lie from AU in Step 1.

A self-scheduling parallel algorithm can select a point for work by a processor needing work with only a small loss in asynchronicity. A single point-length array is sufficient to contain the current status of points with respect to membership in AN, AU, or AE. The updating of this array should be done under a lock condition, especially since this affects the point selection in the selfscheduling.

Another important issue is that the status of AU needs to be preserved at the time a point is assigned to a processor. If AU were to be altered as each processor identifies extreme points, the computation in Step 4 could yield erroneous

98 CHAPTER 5

results. A single point-length array local to each processor is sufficient for this purpose.

An obvious implementation strategy is to employ the same LP solver on each processor. Furthermore, the LP solver should employ the simplex method so that the dual variables used to define the normal to the separating hyperplane are readily available. If the LP solver is constructed carefully it will be possible to use only one or a few global copies of the original point data, instead of a separate copy for each processor. (Typically cache contention problems may arise when using several processors on some systems, and distributing a few copies uniformly among the processors may remedy this.) The status of variables will have to be kept in local arrays, corresponding to the status of AU preserved for each processor computation.

The posting of points in AE (identified as extreme) must be handled carefully. If processors were allowed to post extreme points and make the corresponding changes for all processors, difficulties could arise in the identification of optimality conditions for LP solver completion. We propose the following scheme for such posting. A global queue is kept and points newly identified as belonging to AE are enqueued with a global top of queue pointer updated under a lock, leading to a further small loss of asynchronicity. Each processor keeps its own local top of stack pointer. When a point is assigned to a processor, that processor updates LP solver information for all points between the local top of queue and the global top of queue before resetting the local top of queue to the global top of queue. All of the above also takes place under a lock.

7 TEST PROBLEM GENERATION

To obtain a suite of test problems we developed a problem generator. The input data is the desired number ofrows R (dimension) and columns C (points) and an upper bound P on the percentage of extreme points to be generated. A possible scheme which generates exactly the number of extreme points given by the upper bound is to generate CP /100 points uniformly distributed on the unit sphere and generate random convex combinations of those points for the remaining C-CP /100 nonextreme points. However we felt that this procedure generates problems which are too easy to solve and modified the above by stretching each of the points originally on the sphere by a random multiplier between 1 and 100 before generating the convex combinations of the stretched points. Finally, the barycenter is subtracted from each point and the resulting


Table 1 Actual extreme point percentage for the 27 problem test suite

ROWS COLUMNS

1 125 250 500 16% 12.8% 9%

5 24.8% 18.4% 14.8% 30.4% 26.4% 19.8% 18.4% 16.8% 14.2%

10 28.8% 29.2% 24.6% 44% 36.8% 34% 20% 18.8% 16.8%

20 36.8% 37.2% 30.4% 56% 49.2% 43.6%

points are ordered by distance from the origin. Characteristics of a test suite of 27 problems are given in Table 1 below.


We implemented the parallel algorithm in FORTRAN on the Sequent Symmetry S81 with 20 processors (each equipped with a Weitek 1167 floating-point accelerator), a shared memory MIMD processing environment. The LP solver employed was the XMP linear programming code written in FORTRAN by Roy Marsten. Only one copy of the global point data was used.

Based on our previous testing experience, the basic algorithm was modified to provide the most efficient variant. Step 5 of LPFindFrame was omitted as we had found it to identify few additional extreme points after the first few LP problems had been processed. Also, preprocessing based on simple sorting was incorporated after the initializing LP was solved to reduce the number of LPs needed in the parallel portion.

One run was attempted for each of the 27 test suite problems using 1 to 16 processors. The post I/O times using only one processor is reported in Table 2. These times were used as the base case in computing speedups for all runs with multiple processors. All times reported are wall clock times in seconds on the

100 CHAPTER 5

Table 2 Polt I/O times (wall clock) in seconds Cor the 21 problem test BUite

ROWS COLUMNS ! 125 250 500

8.5 29.34 89.11 5 10.22 33.46 111.57

11.61 42.18 131.95 25.21 90.43 272.14

10 33.69 136.42 436.31 42.78 146.11 497.35 63.23 377.52 1313.07

20 148.33 643.39 2011.55 193.98 706.91 2369.53

Sequent Symmetry S81. These runs were not made in a dedicated environment, and are thus subject to the varying influence of the rest of the system load.

The speedups appear to be good leveling off at roughly 10 to 12 processors while achieving speedups of 7 to 13. The figure of speed-up plots at the end of the paper shows speedup factors for each problem. It was not possible to complete runs for some of the larger problems with larger numbers of processors (typically 11 and above) since they required more memory than was available on the Sequent.


The primary motivation for this research has been to provide a resource for large scale applications which require finding the frame of the convex hull. Applications equivalent to the frame problem in data envelopment analysis routinely exceed n = 8,000 to n = 10,000 points over fewer than m = 20 dimensions. In these applications, it takes several hours to identify the frame applying the conventional methods of solving n linear programs with m rows and n - 1 columns. Similar situations exist in stochastic programming. In general, the state-of-the-art in techniques for finding the frame is such that the methodology limits the size of the applications can be addressed. As Wallace and Wets[T] state: "there is a lot to be gained by a more efficient implementation (of an algorithm to find the frame of the convex hull, than one based on solving L.P.'s)". Our investigations on parallelizing our new procedure based on solving


linear programs which begin small and increase progressively in size have shown that we can realistically expect a reduction equivalent to one order of magnitude in the solution times. These developments will permit the solution of problems previously considered too large.

REFERENCES

[1] Bertsekas, D.P., and J.N. Tsitsiklis, 1989, Parallel and Diltributed Com.putation, Prentice-Hall, Inc., Englewood Cliffs, NJ.

[2] Dula, J.H., 1993, "Designing a majorization scheme for the recourse function of two-stage stochastic linear programs," Computatio1l4l Optimization and Application" Vol. 1, No.4.

[3] Dula, J.H. and R.V. Helgason, 1993, "A new procedure for identifying the frame of the convex hull of a finite collection of points in multidimensional space," Tech. Report 93-1, Southern Methodist University, Dallas, Texas. Submitted to European Journal of Operational Re,earch.

[4] Dula., J.H., R.V. Helgason, and B.L. Hickma.n, 1992, "Preprocessing schemes and a solution method for the convex hull problem in multidimensional space," Computer Science and Operation' Re,earch: New Developmentl in their Interface" O. Balci, ed., Pergamon Press, U.K.

[5] Rosen, J.B., G.L. Xue, and A.T. Phillips, 1992, "Efficient computation of extreme points of convex hulls in !RIi," in P.M. Pardalos ed., Advance, in Optimization and Parallel Computing, North Holland, pp. 267-292.

[6] Wallace, S.W. and R.J-B. Wets 1989, "Preprocessing in stochastic programming: the case of uncapacitated networks," ORSA Journal on Com.puting, Vol. I, No.4, pp. 252-270.

[7] Wallace, S.W. and R.J-B. Wets, 1992, "Preprocessing in stochastic programming: the case of linear programs," ORSA Journal on Computing, Vol. 4, pp. 45-59.

[8] Wets, R.J-B. and C. Witzgall, 1967, "Algorithms for frames and lineality spaces of cones," Journal of Re,earch of the National Bureau of Standard.t-B Mathematics and Mathematical Physics, Vol. 71B, No.1 pp 1-7.

102

[IJ

Eo-..J ~ [IJ

.w C'lI: ~ ;;;J Q .w .w ~ [IJ

• " " "

" " "

i ~ :: ~

, , ,

.. I :

! ~ ~ ~ ~. ~

, , , ,

d._

'i \,

" ',,, '~ ..

,

I : ; i u • ~ . ~ ii - ",

, ,

,

-J , . i

'-

, s !

< • -~

~

c, \~ ..

"

~'. . :' ~

\ , '\

" " \\ '. ,

dnr-ds

'\', '~ ..

\'\,

" .;~ \ ,

~ , , ~

.;-1: ; ·~U ~,; ~ ~

d_,

I

i . ,,-,

\ \

\

\ " "

. i

1 -, ~ i

CHAPTER 5

" \': \ ,

... ~\ j i " GO] 1 ", ",

" '0

~ \ .j , \ \ \

j J ~ ~ \. , ! ; 2 ~

::: s . . d~p~s'"

~ ,',

" , ':-j

,', '\ .,

; ... ,. \ I -

'. i \ . l

\

I : \ , n ~ :il ~ •

.;;..."

6 A DYNAMICALLY GENERATED RAPID

RESPONSE CAPACITY PLANNING MODEL FOR SEMICONDUCTOR

FABRICATION FACILITIES

ABSTRACT

Kenneth Fordyce and Gerald Sullivan*

International Businel8 Machines, Inc. Mail Station 922,

Kingston, NY 12401

*International BTJ.8inel8 Machines, Inc. IBM Consulting Management Technologies Group

Burlington, VT 05401

Some key decisions faced periodically by a semiconductor facility management team are: (1) given a forecast demand how much capital and manpower should I assign or acquire for each tool center? (2) given a fixed amount of capital and manpower what output levels should I commit to? (3) what if yield improves or degrades? (4) what if I increase or decrease starts? (5) do I have enough capacity to meet an emerging opportunity and how long will it take me to increase lineouts? (&) where are potential bottlenecks? and (7) what if I add or subtract capacity? All of these questions fall into the general area of capacity planning: Given a specified set of capital, manpower, and output requirements: (1) do I have enough capacity to meet requirement output? (2) what "contingency" capacity is available? and (3) what is the transition profile I can expect to see when I change from one state to another?

As the pace of change in chip manufacturing increases, the management team needs To assist with capacity planning questions the management team needs a capacity planning decision support system which can (a) dynamically generate the appropriate model from text or worksheet files describing the manufacturing :flow, tool profile, and output goals and a dynamic link to the floor control system to obtain work in progress and tool status, and (b) rapidly solve the model. This paper provides an overview of a capacity planning decision support system called ROSE and detailed description of the parallel goal programming model used by ROSE. ROSE is part of suit of tools to assist production planning and scheduling. Special emphasis is

103

104 CHAPTER 6

placed on the integration of standard ideas from graph theory, computer science, and operations research to develop a tool to meet the needs of the decision maker. It is our observation this integration is critical for the successful deployment of Operations Research.

1 INTRODUCTION

Planning and scheduling in manufacturing is a very broad topic with an extensive academic and application base that cuts across various disciplines such as decision support, computer integrated CIM, AI, statistics, math programming, queueing, and simulation. Typically production planning and scheduling is divided into three or four decision tiers that cover decisions from determining where the facility will be five years in the future to assigning a lot to a tool. We use four decision tiers: strategic, operational, tactical, and dispatch (Appendix A). As with any classification scheme the edges are always grey, not black and white. A bulk of the work in planning / scheduling for the semiconductor manufacturing has been focused on tier 3-tactical and some work in dispatch (See Lee et al (1991) and Fowler 8£ Robinson (1994a) for a review and Leachman (1993) for a tactical planning implementation). There is less in the literature about tier 2 (operational), and in particular the topic capacity planning. Work in this area is generally falls into spreadsheets (financial models or deterministic simulations) and monte carlo simulation (for example Dayhoff 8£ Atherton 1987 and Miller 1990). Both are fine tools, but have inherent limitations. Leachman and Carmon (1992) have proposed a clever capacity planning model using Linear Programming that involves a set of approximate constraints and uses a proportionality assumption for process times.

As the cost of semiconductor tooling increases and the time between ordering and receiving a tool increases, capacity planning becomes increasing important. Since the capacity planning questions are "ill-structured," numerous and rapid "what iffing" is critical for the management team to develop a deep understanding of the problem and then come up with a solid "game plan." Therefore the tools that support this decision must contain traits that are often in conflict: sophistication, easy to use, and rapid response.

This paper describes a goal programming capacity planning model which uses software and network technology to dynamically generate the model, and techniques from directed graphs are then applied to partition a large problem into

Capacity Planning for Semiconductor Fabrication 105

set of smaller independent problems. Such a partitioning positions the "solver" to take maximum advantage of a large grain parallel machine.

The model is part of a capacity planning tool called ROSE (see Appendix B). which integrates various decision technologies from operations research, artificial intelligence, statistics, and decision support systems to support both steady and transient state analysis of capacity. ROSE is part ofthe ongoing work ofthe IBM Burlington Industrial Engineering and the IBM Consulting Management Technologies team to improve "planning and scheduling" in Semiconductor facilities by building tools and systems to support all four decision tiers (Fordyce et al. 1992a and 1992b).

2 A BRIEF REVIEW OF PRODUCING

MICRO-ELECTRONIC CHIPS

The process begins with 25 pure, thin, and circular (8 inches in diameter) slices or wafers of silicon in a group called a lot. Circuits are built as follows.

Through an oxidation process a protective covering of oxide is grown on the wafer. Next, it is coated with a light-sensitive material called a photoresist. A mask is precisely registered over the wafer and ultraviolet light is projected through the mask onto the wafer, causing the photoresist to harden under clear areas of the mask. The image is developed by washing away the unexposed photoresist. Then the wafer is put in an acid bath. The acid passes through the holes in the photoresist and etches similar holes in the oxide layer. Then the remaining photoresist is stripped off.

Controlled amounts of elements such as phosphorus or boron are introduced into the holes in the oxide through diffusion or ion implantation. In diffusion, wafers are placed in high temperature furnaces with the elements to be diffused through the holes in the oxide down into the silicon beneath. The temperature of the furnace controls the depth and concentration of the diffused materials. In ion implantation, dopant atoms are accelerated to a high energy. These atoms strike the wafer and are embedded at various depths, depending on their mass and energy. These "extra atoms" typically have one more or one less electron in their "outer shell" than silicon, they "squeeze" in by giving up an electron (n-type) or taking on an electron (p-type). The processes of oxidation, photolithography, and hot process (diffusion or ion implantation) are repeated many times until thousands of circuits are built into each wafer.

106 CHAPTER 6

In the metallization phase, the "wires" connecting the transistors are put in. The process that applies this wiring is evaporation. Pellets of aluminum and copper are placed in a chamber with a dome holding the wafers. The air is pumped out of the chamber, and the pellets are heated by an electron beam which causes them to evaporate. The evaporated metal clings to the entire surface of the each wafer. Selective acid etch removes the unwanted metal, leaving only micro miniature wires to connect components. Metals are done once or twice.

The manufacturing flow is best represented by a folded serial or re-entrant flow line (see Figure 1). From the wafer's perspective, it is produced by following a specific sequence of unique operations without any option for variation ("serial line"), except for rework loops. From the tooling center perspective, each wafer makes approximately numerous passes, or iterations, through one of its tools. Each set of tools handles all the activity across all the iterations for that step. Each tool can handle a variety of tasks, and is reconfigured (set up) to handle different wafer types at the specific iterations. Major tool centers are often viewed as a job shop with jobs arriving in a random manner.

3 THE REQUffiED FACT BASES

To illustrate we use an example production environment that manufactures two products (LION and TIGER), where the number of operations for each product is the same.

We start with two (one for each product) process specification or operation tables (Tables 1A and lB). They contain information about each operation: the product it belongs to, its sequence position, the tool center it uses, the average raw process time per batch, average batch size, average minutes per wafer, yield, yield adjusted average work load, and minimum required capacity. The tables are representative of the text or worksheet files used by manufacturing people to describe the manufacturing flow. ROSE dynamically generates the equations or tableau needed by the goal programming solver from these tables.

o P I D is the operation index position. The last operation in each product (operation 6) is the inventory for completed wafers or lineouts. Every lot moves sequentially through these six operations.


entry of wafer after being sliced and cleaned

1 IMPRINTING ELECTRONIC CIRCUITS

LAYER ..... LAYER ..... LAYER ..... LAYER 1 f 2 f ... f 10

--- f --- f --- f ---1 1 ! 1

I oper 11 PXIDATloNTOOL~ENTE1 loper 281

1 1 1 1

\ oper 21 PHO~RAPHY ~OOL C4NTER loper 29\

1 1 1 1

\ oper 31 DI~FUSION I ION TO€L CENtER loper 301

1 1 ! 1

start f start f start f send next f next f next f to layer ..... layer ..... layer ..... wiring

1 I entry into wiring and then into module I

Figure 1 Ma.nufacturing Flow .... a Folded Serial Line

108 CHAPTER 6

Table lA: PRODUCT LION (1)

OPID TCRPT BS MPW YLD AWL MRC

1 707 30 25 1.2 1.00 226 271 2 724 100 50 2.0 0.98 226 678 3 706 450 100 4.5 0.98 221 995 4 732 35 25 2.4 0.97 217 521 5 713 125 25 5.0 0.95 211 1055 6 INVEN 0 NA NA NA NA NA

production goal is 200 wafers per day

Table 1 B: PRODUCT TIGER (2)

OPID TCRPT BS MPW YLD AWL MRC

1 708 35 25 1.4 0.98 117 164 2 725 200 100 2.0 1.00 114 228 3 733 20 25 0.8 1.00 114 91 4 706 330 100 3.3 0.92 114 376 5 713 105 25 4.2 0.95 105 441 6 INVEN 0 NA NA NA NA NA

production goal is 100 wafers per day


TC is the tool center that supports the activities associated with an operation. Tool centers are made up of tools or machines. This information is kept in the tool center fact base (Table 2A). Note the tool centers overlap in a nonexclusive manner. For example, tool centers 707 and 708 have tools common to both (A225 and A226), and tools unique to each (A223 and A224 are only in 707, A227 and A229 are only in 708). This is often the result of tools of different "vintage" or a partial dedication strategy (Fowler and Robinson 1994b).

Tool centers serve as a convenient intermediaries. The key piece of information for our model is a list of tools which can handle a specific operation, provided in Table 2B. If two tools have identical properties then we can combine then into one tool and double the capacity which reduces the size of the LP problem.

An alternative method of specifying the link between a tool and an operation is the use of a Boolean matrix. There is one row for each operation and one column for each tool. The OT Lij,1c (operation-tool-link, i for product, j for operation, Ie for tool) cell gets a 1 if the tool is permitted to service the operation, else a O. For example OTL12,3 = 0, since tool A225 can not service operation 2 for product 1. OTL22,9 = 1, since tool B1l3 can service operation 2 for product 2. The OT L matrix is shown in Table 2C.

RPT is the average time it takes to do one batch at an operation. BS is the average batch size (in wafers, remember 25 wafers to a lot, and a lot can not be divided up) for this operation. This information is obtained from manufacturing engineering and the manufacturing data bases. M PW is the minutes per wafer required to process a wafer at this operation. This is a calculate as RPT divided by BS. WPM is wafers per minute and is the reciprocal of MPW.

Y LD (operation yield) is the proportion of the wafers that successfully complete the operation after starting it and move on to the next operation. For example, if Y LD is 0.95 and we run a batch of 100 wafers through the operation, then on average 95 of them will successfully complete the operation.

AWL is the yield adjusted average work load. This is the number of wafers an operation must process on average per day to enable the line to meet its daily target outs goal. For the LION product the lineouts goal is 200 good wafers per day out of operation 5 and into inventory (operation 6). Since the yield at operation 5 is 0.95, then on average operation 5 must process 210.53 (= 200 / 0.95) wafers to obtain 200 good wafers to pass to inventory. Since operation 5 requires 210.53 wafers per day to process and the yield at operation 4 is 0.97, then on average operation 4 must process 217.04 (= 210.53 /0.98 = 200 / (0.97

110 CHAPTER 6

Table 2A: Tool Center Fact Base

TOOL CENTER MEMBER MACHINE

706 Y323 Y324 Y326 707 A223 A224 A225 A226 708 A225 A226 A227 A229 713 W101 W102 W103 W104 W105 W106 724 BIll B112 B113 725 B 111 B 112 B 113 B 114 B 115 732 C821 C822 C833 733 C824 C825 C836

Table 2B: Tool Center Fact Base OPERATION TOOL HANDLE OPERATION

1,1 A223 A224 A225 A226 1,2 BIll B112 BIB 1,3 Y323 Y324 Y326 1,4 C821 C822 C833 1,5 WI0l W102 W103 WI04 W105 W106

2,1 A225 A226 A227 A229 2,2 BIll Bl12 BI13 B114 B115 2,3 C824 C825 C836 2,4 Y323 Y324 Y326 2,5 WI01 WI02 W103 WI04 WI05 WI06


Table 2C: Tool - Operation Link Boolean Matrix (divided into two parts for space reasons)

TOOL OP I 1 2 3 4 5 6 7 8 9 10 11 12 13 ID! A223 A224 A225 A226 A227 A229 B111 Bl12 Bl13 Bl14 Bl15 C821 C822 ---------------------------------------------------------------------11 ! 1 1 1 1 0 0 0 0 0 0 0 0 0 12 ! 0 0 0 0 0 0 1 1 1 0 0 0 0 13 I 0 0 0 0 0 0 0 0 0 0 0 0 0 14 I 0 0 0 0 0 0 0 0 0 0 0 1 1 15 I 0 0 0 0 0 0 0 0 0 0 0 0 0 21 I 0 0 1 1 1 1 0 0 0 0 0 0 0 22 I 0 0 0 0 0 0 1 1 1 1 1 0 0 23 I 0 0 0 0 0 0 0 0 0 0 0 0 0 24 I 0 0 0 0 0 0 0 0 0 0 0 0 0 25 ! 0 0 0 0 0 0 0 0 0 0 0 0 0

TOOL OP ! 14 15 16 17 18 19 20 21 22 23 24 25 26 ID I C824 C825 C833 C836 W101 WI02 W103 W104 W10S W106 Y323 Y324 Y326 ---------------------------------------------------------------------11 I 0 0 0 0 0 0 0 0 0 0 0 0 0 12 I 0 0 0 0 0 0 0 0 0 0 0 0 0 13 I 0 0 0 0 0 0 0 0 0 0 1 1 1 14 I 0 0 1 0 0 0 0 0 0 0 0 0 0 15 I 0 0 0 0 1 1 1 1 1 1 0 0 0 21 I 0 0 0 0 0 0 0 0 0 0 0 0 0 22 I 0 0 0 0 0 0 0 0 0 0 0 0 0 23 I 1 1 0 1 0 0 0 0 0 0 0 0 0 24 I 0 0 0 0 0 0 0 0 0 0 1 1 1 25 ! 0 0 0 0 1 1 1 1 1 1 0 0 0

112 CHAPTER 6

x 0.95)) wafers per day to obtain 210.5 good wafers to pass to operation 5.

5

AWLi,i = 200/ II YLDi,i i = 1,2, i = 1, ... ,5. T=j

Minimum required capacity (MRO) converts the average daily work load in wafers to a minimum number of minutes of tooling required: MRC = AWL x M PW. If a manufacturing operation has just enough tooling capacity allocated to it to meet AWL, then it has no contingency to catch up if it falls behind. Typically, the capacity required is MRC plus some "safety" capacity.

Note in establishing the values for Table lA and 1B we have implicitly assumed limited variation in processing time or yield between tools handling the same product-operation cell or the differences are in the same proportion across all product-operation cells handled by the tool mix. This is similar to the assumption made by Leachman and Carmon (1992). An alternative formulation that does not require this assumption is provided later.

AM C is the available machine capacity per day in minutes. Typically, we have a table that tells us what portion of the day each machine is available on average. If machine k is available 90% of the day, then AMCk = 0.9 x 1440

Using the gateway facility in LMS (Fordyce et aI. 1992b) we can link to the manufacturing floor control system to obtain in real time the lots at each operation. An example is provided in Table 3. This information is used in the transient solver, but not the steady state goal programming model.

4 STEADY STATE CAPACITY ANALYSIS MODEL

Let us begin by defining the indices, variables, and equations. Indices are: i for product; j for operation; and k for tool. MRqj is the minimum required average capacity, in minutes, for each operation. C Roj is the required capacity to handle peaks in demand. For this example CRo; = 1.2 x MRCij. AMCk denotes the average minutes of capacity available for each tool. OTLijk is 1 if tool k can service operation j for product i, else O. Xi;Ic, the decision variable, is the minutes oftool k assigned to operation j for product i. Xi;" is subject to two restrictions: (1) 0 < Xijlc ~ AMC" for all i,j, k, and (2) Xijk = 0, when OTLijk = O. SIc is the unassigned or unused portion of machine k. It is the

Capacity Planning for Semiconductor Fabrication

Table 3: Lots in Process

LOTIO PRODID OPERATION

XXXOOI LION 1 XXXOO2 LION 1 XXXOO3 LION 1 XXJOO5 LION 4

000

XX L002 TIGER 5 XXLOO3 TIGER 5

slack variable in

2 5

LLOTLi;IcXi;" + SIc = AMCIc , Vlt. i=l;=l

113

LOT SIZE

25 25 25 22

24 21

Nonnegativity of SIc insures that no more than 100% of machine is assigned, and SIc is the unused capacity of the machine.

26

L OTLi;"Xi;1c + (UI; - 0:;) = CG!; Vij "=1

The above equation reflects our desire to meet capacity goal I (CG!;) byalloeating time from the machines to the operations in such a way that we at least meet these requirements. If U!; is positive we are "under" capacity. There is not enough capacity to meet the specified requirements for operation ij. If O!; is positive we are "over" capacity. There is at least enough capacity to meet requirements for operation ij. For our example we will have two capacity goals: MRCand CR.

How do we answer the capacity question? We solve the following linear goal programming problem. Our first pre-emptive priority is to insure we meet capacity goal 1 (MRC) by driving Ui} to o. OUI second pre-emptive priority

114 CHAPTER 6

is to insure we meet capacity goal 2 (CR) by driving u,~ to O. Our goal programming problem is formally specified as:

min

subject to

2 5

L L(P1 xU,}) + (P2 x U,~) ,=li=l

2 5

LLOTLX'i" +S" = AMCIc Vk ,=1 i=l 26

(1)

(2)

L OTLX'i/c + U,} - ali = CGti = MRC'i Vij (3) "=1 26

L OTLX'i" + U{; - O;i = CG~j = CR;j Vij (4) "=1

It is straight forward to extend the model to handle a variety of capacity goals. For example, we sometimes use four capacity tiers: (0.9 x MRCii), (1.0 x MRCii ), (1.2 x MRqi) and (1.5 x MRCij ). Details on the goal programming solver are provided in Fordyce, Hannan, and Sullivan 1991.

Similar to the Leachman and Carmon (1992) this model assumes limited variation in processing time or yield between tools handling the same productoperation cell. An alternative formulation which handles substantial differences differences in production time (M PW minutes per wafer) and/or yield (Y LD) between tools or machines handling the same product-operation cell is provided in Appendix E.

Remember that one criterion was speed. Each product has between 250 to 350 operations, therefore the goal programming quickly problem becomes very large! With just two products, each with 300 operations and 400 tools, we have 240,000 (600 x 400) decision variables (Xii")' 400 slack capacity variables (Sic), 2400 under and over (Ui}, ali' u,~, ali) variables, 400 capacity constraint equations, 600 capacity 1 goal equations, and 600 capacity 2 goal equations.

A key characteristic of the production process permits a large problem to be divided into a set of much smaller independent problems. Only a small number of operations "contestn over the use of a given machine or tool. For example, there are no tools that operation 1,1 and 1,2 are both capable of using. Therefore we divide oUI operations into a set of groups, such that there is no tool that can be used by more then one group. For oUI example, the groups are:


1. 11 and 21-tools used by this group are: A223 A224 A225 A226 A227 A229

2. 12 and 22-too1s used by this group are: BIll B112 B1l3 B114 B115

3. 13 and 24-tools used by this group are: Y323 Y324 Y326

4. 14-tools used by this group are: C821 C822 C833

5. 23-tools used by this group are: C824 C825 C836

6. 15 & 25-tools used by this group: WI01 WI02 WI03 WI04 WI05 WI06

In the example, instead of solving one large model, we can solve six smaller, independent models. The formulation of the first smaller model is provided in Appendix C. Therefore this formulation is well suited for a large grain parallel machine.

The last piece of the puzzle is automating the identification of the smaller independent models. This is accomplished using the information in Table 2C and techniques from directed graphs.

The key step is building a Boolean matrix which identifies all potential contention for tools between operations, called the reachability matrix. We can determine this using the information in Table 2C and the repetitive use of logical inner product (the logical operation "or (V)" substitutes for addition and the logical operation "and (A)" substitutes for multiplication). The program GENJND_MAT (listed in Appendix D) carries out the necessary steps and the results are shown in Table 4A. If a cell is I, then there is contention between the operation in the row and the column for at least one tool.

We then convert this information into the format presented in Table 4B. In this format, each operation has one row for a list of all operations with which it contents for resources is listed in the row. Computationally we can store this as a vector ofvedors: (11 21) (12 22) (1324) (14) (1525) (11 21) (1222) (23) (13 24) (15 25) The last step is to find the unique groups by analyzing the vector ofvectors. In our case there are 6: (11 21), (12 22), (1324), (14), (15 25), and (23). See Fordyce, Jantzen, Morreale, and Sullivan (1991) for more details.

The actual deployment of the parallel model on the SPl/SP2 can be handled three ways:

1. low level use of communication routines with QUAD NA

116 CHAPTER 6

Table 4A: Reachability or Contention

OPER OPER 11 12 13 14 15 21 22 23 24 25 ----- -----------------------------------------

11 1 0 0 0 0 1 0 0 0 0 12 0 1 0 0 0 0 1 0 0 0 13 0 0 1 0 0 0 0 0 1 0 14 0 0 0 1 0 0 0 0 0 0 15 0 0 0 0 1 0 0 0 0 1 21 1 0 0 0 0 1 0 0 0 0 22 0 1 0 0 0 0 1 0 0 0 23 0 0 0 0 0 0 0 1 0 0 24 0 0 1 0 0 0 0 0 1 0 25 0 0 0 0 1 0 0 0 0 1

Table 4B: Reachability or Contention

OPER OPERATIONS

11 11 21 12 12 22 13 13 24 14 14 15 15 25 21 11 21 22 12 22 23 23 24 13 24 25 15 25

Capacity Planning for Semiconductor Fabrication

2.000

1.800

1.600

~ 1.400 u

~ 1.200 u ~ 1.000

~ 800 5 ~ 600

400

200

COMPARISON REQUIRED AND AVAILABLE CAPACITY from capacity planning model

0-716 0- 717 0- 718 0- 719 OPERATIONS

Figure 2 Required Versus Allocated Capacity

2. high level communication with shared variables

3. defined function called parallel each

117

ROSE provides a variety of ways for the user to examine the results of the model. One common scenario is to run the model with increased starts and lineouts and then to look for potential bottlenecks by asking for a report on all operations with allocated capacity less than C Hoj, or a graphical representation comparing required capacity versus allocated capacity (Figure 2).

5 SUMMARY

As the cost of semiconductor tooling increases and the time between ordering and receiving a tool increases, capacity planning becomes increasing important. Since the capacity planning questions are "ill-structured," numerous and

118 CHAPTER 6

rapid ''what ifling" is critical for arriving at a solid solution. The model which supports this decision must deal accurately with complex trade-offs and have a short development and solution time. Additionally, it would be beneficial if the tool had the ability to search for alternatives and deal "automatically" with complex trade-offs, then to passively describe the outcome of a user proposed solution. By combining goal programming with some of the latest advances in computer technology, and by exploiting the "nature of the production flow," we developed a model which provides a good start on meeting these criteria. This model is part of a capacity planning decision support system called ROSE (Appendix B).

Additionally, much of the parallel work in decision technology is repeated runs of the same model with different parameters or random numbers. This is an example, albeit limited, of using the inherent problem structure and software engineering tools to dynamically decompose the problem into parallel components.

APPENDIX A

DECISION TIERS

Within the complex environment of semiconductor manufacturing, four related decision areas or tiers can be distinguished based on the time scale of the decision window. For related views or more detail see Kempf, Chee, and Scott 1988, Fordyce et al. 1992a, Leachman 1993, and Gray and Kabbani 1994.

The first decision tier, strategic scheduling, concerns a set of problems that are six months to seven years into the future. Here decisions are made about the impact of changes in the product line, changes in the types of equipment available, changes in the manufacturing processes, changes in the availability of workers, and so forth.

The second tier, operational scheduling, considers the next few months to two years. Here decisions are made concerning changes in demand for existing


products, the addition or deletion of products, capital purchases, manpower planning, changes in manufacturing processes, and so forth.

The third tier, tacticallcheduling, deals with problems the company faces in the next day to six months. Here decisions are made about scheduling starts into the manufacturing line, estimating delivery dates for orders, deciding on daily going rates, how much overtime is needed, last minute capital purchases, operator training, corrections in manufacturing processes, machine dedication, The impact of yield curves, phasing in the manufacture of new products.

The fourth tier, dupatch Ichedv.ling or ,hart interval ,chedv.ling (SIS), addresses the problems of the next hour to a few weeks. Dispatch scheduling decisions concern monitoring and controlling of the actual manufacturing flow or logistics. Here decisions are made concerning trade-offs between running test lots for an change in an existing product or a new product and running regular manufacturing lots, lot expiration, prioritizing late lots, positioning preventive maintenance downtime, adjusting run lengths of product with the same setup to reduce total setup time, production for down stream needs, simultaneous requests on the same piece of equipment, preferred machines for yield considerations, assigning personnel to machines, covering for absences, and reestablishing steady production flow after a machine has been down.

APPENDIX B

OVERVIEW OF ROSE

ROSE consists of the following components:

1. BUILD TIME: provides the vehicle to specify the model and execute it.

• HOOKS: enables the user to access the manufacturing specification files which govern the flow of the manufacturing line and the historical data base of manufacturing line activity; and then put the information into a format specified by the user. This component is used to

120 CHAPTER 6

obtain the manufacturing flow, current WIP (work in process) levels at each operation, estimates of RPT (raw process time), batch size, etc. needed to "feed" the solvers.

• SPECIFY: enables the user to describe or specify the model parameters in an easy to use text processing fashion. From a modeler's point of view these are the model parameters. From the user's point of view these are the specification of the manufacturing line. In SPECIFY the user can build an initial manufacturing line specification using hooks, and the modify individual elements manually.

• REGEN: dynamically regenerates the model based on the description of the model specified in a set of text files. This is a component the user never sees, but is critical to ease of use and rapid "what iffing."

2. SOLVER: solves the question (for example what ifl increase starts) about capacity in steady state and the transition from one state to another state.

• SANITY CHECKER

• SS-SOLVER: is the the steady state solver to determine on average how much contingency capacity is available at each tool center (defined in detail later).

• TS-SOLVER: is the transition state solver. This provides an estimated profile of the line for each day as it moves from one steady state to another after a change (increasing starts, adding capacity, increasing yields, etc) is made. For more details on TS-SOLVER see Appendix F or Fordyce et al. 1992a.

3. VIEW: Graphics and queries to explore the model results and imbed them inside of reports and presentations.

APPENDIX C

FIRST SMALL MODEL


For the first model to solve we have two operations (11 and 21 which we will index 1 and 2) and 6 machines (A223 A224 A225 A226 A227 and A229 which we will index 1-6). The i index refers to the operations and the j index refers to the machine. Therefore we have 12 decision (X variables), 6 slack variables, and 24 deviation variables.

min (PI x (U1 + U2» + (P2 x (UU1 + UU2» equation (1) Capacity limitation equations (2): Xll + 81 = 720 X12 + 82 = 720 X13 + X23 + 83 = 720 X14+ X24+ 84 = 720

X25+85 = 720 X26+86 = 720

First capacity allocation equation (3): Xll + X12 + X13 + X14 + U1 - 01 = 1.0 x 271 X23 + X24 + X25 + X26 + U2 - 02 = 1.0 x 164 Second capacity allocation equation (4): Xll + X12 + X13 + X14 + UU1 - 001 = 1.2 x 271 X23 + X24 + X25 + X26 + UU2 - 002 = 1.2 x 164

APPENDIX D

CODE TO CALCULATE REACHABILITY MATRIX

Assume the Boolean matrix TABLE2C exists with the corresponding information. The the following expression will generate the information in TABLE 4A.

TABLE4A~GENJND_MAT TABLE2C. [0] Z ~ GEN_IND_MAT XjJKjZPRIOR

[1] % finds all (reachability) linkages boolean approach [2] Z ~ X V • /\ ,pZ [3] % OR_DOT_AND and OR_DOT_AND5RANS faster equivalents [5] % V./\ is logical matrix multiply or inner product

122

[6] % rP creates the transpose (on side) of matrix [7] LI0: [8] ZPRlOR+-Z [9] Z +- Z V (Z V./\ Z)

[10] -+ ( ...... ZPRIOR == Z)/L10 [11] % repeat process until no new information added to Z

CHAPTER 6

APPENDIX E

HANDLING VARIATIONS BETWEEN TOOLS

To adapt our steady state capacity analysis model formulation to handle substantial differences in production time (M PW, minutes per wafer) and/or yield between machines handling the same product-operation cell, we start by extending minutes per wafer (M PW) to be specific to the tool as well as the product-operation. requires extending M PW to be specific to the tool as well as the product-operation. M PWi;k is the minutes per wafer for operation j of product i. To account for differences in yield we create YAM PW -yield adjusted wafers per minute. Y LDi;_ is an average estimated yield for this operation across the various machines that can handle this work. Y LDi;k is the yield specific to the machine. If one machine at a specific product-operation cell can produce 1 wafer in 10 minutes and its yield is 0.9, then it can produce 0.9 good wafers in 10 minutes and 1 good wafer in 11.11 (10 / 0.9) minutes. If the second machine at the same production-operation cell can produce 1.3 wafers in 10 minutes and its yield is 0.6, then it can produce 0.78 good wafers in 10 minutes and 1 good wafer in 12.82 (10 / 0.78) minutes. Therefore if we use the second tool we will have to produce substantially more wafers to have the same number of good wafers if we use the first tool. Instead of adjusting the required output for the variation in yield, we can simply increase the time it takes to produce a wafer. Therefore:

( YLDi ;_) YAM PWiik = M PW'ik X Y LD,;k .

YAM PWi;k refers to the average number of minutes required from tool Ie to produce one wafer at operation ij. YAWP M'ik is its reciprocal, the average


number of wafers produced per minute by tool k at operation ij. Xii'" the decision variable, is the minutes of tool k assigned to operation j for product i. The number of wafers produced at an operation by a specific tool is YAWPMiii: x Xiii:.

A W Lij refers to the yield adjusted average work load in number of wafers that have to be produce at operation ij to meet line out requirements. We refocus the capacity allocation goals to target work load (TWS).

26

LYAWPMiii:OTLiii:Xiilc + (UI; - Ofj ) = TWL~j Vij "=1

The above equation reflects our desire to meet capacity goal I (TW L~j) by allocating time from the machines to the operations in such a way that we at least meet these requirements (target number of wafers produced). If ulj is positive we are "under" (not enough) capacity. If of; is positive we are "over" capacity. In this example we will have two capacity goals: 1.0 x A W Li; and 1.2 x A W Lii. The revised formulation is:

:I 5

mm LL)Pl x ui}) + (P2 x Ui~) i=l i=l

2 6

s.t. L L OTLiji:Xiji: + S1& = AMCi: Vk i=l ;=1 26

LYAWPMij1&OTLi;1&Xij1& + Ui~ - otj = TWL;i = 1.0AWLij Vij 1&=1

26

L YAWP Miii: OT Lij1& Xiii: + Ui~ - O?j = TW L?i = 1.2A W Lii Vij i:=1

APPENDIX F

TRANSIENT SOLVER

124 CHAPTER 6

As an illustration, assume we are interested in increasing the output (lineouts) for our memory product from 200 wafers to 250 wafers. After working with the STEADY STATE SOLVER we have established a tool plan to handle this increase in output in "steady state." But what about the transition from 200 wafers to 250 wafers? Do increase starts gradually or all at once? When do we start to increase lineouts? Will temporary bottlenecks emerge?

The traditional method is to do an extensive Monte Carlo simulation of the line, but this does not meet the rapid response requirement. To generate a "transition profile" quickly we use a modified version of the Daily Output Planning System (DOPS) (Fordyce, Gerard, Jesse, Sell, and Sullivan 1992) to carry out "deterministic" simulation and get a "ballpark" view of the transition profile. DOPS was originally designed to address the question how many wafers or lots should each operation plan to process in one day to meet immediate demand and position the line to meet tomorrow's demand. The key equation is:

EWij + (U Nij - Olii;) = TWi;

Here, EWij = end wip in wafers at operation i for product j. (U Ni ; = amount end wip is under or below target wip, ov = amount end wip is over or above target wip, tw = target wip at operation i for product j.

The solver generates recommended outputs (in wafers) to minimize how much the end wip (EW) is below the wip target (TW) without violating any constraints (described in the next paragraph). Wip targets for operations at the end of the manufacturing line have a pre-emptive priority over wip targets for earlier operations. Since UN = max{O, TW - EW}, the solver works from the end of the line to the beginning to minimize UN, where minimizing U Nii has a having a pre-emptive priority over minimizing U Ni ,; -1'

If we fail to meet a wip target at operation ij, then we add the value for U Ni ;

to TWi,i-1. If we exceed a target wip at operation ij, then we subtract the value for ov from TWi,;_l' We call the revised wip target RWT. The simple formula for RWT is: RTWi ,i-1 = TWi,i-1 + U Nij - Oliij' A dampening heuristic is used to smooth the "in flight" changes in wip target.

In our example the solver would first try to move wafers down stream to build ending wip levels at LION operation 6 to 200 and TIGER operation 6 to 100. Depending on how successful the solver is in making the wip targets for operation (6,1) and (6,2); it will adjust wip targets for operation (5,1) and (5,2). Next it would look to move wafers down stream to build ending wip levels at LION operation 5 to the revised wip target RWT(5, 1) and TIGER operation 5 to RWT(5,2).


From the manufacturing flow we have the following constraints.

1. A lot may not skip an operation and it may only move down stream.

2. Operation ij can not process lots that are not within its puUlimit range.

3. We can not exceed the available capacity at any tool center Ie.

4. We can not process more than MAXOUT wafers at an operation per day.

5. We would like to process at least MINOUT wafers at an operation per day.

6. Wafers are launched as groups of 25 called a lot. Due to yield loss the number of wafers in a lot may be reduced during the lots travel through the line. A lot always stays together. Therefore lots move, not partial lots, not wafers, not partial wafers.

There is both a goal programming solution and a heuristic or artificial intelligence solution to this question. For our transition requirement we used the heuristic solver (HS).

To use HS as a deterministic simula.tion tool we built a control mechanism to run it for multiple days where the end wip of day i is the starting wip for day i + 1; lineout or output requirements can vary from day to day and be dynamically set based on the lineouts performance of prior days; and new lots are started based on the rules required for the situation being modeled. Figures Fl gives example graphs that depict the wip profile of the line on day 2 of its transition from 200 wafer lineouts to 250 wafer lineouts.

REFERENCES

[1] Dayhoff, J. and Atherton, R. 1987, "A Model for Wafer Fabrication Dynamics in Integrated Circuit Manufacturing," IEEE 7ransactions on System., Man, and Cybernetic., Vol. SMC-17, No.1, pp. 91-100.

[2] Fowler, J. and Robinson, J. 1994a, "Measurement and Improvement of Manufacturing Capacity (MIMAC) Project Bibliography," SEMATECH, 2706 Montopolis Drive, Austin, Texas 78741-6499

[3] Fowler, J. and Robinson, J. 1994b, "Control of Workcell with Sequence Dependent Setups," TIMS/ORSA, Boston, April 29, 1994.

126

4,000

3,500

111 3,000 II:: w ~2,500 z ~2,ooO ILl > ~ 1,500 a..

i: 1,000

EWIP VS TWIP FOR DAY 2 LION DGR OF 300 XF OF 4

• ii;

CH APTER 6

mEND WIP

ClTAA WIP

500 o~m ~r1IililIJ!f'I!11 .J!.EUJ: ~ ~!fUllii . ~~I~"-4"l-: rn:.rMl-..lilil..--,l

311 439 463 543 621 719 845 GATES

CUM EWIP VS CUM TWIP FOR DAY 2 LION DGR OF 300 XF OF 4

16,000

14,000 .....

.....

12,000 -CUM END WIP III

- CUM TAR WIP IE w ~ 10,000 :a:

..... '-!: 8,000 \ ...J ILl "-~

6,000 " ...J

a.. i:

4,000

2,000

0 311 439 463 543 621 719 845

GATES

Figure F,3 WIP and DGR Profile for Day 2


[4] Fordyce, K., Hannan, E., and Sullivan, G. 1991, "A Goal Programming Algorithm Using a Smart Selection of Entering and Leaving Variables Modification to the Simplex Algorithm," IBM, MS 922, Kingston, NY 12401.

[5] Fordyce, K., Gerard, B., Jesse, R., Sell, R., and Sullivan, G. 1992a, "Daily Output Planning: Integrating Operations Research, Artificial Intelligence, & Real-time Decision Support," Ezpert Sylltem Applications, Vol. 5, pp. 245-256.

[6] Fordyce, K., Dunn-Jacobs, R., Gerard, B., Sell, R., and Sullivan, G. 1992b, "Logistics Management System (LMS): An Advanced Decision Support System for the Fourth Decision Tier Dispatch or Short Interval Scheduling," Production and Operations Management, Vol. 1, No.1, pp 70-86.

[7] Fordyce, K., Jantzen, J., Morreale, M. and Sullivan, G. 1991, "Using Boolean Matrices or Integer Vectors to Analyze Networks," APL91 Proceedings, editor Jan Engel, APL quote quad., Vol. 21, No.4, pp. 174-185.

[8] Gray, D. and Kabbani, N. 1994, "Right Tool, Place, and Time," OR/MS Today, Vol. 21, April 1994, No.2, pp. 34-41.

[9] Kempf, K,; Chee, Y,; and Scott, G. 1988, "Artificial intelligence and the scheduling of semiconductor wafer fabrication facilities," SIGMAN Newsletter Vol. 1, No.1, pp. 2-3.

[10] Leachman, R. and Carmon, T. 1992, "On capacity modeling for production planning with alternative machine types," lIE Trans. Sept. 1992.

[11] Leachman, R. 1993, "Modeling Techniques for Automated Production Planning in the Semiconductor Industry," Optimization in Industry, John Wiley and Sons, edited by T. Ciriani and R. Leachman

[12] Lee C., Uzsoy, R., and Martin-Vega, L. 1991, "A Review of Production Planning and Scheduling Models in the Semiconductor Industry," School of Industrial Engineering, Purdue Univ., West Lafayette, IN.

[13] Lee C., Uzsoy, R., and Martin-Vega, L. 1992, "Efficient Algorithms for Scheduling Semiconductor Burn-In Operations," Operations Research, Vol. 40, No.4, pp. 764-775.

[14] Miller, D. 1990, "Simulation of a Semiconductor Manufacturing Line," Communications o/the ACM, Vol. 33, No. 10, pp. 98--108.

ABSTRACT

7 QUEUEING ANALYSIS

IN TK SOLVER (QTK)

Donald Gross and Carl M. Harris·

Department of Opero.tioT1.8 Relearch The George Wuh.ington Univerlit'V, Wuh.ington, DC !005!

*Department of OperatioT1.8 Re,earch and Engineering George Malon Univerlity, Fairfaz, Virginia !!030

The inherent heavy use of computer-based numerical methods in stochastic modeling has given rise to the natural application of mathematical computer packages to queueing analyses. We present a summary of how our queueing package, QTK, was built up from the well-established TK Solver tool kit of mathematical problemsolving routines. The rule-based, interactive TK software is especially well suited for easy access from the desktop, and, with the queueing modules added, it now permits queueing analyses that include complex "what if?" exercises; comprehensive, stochastic sensitivity analyses; the natural adaptation of standard modules to problems very different than those in the package; and the establishment of cost models and their optimization.

1 INTRODUCTION

There is hardly an operations research professional today who does not have easy access to a sophisticated desktop computer. Certainly one of the most profound results of the desktop computer explosion is the spreadsheet package, beginning from the intra- duction of VisiCalc in 1979 by Personal Software (later called VisiCorp and Software Arts) through the latest updates, for example, of Lotus Development's 1-2-3, Borland's Quattro Pro, and Microsoft's Excel, and the subsequent movement of more and more OR into spreadsheet software.

129

130 CHAPTER 7

The same people who developed VisiCalc also created the first integrated mathematical and engineering rule-based, interactive software for PCs in 1982, calling it TK Solver. The TK software was bought by Lotus together with VisiCalc from Software Arts in June 1985, and then sold by them to Universal Technical Systems in December 1985. The UTS family of software products now includes a library of more than 100 preprogrammed models dealing with basic and advanced numerical analysis and probabilistic/statistical methods, in addition to an ever-widening range of optional TK SolverPaks for applications in specific areas, including some that would qualify as operations research and management science. Currently, TK Solver is available for MS-DOS, Macintosh and DEC VMS operating systems (a Windows version is nearing release), as well as for all workstations running under UNIX.

Interestingly, the latest versions of the common spreadsheets come with some built in operations research capabilities - for example, Quattro Pro 4.0 (QPro4) has both linear and nonlinear programming techniques under /Tools Options. However, with the exception of modules for regression and random number creation, there is nothing in QPro4 that we would agree is directly useful to the probabilistic modeler. In fact, we would be hard pressed to find very many broad-based software packages that would provide the kind of direct numerical solution procedures necessary for solving wide classes of stochastic models. There are certainly stochastic sides to all of the generic OR packages, such as STORM, QSB+ and Hillier's PROBMODj but these routines typically solve only the simplest of problems and cannot be specifically tailored to handle atypical modeling challenges.

Many authors have noted that the development of broad-range software for solving applied probability problems requires many different kinds of mathematical procedures. The disparate nature of these computational needs makes it difficult to compile anything near a comprehensive set of problem-solving modules. This is in stark contrast to mathematical programming, where there are a small number of solution procedures for handling each of a large class of problems. For example, the simplex method and its variations can solve large numbers of linear programming problemsj the assignment algorithm can solve all classic assignment problemsj etc. But, in the area of queueing analysis, there are so many different numerical requirements necessary for solving just the usual models developed in any queueing text that it is not possible to subdivide all problems into a small set of equivalence classes each requiring the same solution method.

We think that it is fair to say that linear programming (LP) is relatively easily handled in a spreadsheet package since a major part of any such problem is

Queueing In TK Solver 131

the generation of the constraint matrix and its manipulation, things which, by definition, a spreadsheet can do relatively efficiently (the same is true for regression). But standard matrix calculations playa minor role in completing queueing analyses. The class of procedures commonly called matrix-analytic methods require much more than is present in Lotus or QPr04, as well as what is likely ever to be included in spreadsheets.

To elaborate, a good example of the role of matrices might be the solution of standard Jackson networks. In the open-network case, the solution of the traffic flow equations (say, [1 -PjA = 'Y) is a consistent linear system, which can be written in matrix-vector form as A = [1 - Pj-1'Y. But the problem's total solution must include the application of birth-death analysis to each node in order to complete the specification of the network's stochastic behavior. If the network were instead closed, there is a slightly different linear system to solve, and a numerical procedure must be used for the state probabilities to deal with the lack of a product form in the joint system-size probability function. Thus, neither the open nor closed network problem can be fully solved as a linear system.

As a further example of the limitations of standard matrix/vector manipulations in queueing, consider the matrix-geometric version of the G/M/1 queue where the entries of the arrival-point transition matrix are to be matrices instead of scalars. Then the next step in the solution is the computation of the (matrix) root of the matrix analog of the fundamental branching process equation z = B(z) = EfJ.,.z". Things get even more complicated when we need to find the (possibly complex) roots of zK = B(z), as we must for the EK /M/1 queue.

2 TK AND QTK

2.1 TK Solver

TK Solver is an equation-solving and knowledge-management software product. In a broader sense, TK is all of the following:

1. A rule-based, declarative language. Problems are set up in TK in the form of rules and relationships as basic building blocks.

132 CHAPTER 7

2. A non-procedural language because it does not require instructions to be set up in precise sequence. The order in which rules or equations are entered is not important. For example, in TK one can define a relationship between variables a, b, c, d and z as

c = In(z)

3. A very high-level, object-oriented programming language that lets people do more than is possible with conventional languages such as FORTRAN, BASIC, PASCAL, etc.

4. A language which provides many functions commonly used by engineering and scientific people. In addition, TK provides an ability for users to define their own functions and subroutines, thus giving them unlimited capability to extend the offering of built-in functions.

5. An environment which provides good flexibility for generating tables, plots and other graphical images complementing the problem solving power of TK.

The benefits of TK are that:

1. Problems can be set up on a computer 10 to 100 times faster than using conventional programming languages, thus generating major cost savings. Overall, the complete set of TK routines requires only 1 megabyte of memory and can therefore be carried on one high density floppy disk.

2. A much more thorough analysis can be done in a short period of time. The user can become more creative because he or she does not have to do any of the computing chores required by conventional languages, and can thus concentrate on the problem at hand rather than many of the "nitty-gritty" details of getting the solution.

3. TK is an ideal environment for work group computing. Applications developed centrally or by individuals can be easily managed and shared by others.

4. TK organizes problems in neat modules. Modules from one problem can be easily cut and pasted into another problem. Thus the need to re-invent the wheel is eliminated.


5. It is not necessary to know conventional scientific programming languages to use TK. It is, however, still easy with TK to use computers to solve a wide range of mathematics problems, from the simple to the very complex, and this can be done with minimum training time.

6. Learning time is reduced because the method of using TK is the same on PCs and workstations. Also, there is total compatibility of files.

2.2 QTK

The Queueing Anallll" with TK Solver software application (QTK) was written using the TK Solver package. As an application ofTK Solver, QTK comes with its own installation program, which automates the copying of files and shields the user from worrying about the PC's operating system commands. The entire QTK application requires less than 1MB of hard disk space Of course, the user must have previously installed TK Solver, and the total core requirement for TK + QTK is dependent on the type of hardware system used.

TK Solver is loaded into a directory called TK2 on a hard drive. The queueing menu QTK.TKM and all of TK's waiting-line model routines are included in the same directory. Typing QTK from any DOS prompt loads TK Solver along with the TK menu and provides the menu screen as presented in Figure 1.

Notice the Section menu line In Figure 1 at the top of the screen. Each section in the QTK menu corresponds to a definitive grouping of queueing models. Altogether, the six sections hold a total of 65 modules in the latest version of QTK. A Section Description Box is located at the bottom of the screen, and it contains a brief listing of the material covered in the section highlighted by the cursor. The usual right and left arrow keys on the keyboard are used to change the section choice on the menu, and the Section Description Box will switch as the highlighted section changes. Within a given section, the up and down arrows keys allow one to go from one specific model to another. For example, with the cursor sitting in the first section, Basics, pressing the down arrow key allows access to the lower level and gives the user the opportunity to browse through the various model descriptions. The menu would then look as in Figure 2.

134 CHAPTER 7

TK Solver 2.0 Queueing Theory

section

Section Description ---------------------------------------------------,

Introduction to the Fundamentals of Queueing Theory Poisson Probabilities and Markov Chains Finite, Linear Difference Equations Finite and Infinite Birth/Death Processes L-___________________________________________________________ Fl Help

Figure 1


section Basics I 1 Server I C Servers I Bulk Queues I Priorities I ~etworksl Poisson Probabilities Mixed Exponential Probabilities Erlang Probabilities Mixed Generalized Erlanq computations Fini te Markov Chains Finite, continuous-Time Markov Chains Finite, Linear Difference Equations probabilistic, Difference Equations Finite, steady-state Birth/Death Solution Infinite, Steady-State Birth/Death Solution Sample Path for Constant, Single-server Queue

Model Description --------------------------------------------------J----' Problem 1.1: Poisson Probability Calculations

Plots and Tables

L--------------------------------------------------------Fl Help

Figure l



Section

Unlimited M/M/l Queue M/M/l/K Queue Markov, Single-Server Finite-source Queue Simple State-Dependent Service General, Markovian state-Dependent Queue Impatient Customers: Balking M/D/l Deterministic-Service Queue M/E(k)/l Erlang-service Queue E(k)/M/l Erlang-Arrival Queue M/G/I: Non-Exponential Service M/G/I Sensitivity Analysis

Model Description ------------------------, Problem 2.1: M/M/l ---

Poisson/exponential single-server queue with unlimited system capacity, FIFO

Calculates major measures of effectiveness ~-------------------------- FI Help

Figure 3

3 SELECTING AND WORKING WITH A QTK

MODEL

To illustrate the software, we select 1 Se1"lJer from the menu and then Unlimited MIMll Queue from the submenu as shown in Figure 3. After favorably responding to a prompt about loading, all the information related to the selection is loaded into TK Solver. Once the loading process is finished, the screen changes to a Variable Sheet with model information as shown in Figure 4. There is actually more information in the Variable Sheet than can be fit onto one screen, so we must scroll down to get the remaining model information, which is shown in Figure 5 below. In addition to the arrow keys, the PgUp, PgDn, Home or End keys can be used to navigate around any TK sheet.

3.1 Working with the Model

The model chosen, the M/M/1 queue with unlimited waiting room, is the simplest of all queueing models and is a good one to use for illustrating how the software works. This queue has a single server, and a single waiting line.

136 CHAPTER 7

------- VARIABLE SHEET ----------------St Input-- Name-- Output- Unit-- Comnent-------------

2 1.8

2

iat st lanbda IIU

rho pO n pn p'n

Lq L L'q Wq 1/

min min llmin lImin

min lAin

M/M/l: Sinule Server I Unlimi ted Queue

Mean interarrival time Mean time to complete service Arrival rate (arrivals/lllit of time) Service rate (. servedllllit of tillle) Fraction of time the server is busy Fraction of tillll! the server is idle Target' of customers in the system Probabil ity of n in the system Conditional probabil ity that n

customers are in the system given that the queue is not erpty

Expected queue size Expected system size Expected non-empty queue size Expected waitinu time in the queue Expected waitinu time in the system

Figure 4

------- VARIABLE SHEET ----------------St Input-- Name-- Output- Unit-- Comnent-------------

1.1 t min Speci fic amount of time

10

.1 50

Ptq Probabil ity of beinu in the queue for time t or greater

Pts Probabil ity of beinu in the system

K

pIC PK

d T TI/s

B

min min

min

for time t or greater

MaxillUll value of variable whose prob needs to be pr i nted & plotted

Probabil ity of K in system Probabil i ty of <= K in system

input: size of time interval for plots Total time horizon for prob plotting Probabil ity that sys tem wa i t <= T

(should be 1 if full plot is needed) Mean length of busy per i od

See Tables and Plots

Figure Ii


When the server is busy, customers queue up and are served in a first-come, first-served manner. When service is finished, the customer exits the queue. If other customers are waiting, the one at the head of the line goes immediately into service. If no customers are present when a server finishes, the server becomes idle, and waits until another customer arrives.

Both interarrival time and service time probability distributions are assumed to be exponential, with means iat and st, respectively. The mean arrival rate is denoted by lambda (~) and is equal to l/iat. The mean service rate is denoted by mu (1') and is equal to l/st. For this simple model, this is all the information that is required for calculating measures of effectiveness. In fact, since iat and lambda are reciprocals, as are st and mu, only one of iat or lambda and one of st or mu are, of course, required to run the model. For example, if iat is set at 2 and sf at 1.8, the model calculates lambda as .5 (= 1/2) and mu as .5556 (= 1/1.8). Note the units are also adjusted, the times being in "minutes" and the rates in "per minute" (l/min).

Concentrating on the first six entries in the Variable Sheet, observe that iat and st are listed as input, while lambda, mu, rho (p) and pO are shown as output and are to be calculated using the formulas on the Rule Sheet. Note that the Variable Sheet's columns are, respectively, St, Input, Name, Output, Unit and Comment. We will ignore St for the time being. A number in the Input column represents that the "variable" named in the Name column with units given in the Unit column and described in the Comment column is a model input. A number in the output column signifies that the named variable associated with it (in its row) is calculated by the model. Using iat and st input allows lambda, mu, rho and pO to be calculated using the basic queueing formulas found in any standard queueing, operations research or management science text.

One can see the actual model formulas by going to the Rule Sheet. This is accomplished by pressing, successively, the keys = and r. The = key by itself provides a menu window in the upper right portion of the screen which allows the changing of sheets (Figure 6).

Notice the manner in which the equations are written on the Rule Sheet. The first rule relates the interarrival time and the arrival rate. The remaining rules are easily recognized since most are the relatively simple formulas for this queueing model (for example, pn = pO*(lambda/mu), n). The Rule Sheet illustrates attractive key features of TK. First, the rules are written in a familiar, symbolic form. Second, the order in which the rules appear on the sheet is seldom important because TK will repeatedly cycle through the Rule Sheet using new

138 CHAPTER 7

I = Sheet options

Variable Rule Function Keystroke Macro Unit List

Plot Table

Numeric Format Comment Global

Figure 8

------- RULE SHEET ------------------S Rul~e--------------------------------------------------------

; Single Server, Possibly Unlimited Queue Length: 1I/1I/1/infty ; Data entry val idation rules

* if and(sol ved(), not(gi ven(' larilda», not (given( Ii at») then call bousg(' warn2 * if and(solved(),not(given('IIU»,not(given('st») then call bo_g('Warn2,"No * if and(solved(),or(not(given('IC»,not(given('d»,not(given('T»» then call b

* iat .. 1/1arilda * st • 1/11U * if andCsolved(),st/iat >~ 1) then CALL BOXMSG('warn1,"W A R N I N G 1",1,1) * rho ~ 1 arilda/IIU * pO ~ 1 - larilda/IIU * pn .. pO*(larilda/IIU)An * L ~ C larilda/IILI)/pO * Lq ~ larildaAZ/CIILI*CIILI-larilda» * Wq ~ 1 arilda/ (1111* CIILI-l arilda) ) * W • 1/(IILI-larilda) * Pts = exp( -lILI*pO*t) * Ptq = rho*exp( -nu"pO*t) * p'n = pn/( larilda/IILI)A2 * L 'q .. UpO * call probs(lC;pK,PIC) * call waits(d, T;TWs) * B=1/(IILI-larilda)

Figure T


information to solve for remaining unknown variables until nothing more can be done.

There are a few rules which are more complicated than algebraic expressions. There is an ''if' statement and some "call" statements. These resemble FORTRAN, BASIC or PASCAL types of programming statements, and, indeed, programming "rules" in TK have elements of all of these programming languages. However, to use this software package, one need not do any programming, though the advanced user may wish to augment some of these queueing models thus requiring some additional programming effort.

Further consideration of the Variable Sheet (return to Figures 4 and 5) shows that there are other variables listed as input values. Scrolling down the sheet, the next input variable after at is n. Specifying n tells the model to calculate the probability of finding this number of customers in the system, as well as the conditional probability of this many in the system when the queue is not empty, i.e., ignoring server idle periods. Below this are the expected-value measures of effectiveness for this queueing system (Lq, L, L'q, Wq and W), which the model can calculate knowing the first two input variables, iat and at. Again, the formulas for these can be found in any standard text, or can be seen on the Rule Sheet (Figure 7).

The remaining input variables for this model are t, a time value for the model to calculate the probability of waiting that amount of time or greater (Ptq being the probability of waiting that time or longer in the queue before entering service and Pta being the probability of spending t or longer in the systemwaiting for service and being served); K, an integer representing the maximum number of system-size probabilities to be calculated; d, an interval size for plotting waiting-time probability distributions; and T, the total time horizon for waiting-time plots.

A comment on input and output variables is in order here, namely, that TK does not formally distinguish between the two-this is one of its many nice features and allows one, for example, to interchange inputs and outputs, and answer such questions as, "if an expected system size of 7 is desired, and mean interarrival time remains at 2 minutes, what mean service time is required?" This will be addressed shortly. But first, consider further the current model solution.

Whenever a QTK model is loaded, it will have inputs entered and the model will be ready to solve. Usually, the input values are only on the Variable Sheet, but sometimes they are also provided via tables.

140 CHAPTER 7

------- VARIABLE SHEET -------------__ St Input- Name-- OUtput- Unlt-- C_t:-----------

2 1.8

2

lat at lanixfa .5 l1li .5556 rho .9 pO .1 n pn p'n

Lq L L'q Wq W

.081

.1

8.1 9 10 16.2 18

lIIin IIIIn 1/llin 1/IIIn

lIin

1IIM/1: Single Server / Unllllited Queue IIeIn Interarrlval tl. IIeen ti. to cOIIIPlete service Arrlvel rate (arrlvels/lnlt of tl., Service rata (. servedllnlt of ti., Fraction of tl_ the server Is busy Fraction of tf_ the server Is Idle Target' of cust_rs In the SYllt. Probabfl Ity of n In the SYllt. Condlti_l probability that n

cust_rl ara In tha 'YIlt. given that the queue I, not .. ty

Expected queue Ilze Expected IYllt. I I ze Expected non-Mpty queue lize Expected 118iting ti_ in the queue Expected 1181tlng ti. in the _YIlt.

Figure 8

3.2 Running the Model, Part A

Once input values are provided on the Variable Sheet as shown in Figures 4 and 5, the model is solved by pressing the F9 function key. The resulting output (performance measures of this system) is given in Figure 8.

3.3 Running the Model, Part B

The model can easily be rerun for any specific set of parameter values; suppose it is desired to see the effect of changing iat to 1 and at to .9. To do this, the cursor is moved to where the previous input values are and the new values simply typed over the old. The F9 key is again used and the model is re-solved (Figure 9). Since both iat and st were reduced by the same factor, the expected system and queue sile measures do not change (compare Figure 9 to Figure 8), but changes do occur in the measures involving waiting times; in fact, the average waits are halved.

Queueing In TK Solver

------- VARIABLE SHEET'-----------____ _ St Input-- Name- Output- Unit-- C_t----------

MlM/l: Single Server / unlimited Queue

1 .9

2

iat st lantlda IIU rho pO n pn p'n

Lq L L'q I/q \I

1 1.1111 .9 .1

.081

.1

8.1 9 10 8.1 9

min Mean interarrival time min Mean time to c~lete service l/min Arrival rate (arrivals/unit of time) lImin Service rate (1# served/unit of time)

Frsction of time the server is busy Fraction of time the server is idle Target 1# of customers in the system Probabil i ty of n in the system Conditional probability that n

customers are in the system given that the queue is not empty

Expected queue size Expected systl!lll size Expected non-empty queue size

min Expected waiting tillll! in the queue min Expected waiting tiM in the system

Figure 9

141

To illustrate the utility of software packages such as this, and how easy it is to answer "what if' questions, the effect of reducing only st can be easily analyzed by replacing the iat value with its original value of 2, leaving st at .9 and resolving using the F9 key. Results are shown in Figure 10. Comparing these to those of Figure 8, the value of rho now drops in half and the server idle probability increases over five fold. The average waiting time in queue drops from over 16 minutes to less than 1 minute. Thus doubling the service rate (halving the service time) produces quite dramatic results.

Suppose it is desired to know how much service time must be reduced in order to halve the total time in system (reduce the total time in system from 18 to 9). The st value of .9 is blanked out (by putting the cursor on the .9 and pressing the space bar) and the cursor then moved to the W row, where 9 is entered in the Input column. Then, F9 is used to re-solve (Figure 11). There is now an output value for st, namely, 1.6364 minutes. So st need only be reduced from 1.8 minutes to 1.64 minutes to reduce time in system (W) in half. The value of rho goes to .8182 from .9. This illustrates the sensitivity of waiting times to changes in st, and how easy it is to use the package for such analyses.

142 CHAPTER 7

------- VARIABLE SHEET ---------______ _ St Input-- Name-- Output- Unit-- Conment:-----------

M/M/h Single Server I Unl imited Queue

2 .9

2

iat st lantxla AU rho pO n pn p'n

Lq L L'q Wq W

.5 1.1111 .45 .55

.1114

.55

.3682

.8182 1.8182 .7364 1.6364

lIin Mean interarrival time lIin Mean t i lie to c~l ete servi ce 1Imin Arrival rate (arrivals/U'lit of time) 1Imin Service rate (' served/U'lit of time)

Friction of tille the server is busy Fraction of ti .. the server is idle Target • of custClllers in the system Prababil i ty of n in the system Conditional probability that n

customers are in the system given that the q.JeUe is not ...,ty

Expected q.JeUe • i ze Expected syst_ size Expected non· ...,ty q.JeUe size

min Expected waiting ti .. in the queue min Expected waiting ti .. in the system

Figure 10

------- VARIABLE SHEET ----------------St Input-- Name-- OUtput- Unit-- Cana'lt:------------

M/M/l: Single Server I Unl illited Queue

2 iat min Mean interarrival time st 1.6364 .in Mean tille to cDq)lete service lambda .5 llmin Arrival rate (arrivals/U'lit of till!!) nil .6111 lImin Service rate (I served/unit of tille) rho .8182 Fraction of ti .. the server is busy pO .1818 Fraction of ti .. the server is idle

2 n Target II of custCAers in the system pn .1217 Probability of n in the systelll p'n .1818 Conditional probability that n

customers are in the system given that the q.JeUe I s not ...,ty

Lq 3.6818 Expected queue size L 4.5 Expected systelll size L'q 5.5 Expected non·...,ty queue size Wq 7.3636 min Expected waiting ti .. in the queue

9 W min Expected .. aitlng time in the system

Figure 11


------- RULE SHEET --------------__ _ S Rul~e--------------------------------------------------

; Single Server, Possibly Unlimited Queue Length: M/M/lIinfty ; Data entry val idation rules if and(solvedO,not(given('lardxta»,not(given('iat») then call boxmsg('warn2

* if and(solvedO,not(given('mu»,notCgiven('st») then call boxmsg('warn2,"No * if and(solvedO,or(not(given('IC»,not(given('d»,not(given('T»» then call b

iat ,. 1Ilardxta * st • 1/mu * if and(solvedO,st/iat >z 1) then CALL BOXMSG('warn1,"II A R N I N G 1",1,1) * rho = I aatxla/mu * pO = 1 - lllllilda/mu * pn = pO*(lambda/mu)An * L = (lambda/mu)/pO * Lq = lambdaAZ/(mu*(mu-lardxta» * IIq = lambda/(mu*(mu-lambda» * II = lI(mu-lambcla) * Pts = exp( -mu*pO*t) * Ptq = rho*exp( -mu*pO*t) * p'n = pn/( lambda/mu)A2 * L 'q = lIpO * call probs(I(;Pk,PI() * call waits(d, T;TWS) * a-lICmu-lambda)

Figure 1:1

3.4 Running the Model, Part C

If one were to try blanking out at and W, setting input values of W q = 10 and iat = 2, and re-solving, only lambda is computed because TK cannot solve for st since the formula for W q in terms of mu cannot be solved directly. Lambda has already been solved, but mu appears twice. It is instructive to look now at the Rule Sheet, which we show again below as Figure 12.

Note that * appears to the left of all rules except the two involving iat and lambda. The * indicates that a particular rule could not be solved. Each of these has at least two unknowns or an unknown appears more than once in a given equation requiring iteration. To give a simple example of the use of the built-in iteration in TK, if a G is entered in the status column (St) for the variable st, TK Solver automatically places a value in the input column, which acts as the initial guess for TK's Iterative Solver, which then takes over, repeatedly solving the equations, using a new guess until any inconsistency is eliminated. The results are shown in Figure 13.

144 CHAPTER 7

------- VARIABLE SHEET ---------------St l'llUt- Name-- Output- Unit-- Camnent-----------

2

2

10

iat st lBlltxla IILI rho pO n Pl p'n

Lq L L'q IIq W

1.7082 .5 .5854 .8541 .1459

.1064

.1459

5 5.8541 6.8541

11.7082

min min lImin lImin

min min

MIM": Single Server I Unlimited Queue Mean interarrival time Mean tillle to c~lete service Arrival rate (arrivals/~it of time) Service rate (i# served!~it of time) Frac:tion of tillle the server is busy Frac:tion of lillie the server is Idle Target , of custa.ers in the system Probabfl ity of n In the system Conditional probabil ity that n

customers are in the system given that the ~ is not ..,ty

Expected ~ size Expected system size Expected non-..,ty ~ size Expected waiting time in the ~ Expected waiting time in the system

Figure 11

3.5 Working with Units

Often, errors are made in manual calculations by forgetting to change units for consistency, for example, having arrival rates in customers per hour but service time in minutes. This package allows for automatic conversion. Referring to a Variable Sheet figure (say Figure 13), notice in the column headed Unit that iat and at have been entered in minutes, while lambda. and mu are expressed in l/minutes. If one were to replace the "min" in the iat row by "sec" (an easy thing to do on the Variable Sheet), the 2 would automatically change to 120. These conversions are built into the software on a Units Sheet (Figure 14). The program will convert among hours, minutes and seconds, but it will not handle day or week or any other time conversion, though new ones can be added by the user since the Units Sheet can be easily accessed and additional lines added, filling in the From, To and Multiply By columns as needed.

Note that one is free to define units in any way desired. For example, to consider a day as a shift (8hr), one can either define day that way by using 8 instead of 24 in the conversion, or add a unit called shift, putting in a new row with an entry in the From column called day, an entry in the To column called shift, and an entry in the Multiply By column of 3 (y days = 3y shifts). For all models


------- UNIT SHEET ---------------_ FrOlll-- To-- Multiply By- Add offset- Ccment:--------hr min 60 min sec 60 1/lli n 1/hr 60 1/sec llmin 60

Figure 14

in this package, automatic conversion exists only among seconds, minutes and hours, but it is an easy matter to augment this as described above.

3.6 Table and Plot Sheets

The final two sheets discussed in this paper are the Table and Plot Sheets, again easily accessed from the Sheet menu by hitting the:::: key, followed by selecting the sheet desired from the menu.

Tables

The Table Sheet is shown in Figure 15 for the MIMII model. Two tables are already set up in this module, Q_PROBS and WAITING_TIMES. The first is a table of system-size probabilities and the second is a table of both the cumulative probabilities for time in the system and time in the queue, respectively.

With the cursor on Q_PROBS, the F8 function key is used to get to that table (Figure 16). The table appears with columns marked nq, protn, and Prob_n, respectively. The nq column contains the various possible numbers of customers in the system (queue + service), denoted by n on the Variable Sheet. The prob_n column gives the probability of exactly that number in the system (shown as pn on the Variable Sheet), and the Prob_n column gives the probability of that number or less (the cumulative probability) in the system.

146 CHAPTER 7

------- TABLE SHEET ----------------Nmoo Title~----------------------------------------

Q PROBS System-size probabil ities WAITING_TIMES System waiting-time CDF + PDF, and line delay CDF

nq o 1 2 3 4 5 6 7 8 9 10

Figure 15

System·size probabilities

prob_" .1459 .1246 .1064 .0909 .0776 .0663 .0566 .0484 .0413 .0353 .0301

Figure 16

Prob " .1459 .2705 .3769 .4678 .5455 .6118 .6684 .7168 .7581 .7934 .8236


------- TABLE: Q PROBS ---------------Screen or Printer: Title: Vertical or Horizontal: Row Separator: Column Separator: First Element: Last Element:

Screen System-sin probabilities Vertical

List-- NlIIleric Format- Width- Headin9"I1-------------~ 10 prob n d4 10 Prob:n d4 10

Figure 1 T

To get to the other table, we return to the Table Sheet (by accessing the sheet menu using the = key), move the cursor down to WAITING_TIMES, and press F8 to get the table of cumulative waiting-time probabilities.

Diving

The concept of "diving" is an integral part of TK Solver and allows one to go from a sheet to a subsheet and from a subsheet to a sub-subsheet, etc. To dive, the shift> keys are used. Using the shift < keys allows one to go the other way, that is, from a subsheet to a sheet. TK calls this maneuver a "return." To illustrate this, starting from the Table Sheet, putting the cursor on Q_PROBS and diving (shift » results in Figure 17, namely, a summary of the table's contents.

Diving once more results in the actual table of values as shown in Figure 18. From any column, one can dive again and access an individual list of values. In general, diving allows access to more information about any of the objects in TK, until the most basic elemental information is reached. The user will seldom have to dive very many levels, unless modifications to the models are desired.

148 CHAPTER 7

------- TABLE: Q PROBS ---------------Title: System-size probaiiH ities Element nq- prob n-- ---- Prob n-- --- --- ---I 0 _ 1459 .1459 2 1 .1246 .2705 3 2 .1064 .3769 4 3 .0909 .467B 5 4 .0776 .5455 6 5 .0663 .6118 7 6 .0566 .6684 8 7 .0484 .7168 9 8 .0413 .7581 10 9 .0353 .7934 11 10 .0301 .8236

Figure 18

------- PLOT SHEET ------------------Name Plot Type- Display Option- Title-e------------system.,probs Bar chart 1.VGA System-size probability fln:tlon syst_wait_CD Line chart l.VGA CDF for systa. waiting tilles ct..delaY_CDF Line chart 1.VGA CDF for line waiting times ct..delaYJX!f Line chart 1. VGA POf for syst. wai ts

Figure 18


SI ...... III. ,"!IIblllll r ... lI .. . 11

-.11

-.12

r-.1

,--

.11 -r-

.1' ,--

r-.11 - -

r-.12

II ,.Iu.

Figure 20

Plots

A very nice characteristic of TK Solver is its graphics capability. Graphs are set up on a Plot Sheet easily accessed from the sheet menu. Listed on the Plot Sheet for this M/M/1 model are plots of the system-size probability function, the cumulative probability density functions for system and queue waits, and the probability density function for system waits (Figure 19). Placing the cursor on the top line, the F7 function key is pressed and reveals a bar graph of the system-size probability function (Figure 20). Pressing any key returns the screen to the Plot Sheet.

3.7 Hard Copy

During the TK installation process, users are prompted to configure TK Solver for their own systems. TK will send output to any of the devices indicated at that time. When a plot is displayed on the screen, plot output is generated by pressing the letter o. TK will provide a choice of printers or allow printing to a file, for example, to be used in a technical report such as this. One other way

150

.9

.8

.7

.6

11 15 21 25

II.,.

Figure 21

JI

CHAPTER 7

JS sa

to get crude hard copy is through the usual Print Screen command. This, of course, gives only one screen at a time.

Printing Sheets

Any sheet in TK Solver can be printed either partially or in its entirety. The instructions for doing this are well laid out in TK's manuals. This can easily be done for the Variable, Rule, or Table sheets with which we have previously worked.

3.8 Returning to the Menu

To return to the menu from any sheet (Variable, Rule, etc.), either Fll or Shift F3 can be used. Another model can now be chosen from the menu after suitable prompts regarding saving changes, etc. The Variable Sheet for the new model selected will then appear.


Exiting the Program

As in most software packages, an easy exit from QTK is accomplished by simply pressing Iqy, with the I giving the command menu, the q picking the appropriate item from the menu for exiting and the y the appropriate response to the prompt asking if exiting is truly desired.

4 MODIFYING EXISTING MODELS

Since TK Solver is relatively easy to use, it is not a difficult matter to modify the models in the package. For example, suppose that a cost per hour per server and a cost per hour per waiting customer were known in an MIMic situation (first problem in C Servers section). One might like then to find the optimal number of servers over some range of possible choices. For this model, the expected total hourly costs of the system, which might be denoted by EC, can be expressed as

EC = Cl * c + C2 * L,

where Ct is the hourly rate of pay per server, C2 is the hourly cost of waiting per customer, and c and L, as usual, are the number of servers and the expected number in the system, respectively. It is an easy matter to take the existing MIMic TK model and incorporate this addition.

First, this model is called up from the QTK menu. The Variable Sheet appears. Three rows are added to the Variable Sheet by using the I key to access the list of command options and selecting Insert Row, placing three rows between rho and pO. (It is really not necessary to insert new rows since these could have simply been added at the bottom; however, a better format results in placing them in the position suggested.) The three rows would be used to define Cl, C2, and EC, using units of $/hr. The List Solve option will be used to solve for a range of c values, so an L must be entered in the St columns for c, EC, L, W, and Ptq. The new Variable Sheet is shown in Figure 22, with the added information (the three extra rows and the appropriate placement of L in the St column).

It is next necessary to go to the Rule Sheet to enter the formula for EC. This time a row is inserted between the formula for L and the statement "if n<=c then ... ". The modified Rule Sheet is shown in Fig 23.

152 CHAPTER 7

VARIABLE SHEET St Input-- Name- OUtput- Unlt-- Conment

M/M/~:Multlple Servers/Unlimited Queue

iat 10 min Mean interarrival time st 20 min Mean time to ~~lete service

6 lanDda 1/hr Arrival rate (arrivals/unit of time) 3 mu 1/hr Service rate per m8lVlel ('/t ime)

r 2 AVII • arrivals duril'lll aVIl servi~e time L 3 c • of servers In the system (c > 1)

rho .6667 Fraction of time each server is busy

3 Cl S/hr Hourly ~ost of a ~ustomer wal t 7 C2 S/hr Hourly ~ost rate of a server

L EC S/hr Total Expected Cost (hourly)

pO .1111 Probabil Ity of 0 in the system n Target • of ~ustomers In the system pn .2222 Probabil ity of n in the system

Lq .8889 Expected queue size L L 2.8889 EXPKted system size

Wq 8.8889 min Expe~ted waitil'lll time in the queue L II 28.8889 min Expected waitil'lll time in the system

10 t min Specific time in the queue L Ptq .2696 Prob. of wai til'lll >" t in the queue

PqO .5556 Probability of no wait in the queue

10 K Max variable value whose prob wanted pi( .0087 Probabil ity of K in system (K>-c) PI: .9827 Probabil ity of <- K in sYlltem

.1 d min input: size of tilll8 interval for plot 60 T min Total time horizon for prob plottil'lll

Tllq .9779 Probabi l ity that queue delay < .. T (should be 1 if full plot i II needed)

Fisure 22

Queueing In TK Solver

------- RULE SHEET -----------------S Rul~e-------------------------------------------------------

; Multi-Server I Unlimited Queue: MIMIc ; Data entry val idation rules

153

• if not(given( 'K» then cell boxmsg( 'warn2,"No value specified for requi red variable" ,1,1) * if ard(not(given('lantxla»,notCgiven('iat») then call boxmsg('warn2,"No value given for iat" 1 1) * if ~nd(not(given( 'mu»,not(given( 'st») then call boxmsg('warn2,"No value specified for st", ',1) * if not(given('c» then call boxmsg('warn2,"No value specified for c",l,l)

iat = 1/lantlda st = l/mu rho = lantxla/(c*mu)

* if rho >= 1 then CALL BOXMSG(' warnl, "W A R N I N G 1",1,1) r = lantxla/mu cell probs(c, r ,mu,lantlda,K;pO ,lq,pK,PK) Wq = Lq/ I antlda W = Wq + l/mu L = lq + r

* EC = Cl*c + CZ*l if n<=c then pn = pD*(rAn)/fact(n) else pn = pD*(rAn)/(fact(c)*cA(n-c» PqO = 1 - pD*c*rAc/(fact(c)*(c-r» Ptq = 1-PqO- (pD*(rAc)*(l-exp( - (c* ... -lantxla)*t»/(fact(c-l )*(c-r)) call waits(d, T;TWq)

Figure 23

154 CHAPTER 7

------- TABLE SHEET ----------------Na~ Title~---------------------------------------Q PROBS System-size probabil ities IIAITING TIMES COf for I ine waiting times COSTS - Performance and cost measures vs # of servers

Figure 24

------- TABLE: COSTS ----------------Title: Performance and cost measures vs # of servers Element c----- L II (min)- Ptq--- EC (S/hr)- ----- ---1 3 2 4 3 5

Figure 211

Going next to the Table Sheet, a table called Costs is added with the title, Performance and C06t Mea61J.re6 V6 No. of Server6, as shown in Figure 24. With the cursor on the newly added line, it is necessary to dive twice (using the> key) to enter the c values manually as 3, 4 and 5 (Figure 25). Pressing FlO then will invoke the list solve feature, and after a short time, the Table will be filled in with the output performance measures for each entered c (Figure 26).

If desired, a graph can be created by going to the Plot Sheet and adding a row called Costs, as shown in Figure 27. Diving once, chart characteristics must be entered as shown in Figure 28. Pressing F7 then will show the plot (Figure 29). When satisfied that this work is properly completed, the / command is used to access Storage, and then the newly created model can be saved under an appropriate name.

REFERENCES

[1] D. Gross and C. M. Harris, Fundamentals of Queueing Theory (2nd edition), John Wiley, 1985.

[2] R. B. Cooper, Introduction to Queueing Theory (2nd edition), Elsevier North Holland, 1981.

[3] 1. Kleinrock, Queueing Systems (volume 1), John Wiley, 1975.


------- TABLE: COSTS -----------------Title: Performance and cost measures VB II of servers Element c--- L W (min)- Ptct-- EC (S/hr)- --- ---1 3 2.8889 28.8889 .2696 29.2222 2 4 2.1739 21.7391 .064 27.2174 3 5 2.0398 20.398 .0133 29.2786

Figure 28

------- PLOT SHEET -----------------Name Plot Type- Display Optiorr Titlee-------------system.,probs Bar chart 1.VGA System-size probabil ity function ct.delaY_CDF Line chart 1.VGA CDF for I ine waiting times COSTS Li ne chart 1. VGA EC VB c

Figure 2T

------- LINE CHART: COSTS --------------Display Scale: Yes Display Zero Axes: None Display Grid: No Line Chart Seal in9: Linear Title: EC VI c X·Axis Label: II of servers (c) Y·Axis Label: expctd hrly syst coat X·Axis List: c Y·Axis- Styl..-- Character- Syrilol Count- First- Last-EC Curve. * 0 1

Figure 28

156 CH APTER 7

11: •• 31..

"'" '. .' ,-\ ,/

lSI '. / \ .. ' . I

\. ! /

au '. '"

.' /

•• \. / , \

au '. .' \. ,-

:II ,

" , / , . .,.. , ,-

'. \ / .,. .. '. ,.

\ .,. .. '. / , / ,/

"'2 12 U M ... <2 •• . .. ..

ICIII_"

Figure 29

8 ON-LINE ALGORITHMS FOR A SINGLE

MACHINE SCHEDULING PROBLEM

ABSTRACT

Weizhen Mao Rex K. Kincaid*

and Adam Rifkin**

Department of Computer Science College of William and Mary

William.9burg, VA 29187-8795

*Department of Mathematic, College of William and Mary

William.9burg, VA 23187-8795

**Department of Computer Science California ].""titute of Technology

Paladena, CA 91125

An increasingly significant branch of computer science is the study of on-line algorithms. In this paper, we apply the theory of on-line ilgorithms to job scheduling. In particular, we study the nonpreemptive single machine scheduling of independent jobs with arbitrary release dates to minimize the total completion time. We design and analyze two on-line algorithms which make scheduling decisions without knowing about jobs that will arrive in future.

1 INTRODUCTION

Given a sequence of requests, an on-line algorithm is one that responds to each request in the order it appears in the sequence without the knowledge of any request following it in the sequence. For instance, in the bin packing problem, a list L = (al' "2, ... , a,.) of reals in (0,1] needs to be packed into the minimum number of unit-capacity bins. An on-line bin packing algorithm packs iii, where i starts from 1, without knowing about 1Ii+1l ... , a,..

157

158 CHAPTER 8

As pointed out by Karp [13], on-line algorithms are often contrasted with offline algorithms, which receive the entire sequence of requests in advance. In other words, off-line algorithms know the future, while on-line algorithms do not. Therefore, given any objective function, the quality of the solution obtained with an on-line algorithm will be no better (and is typically worse) than the solution obtained with an off-line algorithm.

In many situations, we wish to know the quality of a solution obtained by an on-line algorithm (hereafter referred to as the performance of the algorithm). A commonly used method to evaluate the performance of an on-line algorithm is to define a stochastic model by assuming a certain probabilistic distribution for the problem instances. The expected performance of the on-line algorithm is then evaluated within the confines of the stochastic model. However, this approach is inconsistent with the nature of on-line algorithms since, as pointed out by Karp [13], the choice of a stochastic model requires data that may not be readily available about the request sequences that have been observed in the past, as well as faith that the future will resemble the past.

An alternative approach is to compare an on-line algorithm with the optimal off-line algorithm for the same problem in the worst case. Karlin, Manasse, Rudolph, and Sleator [12] defined the term c-competitive to refer to an on-line algorithm with performance that is within a factor of c (plus a constant) to optimum on any sequence of requests. More formally, given any instance I, assume the problem asks for the minimization of an objective function 0'(1). Let A be an on-line algorithm for the problem. Let O'A(I) and 0'·(1) be the values of the objective function on I in the solution obtained by A and the solution obtained by the optimal off-line algorithm, respectively. Algorithm A is c-competitive (c is also called the competitive ratio) if there exists a constant a such that for any instance I,

For many optimization problems, on-line algorithms have been analyzed using this method. For example, Sleator and Tarjan [24] presented the 2-competitive Move-to-Front algorithm for list processing. Manasse, McGeoch, and Sleator [17] conjectured the existence of Ie-competitive algorithms for the Ie-server problem.

In the area of job scheduling, Graham [9] pioneered the on-line algorithm study by designing an on-line algorithm for the multiprocessor scheduling problem. In the problem, a set of n independent jobs are to be scheduled nonpreemptively on m parallel machines. The goal is to construct a schedule with the minimum

On-line Algorithms 159

makespan. Graham defined an on-line algorithm called List-Scheduling (LS), in which the jobs are kept in a list, and when a machine becomes idle the first job in the list is removed from the list and assigned to the machine. Graham proved that for any instance I, O'Ls(I) ~ (2 - ~)O'·(I).

In this paper, we study a single machine scheduling problem, in which jobs are not all available at the beginning, but instead are given release dates. We define two on-line algorithms and study their c-competitiveness. It should be pointed out that since release dates are involved, we naturally define on-line scheduling algorithms to be ones that have no knowledge about the jobs that have not arrived yet and make scheduling decisions based on all the jobs available at any given time. This is clearly different from many on-line algorithms studied by the computer science community, in which the notion of time is not involved. It should also be pointed out that using competitive ratios to study heuristics is not new in the area of scheduling theory. However, scheduling theory has focused on off-line models, which assume that the entire sequence of jobs arriving at a service facility is known in advance. For example, Potts [20) and Hall and Shmoys [11) gave analysis of two off-line heuristics for a single machine scheduling problem that seeks to minimize the maximum lateness over all jobs.

We organize the paper as follows. In Section 2, we define the single machine scheduling problem and present two well known, but perhaps not well studied on-line algorithms, First-Come-First-Served (FC F S) and Shortest-AvailableJob-First (SAJF). In Section 3, we prove that FCFS and SAJF are both n-competitive, where n is the number of jobs to be scheduled. In Section 4, we show that there is no on-line algorithm A for the problem such that A is c-competitive for any fixed constant c. In Section 5, we present some computational results. In Section 6, we give the conclusions.

2 A SINGLE MACHINE SCHEDULING PROBLEM

Given a set J of n independent jobs J1, J2 , ••• , I n . Job J; has processing time p; and becomes available at release date r;. A scheduling problem is defined to execute the jobs in J on a machine such that the total completion time I: Cj ,

where Ci is the completion time of J; in the schedule, is minimized. This problem can be denoted by 11r; I I: C; according to the 0:1.811' classification used by Lawler, Lenstra, Rinnooy Kan, and Shmoys [15), and was proved to be strongly NP-hard by Lenstra, Rinnooy Kan, and Brucker [16) even if all parameters that define the problem instance are given in advance. The problem

160 CHAPTER 8

is solvable in polynomial time however if r; = 0 for all j according to Smith [25].

The problem arises from process scheduling in operating system design. In the multi-user system, we have situations in which more than one process is waiting for CPU time, and the operating system must decide which one to run first. One of the goals that the operating system seeks to achieve is to minimize the average response time, i.e., * 2:(8; - r;), where 8; is the starting time of the execution of a process (job J;). Since 8; = C; - Pi' the average response time is in fact * (2: C; - 2: (p; + r;». Since * 2: (p; + r;) can be considered as a constant for a given instance, the problem is therefore converted to llr; I 2: C;.

Many algorithms have been designed and analyzed for the problem. Dessouky and Deogun [5] proposed a branch-and-bound algorithm. Deogun [4] presented a partitioning scheme. Chand, Traub, and Uzsoy [1] used a decomposition approach to improve branch-and-bound algorithms. Gazmuri [6] gave a probabilistic analysis. Posner [19] studied a greedy method and proved that it yields an optimal solution under certain conditions. Chu [2] defined a few algorithms which use a local optimal condition to make scheduling decisions.

All of the algorithms in the literature mentioned above are off-line in the sense that the algorithms know all the information about jobs even before they arrive in the system. This assumption becomes unrealistic in practice since jobs may arrive at any time and an algorithm is unable to know their p; and r; until their arrival. Furthermore, an algorithm may not even know n, the total number of jobs, until all of them arrive. We call this type of algorithm on-line. In other words, an on-line algorithm makes scheduling decisions without any knowledge about the future.

We present two well known on-line algorithms: First-Come-First-Served (FCFS) and Shortest-Available-lob-First (SAJ F). In both algorithms, a queue is maintained to contain all the jobs that have arrived but have not been executed. In FCFS, jobs in the queue are listed according to nondecreasing r; (jobs with the same r; are ordered by nondecreasing Pi), while in SAJ F, jobs in the queue are listed according to nondecreasing 1';. When the machine becomes idle after completing the execution of a job, the first job in the queue is assigned to the machine for execution. When a new job arrives, it is inserted into the correct position in the queue. Both FCFS and SAJ F are on-line since they make scheduling decisions only based on the information in the available queue. Both FCFS and SAJF are also conservative, meaning that the machine is never left idle when there are jobs in the available queue. We notice


that even though these two algorithms are commonly used, few analytic results are available.

We define the following notations. Let A be an algorithm for the scheduling problem. Let I be any problem instance. Then we use aA(I) and aO(I) to denote the total completion times of I in the schedule constructed by A and the optimal schedule, respectively. We note that the optimal schedule is off-line and yields the minimum total completion time.

3 ANALYSIS OF FCFS AND SAJF

Phipps [18] showed that if the jobs arrive according to a Poisson process then SAl F beats FC F S for a variety of performance measures. Schrage [22] proved that with SAl F the number of jobs in the queue at any point in time is less than or equal to the number of jobs in the queue for any other heuristic simultaneously acting on the same instance. We were unable to find extensions of these results for arbitrary arrival process and with total completion time as the performance measure. Consequently, we provide a proof that SAl F always beats FCFS with respect to total completion time, i.e., for all I, aSAJF(I) ~ CTFcFs(I). First, we need the following lemma.

LEMMA 1 Let JJsAJF(I) and JJFCFs(I) be the mazimum completion times (makespans) of any instance I in SAlF and FCFS, respectively. Then JJsAJF(I) = JJFcFs(I).

Proof Consider any conservative scheduling heuristic A. For any instance I, the makespan JJA(I) of the schedule obtained with A must be the sum of the total processing time L Pi and the total idle time L (Pi- That is,

Since in A the machine is left idle only when there are no jobs in the available queue, A obviously minimizes the total idle time, hence the makespan of the schedule. Because SAl F and FCF S are conservative, they both construct the schedules with the minimum makespan. So JJSAJF(I) = JJFcFs(I). 0

We then have the following theorem.

162 CHAPTER 8

THEOREM 1 O'sAJF(I) ~ O'FcFs(I) for any instance I.

Proof We prove by induction on n, the number of jobs in J. When n = 1, we denote the instance by 11• SAJF and FCFS behave exactly the same. So O'sAJF(I1) = O'FcFs(It}. Assume for any instance 1 .. _1 with n - 1 jobs, O'sAJF(I .. - 1 ) ~ O'FcFs(I .. -t}.

Now consider any instance I .. with n jobs. Let 1", with processing time pO and release date rO, be the job that is executed last in the FC F S schedule for I ... 1" must be the longest job among all jobs with the largest release date. Let l/I ~ ° be the idle time just before 1" in the FCFS schedule for I ... Clearly, l/I = max{O, rO - P-FcFs(I .. - 1 )}. Let 1 .. _1 be the same instance as I .. with 1" removed. We have

Now consider the SAJ F schedule for I ... Without loss of generality, we assume that if there are other jobs which have the same processing time and release date as 1" then 1" is executed last among them in the SAJ F schedule for I ... Let 1/J ~ ° be the idle time just before 1" in the SAJ F schedule for In. Assume that there are Ie ~ ° jobs following 1" in the SAJ F schedule for I .. , all no shorter than J O and arriving earlier than 1". If 1/J > 0, 1" must be the last job in SAJF and so Ie = 0, and if 1/J = 0,1" may be followed by some longer jobs in SAJF. Therefore,

lc1/J = 0. (2)

Furthermore, when 1/J > 0, all jobs except 1" are scheduled earlier than 1" in the SAJ F schedule for I... So 1/J = max{O, r" - P-SAJF(I .. -1)}. Since P-SAJF(I .. -1) = P-FCFS(I .. -1) by Lemma 1, then

1/J = l/I. (3)

Let C, be the completion time of the job J, that is scheduled right before 1" in the SAJ F schedule for I ... Note that J, and 1" may be separated by 1/J. Then,

(4)

Therefore, we have

O'sAJF(In) O'SAJF(In-d + (C, + 1/J + pO) + lc(1/J + pO)

< O'sAJF(I .. -d + P-SAJF(I .. -d + 1/J + pO (By (2) and (4»


< O'FcFs(In-d + J.LsAJF(In-d + 'IjJ + p' (By induction)

O'FCFs(In-d + J.LFCFs(In - l ) + cP + p' (By Lemma 1, (3))

O'FcFs(In) (By (1)). 0

Let us next consider the c-competitiveness of FCFS and SAJF.

THEOREM 2 O'FcFs(I) ~ nO" (I) faT any instance I of n jobs.

PToof The FCFS schedule for any instance I with n jobs has a block structure: Bb B2"'" BI, where in each block there is no idle time, and between two consecutive blocks there is an idle period. Let S(Bi) be the starting time of block Bi. Obviously, Tj ~ S(Bi) for any J j E Bi'

Define another instance I' so that it contains JL ... , J~, where the processing time of Jj is the same as Jj , and the release date of Jj is S(Bi) if J j E Bi in the FCFS schedule for I. The optimal schedule for I' has the same block structure as the FCFS schedule for I. Each block has the same jobs as in the corresponding block in the FCFS schedule for I. Furthermore, based on Smith [25), the jobs in each block in the optimal schedule of I' are executed according to the shortest-job-first rule.

We have 0"(1) ~ 0"(1') because I and I' have the same processing time for each job and the release dates in l' are all at least as early as those in I. Assume Bi has jobs Jil, ... ,Jile; with Pil ~ ... ~ Pile;. Let 0"(1', Bi) and O'FcFs(I, Bi) be the total completion times of jobs in Bi in the optimal schedule for I' and in the FCFS schedule for I, respectively. Then

and

Therefore,

O'FCFs(I, Bi) < kiS(Bi) + Pi1 + 2Pi2 + ... + kiPilei

< kiO'·(I',Bi ).

O'FCFs(I) O'FcFs(I, Bl ) + ... + O'FcFs(I, Bd

< k10" (I', B1) + ... + klO" (I', Br)

164

< n(O'·(1', B1 ) + ... + 0'·(1', BI»

nO'· (I') < nu·(1). 0

THEOREM 3 O'SAIF(1) ~ nu·(1) for Gny instGnce I ofn jobs.

Proof Straightforward using Theorems 1 and 2. 0

CHAPTER 8

We can show that the competitive ratio n in Theorems 2 and 3 is tight in the sense that it is achievable by some instance. Consider the following instance I with n jobs. Let Pl = M and rl = 0, where M is an arbitrarily large positive number. Let Pi = 1 and ri = f for j = 2, ... , n, where f is an arbitrarily small positive number.

In the optimal schedule, the machine waits intentionally for f time units until jobs J2 , ••• , I n are released, then executes J2 , ••• , I n sequentially, and finally executes the long job J1 • Therefore, 0'.(1) = (1 + f) + (2 + f) + ... + (n - 1 + f) + (M + n - 1 + f) = M + in(n + 1) - 1 + nf.

In the schedules constructed by FCFS and SAJF, the machine executes J1

and then J2 , ••• , I n . Therefore, O'FcFs(1) = O'SAIF(1) = M + (M + 1) + ... + (M + n - 1) = nM + tn(n - 1) .

.. ;-r-'-;-.£""':'-:-7- -t n for M -t 00 and f -t O.

Now let us compare FCFS and SAJF with other algorithms. As a matter of fact, very few algorithms for the problem have been analyzed using the competitive ratio. Among those that have been analyzed, there are EarliestCompletion-Time (ECT), Earliest-Start-Time(EST), and Priority-Rule-forTotal-Flow-time (P RTF). In his recently published paper, Chu [2] proved that the tight competitive ratio for ECT and EST is n, and the competitive ratio for PRTF is between i(n + 1) and Hn + 1). ECT, EST and PRTF are all off-line. The study of FC F Sand S AJ F tells us that an algorithm does not have to be off-line to achieve the same competitive ratio of some off-line algorithms. Just knowing the available jobs is adequate.

In many settings, ignorance of the future is a great disadvantage, yet knowing the future is costly and sometimes impossible. How much is it worth to know the future? This becomes a very interesting question.


4 A GENERAL LOWER BOUND

From the discussion in the last section, we have found that in the worst case both F C F S and S AJ F behave badly since their com petiti ve ratios are n, and n can be arbitrarily large depending upon the size of the instance. We are interested to know whether there is anyon-line algorithm for IJrj J L Cj whose competitive ratio is bounded by a constant instead of an instance parameter. The answer to this question is in the following theorem.

THEOREM 4 For anyon-line algorithm A for IJrj J L Cj , there are no constants C and a such that uA(/) :$ cU'(/) + a for any instance I.

Proof We prove by contradiction. Assume that there is an on-line algorithm A such that 0' A (I) :$ CU' (I) + a for some constants c, a, and for any instance I.

Using the adversary argument, we assume that the input instance is provided by an adversary. A good adversary forces the algorithm to make bad scheduling decisions. Suppose that job J 1 is the only job that arrives at time 0, i.e., r1 = 0, and J1 has processing time P1 > a. Since A is an on-line algorithm, it only sees J 1 in the queue and makes a scheduling decision. There are two possibilities to consider.

Case 1. A decides to execute J1• The adversary then chooses n, the number of jobs in the instance, to be larger than c+3, and assumes that for j = 2,3, .. , ,n, rj = 6, where 6 < ~, and Pj = £, where £ < c(n-~~(n+2)" We call this instance I.

In the schedule constructed by A, after J1 is completed, all of the n-l remaining jobs are available. Therefore, uA(/) ~ np1 + (n - I)E + (n - 2)£ + ... + 2£ + £ = npl + tn(n - 1)£.

In the optimal schedule, the machine waits until all the short jobs arrive. Therefore, 0"(/) = n6 + nf + (n - I)E + ... + 2£ + P1 = nO + Hn -1)(n + 2)£ + Pl'

So we have

CU· (I) + a C

cn6 + '2(n - l)(n + 2)£ + CPl + a

< P1+Pl+CPl+Pl (c + 3)P1

166 CHAPTER 8

1 < npl + '2n(n - l)e

uA(1).

This is a contradiction to the assumption that uA(I) ~ cu"(I) + a.

Case 2. A decides to wait for the next job. The adversary then chooses n, the number of jobs, to be 1, i.e., no more jobs will arrive. This forces A to wait forever. Therefore, uA(1) = 00. In the optimal schedule, u"(1) = Pl. So uA(1) > cu"(I) +a. This is again a contradiction. 0


The purpose of this section is to examine the performance of the bound given in Theorem 1 and the c-competitiveness ratios given in Theorems 2 and 3. We begin with a set of four, 1000 job simulation experiments that provide insight into the quality of the bound given in Theorem 1 for total completion time, as well as the differences between Fe F S and SA] F for several other performance measures. We have conducted many other simulation experiments, but these four suffice to illustrate the key conclusions. We note that SA] F has already been shown by Conway, Maxwell, and Miller [3] to be a robust queue discipline under a variety of conditions and we make no attempt to provide an exhaustive computational analysis here. Next, we investigate the conclusions of Theorems 2 and 3 for a 10 job, 30 job, and 50 job problem. Since this scheduling problem is NP-hard we were unable to produce a guaranteed optimum for the 30 and 50 job problems. A tabu search heuristic was used to generate what appear to be high quality feasible solutions. Tabu search has enjoyed many recent successes with a variety of scheduling problems (cf. Glover and Laguna [8]) For completeness we give a brief description of the tabu search procedure in Section 5.2.

5.1 Simulation experiments

The simulation experiments were implemented using a SLAM II (see Pritsker [21]) discrete event simulation model of a single server queue. Each simulation run begins with the queue empty and the server idle. 1000 jobs are created and processed. Table 1 characterizes each of the simulation models. Column 2 lists the distribution of the time between job arrivals. Column 3 gives the


Table 1 Simulation Model Deacriptions-1OOO Jobs

# Interarrival %Sm/Lg P; Distribution p

1 expon(4) 95/5 iriag( 1,2,4)/ (10, 12,14) 0.69 2 expon(4) 80/20 triag(I,2,4)/(10,12,14) 0.97 3 expon(3) 100/0 triag (1,2,4) 0.78 4 unfrm(2,4) 100/0 triag (1,2,4) 0.79

Table 2 Simulation Result. FCFS/SAJF-IOOO Jobs

# EC;/n Wq Lq max in Q maxC; 1 2087/2085 4.9/3.6 1.2/0.9 13/9 4072.4 2 2170/2110 84.2/29.8 20/7.0 41/18 4218.3 3 1567/1566 5.3/4.2 1.7/1.4 15/12 3054.7 4 1492/1492 0.26/0.25 .087/.085 2/2 2989.6

percentage of jobs generated from two job classes (small processing times and large processing times). The distribution from which the job processing times are sampled is given in column 4. Column 5 is the traffic intensity. When p > 0.9 we consider the system to be congested. Table 2 lists the values of several performance measures for the FCFS and SAJF queue disciplines for each of the four simulation models of Table 1. Column 2 lists the average job completion time and column 3 lists the average waiting time. The average and maximum number of jobs in the queue is given in columns 4 and 5, respectively. Column 6 lists the value of the makespan which are identical for SAJ F and FCFS (Lemma 1).

We note that the average number in the queue, Lq, is smaller for SAJ F than for FCFS. Little's formula (see Gross and Harris [10]), Lq = >.Wq, states that the average number of jobs in the queue equals the product of the arrival rate to the queue, >., and the average waiting time in the queue, Wq • It can be shown that Little's formula applies to our single server queue with either SAJ F or FCFS queue discipline. In Section 3 we showed that O"sAJF(I) ::::: O"FcFs(I) for any instance I. This result implies that Wq(SAJF) ::::: Wq(FCFS) and, since>. is a constant, we have Lq(SAFJ)::::: Lq(FCFS).

As we expected, the uniform interarrival distribution smoothed out the arrivals and decreased the size of the queue. When p = 0.79 (model 4) this resulted in nearly identical performance of SAJF and FCFS. A more telling comparison of the performance of SAJF versus FCFS lies in the waiting times. Since

168 CHAPTER 8

the processing times and release dates are included in the computation of the completion time the differences between the waiting times is obscured. A final observation is that the maximum number of jobs in the queue under SA] F was never larger than the maximum number of jobs in the queue for Fe F S. This follows from Schrage's [22] result for SA] F.

5.2 Tabu search

Tabu Search (TS) incorporates conditions for strategically constraining and freeing the search process and memory functions of varying time spans to intensify and diversify the search. The search proceeds from one solution to another via a move function and attempts to avoid entrapment in local optima by constructing a tabu lilt which records critical attributes of moves selected during a window of recent iterations. These attributes identify elements of the solutions that change when progressing from one solution to another, and those from the elected window are declared tabu (forbidden). Current moves are then chosen from a restricted set that excludes the inclusion of tabu attributes (or of a specified number or combination of these attributes), thus insuring that solutions with these attributes (including solutions from the specified window) will nQt-be visited. This restriction may be modified by including aspiration criteria -tha.t allow an otherwise tabu move to be made if it leads to a solution that is sufficiently attractive as measured by these criteria-as, for example, a solution better than any previously discovered solutions. Together, these tabu restrictions and aspiration criteria form the short term memory function of tabu search, which can also be augmented by longer term memory functions to achieve goals of intensification and diversification. We briefly outline the overall structure of a TS solution approach, as a modification of an outline suggested by Skorin-Kapov [23].

• CONSTRUCTION PHASE: Generate a feasible solution.

• IMPROVEMENT PHASE: Perform the short term memory TS improvement phase mazit times, and then execute one of the following longer term memory functions:

- INTENSIFY the search by choosing a previously unselected member of a recorded set of best solutions to restart the improvement phase (retaining memory to choose a new successor), or by choosing a member of the best solutions uncovered but not yet visited. Repeat step 2.


- DIVERSIFY the search by driving it to explore new regions. This may be done with either a frequency based memory that favors the inclusion of rarely incorporated attributes, or a recency based memory that may use distance measures that favor high quality solutions with attributes as unlike as possible from previously observed solutions. Repeat step 2.

- STOP. Display the best solution found.

We implemented a plain vanilla version of TS. Instead of a construction phase we use the FCFS schedule as an initial feasible solution. The improvement phase is a simple greedy local improvement scheme. All pairwise interchanges of the jobs in the schedule are considered. The pair that improves the objective function the most (or degrades it the least if all improving interchanges are tabu) is selected at each iteration ofthe improvement phase. A tabu interchange is allowed only if it results in the best objective function (total completion time) value yet generated. This is called the aspiration criterion. For further information on TS interested readers are referred to Glover [7], Glover and Laguna [8] and Kincaid [14].

5.3 Experiments with c-competitiveness

The second set of experiments compares the performance of SAJ F and FCFS to the optimal schedule, with respect to total completion time, for a 10 job example and to the best schedules found by a tabu search heuristic for a 30 job and a 50 job problem. The examples are the first 10, 30 and 50 jobs, respectively, generated via model 2 of Table 1. In the example with 10 jobs, O'sAJF(I) = 342.0, O'FCFS(I) = 342.3, and 0'·(1) = 269.0. The optimal schedule was found by enumerating all of the 10! schedules and computing 0' for each one.

It was computationally infeasible to calculate the optimal schedule for the number of jobs greater than 10 (We used a 33Mhz 486 class micro-computer). For the 30 and 50 job examples a tabu search heuristic was used to generate good solutions. The tabu search we use, as described in Section 5.2, is a plain vanilla approach. Instead of a construction phase the FCF S schedule was used as the initial starting solution. No intensification or diversification was used. Table 3 lists the parameters selected for our tabu search (ma:z:it and tabusize) as well as three performance features (columns 4-6). In column 2, ma:z:it is the maximum number of neighborhood searches allowed. Column 4 lists the iteration

170 CHAPTER 8

Table 3 Tabu Search Characteri.tic.

n mazit tabusize itr. best # Asp. # tabu 10 50 10 19 0 221 30 150 80 68 8 4,350 50 300 80 110 12 10,401

Table 4 Comparison of Average Completion Times

n TS FGFS ratio SAJF ratio 10 26.9· 34.2 1.27 34.2 1.27 30 75.8 80.8 1.07 80.5 1.06 50 126.3 134.0 1.06 132.0 1.04

when the observed best total completion time was found. The number of times the aspiration criterion was satisfied is given in column 5. Column 6 lists the total number of moves that were declared tabu.

Table 4 summarizes the performance of FGFS, SAJF and TS for three job sequences taken from the job data generated in simulation model 2 of Table 1. Column 2 gives the best solution found by TS. When n = 10 we have verified that this is also the optimal value. The next two pairs of columns (3-4 and 5-6) list the average completion times for FCFS and SAJF and the ratios FGFS/TS and SAJF/TS, respectively. These ratios show that, at least for these examples, the worst-case analysis of Theorems 2 and 3 may be overly pessimistic for the average case.

6 CONCLUSIONS

In this paper, we applied the theory of on-line algorithms to the scheduling problem llril E Gi , and studied the c-competitiveness of two on-line algorithms-FGFS and SAJF. Furthermore, we proved that there is no on-line algorithm with constant competitive ratio for this problem. We also presented some computational results that illustrated the dominance of SAJ F and the overly pessimistic nature of the c-competitiveness worst-case results.

As for the direction of future research, we are currently working on algorithms with look-ahead allowed. We are interested in applying the theory of c-competitiveness to other job scheduling problems. We would also like to


generalize the algorithms for llrjlECj to Plr;IEC; and RlrjlECj in the multi-machine environment.

Acknowledgements

We wish to thank two anonymous referees for their helpful comments. Weizhen Mao was supported in part by NSF grant CCR-9210372, and Rex K. Kincaid was supported in part by a faculty research award from the College of William and Mary.

REFERENCES

[1] S. Chand, R. Traub, and R. Uzsoy, 1993. Single machine scheduling with dynamic arrivals: Decomposition results and a forward algorithm. technical Report 93-10, School of Industrial Engineering, Purdue University, West Lafayette, IN.

[2] C. Chu, 1992. Efficient heuristics to minimize total flow time with release dates, Oper. ReB. Lett. JR, 321-330.

[3] R. W. Conway, W. L. Maxwell, and L. W. Miller, 1967. Theory of Scheduling, Addison-Wesley, Reading, MA.

[4] J. s. Deogun, 1983. On scheduling with ready times to minimize mean flow time, Comput. J. 1t6, 320-32B.

[5] M. 1. Dessouky and l. S. Deogun, 1981. Sequencing jobs with unequal ready times to minimize mean flow time, SIAM J. Comput. 10, 192-202.

[6] P. G. Gazmuri, 1985. Probabilistic analysis of a machine scheduling problem, Math.. Oper. Res. 10, 328-339.

[7] F. Glover, 1990. Tabu Search: A Tutorial, Interfaces ItO, 74-94.

[8] F. Glover and M. Laguna, 1993. Tabu Search in Modern Heuristic Techniques for Combinatorial Problems, C. R. Reeves, ed., Blackwell Scientific Publishing, 70-150.

[9] R. L. Graham, 1969. Bounds on multiprocessing timing anomalies, SIAM J. Appl. Math.. 17, 416-429.

172 CHAPTER 8

[10] D. Gross and C. M. Harris, 1974. Fundamenta16 of Queueing Theory, John Wiley and Sons, New York.

[11] L. Hall and D. Shmoys, 1992. Jackson's Rule for One-Machine Scheduling: Making a Good Heuristic Better, Math.. Oper. ReI. 17, 22-35.

[12] A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator, 1988. Competitive snoopy caching, Algoritkmica 3, 79-119.

[13] R. M. Karp, 1992. On-line Algorithms Versus Off-line Algorithms: How Much is it Worth to Know the Future?, International Computer Science Institute Technical Report TR-92-044, Berkeley, CA.

[14] R. Kincaid, 1992. Good Solutions to Discrete Noxious Location Problems, Ann. of Oper. ReI. 40, 265-281.

[15] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan and D. B. Shmoys, 1990. Sequencing and Scheduling: Algorithms and Complexity, in Handboou in Ope7'Gtionl Relearch. and Management Science, Volume 4: LogilticI of Production and Inventory, S. C. Graves, A. H. G. Rinnooy Kan and P. Zipkin, ed., North-Holland.

[16] J. K. Lenstra, A. H. G. Rinnooy Kan and P. Brucker, 1977. Complexity of Machine Scheduling Problems, Ann. Dilcrete Math.. 1, 343-362.

[17] M. S. Manasse, L. A. McGeoch and D. D. Sleator, 1990. Competitive Algorithms for Server Problems, J. of Algoritkml 11, 208-230.

[18] T. E. Phipps, 1956. Machine Repair as a Priority Waiting-line Problem, Oper. ReI. 4, 45-61.

[19] M. E. Posner, 1988. The Deadline Constrained Weighted Completion Time Problem: Analysis of a Heuristic, Oper. ReI. 36,742-746.

[20] C. N. Potts, 1980. Analysis of a Heuristic for One Machine Sequencing with Release Dates and Delivery Times, Oper. ReI. fB, 1436-1441.

[21] A. A. B. Pritsker, 1986. An Introduction to Simulation and SLAM II, John Wiley and Sons, New York.

[22] L. Schrage, 1969. A Proof of the Optimality of the Shortest Remaining Service Time Discipline, Oper. ReI. 16, 687-690.

[23] J. Skorin-Kapov, 1990. Tabu Search Applied to the Quadratic Assignment Problem, ORSA J. on Comput. f, 33-a-45.


[24] D. D. Sleator and R. E. Tarjan, 1985. Amortized Efficiency of List Update and Paging Rules, Comm. A CM 28, 202-208.

[25] W. E. Smith, 1956. Various Optimizers for Single-Stage Production, Naval ReI. Logut. q'IJ.art. 3, 56-66.

9 MODELING EXPERIENCE USING

MULTIVARIATE STATISTICS

ABSTRACT

Jerrold H. May and Luis G. Vargas

Artificial Intelligence in Management Laboratory Joseph M. Katz Graduate School of Business

University of Pittsburgh Pittsburgh, P A 15260

In an environment where dynamic planning over a rolling horizon together with system monitoring is required, such as in a production environment where lots are released on a frequent basis, an intelligent computer support system may be of particular value if it is capable of (a) recognizing the occurrence of unusual states of the shop floor, such as congestion at bottlenecks, and (b) evaluating the relative desirability of two or more possible courses of action of time. Experience equips humans with those

capabilities. This paper describes how we use a multivariate statistical approach for mechanizing experience in a hybrid OR/AI shop floor planning and control system.

1 INTRODUCTION

The comparison of an observed realization of the state of a system with its expected or desired state is fundamental to planning and control in a manufacturing environment. Intelligent comparison is based on both experience and expertise, where ezperience helps to determine expectations and t recognize the significance of variances, and ezpertise helps to diagnose the causes of the variances and in dealing with them. This paper deals with the mechanization of experience by exploiting multivariate statistical techniques, in an environment in which adequate data exist. A statistical approach provides a mechanism for incorporation of new data, as they occur, and for the elimination of obsolete information. For a description of the AI/OR system for which the expectations are computed, see [7].

175

176 CHAPTER 9

We use multivariate methods to generate expectations and interpret variances in a fashion similar to the way in which they are used in quality control. [1] and [8] provide comprehensive discussions of multivariate quality control procedures. Our methodology differs from the traditional approach in several ways. Statistical quality control is generally interested in monitoring characteristics of the product being manufactured. Measurements of the attribute of interest are typically aggregated over a series of small subgroups, and control chart practice involves two distinct phases. Phase I consists of using the charts for (1) retrospectively testing whether the process was in control when the first subgroups were being drawn and (2) testing whether the process remains in control when future subgroups are drawn. Phase II consists of using the control chart to detect any departure of the underlying process from standard values [1].

In contrast, we are interested in monitoring the production process itself, not its output, and all our subgroups are of size 1 (there are also traditional situations where subgroups are also of size 1; see [1]). Most importantly, we do not treat the analysis as having two distinct phases. Because we are measuring the production process and not the product, there is no notion of "out of control" points; there are days of unusually high or low productivity at particular steps of the process. The purpose of detecting variances is to expedite material or to reallocate resources.

The analysis in this paper extracts and interprets expectations based on Hotelling's T'l distribution [3]. Control charts constructed using that distribution are multivariate analogues of the Shewhart x-bar chart. We use Hotelling's T2, as opposed to HoteHing's X2, because the covariance matrix is unknown [2]. Alternatives to a Hotelling-based approach include:

1. the multivariate exponentially weighted moving average control chart approach, which gives incrementally less weight to older data. That characteristic might provide a mechanism for mimicking human adjustment of expectations as the production process changes, although we do not currently have any basis upon which to set the smoothing constant; and

2. the multivariate cumulative sum control chart approach. Recent results indicate that the exponentially weighted moving average chart may be preferable to it ([4], [6]).

[4] compares the sensitivity of the three control chart methodologies from a univariate perspective, and found that HoteHing's approach is the most appropriate of the three for large shift detection, which is our interest.

Modeling Using Statistics 177

2 EXPECTATIONS

Consider a job-shop production environment in which lots of a variety of products are sequentially processed on a series of shared and dedicated resources. The material goes through a substantial number of sequential processing steps, and is resident on the shop floor for a considerable length of time. The processing time at each step, and the total residence time on the floor, is highly variable. While rework may be necessary, because of the nature of the production process, the material never returns to an earlier processing step, although it may remain at a particular step while rework occurs. Finally, the product is shipped when it is completed; production for inventory is not allowed. The information available to the production manager and scheduler to make decisions in this environment include weekly demand forecasts and a daily status report of all work-in-process lots and machines.

Using the above information, we want an intelligent decision support system to answer the following questions:

(1) Is the current state of the system out of line with expectations? If there are variances, where are the favorable ones (ahead of expectations) and where are the negative ones (behind schedule)?

(2) If there are unfavorable variances, what possible strategies, short term (sequencing and lot expediting) and medium term (machine setups and planned lot release) could be pursued to rectify the variances, how much would they cost, and how quickly would those strategies bring the state of the system back in line with expectations? Because capacity is fixed, in the short run, eliminating unfavorable variances may erase favorable ones, and achieving favorable variances may result in the creation of unfavorable ones. The decision support system should assist in the assessment of the trade-offs involved in the planning process.

As in financial statement analysis, expectations have both stoclc and flow aspects, and are only meaningful in contezt. Stoclc ezpectations evaluate the number of lots of a particular product type at each point in processing at a particular time. Because lots are released onto the floor only early enough to meet their due dates with a particular probability, stock expectations may be measured relative to the future history of daily locations anticipated at the time of lot release, or relative to (conditional on) the lots' actual locations at a time after their release. Flow ezpectations measure the number of lots that visit each of the processing steps within a given time interval.

178 CHAPTER 9

Interpretation in context means that variances from expectations cannot be understood unless longer term information is taken into account. For example, in the environment we studied, setup times and the makes pan are very substantial. Machine allocation and lot release decisions are made based on medium and long term forecasts, and the shop floor configuration changes at least weekly. The scheduler needs to be concerned if material and capacity are not distributed in the same proportions on the shop floor. The scheduler has to look at the current state of the shop floor together with the machine setup plans for the next several weeks in order to determine whether or not a problem exists at the current time, because only then can he estimate the future states of the shop floor and compare them with his desired states.

In this paper, we describe how we extract stock and flow expectations from past production data. Lot sequencing decisions make the product flows nonMarkovian, so we need to consider a series of n.-day flows enough to encompass the residence time of product on the shop floor.

3 CONSTRUCTING EXPECTATIONS

In our manufacturing environment, products go through a series of processing steps, where the last, absorbing, step means that the lot is "completed." The state of the shop floor is observed once each day, at the same time of day. Assume that a large number, D, of consecutive daily floor observations is available, and that the demand forecasts over that period of time are known and stable. For each category of product, we compute the k-th-order transition matrices by counting the number of lots that were observed to move from step i to step i in exactly k days. Each row of the matrix is divided by its row sum in order to obtain relative frequencies. The sample yields D -1 first-order matrices, D - 2 second-order, and 50 on; the transition matrices are all upper triangular. Because due dates, lot priorities, and product line importance are taken into account in determining the queueing disciplines for the machines, the process is not Markovian (see the example in Section 4), and we must explicitly construct transition matrices of all orders of interest.

Let Xt = (Xt .1! Xt•2 • ••• ,Xt•m ) be an m-dimensional random variable that denotes the number of loti at production step 1,2, ... , m, at time t. Let Yt

be the random variable that denotes the position of a particular lot at time t. Let n. = {?r.(i,i) = pry, = ilYo = i]} be the s-th.-order transition probability matriz representing the probability that a lot at step i at time 0 is at step j,


s time periods later. Because of the non-Markovian nature of the system, to estimate the state ofthe shop floor k periods ahead we need to use the transition probability matrices IT., s = 1,2, ... k, because (ITd' f:. IT" s = 2,3, ... , k.

Stock expectations, those based on a count of lots of a particular type at a particular step in processing, come in two varieties-those based on the trajectory of futures projected at the time the lot is released onto the shop floor absolute stock ezpectations, ASEs) and those based on floor state realizations after the lot is released relative stock ezpectations, RSEs).

3.1 Absolute Stock Expectations

ASEs are determined by planned lot releases, which are derived from the long term demand forecasts. Without loss of generality, assume that all lots enter processing at step 1. Then the probability distribution of the locations of lots at days subsequent to lot release is given by the first rows of the transition matrices extracted from the daily floor observations. The daily floor observations yield a set of sample estimates of those probability distributions. Assuming that the sample size is large enough so that the multivariate normal is an adequate approximation to the underlying population distribution, confidence hyperellipsoids for the location of a set of lots for any given number of days after its introduction can be constructed using standard multivariate techniques [5].

Let nt be the number of lots released on day t, and let Nt = (nt, 0, ... ,0) be an m-dimensional row vector. Thus, if nt_i, nt-2, ... , and nt-k lots are released on days t - 1, t - 2, ... , and t - k, respectively, to satisfy a long term demand pattern, then the absolute stock expectations represent the most likely distribution of those lots in subsequent days. Let X t _ k be the state ofthe shop floor on day t - k. The estimated state of the shop floor on day t - k + 1 is given by the state of the Xt_k lots one day later, plus the new lots that were added at t - k, which have been on the floor one day, plus an error term to model the randomness of lot movements:

Similarly, on day t - k + 2, the Xt-Tc+i lots will have been on the floor an additional day, and the new lots added at t - k + 1 will have been on the floor for one day, so that:

180 CHAPTER 9

Because the system is non-Markovian, (Ih)2 ::j:. IT2, so that

X'_1:+2 = X'_kIT2 + N.-kIl2 + (E'_1:+1 + N'-l:+l)lIl + E'_1:+2·

In general, given the state of the shop floor on day t - Ie, the state on day t is given by the states of the lots already on the floor on day t -Ie, updated by the kth-order transition matrix, plus the sum of the sets of lots added between t - Ie and Ie, each updated by the appropriate transition matrix, plus the error terms:

k k-l X'_k = X'_kITk + L N._,IT, + L f._,ll, + E. (1)

Consider D consecutive periods of time (days) for which we have observed the locations of all lots on the factory floor. If the lots have unique identification numbers, and material is not transferred between lots, it is possible to determine the fraction of lots at any production step which have moved to each subsequent processing step over an intervening period of days. Using such information, we construct the D - 1 sample one-period transition probability matrices, Pi';,; = 1,2, ... , D - 1; the two-period transition probability matrices P2,;,; = 1,2, ... , D - 2; ... and the kth-order transition probability matrices P",;,; = 1,2, ... , D - k. The sample matrices are realization, of the random matrices IT", = 1,2, ... , Ie, for the periods; = 1,2, ... , D - s. Let X", be the estimated state of the shop floor after s days, starting on day t - ,. The ;-th sampling approximation of (1), ; = 1,2, ... , D - k, is given by:

k k-l Xk,.,; = Xt-k,.Pk,; + L Nt_,P,,; + L et_,,; P,,; + et,;, (2)

,=1 ,=1

where e.,; is the error term ofthe estimation. Let P" s = 1,2, ... , k be the average of the transition probability matrices, Pi';, P2,;,"" ~,;,; = 1,2, ... , D-k, respectively. Let Xk,. be the mean of the sample (Xk,.,b Xk,.,2, ... , Xk,.,D-k)' An estimate of the expected state of the shop floor after Ie periods of time is given by:

where

k

Xk,. = X.-k,tPk + L Nt_,P, + ek,h

,=1


For a sufficiently large sample, the expectations are approximately distributed according to a multivariate normal. Hence, a confidence interval (hyperellipsaid) for the expected state of the shop floor I' is given by:

- T 1 - p(D - 1) D(Xi,e - 1') S;'eD(Xi,e - I'):S D _ p F",D_,,(o:), (3)

where Si,e is the variance-covariance matrix given by

where Xi,C,; is the jth estimate of the shop floor on day t - Ie, and p is the number of non-Iero eigenvalues of the variance-covariance matrix.

If a given state of the floor, 1'0, does not satisfy inequality (3), then the state of the shop floor is not within expectations, which is equivalent to testing the hypothesis Ho : I' = J.'o using Rotelling's ~-distribution:

2 (D - p)D - T -1 -T = p(D _ 1) (Xi,t - 1') Si,e (Xi,e - p.)

3.2 Relative Stock Expectations

RSEs are determined using the same multivariate techniques, but are based on a different set of vectors. The discussion in this section describes the derivation of a one-day RSE. Two-day and higher RSEs are constructed in an analogous fashion.

As before, let the row vector Xc denote the number of lots at each processing step, for a product line, on day t. Premultiplying a first order transition matrix by the vector representing the state of the shop floor at a given time t gives an estimate of the distribution of those same lots at time t + 1. The distribution of lots on day t based on the distribution on day t - Ie is given by:

i-1

Xc = XC_IIITIl + L fC_. IT• + ft· .=1

(4)

There are D - Ie such estimates available from the sample. Using the D - Ie le-th-order transition matrices, a confidence hyperellipsoid can be constructed as before.

182 CHAPTER 9

Given a sample of size D from which we derive the transition probability ma.trices P,,,, j = 1,2, ... , D - Sj S = 1,2, ... ,Al, we obtain estimates of Xt:

,-1

i"t,; = Xt_,P,,; + L et-h.,;p,.,; + et", 11.=1

(5)

and the expected state of the floor is given by the average of the estimates given by (4):

where

X"t = Xt_,P, + e"t

1 D-, (,-1 ) e"t = D _ S L Let-h,;Ph,; + et,; .

;=1 h=l

The confidence hyperellipsoid is also given by (3).

The projection of Xt_lc into the future using the transition matrices, and the comparison of that time trajectory to that of the ASEs, allows us to study the impact of resource allocation decisions. The overlap, or lack of it, of the hyper ellipsoids corresponding to the ASEs with those corresponding to the RSEs allows us to identify and quantify variances. When a variance occurs, flow expectations, discussed in the next sections, indicate the bottlenecks as well as the unexpectedly productive processes.

3.3 Flow Expectations

Flow expectations are derived from the transition matrices, and are useful for making resource allocation decisions, because they highlight system bottlenecks. While stock expectations are based on the distribution of the proportion of a set of lots by step number, flow expectations are based on the number of lots that have visited a particular step during the time period of interest.

The fundamental distinction between stock and flow expectations is that the former are built by estimating conditional probability distributions, while the latter are built by estimating conditional cumulative probability distributions. Given a number ollots at step 6, the flow expectation estimates the number of them that should have visited step 'Y, 'Y > 6, over a given time interval. Because material cannot flow backward, for any transition matrix p";' the associated


flow transition matrix Q.,; has its elements given by

As before, there are D - 1 such matrices for a one-day interval, D - 2 for two days, and so on. Using the same notation as before, we generate each sample point by premultiplying the appropriate flow transition matrix Q.,; by the vector X",t,i (see (2)), compute the variance-covariance matrix, and extract the F-statistic and tail probability.

4 AN EXAMPLE

In this section, we illustrate the foregoing idea of statistically based expectations in a manufacturing environment, using actual data from the Westinghouse Commercial Nuclear Fuels Division Specialty Metals Plant (CNFD/SMP) in Blairsville, P A, which manufactures high-grade seamless tubing for various uses in nuclear power plants. CNFD/SMP makes thin-walled, seamless nucleargrade high-quality tubing from several zirconium-based alloys. The finished product is usually about ten feet long, with an outside diameter between about one-fourth and seven-tenths of an inch. Most of the tubes are used to hold fuel for nuclear reactors; some are used for instrumentation. About twenty-five different products will be produced in the course of a year, out of around ninety possible products. Typically, four to five different products for up to fifteen customers are manufactured at anyone time.

Tubes are made by high-speed cold-working of tubes on stationary cold pilger mills (CPMs). Pilgering stretches a short, thick-walled tube into a longer, thin-walled one. Material is processed in lots that begin with less than ten tubes, weighing about 100 pounds apiece; each starting tube results in about 100 finished tubes. Products require multiple passes through the pilger-pickleanneal cycle, in which material is worked, deburred, cleaned, briefly pickled in an acid bath, rinsed, dried, and vacuum annealed. Different product lines may share certain reduction sequences, but tubes with different outside diameters differ in at least their final pilgering pass. A pilger can work on only one product at a time. CPM changeover from one product to another may take less than an hour, if the two products share a common outside diameter, to several days, if a die change and machine certification process is necessary.

184 CHAPTER 9

A point estimate of the time required for processing is not adequate in the CNFD /SMP environment. The amount of time necessary to process a lot on a CPM is highly variable. The plant's total commitment to quality is also re1lected in randomized testing of lots on the 1Ioor, in addition to 100% inspection at several points in the process; no part of a lot may progress to the next processing step if even one tube needs rework. Even under favorable circumstances, final pass pilgering takes about twenty times as long as first pass pilgering. Annealing requires about six times as long as first pass pilgering.

Most pilgering steps employ parallel machines, not all of which have the same processing rates. Each group of pilgers with the same setup shares a common queue of lots, and the lot at the head of the queue will be assigned to the first available machine which can accommodate it. A lot's rate of progress depends on the machines to which it is actually assigned, a sequence which is not determined a priori.

Other factors also contribute to the variability in a lot's rate of progress from release to completion. The plant usually works three shifts a day, five days a week, but certain of the machines may also work part or all of a weekend, meaning that lots at certain steps have an additional opportunity to progress. Similarly, because of variable manning, each pilger works on its own production calendar, and all the pilger calendars may differ from that of the furnaces and ofthe pickle house.

Each tube goes through up to twenty-one major processing steps prior to final inspection and packing. Depending on the final diameter required, and the application to which the tubes will be put, the material is processed through two, three, four, or five pilger/pickle/anneal cycles. The particular category of tubes considered here (fuel tubes requiring three major production cycles) do not go thorough steps one through five, or eighteen through twenty.

Products are clustered by their operation sequencing. For example, all four-pass fuel tubes that are made using the ''improved zirconium 4" alloy are grouped together; there are about six different product groups. That is, if a certain number, say 6000, of tubes of a particular product, A, must be shipped in week x, then twenty lots, four per day, each of three 100-pound tubes, will be released to the 1Ioor in week x-4. For example, on Monday of week x-3, four lots of A will have been on the 1Ioor for five days; their distribution across processing steps will be given by the first rows of the fifth-order transition matrices. Four lots will have been on the 1Ioor for four days, and we can estimate their locations using the first rows of the fourth-order transition matrices, and so on.


Shop floor status information is collected for scheduling purposes on a daily basis; the example in this section is based on sixty-four consecutive daily reports between October, 1991 and February, 1992. We tabulated the movements of lots between processing steps as a function of residence time in process. That is, there were sixty-three matrices for movements oflots after one day on the floor, sixty-two matrices for lots with two days of residence, and so on. As previously discussed, it is necessary to construct such tables for all residence times, because the transition probabilities are non-Markovian. Tables la, lb, and lc, which show, respectively, the mean one-day transition matrix, its square, and the mean two-day transition matrices, illustrate the non-Markovian property; the average absolute percentage difference between the entries in the upper triangle of the matrix in Table lc and the corresponding entries in the matrix in Table Ib is more than 63%. Selection of lots for processing is highly dependent on due date and waiting time at a particular step, so that a Markovian assumption of (IT I )' = lpi,) would result in inappropriate expectations.

For the product line detailed in this section, N84, new material may be added either at production step six or at production step eleven. Tables 2a and 2b show the mean distributions, as a function of the number of work days they have been resident on the shop floor, for lots of material that were started at production steps six and eleven, respectively. That is, the first row of Table 2a is the average of all (nonzero) sixth rows of the tabulated one-day transition matrices described in the preceding paragraph; it shows the relative frequencies for the location of a lot which was started at step six and has been in process for one day. Rows that are all zeros occur because new material is not added every day, and because there are not necessarily lots at every processing step every day. Tables 2a and 2b are examples of Absolute Stock Expectations (ASEs).

The two final columns of Tables 2a and 2b show the degrees of freedom for the F -statistic used to interpret the significance of the T2 statistic. Let n denote the number of nonzero rows of the appropriate set of matrices, and let p be the dimension of the space spanned by the vectors, that is, the number of nonzero eigenvalues of the sample variance-covariance matrix computed from the nonzero vectors. The parameters of the related F-distribution are given by 111 = P and 112 = D - p.

In the thirteen working days between February 10 and February 26, 1992, six groups of N84 lots entered the shop floor. Those groups are coded with the letters A through F, and their locations, on a daily basis, are shown in Table 3. Alllot& added on a particular day had the same due date and entered at the same processing step. Group A consisted of six lots, all of which entered at processing step six, group B consisted of four lots, all of which entered at

I-'

Tab

le '

a~

Row

aver

ages

for

one

-day

tra

nsi

tion

mat

rice

s;

nLl1

iler

of m

atri

ces

wit

h th

at

row

non

zero

(to

tal

SBlI'

Pte

size

= 64

) 0

0

0)

6 7

8 9

10

II

12

13

14

15

16

17

21

nonz

ero

6 .2

114

.172

9 .2

348

.172

9 .1

139

.088

9 .0

000

.000

0 .0

000

.005

1 .0

000

.000

0 .0

000

39

7 .0

000

.400

0 .2

000

.127

3 .1

455

.127

3 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

55

8 .0

000

.000

0 .0

505

.277

8 .2

167

.443

9 .0

000

.005

1 .0

061

.000

0 .0

000

.000

0 .0

000

33

9 .0

000

.000

0 .0

000

.177

8 .2

214

.567

5 .0

333

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

30

10

.000

0 .0

000

.000

0 .0

000

.000

0 .8

129

.169

2 .0

179

.000

0 .0

000

.000

0 .0

000

.000

0 28

II

.0

000

.000

0 .0

000

.000

0 .0

000

.578

5 .1

618

.140

9 .0

455

.031

6 .0

29

6

.001

1 .0

110

64

12

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.106

2 .1

465

.414

0 .2

218

.103

5 .0

081

.000

0 62

13

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.108

2 .1

879

.342

2 .3

61

7

.000

0 .0

000

47

14

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.165

0 .1

721

.63

19

.0

155

.015

5 43

15

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.058

3 .7

81

7

.160

0 .0

000

40

16

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.533

4 .4

454

.021

2 64

17

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.477

9 .5

221

64

21

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

1.00

00

2

Tab

le l

b.

The

squ

are

of

the

aver

age

one-

day

tra

nsi

tio

n m

atri

x

6 7

8 9

10

II

12

13

14

15

16

17

21

6 .0

447

.105

7 .0

961

.154

5 .1

384

.387

2 .0

394

.015

8 .0

055

.004

2 .0

06

6

.000

9 .0

010

7 .0

000

.160

0 .0

901

.129

1 .1

297

.403

9 .0

495

.021

6 .0

070

.004

0 .0

038

.000

1 .0

014

8 .0

000

.000

0 .0

026

.063

4 .0

724

.613

0 .1

177

.067

2 .0

225

.016

8 .0

188

.000

6 .0

050

9 .0

000

.000

0 .0

000

.031

6 .0

394

.609

2 .1

387

.088

8 .0

396

.025

3 .0

202

.000

9 .0

062

10

.000

0 .0

000

.000

0 .0

000

.000

0 .4

703

.149

5 .1

413

.110

4 .0

693

.048

0 .0

023

.008

9 11

.0

000

.000

0 .0

000

.000

0 .0

000

.334

7 .1

108

.120

5 .1

273

.112

1 .1

541

.021

4 .0

193

12

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.011

3 .0

314

.139

8 .1

579

.554

2 .0

927

.012

8 13

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.011

7 .0

513

.089

3 .6

183

.218

8 .0

106

14

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.027

2 .0

384

.575

8 .3

189

.039

5 IS

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.003

4 .4

625

.434

0 .1

001

16

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.284

5 .4

504

.265

1 17

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.228

4 .7

716

21

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

1.00

00

Tab

le I

e.

Ave

rage

tw

o-d

ay t

ran

siti

on

mat

rix

6 7

8 9

10

II

12

13

14

15

16

17

21

6 .0

097

.098

6 .0

549

.171

6 .1

759

.403

8 .0

697

.000

0 .0

000

.015

8 .0

000

.000

0 .0

000

7 .0

000

.272

7 .1

091

.109

1 .0

727

.236

4 .0

909

.036

4 .0

364

.018

2 .0

182

.000

0 .0

000

8 .0

000

.000

0 .0

078

.093

8 .0

130

.649

5 .0

953

.069

8 .0

333

.031

3 .0

063

.000

0 .0

000

9 .0

000

.000

0 .0

000

.044

4 .0

111

.682

5 .1

270

.054

0 .0

000

.016

7 .0

643

.000

0 .0

000

10

.000

0 .0

000

.000

0 .0

000

.000

0 .5

051

.150

9 .1

093

.064

6 .1

088

.061

2 .0

000

.000

0 11

.0

000

.000

0 .0

000

.000

0 .0

000

.378

7 .1

062

.105

4 .1

066

.095

4 .1

77

7

.020

0 .0

101

Q

12

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.032

8 .0

301

.144

8 .1

612

.545

1 .0

697

.016

4 ::r:

13

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.025

4 .1

214

.047

1 .6

196

.164

9 .0

217

>

14

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.034

0 .0

496

.634

6 .2

500

.031

7 "t

l 15

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.008

3 .5

342

.392

9 .0

646

16

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.305

2 .5

138

.180

9 >-3

17

.0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.130

6 .8

694

trl

21

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

.000

0 .0

000

1.00

00

~

(0

Ta

ble

2a

. A

SE

s b

y D

ays

In-p

roce

ss f

or

Lo

ts S

tart

ing

at

Ste

p 6

; P

ara

me

ters

fo

r th

e R

elat

ed F

~ ~ C

b 6

7 8

9 10

11

12

13

14

15

16

17

21

",

"2 -...

1 0.

0179

0.

3929

0.

1964

0.

1250

0.

1429

0.

1250

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

7

32

~

2 0.

0000

0.

2727

0.

1091

0.

1091

0.

0727

0.

2364

0.

0909

0.

0364

0.

0364

0.

0182

0.

0182

0.

0000

0.

0000

8

30

~

3 0.

0000

0.

2037

0.

0741

0.

0556

0.

0185

0.

2963

0.

1111

0.

0000

0.

0185

0.

0556

0.

1667

0.

0000

0.

0000

11

26

~

4 0.

0000

0.

1698

0.

0566

0.

0189

0.

0000

0.

2453

0.

1321

0.

0000

0.

0566

0.

0377

0.

2453

0.

0377

0.

0000

11

26

....

5 0.

0000

0.

1346

0.

0577

0.

0192

0.

0000

0.

1538

0.

0577

0.

0192

0.

0769

0.

0769

0.

2885

0.

0962

0.

0192

10

27

~

6 0.

0000

0.

0980

0.

0588

0.

0196

0.

0000

0.

1176

0.

0000

0.

0000

0.

0392

0.

0784

0.

3725

0.

1373

0.

0784

10

27

~

7 0.

0000

0.

0800

0.

0400

0.

0200

0.

0000

0.

0800

0.

0200

0.

0200

0.

0200

0.

0200

0.

3600

0.

1800

0.

1600

10

27

~

8 0.

0000

0.

0612

0.

0408

0.

0000

0.

0000

0.

0612

0.

0204

0.

0000

0.

0204

0.

0204

0.

2857

0.

2041

0.

2857

11

26

s::.

9 0.

0000

0.

0417

0.

0417

0.

0000

0.

0000

0.

0417

0.

0208

0.

0000

0.

0417

0.

0208

0.

1667

0.

2083

0.

4167

9

27

..... ....

10

0.00

00

0.02

13

0.04

26

0.00

00

0.00

00

0.04

26

0.00

00

0.00

00

0.06

38

0.02

13

0.10

64

0.14

89

0.55

32

8 27

en

.....

11

0.00

00

0.00

00

0.04

35

0.00

00

0.00

00

0.04

35

0.00

00

0.00

00

0.04

35

0.04

35

0.08

70

0.08

70

0.65

22

6 29

... C

')

12

0.00

00

0.00

00

0.02

22

0.00

00

0.00

00

0.04

44

0.00

00

0.00

00

0.02

22

0.02

22

0.11

11

0.08

89

0.68

89

6 29

en

13

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0455

0.

0000

0.

0000

0.

0000

0.

0227

0.

1136

0.

0909

0.

7273

6

29

14

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.02

33

0.00

00

0.00

00

0.00

00

0.02

33

0.11

63

0.06

98

0.76

74

5 30

15

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0238

0.

0714

0.

1190

0.

7857

4

31

16

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.02

44

0.12

20

0.85

37

4 31

17

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0750

0.

9250

3

32

18

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.02

56

0.97

44

1 34

Ta

ble

2b

. A

SE

s b

y D

ays

In-p

roce

ss f

or

Lo

ts S

tart

ing

at

Ste

p

12

; P

ara

me

ters

fo

r th

e R

elat

ed F

'1

12

13

14

15

16

17

21

'" "2

1 0.

0081

0.

1062

0.

1465

0.

4059

0.

2218

0.

1035

0.

0081

0.

0000

8

56

2 0.

0000

0.

0328

0.

0301

0.

1448

0.

1612

0.

5451

0.

0697

0.

0164

8

55

3 0.

0000

0.

0083

0.

0167

0.

0500

0.

0639

0.

5389

0.

2528

0.

0694

8

54

4 0.

0000

0.

0000

0.

0169

0.

0254

0.

0254

0.

4562

0.

2938

0.

1822

8

53

5 0.

0000

0.

0000

0.

0000

0.

0000

0.

0172

0.

4224

0.

1624

0.

3980

8

52

6 0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

2672

0.

2069

0.

5259

8

51

7 0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

1810

0.

1695

0.

6494

8

50

8 0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

1053

0.

1404

0.

7544

8

49

9 0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0536

0.

1518

0.

7946

8

48

10

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.02

73

0.09

09

0.88

18

8 47

11

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0000

0.

0741

0.

9259

8

46

12

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

00

0.00

94

0.99

06

8 45

-00 -l

188 CHAPTER 9

step eleven, and so on. The last column of Table 3 shows the interpretation of the status of a group of lots relative to the ASE, based on the left-tail area of F(lIb 112) below the observed value [(D - p)/p(D - l)]yS-lyT, where y is the row vector of lot locations, expressed in relative frequencies, yT is its transpose, and S is the relevant sample variance-covariance matrix. The values in the last column of Table 3 are depicted graphically in Figure 1.

In an environment where sequencing is important, and where expediting may need to be used, the tracking of lots relative to Absolute Stock Expectations is important as part of identifying those lots that may need to be (a) sequenced on a priority basis or (b) expedited. A left-tail probability close to one implies that the status of a group of lots is inconsistent with its expectations. Several algorithms using the log-odds approach as a discriminant function have been proposed for classifying the out-of-control variable set ([9], [10], [2]); we have not yet determined which, if any, of them works best in our environment.

Because of the non-Markovian movement of material on the shop floor, Relative Stock Ezpecto.tiofU, or RSEs, are useful in assessing the impact of short-term tactics and the rate of change in the status of groups of lots. Given a distribution of lots at time t, the RSE evaluates the consistency of the actual distribution at time t + k with respect to what was expected given the history of k-th-order transition matrices.

Computation of the RSE begins with the generation of a set of vectors resulting from premultiplying each of the sample lc-th-order transition matrices by the distribution vector of interest. For the example, we obtained sixty-four such vectors. Missing values (rows that are all zeros) in a sample transition matrix pose a problem for RSE computation, because they may yield defective probability distributions (lack of conservation of material). We dealt with the problem by replacing each zero-row in a sample k-th-order transition matrix with the corresponding row of 1'1. The row averages for the one-day transition matrices, and the number of sample matrices in which each row was nonzero, are given in Table 1a.

Consider the lob in group.A on day 10. The ASE F-statistic on that day, for that set of lots, was 5.07835, yielding a left-tail probability of 0.9994, indicating that, in an absolute sense, the lots were significantly behind schedule. The same group of lots, on day 11, had an F-statistic of 7,353,638,264, with a left-tail value essentially one; in an absolute sense, the lots are clearly in need of expediting. The RSE allows for an evaluation of the state of the group A lots on day 11 given their status on day 10. For the movement of group A between days 10 and 11, we use the I-day transition matrix, and interpret the


Table 3. Lot distributions for N84 by step, day, and lot group; Left· tail probabil i ty for ASE Step

Day Code Age 6 9 10 11 12 13 14 15 16 17 21 Left-tail Prob.

A 0.2004

A 3 0.0254

3 0.8729

0.0059 0.0278

A 0 0.1038 B 0 0.2847

6 6 0 4 0.2807 3 0 4 0.8305

7 2 0.9974 4 4 0.9909

A 8 1.0000 B 5 0.9998 C 1 0.0278

9 A 9 1.0000 B 6 1.0000 C 2 0.0938

10 A 10 0 0 0 4 0.9994 B 7 2 0 0 0 0.9998 C 3 4 2 0 0 0.2267 0 1 0 0 0 0 0.3440

11 A 11 0 1 1.0000 B 8 1 1 1.0000 C 4 1 3 0.5658 0 2 0 0 0.9758

12 12 0 3 1.0000 9 0 1 1.0000 5 0 0 0.4178 3 0 0 0.9999 1 0 0 0.1151

13 13 1 1.0000 10 1 1.0000 6 6 0.0250 4 0 0.2634 2 0 1.0000 1 0 0.0169

190 CHAPTER 9

rI.I +l 0

...:I .., CX)

Z <i-l 0

rI.I C. ::s 0 + 1-1 Cl

X -..-l CIl

1-1 0

<i-l

1-1 0 0 ..... r<. Q)

• ..c: ! +l

<i-l 0

rI.I :>. ro Ci

rI.I :> :>. +l -..-l ..... -..-l ..Q to ..Q 0 1-1 Il.

..... -..-l to

"'~TtJ~.d IT"~'4J:rt +l I

+l <i-l Q)

...:I

Q) 1-1 ::s 0'1

-..-l r<.


RSE by finding the left-tail area of F(p, D - p) = F(9, 55) below the observed value of [(D - p)/p(D - 1)]ZS-lZT = [55 / (9 * 63)] * 11.9997 = 1.163992. The left-tail area is 0.663968. In terms of the expectations for the lots at the time of their introduction, their positions on day 11 are very problematic, but, considering where they actually were on day 10, their positions on day 11 are not significantly out of line.

In a similar fashion, an assessment of the movements of an entire product line can be obtained, using the transition matrices. For example, the distribution oflots on day 12 was

(6,1,6,2,4,1,1,0,0,4,7,4,1),

and on day 13 was (2,1,8,0,4,2,2,0,0,2,8,4,4).

Generating the 64 sample-based estimates for the lots on day 13 from the distribution on day 12 by premultiplying the 64 one-day transition matrices by the day-12 vector, and computing the variance-covariance matrix, the T2 is 85.7428 and the observed value of F(13, 51) is 5.339295. The left-tail probability is 0.999994. The throughput for the N84 product is unusually low; an increase in resource allocation to the line may be necessary.

Consider the flow on the N84 product line from day 12 to day 13. The margins of the matrix in Table 4 show the lot distributions on the two days, and the body of the matrix shows movements from day 12 to day 13. The associated flow transition matrix is shown in Table 5; the rows are computed as shown above. The total number of lots that visited each step during the one-day period, and the expected number of lots that should have visited each step, based on the sixty-four sample transition matrices, are shown at the bottom of Table 5. Lot movements are below expectation for steps early in the process, and meet or exceed expectation toward the end of processing. The observed F(12,52) is (52 / 12 * 63) * 185.06 = 12.729, yielding a left-tail probability essentially 1.0; the lot movements are significantly different from those which were expected.

5 CONCLUSIONS

In this paper we showed how a support system can be endowed with a form of experience by using multivariate statistical approach, when adequate historical data exist. The approach is based on the use of the T2 distribution. By tracking

192 CHAPTER 9

Table 4. Lot movements for N84 product between days 12 and 13

Day 13 6 7 8 9 10 11 12 13 14 15 16 17 21 Total

6 2 1 3 6 7 1 1 8 4 2 6

0 9 2 2 a 10 2 2 4 y 11 1

12 1 1 13 0 2 14 0

15 4 4 16 4 3 7 17 1 3 4 21 1 1

Total 2 8 0 4 2 2 0 0 2 8 4 4

Table 5. Flow visitation matrix for N84 between days 12 and 13

Day 13 6 7 8 9 10 11 12 13 14 15 16 17 21

6 4 3 7 1 8 2 2

0 9 2 a 10 4 2 Y 11 1

12 1 13 2 14

15 4 16 3 17 3 21

Total 0 4 4 2 4 4 3 2 2 2 4 3 3

Sa"l'le o 4.74.3 8.4 7.2 8.6 1.3 1.3 0.9 0.4 3.9 3.9 2.3 Avg.


the tail probabilities associated with the sample T2 values over time, in a way analogous to that used in statistical quality control, the system is able to detect significant variances in process performance, such as the development of bottlenecks, and to support its interpretation by reference to historical performance data.

Our methodology also allows the system to compare the relative desirability of two or more plan proposals, by projecting the impact of the plans on future states of the system, using a simulation, and then tracking the left-tail probabilities of the T2 statistics associated with those future projected floor states. The method does not exploit any structure unique to shop floors, and should be applicable to the analysis and control of other multi-state systems, where reliable operational data are available.

Acknowledgements

We thank the Westinghouse Commercial Nuclear Fuels Division, Specialty Metals Plant for their financial support of the MULTEX project. Many people at the plant have personally contributed their time. Terry Shepherd. has been particularly helpful to our team. We also thank Dr. Robert M. Cohen and two anonymous referees for their careful reading of the manuscript and for their useful comments on it.

REFERENCES

[1] Alt, F. B. (1985). "Multivariate Quality Control." In: Johnson, N. L. and S. Kotz, editors, Encyclopedia of Statistical Sciences, Volume 6, New York: John Wiley and Sons, 110-122.

[2] Chua, M.-K., and D. C. Montgomery (1991). "A Multivariate Quality Control Scheme." International Journal of Quality and Reliability Management, 8:6, 29-46.

[3] Hotelling, H. (1947). "Multivariate Quality Control." In: Eisenhardt, C., Hastay, M. W., and W. A. Wallace, editors, Techniques of Statistical Analysis, New York: McGraw-Hill, 111-184.

[4] Hunter, J. S. (1986). "The Exponentially Weighted Moving Average." Journal of Quality Technology, 18:4, 203-210.

194 CHAPTER 9

[5] Johnson, R. A. and D. W. Wichern (1988) Applied Multivariate Statistical Analysis, Englewood Cliffs, NJ: Prentice-Hall.

[6] Lowry, C. A., Woodall, W. H., Champ, C. W., and S. E. Rigdon, "A Multivariate Exponentially Weighted Moving Average Control Chart." Presented at the 149th Annual Meeting of the ASA, Washington, D.C., August 1989.

[7] May, J. H. and L. G. Vargas, "An Intelligent Assistant for Shod-Term Manufacturing Scheduling," Working Paper, AIM Laboratory, Katz Graduate School of Business, University of Pittsburgh, 1993.

[8] Montgomery, D. C. (1991). Introduction to Statistical Quality Control, Second Edition. New York: John Wiley and Sons.

[9] Moran, M. A. and B. J. Murphy (1979). "A Closer Look at Two Alternative Methods of Statistical Discrimination." Applied Statistics, 28:3, 223-232.

[10] Murphy, B. J. (1987). "Selecting Out of Control Variables with the T Multivariate Quality Control Procedure." The Statistician, 36:5, 571-583.

10 OPTIMAL SPARE PARTS ALLOCATION

AND INDUSTRIAL APPLICATIONS

ABSTRACT

Wolfgang Mergenthaler Sighert Felgenhauer*

Peter Hardie** Markus Groh

and Josef Lugger

Beratende Ingenieure Frankfurt Kiefernweg 1

65439 Florskeim Germany

0#' AEG Aktiengesellsckaft Goldsteinstrafle 238

60528 Frankfurt Germany

** Airbus Industrie-Airspares Weg beim Jager 150

P.O.Boz 630107, 22335 Hamburg

Germany

With limited component reliabilities the only way to increase and optimize the availability of technical systems is often through optimal configuration of spare parts inventories. Given limited budgets, this task invariably yields combinatorial optimization problems. This paper analyzes a system of identical plants in terms of steady state fleet utilization, plant availability and related hourly cost functions depending on the underlying spare parts inventory. Various optimization problems are then derived with these quantities serving as objective functions and constraints, respectively. An explicit convexity proof is given on a particular optimization problem. A steepest incline algorithm, implemented in a program named SPARE, is described. Two industrial applications - Aircraft Initial Provisioning and Optimum Fleet Utilization

195

196 CHAPTER 10

in the Electronics Industry - illustrate these optimization problems and show, how SPARE finds approximate solutions.

1 INTRODUCTION

Assessing the availability of technical systems has become an important design step in many industries in light of ethical, economic and environmental considerations and was made possible through progress in reliability theory as presented in Barlow and Proschan[21, Gaede[41 or Kohlas[51. Optimizing the availability is an obvious next step, as combinatorial optimization has become state of the art in industrial and applied mathematics, see for instance Littger[7), Marthello and Toth[81 or Papadimitriou and Steiglitz[91. Given limited component reliabilities, this can often be achieved only by holding an appropriate spare parts inventory. Recent research shows that performance criteria as expressed in terms of objective functions and constraints vary widely across applications and detailed modeling is required, see Dada[31 and Kostic and Pendic[61.

The current paper focuses on two kinds of industrial inventory optimization: servicing a single plant and servicing a fleet of identical plants. The inventory is held closely by the plant, or at a central hub, respectively. Certain performance criteria must be optimized, while the overall amount of capital tied up in the spare parts inventory must remain limited.

Section 2 contains the mathematical model, specifies plant availability and fleet utilization as the primary performance criteria, derives average downtime costs per hour and capital costs per hour as secondary performance criteria, defines spare parts budget as the primary constraint and shows that operating costs per hour - defined as the sum of average hourly downtime costs and capital costs per hour - have an optimum. Several optimization problems are then expressed and a steepest incline optimization algorithm is introduced.

Section 3 describes a computer program named SPARE that implements these procedures. In particular system architecture, interfaces, data requirements and reports are discussed.

Section 4 deals with the following two industrial applications:

• Optimal aircraft initial provisioning

• Optimal spare parts inventories for a fleet of electronic equipment.

Spare Parts Allocation 197

2 MODEL

This section lists assumptions and definitions, derives approximate expressions for the objective functions and constraints in terms of the spare parts inventory, lists optimization problems relevant to such inventories and describes a steepest incline optimization algorithm.

2.1 Assumptions and Definitions

Consider a fleet of N identical plants. A plant consists of n subplants of ~ components of type i E C for a set of types C =: {I, 2, 3 ... , n}. For each type of components spare parts must be stored. Assume the following:

• A plant is either intact or defective. It is a logical series structure in all of its components.

• A failing component will immediately be replaced by an identical one from the spare parts inventory, if available. If not, the plant will remain inoperative until such a component arrives. In any case an intact component will be ordered from a supplier.

• For each i E C let:

- Tni =: initial number of spare parts for type i

- Pi =: unit price of component of type i

- Ai =: failure rate of component of type i, assumed to be constant

- JJi =: return rate from shop repair of component of type i, assumed to be constant

- Xi,t =: random number of defective plants waiting for a component of type i, Vt E I R+ with XO,t =: number of intact plants

- Yo,t =: random number of type i components on stock, 'rIt E I R+

- ei =: unit vector in direction i, Ci E IN"

• Define the following vectors:

- m =: (ml' ... , m,.)T E IN"

- P =: (PlI ... ,p,.)T E IN"

- 1 =: (1, ... , l)T, 0 =: (0, ... , O)T

198 CHAPTER 10

• The spare parts budget resulting from an inventory m is defined as

- B(m) =: mTp

• For le =: (leo, lel , ... , k,.)T with leTt = N and r := (rl, ... , r".)T with rle = 0 define

- P{le, r, t} =: P{X, = le, Yc = r}, and

- Q(le,r) =: limt-+ooP{le,r,t}

2.2 Steady State Fleet Utilization and Plant

Availability

According to the complexity of the logistic processes separate approaches will be taken for the cases N = 1 and N > 1, respectively.

Fleet Size N > 1

Theorem 2.1:

Under equilibrium conditions for each m E INn there are numbers q1(m) E [0,1], ... , qn(m) E [0,1] such that

Q(le) -. E Q(le,r)

Proof Balance equations equating in and out rates of state (le, r) E I N 2n ,

which must hold under steady state conditions, state that (see van Dijk[10], chapter 3)

n

Q(le + eo, r) E«leo + 1)h.;A,(r,) + J.'i(ma - rill ,=1


where Ai(ri) = Ai, if ri = 0 and Ai(ri) = 0 else. Summing both sides over r ::; m yields

n n

(ko + 1) L Q(k + eo, r) L hiAi(ri) + L L Q(k + eo, ffii)J.£i(ffii - ffii) i=l

= L L Q(k + ei, r)J.£i(ffii + ki + 1) iEC.:O::;m

This yields, upon recognizing that the second term on the l.h.s. vanishes and due to Ai(rd ::; Ai, i E C

n

Q(k + eo)(ko + 1) L hiAiqi(m) = L Q(k + ei)J.£i(ffii + ki + 1) i=l iEC

where ~Aiqi(m), i E C, represent the "effective" out rate of state k + eo, due to failure of a component of type i E C. This equation corresponds to balance equation (3.29) for a closed queuing network in van Dijk[lO]. The traffic equations are the given by

Uo = Ul + ... + Un

hiAiqi(m) ~Aiqi(m) Ui = Uo =: uo, i E C

EjEc hj>'jqj(m) z(m)

Upon insertion into equation (3.32) in van Dijk[lO] one obtains

Q(k) = K Uo l"i"iqi m ffii· 10 0 + ... 10" 1 n (l.., ( ») ki I

z(m)ko+ ... k" ko! ;U J.£i (ffii + k;)!

for some constant K > O. The proof of (1) now follows observing that ko + ... kn = N, that Q(k) is defined on k E INn with kTl = N and using as a norming condition

L Q(k)=l kEIN",loT 1=N

o

An open question to be resolved is how to estimate the numbers qi(m),i E C, representing the likelihood of a failing component to hit an empty spare parts pool at the pertinent position.

Approximation 2.2:

200 CHAPTER 10

As a plausible approximation the steady state stockout probability of a spare parts inventory subject to a constant external demand rate of N~)..i and shop return rate /Ji with 7'ni parts initially is chosen. By solving the corresponding linear system of equations it is easy to show that then

Corollary 2.3

( )m.

Nh.·),· • 1

() ~ ",,! qi 7'ni = .

~ ..... (Nh. i ),;)' 1 i..J,=o II. "J!

Let fleet utilization u(m) be defined through

1 N u(m) = 1 - N L: M L: Q(k)

M=l 1cEIN",1cT l=N

u(m) is then given by

Proof (2) follows by definition and using (1). 0

Fleet Size N = 1

In this case steady state fleet utilization u(m) coincides with steady state plant availability A(m) and is given by the following result:

Corollary 2.4: 1

A(m) = ---.. -~( ~-~-~ .~)"'("'--;H-:-:':-;-) ~-_-.+-,-! 1 + Li=l ~'.". (!9)'-:h

.l...J,=1 1'... ,.

(3)

Proof The following balance equations can be seen to hold:


These equations are satisfied, as can be seen by inspection, through

(hi ).i)(m;+1l 1 "(h;).;)(m;-.;l 1

Q(ei,r)=K -;: (7ni+ 1)!.II. --;:- (m.-r.)! (4) ,=1,,#_' "

for some constant K > O. Summing over r ~ m and noting that

A(m) = Q(eo)

" K = Q(eo) + L Q(ei)

;=1 proves (3). 0

2.3 Optimization Models

Although the logistic processes described in subsection 2.2 may hold for a rather large class of spare parts inventory optimization situations, practical optimization problems vary considerably. For instance, one may have to maximize steady state availability of a single plant given a certain spare parts budget, or one may have to minimize the necessary spare parts budget given a required steady state plant availability. Also one may be interested in minimizing the sum of the average hourly downtime costs and the hourly capital costs. The same set of optimization problems applies to a fleet of identical plants with plant availability replaced by fleet utilization. This subsection formalizes the above optimization problems in developing objective functions and constraints.

Servicing a single plant

Examples are given by oil refineries, cement mills, automotive production plants etc. Let c and z be the hourly downtime costs of the plant and the hourly carrying costs per currency unit invested, respectively, for some c ~ 0 and z ~ O. c may be represented by the notional production loss per hour and z may be determined by such costs as interest, warehouse rent, obsolescence, insurance etc.

Definition 2.5:

202 CHAPTER 10

Let Kd(m) =: c(l- A(m))

K;(m) =: zB(m)

Kop(m) =: Kd(m) + K;(m) = c(l - A(m» + zB(m)

be the expected hourly downtime cost, the hourly carrying costs and the expected hourly operating costs of the plant, respectively.

The following optimization problems can now be stated:

Unconstrained minimum of operating costs:

(5)

Maximum availability given budget:

m· : A(m·) = max A(m), 0 < B(m) < Bo mEIN" - -

(6)

Minimum budget given availability:

m· : B(m·) = min B(m), 0 < A(m) < Ao < 1 mEIN" - --

(7)

Servicing a fleet of identical plants

This problem arises for instance in airline operations with plants represented by planes. In reality the situation is more difficult than modeled in this paper, one of the reasons being the fact that individual planes in a fleet are hardly ever identically configured.

Definition 2.6:

Let KJd(m) =: Nc(l- u(m))

KJop(m) =: KJd(m) + K;(m) = Nc(l- u(m» + zB(m) (8)

be the average hourly fleet downtime costs and :fleet operating costs.

The following optimization problems can now be identified and make up the so called Initial Provisioning problem in airline operations:


Unconstrained minimization of fleet operating cost:

(9)

Maximum fleet utilization given budget:

mO : u(mO) = m~~" u(m), 0::; B(m) ::; Bo (10)

Minimum budget given fleet utilization

mO : B(mO) = min B(m),O < u(m) < Uo < 1 mEIN" - --

(11)

2.4 Optimization procedures

All the optimization problems outlined in subsection 2.3 are combinatorial in nature and may be classified as generalized knapsack problems. With the exception of the budget function B(m) the various objective functions are not linear in T/'l.i, i E C, but still are monotonous.

The optimization algorithm considered below and implemented in a spare parts optimization program named SPARE as described in section 3, is a steepest incline algorithm and follows an iterative search path for the local optimum. Only if this is also a global optimum, the optimization problem is solved. A well known result on convex optimization (see theorem 1.2 in Papadimitriou and SteigIitz[9j ) shows that if the objective function is convex, then any local optimum is also the global optimum.

Local versus Global Optima

Lemma 2.7:

If

( h~i) ::; 1, i E C then optimization problem (8) is convex.

Proof Convexity of optimization problem (6) is equivalent to convexity of the objective function A(m), as the constraint is linear. Convexity of A{m) can be

204 CHAPTER 10

proven by noting that

1 2 1 -:-;----;- + -- > 0 A(m + ei) A(m)-A(m+ 2ed

through inspecting the denominator in (4). Therefore A(m) is convex in each of its components and therefore is convex. 0

Steepest incline algorithm

The steepest incline algorithm is described here for optimisation problem (6). Optimisation problems (5) and (7) are derived from this procedure by replacing the stopping criteria by their obvious analogs. Optimisation problems (9), (10) and (11) are then obtained by replacing the steady state plant availability A(m) by steady state fleet utilisation u(m).

Algorithm 2.8:

The solution to optimisation problem (6) is approximated through the following sequence of points m l E I Nfl, Z = 0, 1,2,3, ... :

rnO =: 0 (12)

ml+1 =: ml + ei"(I+l), Z = 0,1,2,3, . . . (13)

A(ml + ei'(I+1» - A(ml) A(ml + ei) - A(ml) -':'--.~';";"';;'~-"":"""":' = max ---'----,-....:..,.,.-...:....-..;..

A(ml)Pi'(I+l) iEC A(ml)Pi If, for some Z· ~ 0

then m· =: ml "

Corollary 2.9:

Let Z· be the number of iterations through sequence (12, 13) to reach the optimal point. Let

PmifL =: min Pi > 0, Pmll'" =: max "" iEC iEC r·

Then

(14)


Proof The largest value of l· results, if all the spare parts put into the knapsack during the steepest incline algorithm have minimum unit price and if their total is slightly below Bo + Pmin, i.e. if I·Pmin < Bo + Pmin and Il·Pmin - (Bo + Pmin) I :::::: O. This yields the upper bound in (14). The smallest value of l" results, if all the spare parts have maximum unit price, and iftheir total exactly equals B o, i.e. if I·Pm/l,. = Bo. This yields the lower bound in (14). 0

3 SPARE-AN IMPLEMENTATION

SPARE implements the optimization algorithm outlined in section 2 and applies it to the optimization problems described. SPARE is written in the Microsoft Visual C++ programming language and runs on IBM-compatible computers under Microsoft Windows 3.11.

3.1 System architecture

Figure 3.1 presents an overview on SPARE's system architecture.

Project Structure

The main files belonging to a particular project XYZ and their meanings are listed below:

• XYZ.DAT, component parameter input file

• XYZ.OUT, optimum spare parts output file

• XYZ.LOG, performance criteria output file

A temporary file is generated for the purpose of graphical visualization of some of the performance criteria as functions of the spare parts budget.

1 Microsoft Windows 3.1 © and Microsoft Visual C++ © are registered trademarks of Microsoft Corporation, Redmont, WA, USA.

206

Interactive input data

0-Optimum spare parts numbers

CHAPTER 10

Component parameters

Spare parts optimizer SPARE § ;. I getD!!t!! Ijr----- XYZ.DAT

I

1 readData I'

[~cs calculate - availability

-- r--~~------~ optimize - availability - fleet utilizatiDn - fleet utilization Performance - budget - budget criteria

§,-I~ ____ """""'_I_lr_e_po_rt~_J_I ___ -_ -_---:.:....If~:J Figure 3.1: SPARE.- System Architecture


Operating Modes

In operating mode optimize SPARE allows the user to generate an optimum spare parts inventory m* according to the optimization problems (5), (6) and (7) or (9), (10) and (11), respectively, thereby computing the performance criteria

and {u(~·),l1(~·),l(f~(~·),l(i(~·),l(fop(~·)},

respectively. In operating mode calculate SPARE allows the user to compute the pedormance criteria

and {u(~),l1(~),l(f~(~),l(i(~),l(fop(~)},

respectively, for a given manual recommendation ~.

User interface

Figure 3.2 represents a screen copy of the user interface with the pull down menu ACTIONS activated.

The user interlace mainly enables the user to

• select projects determined by project path (e.g. "C:\SPARE") and project name (e.g. "XYZ") via pull down menu FILE,

• select operating mode optimize or calculate via pull down menu ACTIONS and

• graphically visualize availability, fleet utilization and (fleet) operating costs versus budget via pull down menu RESULTS.

Also the interactive input data as described below are requested by the user interface through dialog boxes.

3.2 Input data

SPARE requires the following two types of input data:

208 CHAPTER 10

Figure 3.2: SPARE - User Interface


Component parameters

The component parameter input file XYZ.DAT is an ASCII-formatted file and contains a record for each component type i E C.

Each record essentially is made up of fields according to table 3.1:

Field 1 2 3 4 5 6 7

Value Part number Part description Frequency per plant = ~ Unit price = Pi Expected lead time = :. Mean time between failures = t Manual recommendation in operating mode calculate = T1I.i

Table 3.1: Structure of component parameter input file

Interactive input data

These data represent information specific to the operating condition of the plant and are requested through dialog boxes as part of the program's user interface. These data include

• Daily operating hours

• Houdy downtime cost c

• Interest rate z

• Given availability Ao, if applicable

• Given budget Bo, if applicable

3.3 Output data

In operating mode optimize SPARE computes the optimum spare parts vector m·, replaces field 7 in the component parameter file XYZ.DAT by the elements

210 CHAPTER 10

of this vector and stores the result under the name XYZ.OUT. In addition SPARE stores the optimum performance criteria as listed in paragraph 3.1.2 in a log file named XYZ.LOG. In operating mode calculate SPARE simply computes the performance criteria and stores the result in the log file XYZ.LOG.

Furthermore, the sequences

{A(ml ), B(ml)}, {'U(ml), B(ml)}, {Kop(ml), B(ml)}

and {KJop(ml), B(ml)}, 1= 0, ... , I·

are stored in a temporary file to generate the corresponding graphical visualizations.

4 INDUSTRIAL APPLICATIONS

Applications of spare parts optimization problems are scattered widely throughout industry. SPARE is currently being applied to

• Aircraft Initial Provisioning

• Optimizing utilization of a fleet of electronic equipment.

In both cases the main goal is to maximize efficiency of a set of spare parts. In general, average hourly downtime costs for a single airplane or for a fleet of identical electronic equipment is a decreasing function of the overall spare parts budget while hourly carrying costs increase almost linearly, thus ensuring the existence of a global minimum in the sum of these costs, see figure 4.1.

4.1 Aircraft Initial Provisioning

An airline usually buys a set of spare parts to be recommended by the manufacturer along with every new airplane. The recommendation procedure and its results are both referred to as Initial Provisioning.

Current recommendation procedure

Conventionally such sets were determined by first classifying spare parts into categories such as rotables (i.e. parts to be returned repeatedly to serviceable


Hourly operating costs

Hourly / downtime cos~

Spare parts budget

Figure 4.1: Spare parts costs versus budget

212 CHAPTER 10

state and whose lifetimes are expected to be as long as the life of the airplane), repairables (i.e. parts to be returned to serviceable state a limited number of times) and expendables (i.e. parts totally scrapped on failure). For each category recommendations were given based on component parameters such as TAT, QPA and MTBUR (Turn Around Time, Quantity Per Aircraft, Mean Time Between Unscheduled Removal), see AIRBUS INDUSTRIE[l].

The rot abies category can be optimized using SPARE. In addition to traditional operating parameters such as average flight cycle time etc., fleet size can now be talten into account, thereby revealing a substantial economy of scale effect.

Mazimum availability given budget

Here an optimization example is given, where for some notional hourly downtime costs and interest rate the maximum availability is needed with spare parts budget given. Figure 4.2 presents a screen copy of the graphical representation of availability versus budget.

Table 4.1 displays the contents of the log file:

Optimization results

Inventory value at optimum inventory level in US-$ = 19520840.00 System Availability at optimum inventory level 0.999437 Component Availability at optimum inventory level = 0.999999 Hourly downtime cost at optimum inventory level in US-$ = 1.69 Hourly inventory cost at optimum inventory level in US-$ = 222.84 Hourly operating cost at optimum inventory level in US-$ = 224.53

In interpreting the performance criteria it must be noted that the actual input data do not fully reflect an aircraft's bill of materials and are selected for demonstration purposes only. Figure 4.3 shows the optimization results for some of the spare parts numbers.

Spare Parts Allocation

== Spare - Viewing Plot Avail-Budget au file J;,dit Bction Besulls Help

Ay~y

1.00 •

0.90 0.80

0.70 0.&0 0.50 0.40 0.30 0.20 0.10 BudQct

2.00 4.00 6.00 aoo 10.00 12.00 14.00 16.00 1800 20_00 '106

Figure 4.2: SPARE - Availability versus budget

213

214 CHAPTER 10

= Spare ' Viewing D:\DEMO _ANl\DEMO.OUT aa filc (dit Action Bcsults Help

3 1 DOS 000 000 045 1 0005436 5 27180 0. 00000 ;. 2 :I 005000 000 045 1 oooqno 2 2 640 0.00001) 2 3 1)05000 000 045 1 0001098 2 2 2196 0.00001) 1 3 005000 000 045 1 0000826 1 1 826 0.00000 1 :I 005000 000 045 1 0000213 2 2 426 0.00000 1 8 005000 000 045 1 0000772 1 1 772 0.00000 :I 3 181800 000 045 1 0000706 1 1 706 0.00000 8 3 OliO 1 1 1 797 11.11111100 2 3 000 1 1 1 3096 11.000011 1 3 1 1 1 934 0.00000 1 1 2 2 6811 0.00000 2 2 1708 0.00000 1 1 492 0.00000 1 1 569 0.00000

Figure 4.3: SPARE· Optimum spare parts numbers


4.2 Optimizing the utilization of a fleet of electronic

equipment

215

System service and support in the electronics industry largely depend on the spare parts inventory held by the vendor or maintenance contractor. AEG Aktiengesellschaft seeks to optimize its customer support by streamlining and optimally organizing its spare parts logistics, among others.

Current inventory strategies

Spare parts support within AEG Aktiengesellschaft is currently organized into a two-echelon system as follows:

• A central spare parts depot holds approximately 32000 items.

• Twenty-four warehouses distributed over Germany hold those items demanded regularly on a local scope and service disjoint fleets of systems. These warehouses supply themselves through the central depot.

• A communication network connects the warehouses and the central depot and Iuns a computer based information system ISV (!ntegrierte §tiitzpunkt-,Yerwaltung).

• Inventories have traditionally been managed based on forecasts.

Ma'Cimum fleet utilization given budget

SPARE now helps AEG Aktiengesellschaft to maximize fleet utilization for a certain type of electronic equipment. Statistical component data such as failure rates etc. are provided through either dedicated computational procedures, through vendor information or by means of evaluating field consumption data. Lead times are estimated based on experience. Figure 4.4 displays a screen copy of the graphical representation of fleet utilization versus spare parts budget for a fleet size of 10.

Table 4.2 presents the contents of the log file.

216 CHAPTER 10

1:1 Spare - Viewing Plot Number Of Planes-Budget aa [jle fdit Action Results Help

Fleet Utilization

S.!ll

8.91

7.92 6.93

5.94 4.95 3.96

2.97 1.98

0.99 Budget

+--9-90 ..... -0 -'-980..---0-29 .... 70-.0-396O.....,.--.0-4-95O..-.0-594-r-0'0-6-93 .... 0'-0-7-92O..-.0-89....,'r-0.-0 -9900~.0 • 103

Figure 4.4: SPARE - Aeet utilization versus budget


Parameters

Project Operating hours per year Hourly downtime cost in US-$ Interest rate in percent Optimize Average number of working plants given B Spare parts budget in US-$ Fleet Size

Optimization results

Number of spare parts allocated Inventory value at optimum inventory level in US-$ Average number of working plants Fleet Utilization

DEMO 3000.00 3000.00 10.00

10000000.00 = 10

= 928 10096729.00 9.91 0.9907

217

Figure 4.5 contains the optimization results on some of the spare parts numbers.

REFERENCES

[1] Airbus Industrie, 1990. Spares Support Guide (SSG), Hamburg.

[2] R.E. Barlow and F. Proschan, 1975. Statistical theory of reliability and life testing-Probability models, Wiley, New York.

[3] M. Dada, 1992. A two-echelon inventory system with priority shipments, Management Science 3:8, 1140-1153.

[4] K.-W. Gaede, 1977. Zuverliissigkeit-Mathematische Modelle, Hanser, Miinchhe-Wien.

[5] J. Kohlas, 1987. Zuverliissigkeit und Verfiigbarkeit, Teubner, Stuttgart.

[6] S. Kostic and Z. Pendic, 1990. Optimization of spare parts in a multilevel maintenance system, Eng. Costs and Prod. Econ., 20:1, 93-99

[7] K. Littger, 1992. Optimierung - Eine Einfiihrung in rechnergestiitzte Methoden und Anwendungen, Springer, Berlin-Heidelberg.

218 CHAPTER 10

III Spare - Viewing D;\BIF\SPARE\AIR\COA.OUT aa Eile ( dit Action Besults /:ielp

1 005000 000 045 1 0005436 5 5 27180 0.00000 3 005000 000 045 1 0000320 1 1 320 0 . 00000 3 005060 660 045 1 0001098 1 1 1098 0.00000 3 005000 000 045 1 0llOll826 1 1 826 0.00000 3 005060 000 045 1 0006213 1 1 213 0.00000 3 005060 000 045 1 000lt772 1 1 772 0.00000 3 181800 000 045 1 0000706 1 1 706 0.00000 3 181800 000 045 1 0000797 1 1 797 0.00000 3 181800 000 045 1 0003096 0 0 o 0.00000 :1 100000 060 645 1 01l01l934 1 1 934 0.1l0000 1 222220 000 045 1 01100340 0 1 340 0 . 00000 1 425000 010 044 1 0000854 1 1 854 0.00000 3 222220 060 045 1 0600492 0 1 492 0.00000 3 222220 000 OJI5 1 0000569 0 1 569 0.00000 2 030000 999 056 1 01l0()414 3 3 1242 0.00000

Figure 4.5: SPARE · Optimum spare parts numbers


[8] S. Marthello and P.Toth, 1990. Knapsack Problems-Algorithms and Computer Implementations, Wiley, Chichester.

[9] C. H. Papadimitriou and K. Steiglitz, 1982. Combinatorial Optimization: Algorithms and Complexity, Prentice Hall, Englewood Cliffs.

[10] N.M. van Dijk, 1993. Queuing Networks and Product Forms-A Systems Approach, Wiley, Chichester.

11 A C++ CLASS LIBRARY

FOR MATHEMATICAL PROGRAMMING

ABSTRACT

Soren S. Nielsen

Management Science and Information Sylteml Unil1erlity of Tezal A'Ultin, TX 7871~

We present a library of C++ classes for writing mathematical optimization models in C++. The library defines classes to represent the variables and constraints of models, and also defines overloaded operators on these classes which results in a natural syntax for model definition.

The system requires programming in C++ for its use, and is hence most suitable for the advanced programmer/modeler. For such a user, however, it provides some advantages over standard modeling systems. First, the system preserves all the advantages of working with a programming language, such as efficiency, flexibility and openneBB. Second, C++ allows users to extend and specialize existing data types. As an example of this, we show how a user could define a specialized network model type with nodes and arcs.

Efficient data structures for storing and manipulating sparse arrays are introduced, the concept of variable aliasing is discuBBed, and a number of related, future research topics are presented.

1 INTRODUCTION

Mathematical programming systems, for instance GAMS [1992] or AMPL [1993], facilitate the formulation and solution of mathematical optimization models by providing high-level modeling abstractions such as variables and equations, and languages for efficiently manipulating and combining these objects, and by providing automatic interaction with optimization software. How-

221

222 CHAPTER 11

ever, formulating and solving a mathematical model is often only a part of the solution process. In realistic applications an optimization system is (or should be) an integrated part of a larger decision support system, since data typically need to be processed by other parts of the system before and after the optimization step: Storing and retrieving data from data bases, reacting to live data feeds, presenting results graphically and interacting with other "black box" components. When models are used operationally, the process of data collection, model solution and result processing needs to be automated, and cannot be carried out entirely within the modeling system. Although modeling languages are developing in response to these issues, and increasingly are beginning to incorporate programming languages constructs, they are generally difficult to integrate, and modelers are still often forced to escape the modeling system in favor of languages such as C or FORTRAN.

At the same time as modeling languages are developing, so are programming languages. A modern, object-oriented language such as C++ (Ellis and Stroustrup [1990]). gives the programmer the full flexibility and efficiency of any programming language, but at the same time allows the definition of "abstract data types", through classes, which can be used to represent high-level objects. The idea naturally arises to define modeling abstractions in the programming language, and hence taylor the programming language towards modeling applications, rather than approaching the capabilities of programming languages by extending the modeling systems.

We present here a library of C++ classes which defines and implements modeling abstractions. By suitably redefining (overloading) standard C++ operators, a syntax very similar to that of GAMS or AMPL can be used in the definition of expressions and constraints. Our aim is to show that the flexibility and capabilities of a modeling system can be approximated using programming language constructs, thus combining the notational convenience of modeling abstractions offered by modeling systems with the openness, efficiency and flexibility of a programming language. To this end, we present several examples of the use of the system. We assume that the reader is somewhat familiar with C, but assume no prior knowledge of C++.

While the C programming language, Kernigan and Ritchie [1978], is becoming widely used to write optimization software (e.g., CPLEX, Bixby [1992], LOQO, Vanderbei and Carpenter [1993]), C++ is not widely used in the OR community. Birchenhall [1992] uses C++ to define a matrix library for econometrics. This library could be very useful as a supplement to, or integrated with, the present class library, since it incorporates routines for matrix decomposition and for solving systems ofequat.ions. However, in contrast with the mathemat-

A C++ Class Library 223

ical programming library, it applies dense data structures throughout, and is consequently less suited for large-scale applications.

The paper is organized as follows. We first give a small example LP model. We then discuss aspects of the C++ implementation of the class library, and discuss advantages and disadvantages of using C++ for modeling. Section 4 discusses the addition of vectors and associated operations to the library to allow an efficient, algebraic notation for vector operations. Section 5 introduces the concept of aliasing, and in Section 6 we give some future research topics.

2 A SMALL EXAMPLE MODEL

In order to introduce our ideas, we show here a small LP model formulated in GAMS, Figure 1. We also show in Figure 2 how this model can be formulated in C++, using the mathematical programming (MP) library. The model is from AMPL [1993], where its formulation is found on page 5.

The two representations declare and define model components in the same order. The C++ model begins by including the MP library definitions. Then the model variables, constraints and the model itself are declared. These data types (KP _variable, MP _constraint and KP ..model) are defined as classes in the MP library. We next define the constraints of the model. The statement

Time =

illustrates an example of operator overloading: Here, the <= operator is defined to accept two parameters (oftype MP_expression), and return an object oftype MP _constraint, which is then assigned to the object Time. The right-hand side of the <= operator is the constant 40.0, while the left-hand side is constructed by the overloaded + and * operators, which take operands of numeric type or type KP_variable and return objects of type KP_expression. The effect of executing this statement is to build a tree-structure which represents the constraint, and store a reference to the tree structure in Time. Note that the standard uses of <=. + or * on numerical data are not lost. C++ is a strongly typed language, and the precise interpretation of operators is determined at compile-time by the types of the operands.

After defining the remaining constraints, it only remains to define which constraints are part of the model. This is done by calling the subj ect-to function

224 CHAPTER 11

* Small LP example. Ref: AIIPL [1993, p. 5] variable profitj positive variables X_B, X_Cj equations Time, B_limit, C_limit, prof_defj

Time B_limit C_limit prof_def

(1/200)*X_B + (1/140)*X_C =L= 40j X_B =L= 6000j X_C =L= 4000j

profit =e= 25*X_B + 30*X_Cj

model m ITime, B_limit, C_limit, prof_def/j solve m maximizing profit using lpj

Figure 1 A.mall LP model in GAMS

II C++ LP model in 2 variables. Ref: AMPL [1993, p. 6]. #include "liP _library .h" void mainO {

liP_variable liP_constraint

Time = B_limit = C_limit =

liP_model modelj

X_B, X_Cj Time, B_limit, C_limitj

(1/200.0)*X_B + (1/140.0)*X_C <= 40j X_B <= 6000j X_C <= 4000j

model.subject_to( Time, B_limit, C_limit )j model.objective = 25*X_B + 30*X_Cj

model.maximize()j }

Figure:l The same model in C++ uaing the Mathematical Programming c:lasalibrary.


ofthe model object with a list of constraint objects. We then define the model's objective, which is another object of type MP _expression, and ask for it to be solved and the objective function maximized. The system is presently connected to only one LP solver (LOQO, Vanderbei and Carpenter [1993]), but the interface is sufficiently general that new solvers can be added quite easily, as long as they can be called as subroutines from the C++ system.

3 STRUCTURE AND USE OF THE CLASS

LmRARY

The C++ programming language grew out of the popular C language, under the iniluence of languages like Simula and Smalltalk. While C++ contains C as a proper subset, there is a distinctly different ilavor to the language. The primary difference between the two languages is that C++ has classes. Classes are used to define and implement new data types and the operations allowed on them. An often-used example is thai of defining a complex-number type as a class containing two iloating point numbers (for the real and imaginary part), and then defining the arithmetic operators (+, -, *, /) and various functions (sin, cos, log) on them, as well as input and output functions and conversion functions from real numbers. Once such a class is defined, complex numbers can be used in a program just like the built-in arithmetic data types. In this sense C++ is an extensible language. Although C++ is not a declarative language, programming in C++ naturally focuses attention on the declaration of suitable data types and their associated behavior and operations (i.e., classes and their interfaces), and only secondarily on their implementation, whereas in C one tends to focus more on the imperative language constructs, i.e., executable statements. Developing class libraries in C++ can be considerably more complex than programming in many other languages, but useful paradigms and idioms are developing to facilitate this; see, e.g., Coplien [1992]. On the other hand, ulling a well-designed class library can be quite simple, as our examples indicate.

3.1 The MP Classes

To represent a VlJrilJ~le in a mathematical model we define an MP -variable class. Objects, or variables, of type MP_variable have attributes that define the type of the variable (continuous or discrete), and iloating point fields level, dual, upper and lower, which represent the level and dual values, and

226 CHAPTER 11

bounds of the variable. By default, the level and dual values are 0, and the bounds are 0 and "Infinity", respectively, but these can be reset. For instance, x.set_upper(200) j sets the upper bound of the variable x to 200. There are also functions to fix a variable, temporarily changing its bounds to a common value, and to "unfix" it. MP_variables can be aliased with each other, which allows multiple variable names to refer to the same actual MP _variable object; examples are given in Section 5.

An MP-variable is a special case of an object of class MP_expression. Basic arithmetic operators are defined on objects of type MP_expression, such as + and *. These operators again return an object of class MP_expression, which is a tree-structure representing the operation and its operands. An MP _expression can also take the form of a numerical constant, so that numbers can be mixed with variables in expressions.

MP _expressions are used to specify objective values and constraints. A constraint is represented by an object of class MP _constraint. The relational operators <=, =:: and >:: are defined to take two objects of type MP _expression and return an MP _constraint. Examples of this were seen in Figure 2.

Finally, a complete optimization model consisting of an objective function and a constraint set is represented by the class MP ..model. The primary operations on an MP ..model are inclusion of constraints and an objective function, and the functions maximize(), minimizeO and solve(). The solveO function does not use any objective expression, but just finds any feasible point. These functions· interface with standard LP or NLP solvers, although only an LP solver is presently available.

3.2 Modeling in C++ The inclusion of these data types into a C++ program can be used to approximate some of the facilities for modeling offered by dedicated modeling languages. Having to work within C++ carries some costs with it. For example, it is significantly harder to learn C++ than, for instance, GAMS or AMPL, and while a modeling system can give reasonable error messages when mistakes are made-because it has some built-in knowledge about the semantics of modeling-the error messages from C++ compilers are usually quite cryptic. Run-time errors such as divisions with zero, which modeling systems catch gracefully, typically aborts the C++ run with hardly any explanation.


Hence, the MP library is not suitable for the casual modeler who needs to solve only a few, relatively simple models.

However, when the need arises for solving complex models, especially with structures which cannot easily be expressed algebraically, or whose solution requires interaction with the operating environment, modeling within C++ brings a number of advantages. First, the programmer has all the flexibility of a programming language, including the ability to write subroutines to manipulate models or model components (variables etc.) or data, to interface with the operating system, for instance to read and write data or present graphics, or to call external "black-box" subroutines. Second, although every run of a model involves compilation and linking, which may be slow compared to the speed with which a modeling system processes a model file, the resulting executable program may run at significantly higher speeds, especially if the modeling system uses interpreted code. This can make a significant difference in cases where the same model is executed repeatedly on different problem instances, e.g., as part of an operational system, or where the data processing within the model (or model setup) is computationally intensive.

The use of algebraic modeling languages tends to favor a concise and compact model representation, where the model formulation can (and should) be independent of the actual problem instance to be solved, such as the number of variables and constraints, and the actual data values. The model formulation is also independent of which solver will ultimately solve the problem.

These highly desirable properties of a modeling system, as summarized by Fourer [1983], are presently only partly supported by the C++ class library, and to some extend are a matter of programming style which must be applied consciously. Independence between the formal model formulation-algebraic or not-and a specific problem instance's dimension and data can in simple cases be achieved in the C++ system by keeping input routines and model specification separate, and using dynamic arrays, as illustrated in Figure 4. In the general case, however, we recognize that much more powerful concepts and methods are needed. For example, it would be desirable to have all data instances, variables etc. indexed by sets, the size and structure of which could in turn be determined at run-time, for each problem instance. There is no concept of set in C++ and all arrays are indexed, using integers, from 0 to some upper bound. The design of flexible classes to represent sets, preferably sparse and of multiple dimensions, which could then be used to index multidimensional arrays of model components, is under current investigation. At a minimum, it should be possible to use sparse integer sets for indexing, at least in one dimension,

228 CHAPTER 11

and to use vector operations such as inner products to approximate a natural, algebraic notation. Steps towards this goal will be presented in Section 4.

Independence between model specification and solver, on the other hand, presents no serious problem. As illustrated in our examples, the solution of a model requires the call of one of the optimization functions. These can in turn interface with any available solver suitable for the problem at hand. The solver can either be external (as in GAMS), in which case a problem representation needs to be written to some common area of memory or the file system, and a system call be issued to invoke the solver, or it can be a subroutine which is linked into the system. The latter method was used to connect the class library to the LOQO solver by Vanderbei and Carpenter [1993], for LPs, and to a specialized zero-one minimax solver, which is currently being developed for the purpose of solving mixed-binary linear programs through a generic C++ implementation of Benders decomposition (See next Section).

3.3 Extending and Specializing the Library

One of the principal advantages of C++ over standard modeling languages is that it allows the use of existing classes and objects as building blocks to create more general or specialized structures. We illustrate this flexibility with two examples: First, we show how the MP library classes can be specialized to create an environment for the specification and solution of network models. Second, we illustrate how the capabilities of the library can be extended by outlining how a general implementation of a decomposition algorithm can be written in a problem-independent way, using the build-in solvers as building blocks.

The first example shows the specialization of the MP classes to network modeling. The customization code is shown in Figure 3. We define classes to represent nodes and arcs of the network using the existing constraint and variable classes, respectively. The class Network-node is defined as a sub-class of the existing class HP_constraint, another class NetworkArc as a sub-class of HP -variable, and finally a class Network..model as a sub-class of HP ..I1Iodel. New classes defined in this way inherit all the properties of their parent classes, and can be used wherever the parent classes can. For instance, Network..nodes are (specializations of) constraints, and have constraint types and duals associated with them, Network_arcs are variables which can be used in flow conservation constraints of arcs, and a Network..model automatically has an

A C++ Class Library

II Defining network data types (nodes, arcs) II using existing classes.

#include "MP_library.h"

class Network_node: public MP_constraint { public:

}j

Network_node 0 II Constraints are equality constraints by default: { constraint_type = eqj }j

class Network_arc: public MP_variable { public:

229

void connects(Network_nodet from_node, Network_nodet to_node) {

}j

};

II Add this arc to balance constraints of from- and to-node. II The expression '*this' refers to the arc object for which II the 'connects' function was called. from_node.rhs += *thisj to_node.lhs += *thisj

class Network_model: public MP_model { }j

Figure a netvork . h: Defining cues to represent the nodes and ercs of network models.

230 CHAPTER 11

II Define and solve a generic network model. #include "network.h"

void maine) { int i, num_nodes, num_arcs, *from_node, *to_node;

double *supply, *cost;

}

read_network(num_nodes, num_arcs, from_node, to_node, supply, cost);

array<Network_node> array<Network_arc> Network_model net;

node(num_nodes); arc(num_arcs);

II Define network topology: for (i = 1; i<= num_arcs; i++)

arc[i].connects( node[from_node[i]], node[to_node[i]] );

II Add supplies to right-hand sides of nodes II (demands are negative): for (i = 1; i<= num_nodes; i++) node[i].rhs += supply[i];

II Set up the objective function: for (i = 1; i<= num_arcs; i++)

net.objective += cost[i]*arc[i];

II The model consists of all the nodes: for (i = 1; i<= num_nodes; i++) net.subject_to( node[i] );

II Finally, solve the model: net.minimizeO;

Figure 4 Example of the use ofthe simple network classes to express a generic network model in terms of nodes and arcs. The routine read-lletvork initializes its parameters with the network lize, topology and other data; its definition is not shown. The declarations array< ... > declare dynamic array. of the type specified between the brackets. Note that this specification i. completely independent of the specific problem instance to be solved, and of the solver UBed.


objective function and the functions maximize 0, minimize 0 and solve (), and accepts Network..nodes as constraints.

The definition of Network..node states that the default constraint type is equality. A Network_arc has a function connects which is called to specify the nodes to which the arc is incident, and which modifies the balance constraints of the two nodes to take the arc's flow into account. This is done by adding the arc variable to the constraints' right-hand and left-hand sides, rhs and lhs. The class Network.model in this example adds nothing to the definition of MP .model, but could differ, for instance in having a network solver as the default solver.

As a result of these definitions, the user is now able to think in terms of nodes and arcs. The network topology needs to be defined to the system, but the fact that network models implicitly have balance constraints for each node is captured once and for all in the definition of the lIetwork..node class, so no (explicit) constraints are needed.

For completeness, an example of the use of these network classes is given in Figure 4, which implements a generic model for pure network problems. While this definition of network modeling concepts may be too simplistic for realistic use, it illustrates the point that C++ allows extensive customization and reuse of existing class libraries to meet specific users' needs. We mention that the language AMPL also has specialized facilities for handling network models, but they are an integral part of the language, not a user-defined extension.

Our second example shows how one can use subroutines in C++ to encapsulate and generalize common operations, just as in any other programming language. We will outline how one could write a routine to implement a decomposition algorithm in a quite general fashion. The user should be able to set up a model-for instance, a mixed-integer model-and then call this decomposition routine with the model as a parameter to have it solved. This should be contrasted with, for instance, GAMS. In order to solve a model by decomposition in GAMS, the modeler needs to specify the complete algorithm for his/her specific problem, by setting up master and subproblem definitions, and implement an explicit loop which updates these problems appropriately, while checking for convergence. Until the latest version of GAMS, 2.25, this was further complicated by the lack of a suitable looping facility. Usually, the structure of the underlying model is obscured by this rewriting process, and if it changes, only an expert (usually the original modeler) can change the model. The problem clearly is the difficulty in separating the model which is solved, from the algorithm-as expressed within the modeling system-for solving it.

232 CHAPTER 11

#include "mathprog.h"

II Define Benders Decomposition Routine: void Benders(MP_model& model) {

}

}

II Informal outline of algorithm: MP_model Master, Sub;

II Here we know nothing about the size or structure II of 'model'. Get that information by looping through II the constraints.

for current_constraint = each constraint in model do { if (only integer variables in current_constraint)

Master.subject_to( current_constraint );

}

II Add to master else

Sub.subject_to( current_constraint ); II Add to subproblem

while (not converged) { Master.minimize(); II Uses binary min-max routine

Sub at solution to Master; Fix integer variables in II Uses LP solver Sub. minimize 0 ;

Master.subject_to( new_cut );

void maine) {

}

MP_variable MP_binary_variable MP_constraint ... ; MP_model my_model;

II Set up model as usual, then instead of II my_model.minimize(), do: Benders( my_model );

Figure 5 illustration of the capability in C++ to write general routines which work with class objects as parameters. This example outlines the definition and use of a general routine for solving mixed-integer models by decomposition.


We outline in Figure 5 how a general (i.e., model-independent) implementation of Benders decomposition can be written. The Figure shows the clear separation between the model specification in the main program, and the application of the decomposition algorithm to solve it. Hence, the decomposition routine is immune to changes in the model, and vice-versa. The decomposition routine could easily be equipped with optional parameters to indicate which master and subproblem solvers the user would like to use. A recently developed implementation of Benders decomposition within the C++ system, for use in solving mixed-binary LPs, uses the LOQO solver to solve the LP subproblems, and a specialized binary min-max solver to solve the master problems. The actual implementation, which is documented in a forthcoming working paper, is too long for this paper, but follows the above outline closely.

4 ALGEBRAIC NOTATION AND SPARSE ARRAYS

One of the primary advantages of modeling languages is the use of algebraic notation, which provides a very compact way to express operations on vectors and matrices, such as multiplication or addition. In order to introduce such convenient notations in C++ we first need to define a suitable notion of vectors and matrices, with elements consisting of numbers, variables, etc. While C++ allows the definition of arraysl of any type, there are no operators associated with them (except indexation), and it is not possible to define operators on standard arrays in C++. Instead, it is necessary to define classes to implement vector objects. Since these classes ideally should allow the definition of vectors of arbitrary type, they are naturally defined as template classes, i.e., classes which take type parameters (see, e.g., Ellis and Stroustrup [1990]). We describe here how vectors are implemented in the C++ class library, and the simple operations we have defined, and how this leads to a convenient notation for simple vector expressions, not unlike that offered by some modeling languages. These classes and operators could then be used as building blocks for defining matrix objects, but for this paper we have limited ourselves to vectors.

A vector, as an abstract object, is a way to store objects by integer indices, so they can later be retrieved. The simplest implementation of a vector uses fixed bounds, and allocates space for each element with index within the bounds. For use with mathematical programs, such dense data structures are insufficient: There are many cases where only very few of the potential elements of a vector

1 We use the terms array and vector interchangeably, although array tends to be a programming term, and vector a mathematical term.

234 CHAPTER 11

are actually used, and to allocate space for all the potential elements within the index space would be extremely wasteful. Also, subsequent operations on such vectors could be executed much faster if it were known in advance which ones were unused. Hence, an implementation of sparse vectors is needed.

We implement sparse arrays as a template class whose primary operator is the indexation operator, operator[J. The underlying data structure is a balanced, binary tree, in which space is allocated only to elements which are referenced through the indexation operator. Initially, the tree is empty, and it grows in a balanced way whenever a new element of the vector object is referenced (There is also a way to remove elements, in which case the tree shrinks while staying balanced). The particular variant of tree we use is the AVL-tree, AdelsonVelskii and Landis [1962]. These trees allow access (including insertion and deletion) to elements in logarithmic time. We have modified the implementation of AVL trees from Wirth [1976] to allow sequential access to all elements of the sparse vector, which is needed to efficiently implement arithmetic operators on such classes. (Section 4.1). Of course, since the use of sparse data structures for small, or dense, arrays is necessarily wasteful, the implementation should ideally be able to switch automatically between a dense and a sparse representation. We have not implemented this facility.

Sparse arrays of double precision numbers and MP _variables are declared using the types sparse_array and MP _var -array2, respectively. We have defined the operator * for inner product on these types, and illustrate in Figure 6 their use. Of course, additional operators, such as addition or subtraction, should also be defined for a complete system. The model shown is concerned with constructing a portfolio of stocks, such that the weighed average portfolio f3 ("beta") is equal to 1, and the expected portfolio return is maximized. The inner product is used in the definition of the meet_beta constraint, and in the objective.

Another facility of algebraic notation is to write summations and similar operations compactly. Although we have not yet incorporated such operators into the system, we show (also in Figure 6) how this could be done, using the summation operator as an example. The resulting summation operator is used in the definition of the sum..x constraint. This code, which is explained below, illustrates that user-defined operations can relatively easily be incorporated into models, which is in contrast with most modeling languages.

2These types are typedef's for the template definitions Sparse_vector<double> and Sparse_vector<KP_variable>, where Sparse_vector is the template vector class.

A C++ Class Library

II Small portfolio model

#include "MP_library.h"

MP_expl sum(MP_var_arrayl x)

{ II Define a summation function

MP_expl result = *new MP_exp; Iterator X(x);

II To accumulate result II To iterate through x

while (X.more(» result += x[X.current()]; return result;

}

void main() {

sparse_array exp_ret, beta; double target_beta = 1.0;

MP_var_array x; int Num_stocks;

for (int i = 1; i <= NUDLstocks; i++) x[i].set_upper( 0.25 );

}

MP_model portfolio;

MP_constraint meet_beta = SUDLX =

beta * x == target_beta ), ( sum(x) == 1 );

portfolio.subject_to(meet_beta, sum_x); portfolio.maximize( exp_ret * x );

Figure 8 An example of the use of sparse arrays. The multiplication operator * on arrays is the usual inner product. The routine read..data initializes IUllUltocks, exp-ret and betaj it. definition is not shown. Nothing in the model depends on the number of stocks in the universe.

235

236 CHAPTER 11

4.1 Iterators

The use of sparse vectors is indispensable in mathematical programming, but raises certain problems when we try to introduce operations on such objects. To calculate, for instance, the inner product between two sparse vectors, we need to access only the previously referenced elements (assuming that the nonreferenced elements correspond to zeros). Without wanting to access, or spend time skipping past, elements which are not present, how can this be done?

To illustrate, consider this simple implementation of the inner product of the real arrays x and y:

long i; double product = 0;

tor (i = 1; i <= Limit; i++) product += xCi] * y[i];

return product;

This code is fine for dense vectors, but in the case of sparse vectors suffers from two problems: First, each and every element of both x and y between 1 and Limit are accessed (and hence, in our implementation, allocated space), and the multiplications then executed. Second, this code requires knowledge of the bounds (dimension) of the vectors, even though it is often natural to think of sparse vectors as having no particular bounds at all (being "boundless").

A way around these problems is provided by iteratorl. Iterators allow accessing only the elements which have been allocated, without knowledge of the internal implementation of the array. An example of the use of iterators is given in Figure 6. The declaration Iterator X(x) associates the iterator X with the sparse array x. Subsequent calls of iter at or function X. more () returns True if there are more elements allocated in x, and in this case X. current () returns the index of the next element. This solution allows accessing only the elements needed, and is completely independent of the dimensions of the vectors.

As a somewhat more involved example, we show in Figure 7 the implementation of the inner product operator for sparse arrays, which is defined in the MP library. The routine uses two iterators to iterate through the two vectors, matching up allocated elements. The code shown is for the case of double precision vectors, but is identical for vectors with elements of any type, and is implemented as a template function.

A C++ Class Library

double operator*(sparse_arrayt x, sparse_arrayt y) {

}

AVL_iterator X(x), Y(y); Bool x_more = X.more(), y_more = Y.more(); double z;

while (x_more tt y_more) if (!x_more II X.current() > Y.current(» {

y_more = Y.more(); } else

if (!y_more I I Y.current() > X.current(» { x_more = X.more();

} else {

}

z += x[X.current()] * y[Y.current()]; x_more = I.more(); y_more = Y.more();

return z;

Figure T Implementing inner products using iteraton to access only the allocated elements of sparse arrays without knowledge of their internal representation.

237

238 CHAPTER 11

5 VARIABLE ALIASING

In many models it is convenient to be able to refer to the same variable by multiple names. Under the MP library, this can be achieved by aliasing variables with each other. Aliasing does not appear to have a direct counterpart in GAMS or AMPL3 (although AMPL has a "defined variable" feature).

The concept of having multiple names for the same object is supported by C++ in the form of reference variables. In the code fragment

int i,k; int tj = i; II Now j is the same integer variable as i

j = 3; if (i != 3) Error();

j is defined to be a synonym (alias) for i. However, j will always be an alias for ij there is no way later to make j an alias for k. For MP variables we need more flexibility.

Aliases are arranged with the is_alias_of () function:

HP_variable x,y; x.is_alias_of(y); II Now x is the same MP variable as y

x.set_upper(10.3); if (y.upper() != 10.3) Error();

After the call of is_alias_ofO, the names x and y refer to the same variable, and the two constraints

x + Y >= 10; x + x >= 10;

are equivalent. We give an example of the use of aliasing from stochastic programming.

30ur use of the term "alias" does not have anything to do with the semantics of the GAMS ALIAS statement.

A C++ Class Library

array<MP_var_array> x(Hum_scen); array<MP_constraint> flow_conservation(Num_scen);

II Either impose non-anticipativity constraints---MP_constraint non_anticipativity[Hum_scen]; for (i = 1; i <= Hum_var; i++)

if (is_first_stage(i»

239

for (seen = 2; seen <= Hum_scen; scen++) Two_stage_stochastic.subject_to( x[scen][i] == x[l][i] );

II ---or, alternatively, alias the first-stage variables: for (i = 1; i <= Hum_var; i++)

if (is_first_stage(i» for (seen = 2; seen <= Hum_seen; scen++)

x [seen] [i].is_alias_for( x[l][i] );

Figure II Code fragments showing a split-variable formulation of a two-stage, atochastic program. Some of the variables x are first-stage, the rest are secondstage. Either non-..... ticipativity constraints or aliasing can be used to enforce equality among first-atage variables. The declaration array<I'IP _var .-array> x[llIIII...Stock] invokes a template cia88 array, ..... d declares x to be a (dense) array of aparse arraya of variablea, i.e., a two-dimensional array.

5.1 Formulation of Stochastic Programs

A two-stage, stochastic programming model can be viewed as a sequence of models, called scenario sub-problems, which usually have the same structure, but where the data may differ among scenarios. The scenario sub-problems are not completely separate, but share some subset of the variables, called the first-stage variables.

The first-stage variables, which represent decisions to be made up-front, before realizing one of several possible future events (scenarios), are logically identical across all scenario sub-problems, since the first-stage decision must be made without foresight. A common way to model two-stage, stochastic programs is by variable-splitting, i.e., giving each scenario sub-problem its own copy of the first-stage variables, and then adding (non-anticipativity) constraints imposing equality among the separate copies of the first-stage variables.

240 CHAPTER 11

Figure 8 shows a stochastic model in the split-variable formulation. This formulation is useful because it allows a uniform treatment of first- and second-stage variables within each scenario, as in the figure, where x consists of both firstand second-stage variables.

Aliasing provides a. way to achieve the same notational convenience without having to introduce multiple copies of the first-stage variables or nonanticipativity constraints, as also shown in the Figure. The variables names x [1] [i], x [1] [i], .'" x [Hum_seen] [i] are aliased such that they for each i

refer to the same variable across scenarios. In effect, we still use the convenient split-variable formulation, but the model implementation contains no redundancy.

More generally, variable aliasing can be used as a modeling tool for connecting separate (sub-)models within a larger system of models. Separate models, for instance for managing inventories of raw material, production, and shipping, could be maintained independently, but by aliasing key variables be solved as a large, combined model. Geoffrion [1990] gives another example where a transshipment model is hierarchically decomposed into two transportation models. This is just one approach to distributed modeling and model management; for a survey on these topics see Krishnan [1993].

6 EXTENSIONS

We discuss now several extensions to the MP library, which are the topics of curren t research.

• Multi-dimensional Arrays: Although the MP library allows the use of (sparse) arrays of numbers and variables, it does not support matrices and higher-dimensional arrays. Such objects can be declared using regular C++ arrays, but still no operations (such as matrix multiplication) are available. For advanced use of the library, one needs a general array structure, which allows for arbitrary dimensions, and which provides the natural operations for 1- and 2-dimensional arrays, i.e., vectors and matrices.

• Index Sets: Both GAMS and AMPL have rich facilities for defining oneor multidimensional sets and subsets thereof, and for defining data and variables indexed by such sets. Again, flexible set structures are indispensable for anything but the most trivial kinds of models. It appears


that such index structures could be defined using C++ classes. Managing (possibly multidimensional) sparse sets is very similar to managing sparse arrays, so set classes could be based on the machinery for multidimensional arrays. The indexing operation needs to be modified to accept indices of set class objects, and operators using sets, such as summations or products over sets, need to be defined.

• Type Information and Dimensional Analysis: Automatic consistency checking and type or unit conversions help manage large models or the integration of several models. Strongly typed programming languages such as Pascal or C++ associate a type with each datum or variable in a program, and enforce type consistency. ASCEND (Piela et al. [1991]) is an example of a modeling system which allows type or dimensional information to become part of a model. Type information helps the system guard against inconsistencies, for instance adding apples and oranges, or provide automatic conversion, for instance between meters and yards. It also helps enforce domain constraints, for instance that a node-arc incidence matrix must be indexed by pairs of numbers of type "node". In C++ type information could be represented by objects which could optionally be associated with data elements and variables. The operators defined on MP library objects, such as + and *, could then be extented to type-check (at run-time) the consistency of operand types, and possibly provide some automatic conversions.

• Nonlinear Programming: The syntax introduced allows for writing non-linear expression as part of constraints or in the objective function. Since the class library has complete control over the expressions generated, symbolic differentiation routines could be developed which would provide exact derivatives to a non-linear optimizer. At the same time, a derivative (or gradient) function could be made available at the user level, so that a modeler could write a non-linear expression, and have its gradient and Hessian generated automatically. This would aid in the development of prototype non-linear algorithms.

7 CONCLUSION

We have presented a library of classes and definitions which allow the convenient formulation of simple mathematical optimization models within the high-level programming language C++. Operators on the mathematical programming objects were defined which led to a natural, modeling language-like syntax for

242 CHAPTER 11

defining expressions, constraints and models. We have also presented data structures for storing and managing large, sparse, one-dimensional arrays of data, variables and constraints. Finally, we have mentioned several topics for future research in the areas of generalized array structures, index sets, type information, and non-linear programming.

REFERENCES

[1] G.M. Adelson-Velskii and E.M. Landis. Doklady akademia nauk SSSR. Soviet Math (English Tra.nlla.tion), 3:1259-1263, 1962.

[2] C.R. Birchenhall. A draft guide to MatClass-a matrix class for C++. Version 1.0D. Technical report, University of Manchester, Oxford Road, Manchester M13 9PL, UK, May 1992.

[3] R.E. Bixby. Implementing the Simplex method: The initial basis. ORSA Journal on Computing, 4(3):267-284, 1992.

[4] A. Brooke, D. Kendrick, and A. Meeraus. GAMS: A Uler'I Guide. The Scientific Press, 2 edition, 1992.

[5] J .0. Coplien. Advanced C++ Programming Stylel and Idioml. AddisonWesley, 1992.

[6] M.A. Ellis and B. Stroustrup. The Annotated C++ Reference ManuaL Addison-Wesley, Reading, MA, 1990.

[7] R. Fourer. Modeling languages versus matrix generators for linear programming. ACM Tra.n&actionl on Mathematical Software, 9(2):143-183, June 1983.

[8] R. Fourer, D.M. Gay, and B.W. Kernigan. AMPL: A Modeling Language for Mathematical Programming. Scientific Press, 1993.

[9] A. Geoffrion. Reusing structured models via model integration. Technical report, University of California, Los Angeles, 1990. Working Paper No. 362.

[10] B.W. Kernigan and D.M. Ritchie. The C Programming Language. Prentice-Hall Software Series, 1978.

[11] R. Krishnan. Model management: Survey, future research directions and a bibliography. ORSA CSTS NewIletter, Spring 1993.


[12) K.M. Westerberg P.C. Piela, T.G. Epperly and A.W. Westerberg. ASCEND: An object-oriented computer environment for modeling and analysis: The modeling language. Computer, in Chem. Eng., 15(1):53-72, 1991.

[13) R.J. Vanderbei and T.J. Carpenter. Symmetric indefinite systems for interior point methods. Mathematical Programming, 58:1-32, 1993.

[14) N. Wirth. Algorithml + Data Structure, = Program,. Prentice-Hall Series in Automatic Computation, 1976.

12 INTEGRATING OPERATIONS RESEARCH

AND NEURAL NETWORKS

FOR VEHICLE ROUTING

ABSTRACT

Jean-Yves Potvin and Christian Robillard

Depo.rtement d'Injormo.tique et de Recherche Operationnelle Universite de Montreo.l

C.P. 6128, Succ. Centre- Ville, Montreal (Quebec).

Canado. H3C 3J7

A competitive neural network is designed to improve the initialization phase of a parallel route construction heuristic for the Vehicle Routing and Scheduling Problem with Time Windows (VRSPTW). The neural network identifies seed customers that are nicely distributed over the whole geographic area. In particular, the weights of the network converge towards the centroids of clusters of customers, when such clusters are present. Computational results on a. standard set of problems are reported, both with a simple initialization methodology a.nd the neural network initialization.

1 INTRODUCTION

In this paper, a competitive neural network identifies good seed customers during the initialization phase of an insertion heuristic for the Vehicle Routing and Scheduling Problem with Time Windows (VRSPTW).

The VPSPTW is the focus of very intensive research, and is used to model many realistic applications.l2,3,12] The overall objective is to service a set of customers at minimum cost with a fleet of vehicles of finite capacity housed at a central depot. In order to be feasible, each route must satisfy capacity and time window constraints. First, the customers have known demands for service, like a quantity of goods to be delivered, and the total demand on a route cannot exceed the capacity of the vehicle. Secondly, each customer has a time window or time interval for its service. Since the hard time window case is

245

246 CHAPTER 12

considered, no vehicle is allowed to arrive too late at a customer, that is, after its time window's upper bound. On the other hand, a vehicle can wait if it arrives too early at a customer. In this study, the first objective is to minimize the numbers of routes. Then, for the same number of routes, the total route time is minimized.

In the following sections, a parallel route construction heuristic for the VRSPTW is described. To this end, Section 2 first introduces the basic insertion heuristic. Then, Section 3 describes the neural network model and its use during the initialization phase. Finally, Section 4 reports about computational results on Solomon's standard set of problemsJll]

2 A PARALLEL INSERTION HEURISTIC

Our insertion heuristic is largely inspired by the work of Solomon.lll] However, the routes are built in parallel rather than one by one. Hence, a set of routes must first be initialized with seed customers, with each route servicing a single seed customer. In Figure 1, for example, the black square stands for the depot and the gray circles stand for the customers to be serviced. Here, three routes are initialized with customers 1, 2 and 3. Then, the remaining unrouted customers are inserted one by one into these routes until all customers are routed.

One difficulty with a parallel insertion heuristic is to determine the initial number of routes. This difficulty does not arise with a sequential approach like Solomon's heuristic, because the routes are created and filled one by one, as needed, until all customers are routed.

In our algorithm, the initial number of routes is obtained through Solomon's heuristic. In [11], the solutions reported by the author were obtained by applying his heuristic eight times to each problem, with two different initialization criteria and four different parameter settings, and by taking the best solution. Here, we are not interested in Solomon's routes by themselves, but rather in their number. Accordingly, Solomon's heuristic is applied only once, using a single initialization criterion and parameter setting.

Once the initial number of routes is known, the problem becomes one of selecting "good" seed customers to initialize these routes. The solution to this problem is described in Section 3, where the neural network model is discussed.

Integrating OR and Neural Nets 247

3

Figure 1 Initializing three routes

Accordingly, the following description of the parallel insertion heuristic is deliberately sketchy with respect to the initialization phase (d. Step 1).

Let (40,41,42, ... ,4 .. ) be some route r with 40 and 4 .. standing for the depot. Then, the parallel insertion heuristic can be described as follows.

Step O. E6timate the initial number of route6. Apply Solomon's sequential heuristic with only one initialization criterion and only one parameter setting to estimate the initial number of routes m (note that the farthest customer from the depot is used to initialize a new route, and the parameters '\,0(1,0(2 are set to 1,1 and 0, respectively, see [11] for details).

Step 1. Initialize the rou.te6. Start the parallel insertion heuristic by selecting m seed customers in the whole set of customers. Then, use each seed customer to initialize a new route.

248 CHAPTER 12

Step 2. Insert the remaining C'U.6tomers. Consider the minimum feasible insertion costs for each unrouted customer u in each route r:

cl(i, '11., j) = O!l x cn(i, 'U.,j) +0!2 x c12(i, u,j), O!l +0!2 = 1, O!l ~ 0, 0!2 ~ O.

where

U is the current set of unrouted customers. Pu,r is the set of feasible insertion places for customer u in route r

(note that ct is set to an arbitrary large value when Pu,r is empty)

cll(i, u,j) do,u + du,; - d;,;,

buj - bj,

Then, select customer u· as follows.

max C2(U) uEU

d;. ,1 distance (in time units) between i and j. current service time at j. new service time at j, given that u is inserted before i.

~ c~ (ir(u) , u, ir('I.I)) - c~ (ir'(u), u,jr'(u)), r~r'

where

and insert u· between 4,(u·) and ir'(u·).

Step 3. Repeat Step 2 until all customers are routed (feasible solution) or until one or more unrouted customers have no feasible insertion points (no feasible solution).

It is worth noting that the insertion cost formula Cl is a weighted sum of detour and delay as in [11], while the cost C2 for selecting an unrouted customer is based on a generalized regret measure over all routes. Basically, the regret is a kind of "look ahead" that indicates what can be lost later, if a given customer is not immediately inserted within its best route. In our context, a large regret measure means that there is a large gap between the best insertion place for


a customer, and its best insertion places in the other routes. Hence, unrouted customers with large regrets must be considered first, since the number of interesting alternative routes for inserting them is small. On the other hand, customers with a small regret can be easily inserted into alternative routes without loosing much, and are considered later for insertion. Broadly speaking, this new generalized measure improves upon the classical regret measure by extending the look-ahead to all available alternatives, whereas the classical regret is rather "short sighted" and merely looks at the best and second best alternativeJ8,13]

The parallel insertion heuristic is applied to each problem with three different parameter settings (0:11 0:2) = (0.5, 0.5), (0.75, 0.25), (1.0, 0.0), and the best overall solution is selected at the end. However, this heuristic is applied more than three times to each problem, because solutions with fewer routes than Solomon's initial estimate are also considered.

This is done by stopping the procedure as soon as a feasible solution is found for a given number of routes m and a given parameter setting. The parameter settings that have not yet been tried with the current number of routes (if any) are stored, and the whole procedure is repeated with m-l routes. The number of routes is reduced in this way until no feasible solution is found over the three parameter settings for some value Tnint. At this point, we simply backtrack to mont + 1 routes and try the remaining parameter settings (if any). The best solution found over the three parameter settings with Tnint + 1 routes is the final solution of the parallel heuristic.

The next section will now explain how the initial seed customers are selected.

3 THE INITIALIZATION PHASE

Competitive neural networks are now widely used to cluster data. A large body of neural network training algorithms have been developed for this task, and can be adapted to our initialization problem. The interested reader will find more details about these networks in [4,5,6,7,10].

Figure 2 depicts a typical competitive network with two input units 11 and 12 , that encode the (:z:, y) coordinates ofthe customers, and three output units 0 1 , O2 , 0 3 associated to three different clusters (or routes). The weights on the connections between the two input units and output unit j represent the

250 CHAPTER 12

x

y

Figure:l A competitive neural network

weight vector associated to cluster j. For example, WI = (Wll' W21) is the weight vector associated to output unit 0 1 and cluster 1.

At start, the weight vectors are arbitrarily chosen, but as the training algorithm goes on, they are progressively modified so as to move towards the centroid of each cluster of customers. In Figure 3, for example, the gray circles are customers and the three w's are weight vectors (assuming three output units). The figure shows how the three weight vectors evolve in the plane so as to place themselves approximately at the center of the three clusters of customers. Figure 3(d) depicts the three clusters identified by the neural network. Namely, the weight vector Wi of output unit i is the closest weight vector to all customers within its box. Note also that the three black circles in figure 3(d) are the closest customers to the final weight vectors, and can be used as seed customers for the parallel insertion heuristic.

The training algorithm for the competitive neural network can be described more formally as follows. We consider two input units and m output units.


(a) (b)

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 WIO 0

0 0

0 0 0 wI W2 0 W3

W w3

(c) (d)

o 0 0 ~ 0 0 o 0

wI 0 Ef]rD 0 0

0 0 ow· w 2 W3 0 2 0

0 0

Figure 3 Evolution of the weight vectors over time

Each output unit j has an initial weight vector Wj(O) = (Wlj(O), W2j(O)), where Wi;(O) is the weight on the connection between input unit i and output unit j. Starting with these initial weight vectors, the training algorithm goes through the set of coordinates many times (referred to as "passes" through the data), and slightly modifies the weights on the connections after the presentation of each input.

The update of the weight vectors is done as follows for a given input vector I=(x,y). Each output unit j first computes the Euclidean distance between its weight vector Wj and the input vector. The output unit j. which is the closest

252 CHAPTER 12

to the input vector is designated as the "winner" of the competition. It means that output unit j. claims that the current customer is member of cluster j". Then, its weight vector w; is adjusted so as to get even closer to the current input vector, using the formula

wj(t + 1) = wj(t) + 11(1 - wj(t)).

The parameter 11 in this formula is the learning rate. Its initial value is smaller than 1 and is progressively reduced as learning goes on.

It is worth noting that the closest weight vector to a given customer can change from one pass to another through the coordinates, because each unit can win many times during a single pass and its weight vector is moved each time. However, the cluster membership typically stabilizes after a few passes through the data.

The coordinates of the customers are presented many times to the network until some stopping condition is met. Typically, the following objective function is used to evaluate the neural network performance (for n customers and m output units):

E= lin L i=l,no,n. j=l, ... ,m

where d(lt, wi) is the Euclidean distance between the input vector It and the current weight vector Wi, and M is the cluster membership matrix, that is, Mii is set to 1 if customer i is currently member of cluster j, and 0 otherwise. Hence, this function computes the average squared Euclidean distance between the weight vectors and the coordinate vectors of their current customers. This value decreases from one pass to another through the set of coordinates, and the training stops when the improvement gets under some threshold.

One difficulty with the above learning algorithm relates to its sensitivity to the initial location of the weight vectors. Quite often, the weight vectors of some output units just stay idle at their initial location, because they never win a competition. To solve this problem, the Euclidean distance between the weight vector of a given output unit j and the current input vector I is biased according to the following formula:[l]

d'(I,w;(t» = d(I,wj(t» * ui(t) ,

where ui(t) is the number of times unit j has won in the past (i.e. before iteration t). Consequently, if unit j was a frequent winner, the distance d! increases and the likelihood of being a winner in the near future is reduced.


Conversely, a unit is more likely to win in the near future if it did not win very often in the past. This modification allows the network to be fully active and to associate each output unit to a different cluster.

As mentioned before, the weight vectors tend to settle approximately at the centroids of the clusters of customers, when such clusters are present. Otherwise, they distribute themselves evenly in the space of coordinates so as to get an equal share of customers. Then, the seed customers for the parallel insertion heuristic are identified by selecting the closest customer to each weight vector (as depicted in Figure 3(d».

The next section will now present computational results obtained with both a simple initialization methodology and the neural network initialization, using Solomon's set of problemsJll]


In this section, we first briefly describe the set of test problems. Then, we give some details about the implementation of our parallel heuristic. Finally, computational results on the test problems are reported.

4.1 Test problems

For testing purposes, Solomon's standard set of VRSPTWs was used.lll] The design of these problems highlights factors that can affect the behavior of routing and scheduling heuristics, like geographical data, number of customers serviced by a vehicle and time window characteristics (e.g. percentage of timeconstrained customers, tightness and positioning of the time windows).

The geographical data were either randomly distributed according to a uniform distribution (problem classes Rl and R2), clustered (problem classes Cl and C2) or mixed with randomly distributed and clustered customers (problem classes RCI and RC2). Classes Rl, Cl and RCI have a narrow scheduling horizon, and allow only a few customers per route. Conversely, classes R2, C2 and RC2 have a large scheduling horizon and allow a larger number of customers per route.

254 CHAPTER 12

Each class includes problems with large time windows, tight time windows or a mix of large and tight time windows. For problems of type Rand RC, the time windows have a uniformly distributed, randomly generated center and a normally distributed random width. For problems of type C, the time windows are positioned around the arrival times at customers after a 3-opt, cluster by cluster routing solution. Finally, all problems are 100-customer Euclidean problems and include capacity constraints, in addition to the time windows. The reader is referred to [11] for additional details on this problem set.

4.2 Implementation

In the current neural network implementation, the following parameter settings and initial conditions are used:

(a) all the weight vectors are initially located at the origin

(b) the learning rate TJ is initially set to 0.8 and its value is decreased from one pass to another through the set of coordinates, according to the following formula (as suggested in [1]):

TJ = 0.8e- O.239(1'-1)

In this formula, p is the number of passes through the whole set of coordinates, and is initially set to 1.

(c) the stopping criterion is: .6.E < 0.001 .

Hence, the training algorithm stops when the objective function E that monitors the neural network performance decreases by less than 0.001 after one complete pass through the customers.

4.3 Results

Table 1 shows the results obtained with different route construction heuristics for the six classes of problems in Solomon's test set. The parallel insertion with the neural network initialization is referred to as P ARAL-NN. For comparison purposes, we also provide the results with the same parallel insertion heuristic, but with the initialization methodology suggested in [9], namely:


(a) Initialize the first route with the farthest customer from the depot. Put this customer in the current set of seed customers.

(b) Repeat until all routes are initialized:

Initialize the next route by selecting the farthes customer from the current set of seed customers (so as to evenly distribute the seed customers over the whole geographic area). Formally, select customer u· such that:

min du·. = max (min du.), .es ueU .es

where S is the current set of seed customers and U is the set of unrouted customers. Then, add u· to S and remove u· from U.

The parallel insertion heuristic with the above initialization methodology is referred to as PARAL-F in Table 1. We also show the results with Solomon's heuristic, with the two approaches suggested by the author to initialize a new route, namely (1) select the farthest customer from the depot and (2) select the customer with the earliest deadline (earliest time window's upper bound). In each case, we used the four parameter settings suggested by Solomon, and took the best overall solutionJl1) The two heuristics are referred to as SOLO-F (Farthest) and SOLO-D (Deadline) in Table 1.

For each class of problems, the average number of routes, travel time, waiting time, route time and computation time are shown. In the Table, "Route time" refers to the sum of travel time, waiting time and service time. For the problems of classes 01 and C2, a fixed service time value of 9,000 must be added to the sum of travel time and waiting time. For the other classes, a fixed service time value of 1,000 is added.

The computation times in seconds were obtained on a Sparc 2 workstation. For PARAL-NN and PARAL-F, the application of Solomon's heuristic to estimate the initial number of routes is included in the computation time (c.f. Section 3). For PARAL-NN, the contribution of the neural network to the computation time is shown between parentheses. For example, the average computation time on problem class R1 is 5.2 seconds, and the neural network initialization took 1.2 seconds, that is, approximately 20% of the computation time.

Of course, some care should be taken when interpreting the computation times associated to the neural net~ork. Specifically, the neural network was

256 CHAPTER 12

simulated on a serial machine, and its inherent parallelism and locality were not exploited at all. On a neural chip, the competition among the units would be held in parallel, and the overall run time complexity of one pass through n customers would be O(n). Accordingly, the computation times would be substantially reduced. Globally, the total number of routes over all problems is the following for each heuristic:

PARAL-F

PARAL-NN

466

457

SOLO-D

SOLO-F

466

457.

PARAL-NN improves PARAL-F, by allowing a total of 9 routes to be saved. This ability to save routes is very important in real-world applications, because each vehicle is associated to important acquisition and maintenance costs. Also, the total number of routes is now the same for PARAL-NN and SOLO-F. Overall, PARAL-NN outperforms SOLO-F on problem sets R2, C2, RC2, while SOLO-F is better on sets Rl, Cl and RCI.

It is worth noting that the neural network initialization does not provide any improvement over the simple initialization methodology on problem classes RCl and RC2. In these problems, the time windows greatly impact the shape of the routes. Since the neural network exploits spatial relationships among the customers to initialize the routes, it can hardly provide any improvement when the time windows do not preserve some of the geographical characteristics of the problems.


In this work, a competitive neural network was incorporated into a heuristic algorithm for the VRSPTW. The neural network was· used to select appropriate seed customers during the initialization phase. With respect to this work, we note two possible avenues for further research.

First, the neural network initialization was based on spatial considerations only. Better results could possibly be achieved by considering both spatial and temporal issues during the neural network initialization. In this case, a third input unit encoding the time dimension would be added to the neural network (e.g. the time window's upper bound at each customer). This approach could be beneficial on problem classes RCl and RC2 in particular.


A second research avenue would focus on ART-like networks.[6] In our competitive network, the number of output units is fixed. Since each output unit stands for a cluster (a route), the initial number of routes must first be estimated via Solomon's heuristic. In ART networks, the number of active output units is not fixed, but is dynamically adjusted to each problem. With this model, the parallel insertion algorithm would not rely anymore on Solomon's heuristic to estimate the initial number of routes.

Acknowledgements

We would like to thank Tanguy Kervahut for his help during the computational experiments. Also, this research would not have been possible without the financial support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds pour la Formation de Chercheurs et l' Aide a~ la Recherche of the Quebec Government (FCAR)

258 CHAPTER 12

R1 Number Travel Waiting Route Comput. (12 problems) of time time time time

Routes Routes (sec.)

SOLO-D 14.0 1507.2 308.9 2816.1 0.6 (deadline)

SOLO-F 13.6 1408.3 281.3 2689.6 0.6 (farthest)

PARAL-F 13.9 1550.4 185.7 2736.1 4.2 (farthest)

PARAL-NN 13.6 1539.4 189.7 2729.1 5.2 (neural net) (1.2)

R2 Number Travel Waiting Route Comput. (11 problems) of time time time time


SOLO-D 3.3 1413.0 168.0 2581.0 3.1 (deadline)

SOLO-F 3.4 1419.1 212.4 2631.5 3.1 (farthest)

PARAL-F 3.3 1412.3 114.4 2526.7 4.0 (farthest)


Table 1a Random Problems


Cl Number Travel Waiting Route Comput. (9 problems) of time time time time


SOLO-D 10.0 1029.4 145.7 10175.1 0.6 (deadline)

SOLO-F 10.0 927.2 225.3 10152.5 0.6 (farthest)

PARAL-F 10.9 1548.8 439.2 10g88.0 3.9 (farthest)


C2 Number Travel Waiting Route Comput. (8 problems) of time time time time


SOLO-D 3.3 867.1 259.4 10126.5 1.8 (deadline)

SOLO-F 3.5 752.9 110.6 9863.5 1.8 (farthest)

PARAL-F 3.8 1098.5 328.4 10426.9 3.2 (farthest)


Table Ib Clustered Problems

260 CHAPTER 12

RC1 Number Travel Waiting Route Comput. (8 problems) of time time time time


SOLO-D 14.3 1768.8 250.3 3019.1 0.6 (deadline)

SOLO-F 13.5 1596.4 175.8 2772.2 0.6 (farthest)

PARAL-F 13.4 1769.5 110.0 2879.6 4.1 (farthest)


RC2 Number Travel Waiting Route Comput. (8 problems) of time time time time


SOLO-D 4.0 1760.4 313.8 3074.2 2.3 (deadline)

SOLO-F 3.9 1686.5 312.2 2998.7 2.5 (farthest)

PARAL-F 3.5 1616.9 189.7 2806.6 3.3 (farthest)


Table Ie Mixed Problems


REFERENCES

[1] S.C. Ahalt, A.K. Krishnamurthy, P. Chen, and D.E. Melton, 1990. Competitive Learning Algorithms for Vector Quantization, Neural Networ1cs, 3,277-290.

[2] L. Bodin, B.L. Golden, A.A. Assad, and M. Ball, 1983. Routing and Scheduling of Vehicles and Crews: the State of the Art, ComputerlJ and OperationlJ RelJearch, 10, 63-211.

[3] M. Desrochers, J.K. Lenstra, M.W.P. Savelsbergh, and F. Soumis, 1988. Vehicle Routing with Time Windows: Optimization and Approximation, in B.L. Golden and A.A. Assad (eds.), Vehicle Routing: Method8 and Stu.dielJ, North-Holland, Amsterdam, pp. 65-84.

[4] S. Grossberg, 1976. Adaptive Pattern Classification and Universal Recoding I. Parallel Development and Coding of Neural Feature Detectors, Biological CyberneticlJ, 23, 121-134.

[5] S. Grossberg, 1976. Adaptive Pattern Classification and Universal Recoding II. Feedback, Expectation, Olfaction, illusions, Biological CyberneticlJ, 23, 187-202.

[6] S. Grossberg, 1987. Competitive Learning: From Interactive Activation to Adaptive Resonance, Cognitive Science, 11, 23-63.

[7] T. Kohonen, 1988. Self Organization and AlJIJociative Memory, Second Edition, Springer-Verlag, Berlin.

[8] S. Martello and P. Toth, 1981. An Algorithm for the Generalized Assignment Problem, in J.P. Brans (ed.), Operational RelJearch'81: ProceedinglJ of the Ninth IFORS Int. Conf. on Operational RelJearch, North-Holland, Amsterdam, pp. 589-603.

[9] J .Y. Potvin and J .M. Rousseau, 1990. A Parallel Route Building Algorithm for the Vehicle Routing and Scheduling Problem with Time Windows, Technical Report CRT-729, Centre de Recherche sur les Transports, Universiteffi. de Montreal, Montreal,Canada.

[10] D.E. Rumelhart and D. Zipser, 1985. Feature Discovery by Competitive Learning, Cognitive Science, 9, 75-112.

[11] M.M. Solomon, 1987. Algorithms for the Vehicle Routing and Scheduling Problem with Time Window Constraints, OperationIJ RelJearch, 35, 254-265.

262 CHAPTER 12

[12) M.M. Solomon and J. Desrosiers, 1988. Time Window Constrained Routing and Scheduling Problems, 1ransportation Science, 22, 1-13.

[13) F. Tillman F. and T. Cain, 1972. An Upper Bounding Algorithm for the Single and Multiple Terminal Delivery Problem, Management Science, 18, 664-682.

13 USING ARTIFICIAL INTELLIGENCE TO

ENHANCE MODEL ANALYSIS

ABSTRACT

Ramesh Sharda and David M. Steiger*

College of B'lUinels Administration

Oklahoma State University Stillwater, Oklahoma 7-107-1

"'School of Business University of North Carolina, Greensboro

Greensboro, North Carolina

The purpose of mathematical modeling is to generate insights into the decision making environment being modeled. Such insights are often generated through the analysis of several, if not many, related model instances. However, little theory and only a few systems have been developed to support this basic goal of modeling. Nonlinear modeling capabilities of neural networks and related methods can be employed to identify patterns within the multiple 'what-if' instances.

The purpose of this paper is to describe a prototype, artificial intelligence-based system, named INSIGHT, which analyzes multiple, related model instances to identify key model parameters and develop insights into how these key parameters interact to influence the model solution.

1 INTRODUCTION

Once a Decision Support System (DSS) model is built, validated and run for an initial set of assumptions and instantiating values, the decision maker's job has just begun. There follows an extensive set of what-if questions and associated model instances which explore the workings and trade-oft's of the business system represented by the model [13]. That is, the decision maker tries to develop insight(s) into the interrelationships between changes in model parameters and their eft'ects on the model solution.

263

264 CHAPTER 13

Such insights are almost always generated through the analysis of several, if not many, related model instances [9]. However, insightful analysis of multiple what-if model instances becomes difficult as more and more instances are considered, especially if the interrelationships become nonlinear. Further, very little theory has been provided to enhance this insightful analysis process.

The purpose of this paper is to describe a prototype system, named INSIGHT, which analyzes multiple model instances corresponding to what-if cases generated by the decision maker in his search for model insights. Specifically, the INSIGHT prototype identifies the key parameters in the model and the relationships between these key factors and the model decision variables. Our discussion is divided into seven parts. Section 2 describes current model analysis tools. Section 3 describes the INSIGHT system and the technologies utilized. Section 4 provides a sample INSIGHT session. Section 5 describes a sample problem. Section 6 provides a summary of results. And Section 7 lists several research directions.

2 CURRENT ANALYSIS TOOLS

Several systems and techniques have been developed and reported in the literature which support, to varying degrees, parts of the insight-generating process. Specifically, there are three techniques designed for the analysis of optimizationbased models: PERUSE [17], ANALYZE [10, 11, 12] and candle-lighting [14]. All three of these approaches are limited, by design, to single instance analysis. In addition, there is a software systems designed for the analysis of simulationbased models named I-KBS [20). This system is similar in scope and limitations to those of the LP analysis software. And finally, there are two systems available for spreadsheet-based models: ROME/ERGO [15,16] and its extended and commercially available system, IFPS/PLUS [6]. These spreadsheet systems are limited to comparing variables which use the same formula for their computation; e.g., they help determine the cause for changes in a variable appearing in two different years in a single spreadsheet or the same variable appearing in two different spreadsheets.

In addition to these paradigm-dependent systems, there are three model analysis systems which are independent of their modeling paradigms. One such system is LISA [22, 23) which accepts, as input, a set of model instances generated via Monte Carlo simulation. It uses standardized rank regression coefficients and partial rank correlation coefficients to determine the key parameters in a

Artificial Intelligence For Model Analysis 265

probabilistic model. Another is named Global Sensitivity Analysis [24], which provides extended sensitivity analyses based on the analysis of a set of (many) solved model instances. Global Sensitivity Analysis uses a pre-specified set of nonlinear, multinomial terms involving the parameters of interest, and backward stepwise regression to find the "best fit" set of these terms which explain variations in the model solutions, then uses the resulting multinomial expression to determine the extended sensitivity of the original model to changes in a given parameter. The third paradigm-independent analysis system is INSIGHT, whose goal is to generate a simplified auxiliary model which helps the decision maker develop insight(s) into what the key parameters are and how they interact to influence the model's solution. INSIGHT is discussed further in Sections 4 and 5 below.

3 INSIGHT SYSTEM DESCRIPTION

To provide an environment which can generate insights into the model's key factors and their relation to the model recommendations, we suggest the applications of technologies which can analyze many instances simultaneously and generate nonlinear, as well as linear, relationships. Once such technology which has been mentioned as applicable to this task is artificial intelligence [3, 5]. AI seems especially attractive due to its ability to analyze many instances in generating appropriate relationships; e.g., knowledge extraction in (intelligent) databases [8]. Further, neural networks and related AI technologies have been recognized as having excellent pattern recognition capabilities, an important aspect of relation generation.

INSIGHT is an artificial intelligence-based system which analyzes, multiple model instances to: 1) identify the critical model parameters, and 2) generate one or more simplified auxiliary models. This system, using the group method of data handling (GMDH), accepts as input the tuples representing solved model instances stored in EXCEL's Scenario Manager [18]. The system output consists of a set of key factors, as well as the simplified auxiliary model which explains a high proportions (say 80%) of the total variance in model output across instances.

G MDH is a self-organizing method of model generation which produces a model in the form of polynomials of arbitrarily high degree describing an output variable, Y, in terms of a subset of input variables, :Ill> :Il2, ••• ,:Iln , their cross products and powers. The final model is formed from a cascading network, with the

266 CHAPTER 13

input of each network layer consisting of polynomial output(s) from previous layer(s), one or more original independent variables, Zj, or some combination of both. The output of each layer is commonly a quadratic of the form

(1)

although some GMDH algorithms use other polynomial forms, such as incomplete cubics [2] or a combination of linear, quadratic and partial cubic polynomials [1J.

At each layer, the GMDH algorithm commonly generates a set of submodels of eq. (1) based on different pairs of input variables, and/or outputs from previous layers; the parameters, ak, are estimated by ordinary least squares based on the training data set. The algorithm then simplifies each polynomial of eq. (1) by using stepwise regression to eliminate some of the terms, if possible; this step results in much simpler final polynomials. The resulting polynomial submodels at each network layer are then ranked by their goodness of fit to the test data set, with the highest ranking submodel designated as that layer's output polynomial. The goodness of fit criterion may be based on the standarderror statistic, R, or some combination of squared-error statistic and an overall model complexity factor (7]. This procedure reduces the risk of "overfitting" the model; i.e., of fitting noise in the system. The output of the final layer is the polynomial of arbitrarily high degree.

The primary advantages of the GMDH algorithm include the following: 1) no prior knowledge of the form of the model is required, 2) a smaller set of samples is required as compared to statistical regression, 3) no assumptions with respect to linearity or continuity of the solution values or normality of residuals are required, and 4) chances of overfitting are reduced [1, 2, 7, 19]. The primary disadvantages of GMDH include: 1) GMDH dies not guarantee the best model, only a good one, 2) simple functions, such as sin(z), l/z, y'Z, may be unrecognizable in their complex polynomial approximation forms, much like the Taylor-series expansion of such terms is unrecognizable. See [7J for a set of articles describing the GMDH algorithm and its applications.

The general functional characteristics of the INSIGHT system are shown in Figure 1. The system consists of three primary modules: 1) a model generation and storage facility, 2) a candidate relation/pattern generator, and 3) a simplified auxiliary model generator. Each of these modules is discussed individually below.


3.1 Model Generation and Storage

Microsoft EXCEL [18] is used as the model generator for the INSIGHT system, whereas EXCEL's Scenario Manager function is used to store the model instances. This package was chosen over other spreadsheet-based packages for two primary reasons: 1) it provides an add-in capability through which a mixed integer linear programming solver (What'sBest! [21]) can be used to solve the facility location model in our validation test, and 2) it provides an instance storage capability (Scenario Manager) which allows the decision maker to specify which variables and parameters to save for each model instance, and thus reduces instance processing required in other parts of the system.

3.2 Candidate Relation/Pattern Generation

The Candidate Relation/Pattern Generator for the INSIGHT system is AIM [1], a commercially-available, GMDH-based package which generates a polynomial whose parameters are optimized to minimize the error between the proposed model and the training data [1]. Input to this package is the set of model instances stored as scenarios in EXCEL's Scenario Manager, as modified and re-organized by the Candidate Pre-processor, an Excel macro routine. The output of AIM is a data file containing the best-fit relations between the independent variables and their cross products and the dependent variable. Such a relation is generated at each layer of the cascaded network with the output of one layer treated as the input of the succeeding layer.

The Candidate Post-processor is a C language program which accepts, as input, the output file from AIM (filename. NET) containing node type indicators, node input variables and component coefficients. The post-processor then computes the single, overall equation for each primary node; i.e., each node using only original variables as inputs.

The output of this module is a list of those AIM terms having non-zero coefficients, some subset of which explains a high percentage (say 80%) of the total variation from average in the model solution across instances.

AIM was chosen over other GMDH-based packages and other technologies (e.g., statistical regression, neural networlts, etc.) for several reasons: 1) AIM is selforganising and requires no prior knowledge of, or assumptions concerning, the model form (e.g., whether the best model is linear or quadratic, contains crossproduct terms or not, etc.) as statistical regression does, and 2) AIM provides

268

Spreadsheet Model Specifications and Parameters

What'sBest! Solver

Keyboard Macro Control Program

User Specified Variable Names

Excel Stat~ticaI Routines

Multiple Instance Tuples From

Scenario Manager

Candidate Post-processor

CHAPTER 13

} Model Generation & Storage

Candidate RelationIPattern Generator

Simplified Auxiliary Model Generator

Simplified Auxiliary Model, Key Factors and

Coefficient ofDetermination, R2

Figure 1 The INSIGHT System


an explicit equation relating the independent variables to the instance objective functions, as opposed to the matrix of interconnecting weights provided by neural networks or the correlations and influence factors of other self-organizing packages.

In the INSIGHT software, the Candidate Relation/Pattern Generator is called and controlled by a keyboard macro program, Automate Anytime [4]. Using this keyboard macro eliminates the need for the INSIGHT user to know, and interface with, the AIM software and the pre- and post-processors.

3.3 Simplified Auxiliary Model Generator

The Simplified Auxiliary Model Generator accepts, as input, the list of candidate relations produced by the Candidate Post-processor and produces, as output, the best simplified auxiliary model. This program uses EXCEL's correlation and regression software routines. The procedure for generating the simplified auxiliary model consists of a two-step iterative process which finds and linearizes the most highly nonlinear term(s) and then implements a stepwise linear regression to find the best overall statistical model; i.e., the model which exceeds some threshold of explanatory power as determined by the coefficient of determination, say R2 > .80.

EXCEL's statistical routines were chosen to implement this module since they are already built in and they were sufficiently fast and accurate for our purposes.

4 INSIGHT-A SAMPLE SESSION

The INSIGHT software has been implemented as an add-in command available through the FORMULA menu. To execute the INSIGHT software, the user must first generate an appropriate set of instances, storing them in the Scenario Manager after specifying the appropriate set of independent and dependent variables.

After solving and storing the model instances in Scenario Manager, the user simply selects the INSIGHT command in the FORMULA menu (Figure 2). The user then specifies the names of both independent and dependent variables in the same order they are stored within Scenario Manager (Figure 3). The user then identifies the dependent variable and, if applicable, a range of possible val-

270 CHAPTER 13

ues for that variable (Figure 4). This initiates the INSIGHT data processing and eventually results in a dialog box telling the user to exit EXCEL and WINDOWS, and execute a program called process.bat. This initiates a keyboard macro which calls AIM, processes the data, and calls the post-processor.

Upon re-entering EXCEL, the user must then select the INSIGHT (CONTINUED) command in the FORMULA menu (see Figure 2) to continue processing. After a short time, the INSIGHT software displays a final dialog box, giving the user a list of key factors and one or more key relations relating those key factors (Figure 5). After perusing the information in this dialog box, the user may click on the OK button and continue other EXCEL processing.

5 A SAMPLE PROBLEM

To test the validity of the INSIGHT software, we generated a facility location model for a test case. Geoffrion [9] used a similar model to illustrate the development of insight-generating simplified auxiliary models. The facility location model, formulated as a mixed integer linear programming model, is as follows:

min L L tijZij + L fiYi j

subject to LZij =Pj for every j

"" zoo - My· < 0 L-- 'J • -for every i

j

Yi = 0, 1 for every i

Zij ~ 0 for every i, for every j

To this formulation, Geoffrion [9] added four simplifying assumptions (namely, uniformly distributed demand on a plane with demand density of pj uniform fixed costs, f, and uniform supply costs, s, for each warehouse regardless of its locationj outbound freight rates/mile, t for each warehousej and no throughput limits for any warehouse). He then used human expertise and mathematical manipulation to generate the following insight-generating simplified auxiliary model for the optimal number of warehouses, n·, for an area having A square miles of area:

n· = (A/3.05)(pt/1)2/3.

Artificial Intelligence For Model Analysis

F5 '0 0 Shlh.f5 0 0 0 o.

.o.u11ine: ... 0 GUll Seek-.. 0 Seenalla. MII'II;flf ... 0 Inalght In.l"hl(Contfnur:d]

C zsjj-- '0 0 0 O· 0 0 0 0 0 0 0 0 0 0 0 0 . - 0' 0 Ot-

0 0 0 0

250 . 0 11 -=I~ 1

t4g ; ---g'

57 1481

o 180 s.r

0 0 0 0' 0:

280. 0 0, 0, o. .o! 0 0 0.

il 0 ~ .

OJ

'~t-9

Figure 2 FORMULA Menu Showing INSIGHT Commands.

271

0 26O~

0 0 0, 0 0 0

260 1

259

272 CHAPTER 13

Figure 3 INSIGHT Dialog Box for Specifying Variable Names.

Artificial Intelligence For Model Analysis

Figul'e" INSIGHT Dialog Box for Specifying Dependent Variable and Its Range.

273

274 CHAPTER 13

Figure 1\ INSIGHT Display of Key Variables and Key Relations.


Our sample problem incorporates approximately the same set of four assumptions as Geoffrion used, within the limitation of a real-world setting. However, in our sample problem, demand is evenly distributed among the thirteen cities instead of being uniformly distributed throughout the plane. Thus, the case represents lumpy demand located in thirteen fairly centrally located (but not exactly equidistant) cities, where distances between cities are actual mileages. Our test case depicts the cities in Central Texas, an arbitrary locale chosen simply because a map showing city-to-city driving distances was handy at the time.

The sample facility location model was formulated as a 13 x 13 city mixed integer linear programming model using the What'sBest! [21] software package to solve specific instances. As a test of the INSIGHT tool, we generated and solved a set of 24 model instances, each with a different value of one or more of the following variables: total demand, p, warehouse-to-customer transportation rate, t, and/or warehouse fixed costs, f. Required solution time was approximately two hours for each model instance on a 80386SX microcomputer with a math coprocessor, the machine on which we did our testing.

Any instance which resulted in an optimal number of warehouses greater than one and less than thirteen was accepted as one of the instances to be analyzed by INSIGHT. In addition, we ensured that there were at least two model instances which depicted different values for each of the three model variables mentioned above.

The GMDH-based AIM algorithm cannot generate simple models for common terms such as l/z, .,fi and cos(z)j e.g., AIM generates an eighteenth degree polynomial to approximate cos(z), patently unsuitable for our simplified auxiliary model. To address this potential problem, we included in the AIM preprocessor a routine to automatically add l/z and .,fi for each of the independent input variables specified in Scenario Manager since these are common terms which might be potential components of any simplified auxiliary modelj a future enhancement to INSIGHT will include an optional capability for the user to specify such terms.

276 CHAPTER 13

6 RESULTS

Based on the analysis of 24 model instances, the INSIGHT system generated the following simplified auxiliary model for our sample problem:

n ~ (pt/f)

Note that this model uses the same three key variables as used in the Geoffrion model in an even more simplified form (i.e., no variable is raised to a power other than unity). This single term explained 92% of the total variation from average of the optimal number of warehouses; i.e., R2 :::: .92. A side-by-side comparison of Geoffrion's results and those of INSIGHT is shown in Table 1.

Also shown in Table 1 is an indication of the insensitivity of the INSIGHT results to the number of instances analyzed. In this model, at least, the INSIGHT results degrade fairly gracefully (with respect to R2) down to the 10-15 instance range. This indicates that only a modest number of intelligently selected instances are required to generate the insightful results shown. Actual results in other models would depend on the model, modeling paradigm, specific instances, complexity of relationship, etc.

Thus, the INSIGHT tool, using concepts of artificial intelligence as applied to the analysis of multiple model instances, was able to duplicate the insightgenerating simplified auxiliary model produced by Geoffrion without using human expertise or mathematical manipulations based on simplifying assumptions as did Geoffrions's analysis. This provides a basic validity test of the concepts and implementation of the INSIGHT tool.

7 RESEARCH DIRECTIONS

There are several areas for future research suggested by the INSIGHT prototype. First, we need to explore other pattern recognition technologies which would provide potential key relations. Such technologies might include selforganizing nonlinear regression techniques based on something other than third degree polynomial representations, or other self-organizing polynomial techniques which employ quality measures more conducive to filtering out unimportant terms and keeping only key terms.

A second research direction might include investigations of potential applications of the INSIG HT methodology, not only in generating key factors and rela-


tions for analysis in a decision support system, but also to suggesting relationbased heuristics for solving "tough" problems in management science literature, such as machine layout problems, specific traveling salesman problems, etc. Such problems might be solved and explained more readily and/or efficiently by the simplified auxiliary model produced by the INSIGHT methodology.

A third research direction might include the design and execution of an experiment which would test the ability of the INSIGHT system to enhance the ability of decision makers to generate insights in a wide variety of real world problems.

Table 1 Test Problem Results and Comparison.

Geoffrion

Method: Human Expertise Mathematical Knowledge Mathematical Simplification Mathematical Manipulation

Relation Generated: n· ~ (dt/1)2/3

INSIGHT

Method: Artificial Intelligence Statistical Software More Realistic Assumptions Multiple Instances

Relation Generated: n· ~ dt//

Results from INSIGHT by Number of Instances

# Instances 15 24 48

REFERENCES

R2 for Relation .88 .92 .94

[1) AbTech, 1990. Abductory Inductive Models-User's Manual. AbTech, Inc., Charlottesville, VA.

[2] R. L. Barron, A. N. Mucciardi, F. J. Cook, A. R. Barron and J. N. Craig, 1984. Adaptive Learning in Networks: Development and Applications in the U.S. of Algorithms Related to GMDH. In S. J. Farlow (Ed.)

278 CHAPTER 13

Self-Organizing Methods in Modeling: GMDH Type Algorithms. Marcel Dekker, New York, 25-66.

[3] J.J. Brennan and J.J. Elam, 1986. Understanding and Validating Results in Model-Based Decision Support Systems. Decision Support Systems 2, 49-54.

[4] Complementary Solutions Inc., 1992. Automate Anytime User Guide. Atlanta, GA.

[5] J.J. Elam and B. Konsynski, 1987. Using Artificial Intelligence Techniques to Enhance the Capabilities of Model Management Systems. Decision Sciences 18:3, 487-501.

[6] EXECUCOM, 1992. Interactive Financial Planning System: User's Manual. EXECUCOM, Austin, TX.

[7] S. J. Farlow (Ed.), 1984. Self-Organizing Methods in Modeling: GMDH Type Algorithms. Marcel Dekker, New York.

[8] W.J. Frawley, G. Piatetsky-Shapiro and C.J. Matheus, 1992. Knowledge Discovery in Databases: An Overview. AI Magazine. Fall, 1992, 57-70.

[9] A.M. Geoffrion, 1976. The Purpose of Mathematical Programming Is Insight, Not Numbers. Interfaces. 7:1, 81-92.

[10] H.J.Greenberg, 1983. A Functional Description of ANALYZE: A Computer Assisted Analysis System for Linear Programming Models. ACM Transactions on Mathematical Software. 9:1, 18-56.

[11] H.J. Greenberg, 1988. ANALYZE Rulebase in G. Mitra (ed.). Mathematical Models for Decision Support. Springer-Verlag, Berlin, 229-238.

[12] H.J. Greenberg, 1990. A Primer of ANALYZE, Working Paper, University of Colorado at Denver.

[13] F.S. Hillier and G.L. Lieberman, 1990. Introduction to Operations Research (5e), Holden-Day, Inc. Oakland, CA.

[14] S.O. Kimbrough, S.A. Moore, C.W. Pritchett and C.A. Sherman, 1992. On DSS Support for Candle-Lighting Analysis. Transactions of DSS-92, 118-135.

[15] D.W. Kosy and B.P. Wise, 1984. Self-explanatory Financial Planning Models. Proceedings of the National Conference of Artificial Intelligence, 176-181.


[16] D.W. Kosy and B.P. Wise, 1986. Overview of Rome: A Reason-Oriented Modeling Environment in L.F. Psu (ed.), Artificial Intelligence in Economics and Management. Elesvier Science Publishers, North-Holland, 21-30.

[17] W.G. Kurator and R.P. O'Neill, 1980. PERUSE: An Interactive System for Mathematical Programs. ACM Transactions on Mathematical Software 6:4, 489-509.

[18) Microsoft, 1992. Microsoft Excel User's Guide 2 (Version 4.0). Microsoft Corporation.

[19] M. H. Prager, 1988. Group Method of Data Handling: A New Method for Stock Identification. Transactions of the American Fishery Society 117, 290-296.

[20) Y.V. Reddy, 1985. The Role of Introspective Simulation in Managerial Decision Making. DSS-85 Transactions, IADSS, University of Texas at Austin, 18-32.

[21] S. L. Savage, 1992. The ABC's of Optimization Using What's Best! LINDO Systems Inc, Chicago.

[22) A. Saltelli and T. Homma, 1992. Sensitivity Analysis for Model Output. Computational Statistics and Data Analysis 13, 73-94.

[23) A. Saltelli and M. Marivoet, 1990. Non-parametric Statistics in Sensitivity Analysis for Model Output: A Comparison of Selected Techniques. Reliability Engineering and Systems Safety 28, 220-253.

[24) H. M. Wagner, 1993. Global Sensitivity Analysis. To appear.

14 SOLVING QUADRATIC ASSIGNMENT

PROBLEMS USING THE

REVERSE ELIMINATION METHOD

ABSTRACT

Stefan VoB

Techni6che Hochlchv.le Darmltadt, FB 1 / FG Operationl Re,earch,

Hochlchv.lItrape 1, D-6,j!89 Darmltadt, Germany

The quadratic assignment problem (QAP) is among the moat commonly encountered combinatorial optimisation problems. Recently, various tabu search implementations have been proposed to solve the QAP efficiently. These approaches mainly investigate tabu list management ideas that do not take account of logical interdependencies deriving from the sequence in which solutions are generated. Here we apply different versions of the reverse elimination method (REM), a dynamic strategy that explicitly incorporates logical interdependencies. We also introduce a new type ofintensification atrategy hued on a clustering approach and combine it with some diversification ideas. Computational results are reported for a large number of benchmark problems up to the dimension of 128. Our version of REM improves on some of the best known results and matches them for moat of the remaining problems.

1 INTRODUCTION

The quadratic assignment problem (QAP) is to find an assignment ofn objects to n locations to minimile the cumulative product of flow costs and distances where flow occurs between pairs of objects and distances are measured between pairs of locations to which objects are assigned.

More formally, we are given a distance matrix D = (~i)"x" describing the distances between pairs oflocations and a cost matrix C = (c",) .. XR describing the pairwise flow costs between pairs of objects. Then the QAP is to find a

281

282 CHAPTER 14

permutation '/I' of the set {I, ... , n} to minimize the quantity

.. " I: I: ~j • c,..(i)r(j) (1) i=1 j=1

Equivalently, the QAP may be modeled by using binary variables Zip which take the value 1 if object p is assigned to location i and take the value 0 otherWlSe:

Minimize

subject to

" " 11 "

Z(z) = I: I: I: I: doj . Cpq • Zip· Zjq

i=1j=1p=1q=1

" I: Zip = 1 p= 1, ... ,n i=1 .. L Zip = 1 i = 1, ... , n p=1

zip E{O,I} i,p=I, ... ,n

(2)

(3)

(4)

(5)

The QAP is NP-hard, encompassing the classical traveling salesman problem as a special case, and hence there is an obvious need for good heuristic algorithms. A large number of construction as well as improvement procedures have been proposed in the literature. For an extensive survey on the QAP the reader is referred to, e.g., Finke, Burkard and Rendl [11] and the references cited therein.

Among the improvement procedures the most widely known is CRAFT, developed by Buffa, Armour and Vollmann [3]. Its principle is to start with a feasible solution and to try to improve it by successive interchanges of single assignments as long as improvements in the objective function are obtained.

The main drawback of algorithms like CRAFT is their inability to continue the search upon becoming trapped in a local optimum. This suggests consideration of recent techniques for guiding known heuristics to overcome local optimality. The approaches that have been applied with the greatest success to the QAP are simulated annealing (see e.g. William and Ward [19]) and tabu search (see

Solving Quadratic Assignment Problems 283

Skorin-Kapov [15], Taillard [16], Battiti and Tecchiolli [1, 2], Fiechter, Rogger and de Werra [10], Chakrapani and Skorin-Kapov [5, 6]). Genetic algorithms have also been applied to the QAP by Nissen [14] and to a version of the QAP with associated area requirements by Tam [17].

Tabu search as a metastrategy for guiding a heuristic, such as CRAFT, heavily depends on the underlying tabu list management. In contrast to previous methods of tabu list management for the QAP, we investigate a method that dynamically exploits logical interdependencies between attributes used to determine tabu status. This method, called the reverse elimination method, has been successfully applied to the multiconstraint zero-one knapsack problem (see Dammeyer and VofJ [8]) and the quadratic semi-assignment problem (see Domschke, Forst and VofJ [9]). Furthermore, a clustering approach is proposed that helps to reduce the computational effort within the procedure.

The paper is organized as follows. In Section 2, we present an outline of the reverse elimination method adapted to the QAP. Section 3 describes computational refinements that prove to be successful as a strategy for search intensification. Computational results for a large number of benchmark problems up to the dimension of 128 are reported in Section 4. For some of the larger problems we show that the best known objective function values from the literature may be improved. Finally, some conclusions are drawn in Section 5.

2 REVERSE ELIMINATION METHOD

Tabu Search

Many improvement procedures are characterized by identifying a neighborhood of a given solution which contains other (trall.l/ormed) solutions that can be reached in a single iteration. A transition from a feasible solution to a transformed feasible solution is referred to as a move. A starting point for tabu search is to note that such a move may be described by a set of one or more attributes, and these attributes (properly chosen) can become the foundation for creating an attribu.te based memory. For example, in a zero-one integer programming context these attributes may be the set of all possible value assignments or changes in such assignments for the binary variables. Then two attributes e and e, which denote that a certain binary variable is set to 1 or 0, may be called complementary to each other. Following a steepest descent/mildest ascent approach, a move may either result in a best possible improvement or a

284 CHAPTER 14

least deterioration of the objective function value. Without additional control, however, such a process can caule a locally optimal solution to be re-visited immediately after moving to a neighbor, or in a future stage of the search process, respectively.

To prevent the search from endlessly cycling between the same solutions, the attribute based memory of tabu search is structured to provide a memory junction, which may be visualized to operate as follows. Imagine that the attributes of all moves are stored in a list, named a running lilt. representing the trajectory of solutions encountered. Then, related to a sublist of the running list a so-called tabu lilt may be introduced. Based on certain restrictions the tabu list implicitly keeps track of moves (or, more precisely, salient features of these moves) by recording attributes complementary to those of the running list. These attributes will be forbidden from being embodied in moves selected in at least one subsequent iteration because their inclusion might lead back to a previously visited solution. Thus, the tabu list restricts the search to a subset of admissible moves (consisting of admissible attributes or combinations of attributes). The goal is to permit 'good' moves in each iteration without re-visiting solutions already encountered.

For a background on tabu search and a number of references on successful applications of this metaheuristic see, e.g., Glover [12] and Glover and Laguna [13].

Dynamic Tabu List Management

Tabu lilt management concerns updating the tabu list, i.e., deciding on how many and which moves have to be set tabu within any iteration of the search. In the sequel, we describe the reverse elimination method (REM) as a dynamic strategy for managing tabu lists (d. Glover [12] and Dammeyer and VoS [8]). The primary goal of REM is to permit the reversion of any move but one between two solutions to prevent from re-visiting the older one.

REM starts from the observation that any solution can only be re-visited in the next iteration if it is a neighbor of the current solution. In each iteration the running list is traced back to determine all attributes which have to be set tabu (since their inclusion would lead to an already explored solution). For this purpose, a re.idual cancellation .equence (ReS) is built up stepwise by bacing back the running list. In each step exactly one attribute is processed, from last to first. After initializing an empty ReS, only those attributes are added whose


RllDDins liat: 111731564115 (lateat move: 5) number of iterationa: 10

tracing atep attribute position RCS length tabu move in rIIIIIlins liat ofRCS

1 10 5 1 !;

2 9 115 2 -3 8 4115 3 -4 7 45 2 -5 6 4 1 4 6 5 14 2 -7 4 314 3 -8 3 7314 4 -9 2 117314 5 -10 1 11734 4 -

Fisurc 1. An example for RCS-development

complements are not in the sequence. Otherwise their complements in the Res are eliminated (i.e. cancelled). Then at each tracing step it is known which attributes have to be reversed in order to turn the current solution back into one examined at an earlier iteration of the search. If the remaining attributes in the Res can be reversed by exactly one move then this move is tabu in the next iteration. An illustration of the REM approach (for the case that every move consists of exactly one attribute, i.e., the length of an Res has to become equal to 1 to enforce a tabu move) is given in Figure 1.

Obviously, the execution of REM provides a necessary and sufficient criterion to prevent re-visiting known solutions. In theory, the choices made could lead the method into a corner where the only escape is to revisit a previously encountered solution. Its occurrence, however, does not deter the method, since upon making recourse to supplementary tabu criteria in such a situation, as described below, the method continues unabated. (Another method to escape from such a so-called black hole solution is to follow a simple backtracking approach). Because the computational effort of REM increases as the number of iterations increases, ideas for reducing the number of computations have been developed (cf. [12,8]).

One of the crucial parameters of REM is the number of moves stored explicitly within the running list or, equivalently, the number of tracing steps when building an Res. Whenever this parameter ta is greater or equal to the number of moves performed up to a ceriain iteration of the search, REM retains its sufficiency and necessity property. Otherwise, the search may allow re-visiting

286 CHAPTER 14

Running list: 3335 55 53 2224 44 42 11 14 24 21 14 12 42 44 (latest move (14,12/42,44) ) number of iterations: 4, solution ... = (2,1,5,4,3)

tracing step attribute position ReS length tabu move in running list of ReS

1 13, 14, 15, 16 14 12 42 44 4 6wap(I,4) 2 9, 10, 11, 12 11 24 21 12 42 44 6 -3 5, 6, 7, 8 22 Ii 21 12 4 6wap(I,2) 4 1,2,3,4 33 35 55 53 22 Ii 21 12 8 -

Figure 2. An example for ReS-development

previously encountered solutions under conditions where this could be avoided. Nevertheless, for reason of computation times one may prefer to keep ta from becoming large. When the number of iterations iter is greater than ta we therefore include further ideas of tabu search to prevent the search from cycling, as described in the next section.

QAP Moves

In what follows we relate specific elements of tabu search to the QAP. A transition from one feasible solution to another one needs two exchanges within the binary matrix (Zip)nxn such that the assignment conditions remain valid. Accordingly moves are paired-attribute moves, in which attributes denote both the assignments and the type of exchange, i.e., selection for or exclusion from the (actual) solution. More specifically, a move may occur by swapping or exchanging two objects such that each is placed on the location formerly occupied by the other. Given a permutation '/I' of {I, ... ,n}, swapping objects on positions i and j results in a permutation '/1", i.e., '1I"(i) := '/I'(j), '1I"(j) := '/I'(i), '1I"(k) := '/I'(k)'v'k =f. i,j. Equivalently, with respect to the binary variables, the move corresponds to setting Zip := 0, Ziq := 0 and Zip := 1, Z'q := 1 indicating that object p is moved from position i to position j and q from j to i, correspondingly. For short, a move may be expressed by swap(i,j). More carefully, we may use a representation (ip, iqfjq, jp) with four attributes. Fig. 2 gives an example for the corresponding ReS-development when starting with a solution '11' = (1,2,3,4,5) for a QAP with n = 5 and having performed the following four moves: swap(3, 5), swap(2, 4), swap(l, 2), and swap(1,4).


Tabu search customarily selects best admissible moves, and a value val.".(i,i) for swapping objects on positions i and i of a given permutation 11" has to be examined for each move (resulting in permutation 11"). For the QAP with symmetric matrices and zero diagonals this change in cost may easily be calculated as: .. ..

val ... (i,i) := L L(du . C ... (h)1r(I:) - dhl: . e,..'(h)""(I:»)

h=ll:=l

= 2· L (dil: - dol:) • (c"'(i) ... (I:) - C ... (i) ... (I:»)

I:¢i,i

(6)

Further improvements in the effort of calculating move values may be obtained by storing all val ... (i,i) values and updating them in constant time. Therefore, the overall effort for evaluating the neighborhood of a given solution is O(n2 ).

3 INTENSIFICATION AND DIVERSIFICATION-A

CLUSTERING APPROACH

Intensification

Intensification strategies are proposed in tabu search as a way to encourage and reinforce the choice of good solution attributes, where 'good' is understood to be a conditional concept that depends on the region considered. Hence one of the natural forms of intensification strategies is to retain and compare good solutions from the past, grouped by some criterion of regionality, as a basis for deciding which attributes and/or attribute combinations should qualify as good. A strongly focused intensification strategy of this type, which compels a favored subset of good attributes to remain in the solution, or which limits consideration of potential moves to those that include at least one good attribute (for instance from a union of good solutions, together with a current aspiration criterion to admit other options) can reduce computation time by shrinking the space of possibilities examined.

Another type of intensification strategy, which takes a less restrictive form, is simply to make recourse to the recorded pool of good solutions by selecting members to restart the search at future stages of search. In this case, an empty running list accompanies the restart, allowing the search to more thoroughly examine the vicinity of the chosen good solution, and to choose a different path

288 CHAPTER 14

for going beyond this vicinity than previously selected. This approach will be described in more detail below, where we refine it by introducing a special clustering strategy.

Diversification

Search diversification often occurs in tabu search by using a long term memory to penalize frequently selected assignments. Then the neighborhood search can be led into unexplored regions where the tabu list operation is restarted (resulting in an increased computation time).

An obvious diversification strategy is to extend the tabu list by setting additional attributes tabu beyond those necessary to prevent older solutions from being re-visited. This may be done, for instance, by combining REM with a static tabu search approach or applying the cancellation sequence method (see e.g. Dammeyer, Forst and VoH [7]). Another possibility is to enlarge the duration of being tabu for any attribute encountered throughout tracing back the running list. Therefore, a tabu duration td ~ 1 is defined indicating that any move is tabu for td iterations (REM by itself corresponds to td = 1).

Another appealing opportunity for search diversification is created by REM as follows (cf. VoH [18]). Let t > 1 be integer. If at any tracing step the attributes that have to be reversed to turn the current solution back into one already explored equal exactly t, then it is possible to set these attributes tabu for the next iteration. Note that for the case of multi-attribute moves, due to various combinations of attributes in moves, even more than t attributes may be set tabu in order to avoid different paths through the search space leading to the same solution. Setting more than t attributes tabu provides an evident strategy for search diversification.

To have t = 2 means that all neighbors common to the current solution and one solution already explored are forbidden. These neighbors were implicitly investigated during a former step of the procedure (due to the choice of a best non-tabu neighbor) and need not be looked at again. Therefore, REM2 retains the nice property of being a necessary and sufficient criterion as mentioned above without having to fear that some solutions would not be encountered, as may be the case for t > 2.

For larger values t an aJ/piration criterion needs to be included. This may be, for instance, a global criterion where a tabu status is suspended whenever the


best known objective function value found so far within the search may be improved.

A Clustering Approach

Whenever search intensification is performed based on a promising solution the idea is to perform those moves that will keep the permutations within the near vicinity of the respective solution. If search intensification is performed several times throughout the search it may be advantageous for the starting solutions of the intensification phase to be different from each other. In order to diversify the different phtues of search inte1&llification, some permutations that have been visited within the neighborhood search may be stored as possible candidates for a restart in future iterations. To obtain reasonable differences in new starting solutions, we propose that these solutions be stored by a clustering approach.

For any two solutions 11' and 11" we may define some measure of similarity .6.(11',11"). Whenever .6.(11',11") is less than or equal to a given barrier value, these solutions will be considered to belong to the same class of solutions, and otherwise will be considered to belong to different classes. Any time the search is restarted, a starting solution is selected from a class that has not previously been used for intensification purposes, eliminating the chosen solution from the respective class.

Consider the following measure of similarity between two permutations'll" and '11"', based on the hamming metric:

.. .6.(11',11"):= L~

i=l

where ~ := 1 if 'II"(i) '# 'II"'(i) and ~ := 0 otherwise.

(7)

To initialise these classes we apply the following approach. Each class c consists of at most Cm .. 11 elements and the number of classes is bounded by a number tc. Each nonempty class is represented by a permutation 'lI'e with the best objective function value among all elements of that class. Such a solution may be referred to as an elite solution. Any solution 'lI' of the search which lies within .6."eo& % of the overall best solution found so far is a candidate for being included into one of the solution classes. Two cases may occur.

At first, given a parameter .6.tJi./I E (0,1), if .6.('11", 'll"e) ::; .6.tJi./f . n for any class representative'll'e then 'lI' is included into class c. If'll" becomes representative of

290 CHAPTER 14

c then we successively check whether any two classes have to be merged whose representatives have a similarity that is below or equal to D.diJ J . n. Second, if 7r is not thus included within any of the tc classes then it will constitute a new class, eventually replacing that class whose representative has the worst objective function value among all classes.


Burkard, Karisch and Rendl [4J describe a collection of known problem instances with sizes n ;::: 8. Most of these problems are taken from the literature while others were generated by some researchers for their own testing purposes. (An update from February 1993 giving some corrections and including optimal or current best known solutions is available from the respective authors.)

For reason of space we shall not present results for all of the problems for various combinations of parameters. Instead we highlight some of our results for specific parameter values in Tables 1 and 2. In these tables each data set is identified by a name abbreviation as given in [4J together with a number indicating the problem size n. All programs have been coded in Pascal and run on a 486 personal computer (33 MHz). For small problems with up to n = 64, both REM and REM2 seem to find the optimal or best known solution in nearly all cases. Table 1 shows the results for some of the most widely studied problems by, for instance, Nugent, Vollmann and Ruml (see the above mentioned collection of problem instances) as found by REM. The values in column solution give the results found by REM whereas iter" indicates the iteration number where the solution value has been found. Furthermore, problem ESC 128 with n = 128 is included in Table 1 for the sake of interest.

Skorin-Kapov [15J and Chakrapani and Skorin-Kapov [6J describe some large scale data (42 ~ n ~ 100) where the distances of the problems are rectangular and the flow values are pseudorandom numbers. Table 2 presents the results for the eight largest of these problems again presenting the best known feasible solution together with the results obtained by the reverse elimination method. For some of these test problems we were able to improve on the best previously known feasible solutions (see Chakrapani and Skorin-Kapov [6]). The permutations for the large scale problems are presented in Appendix A. Note, that we have chosen those results for REM with smallest ta values possible indicating that the clustering approach plays a major role in obtaining good solutions. Furthermore, the same solution values have been obtained quite frequently for


Table 1 Numerical results for some bcnchmlU"k problClIUl

I problem I best known solution I solution I iter· ta

ROU 20 725522 725522 2097 1000 NUG 20 2570 2570 1106 1000 NUG 30 6124 6124 2315 1000 KRA 30a 88900 88900 2471 1000 KRA 30b 91420 91420 13243 1000 STE 36a 9526 9526 18871 1000 STE 36b 15852 15852 23454 1000 ESC 128 84 64 684 5

similar values of iter when varying tao For larger ta values, however, iter" may increase (see, e.g., the results for problem SKO 81). The CPU-times for the first 1000 iterations are within the range of 132 seconds for SKO 81 and 200 seconds for the data sets of size n = 100. For the data sets of Table 1 times are, e.g., 9 seconds for n = 20 and 19 seconds for n = 30 for 1000 iterations indicating that the the neighborhood evaluation dominates the CPU-times.

The results in both tables have been obtained by using the following parameters:

tc = 5 c... .... = 10 l1d.iJ J = 0.3 116ut = 0.3

maximum number of classes (cf. the clustering approach) maximum number of elements in any class parameter for measuring similarity parameter for measuring similarity

The intensification phases have been included into the search as follows. Whenever there was no improvement in the objective function value for 10· n iterations, we switched over to intensification using the clustering approach. From each class we successively chose the best and the second best solution and eliminated it from that class. For each of these solutions we performed the search accompanied by an empty running list for exactly 10 . n iterations. Then the search switched back to the original proceeding, again.

To summarize our numerical results, all solutions we obtained at least match, or come very close to matching, the best published objective function values. Some of the best known results presented in the literature have been improved. The iteration values iter· (to obtain the solution with best objective function value) and iter are quite large, especially for the data sets of size n = 100,

292 CHAPTER 14

Table 2 Numerical results for the Skorin-Kapov data

I problem I best known solution I solution I iter· iter ta

SKO 81 91008 91008 78238 100000 1000 91008 7520 150000 55

SKO 90 115534 115534 277364 300000 20 SKO 100a 152014 152002 115922 150000 30 SKO 100b 153900 153890 132273 200000 20 SKO 100c 147868 147862 243000 350000 14 SKO 100d 149596 149586 256890 300000 48 SKO 100e 149156 149158 127559 200000 44 SKO 100f 149036 149036 180042 200000 5

but they are comparable with other implementations (see e.g. Chakrapani and Skorin-Kapov [6]). For larger values of n the parameter setting seems to become more important than for small problems.

5 CONCLUSIONS

In this paper we have developed and applied an effective dynamic tabu search strategy to the quadratic assignment problem. Empirical tests disclose that our procedure obtains very good outcomes for benchmark problems from the literature.

Our approach may be viewed as a hybrid between strategies based on logical interdependencies and strategies based on heuristic considerations. The success of our approach suggests the potential merit of investigating additional hybrids of dynamic strategies, such as those based on the cancellation sequence method (see Glover [12] and Darnmeyer, Forst and Vofi [7]) and on combinations of this approach with the REM approach.

Acknowledgements

The author would like to thank Jadranka Skorin-Kapov as well as Stefan Karisch and Franz Rend! for providing the data sets used throughout this study.


The comments of Fred Glover, which helped to improve the readability of the paper, are greatly appreciated.

APPENDIX A

BEST FOUND SOLUTIONS

In this appendix the best found solutions of some of the data sets are given explicitly.

294 CHAPTER 14

problem I solution I permutation

SKO 1008 152002 11' - _~17, 76,7,44,48,23,29,21,35,3,41,53,36,92,38,61,72,12, 50,45,90,64,46,99,47,26,100,66,70,58,20,69,51, 73,30, 85, 67,40,24,10,87,84,82,81,56,93,8,68,59,42,28,2,49,19, 89,96,98,86,9,60,15,57,34,32,22,4,52,16,62,88,6,77, 14, 43, 31, 75, 94, 54, 80, 27, 91, 79, 63, 25, 95, 74, 55, 13, 5, 39, 18, 37, 78, 1, 97, 71, 33, 11, 83, 65)

SKO 100b 153890 11' - _~80, 55,61, 15,63, 14,76,9,30, 23,94, 93, 29, 11, 26, 1, 21,45, 43,75,24,82,52,2,22,12,66,36,72,90,84,37,35,87,32,5, 85,58,38,92,50,59,28,4,97,99,17,49,56,53,25,3,42,10, 96,74,68,34,63,71,16,79,67,81,48,86,33,64,91,70,69,89, 65, 95, 46, 19, 54, 62, 73, 27, 6, 51, 8, 98, 20, 44, 7, 100, 78, 31, 77,13,18,57,39,40,41,47,80,88)

SKO lODe 147862 'II' _ (64, 71, 12, 78,43, 18,87, 13, 66, 1, 76, 55, 38, 59, 22, 49, 40, 79, 97,85,37,26,36,41,92,42,3,45,21,20,46,6,29,52,84,54, 25,15,33,63,74,80, 14, 19, 34, 16, 56,30, 35,65, 95, 31,61,83, 91, 88, 48, 93, 7, 10,70, 47, 2, 17, 100, 50,98, 68, 77, 9, 69, 60, 5, 32, 96, 27, 73, 99, 24, 86, 39, 4, 57, 11, 75, 90, 81, 58, 8, 67, 62, 94, 72, 89, 82, 44, 51, 53, 23, 28)

SKO 100ci 149586 'II' - _~85, 70, 48, 47, 53, 7,31, 57, 19, 42, 45, 18, 6, 21, 17, 16, 9, 72, 80,15,26,14,98,37,69,71,5,39,44,97,62,20,90,92,63, II, 29, 65, 32, 13, 27, 56, 10, 22, 35, 43, 23, 2, 95, 82, 68, 8, 99, 40, 24, 50, 100, 12, 41, 94, 55, 28, 38, 93, 88, 79, 64, 87, 52, 25, 36, 81, 91, 89, 84, 46, 83, 78, 86, 75, 77, 66, 51, 73, 59, 30, 74, 58, 3, 76, 96,61,4,67,60,49,33,54,1,34)

SKO lODe 149158 ... -_~13, 35, 3D, 84, 5, 15, 52,24, 16, 65,6, 93, 31, 67,90, 76, 8, 7, 17,38,3,68,25,4, 2, 29, 87, 18, 73,46,40,49,96,95,94, 97, 20,36,80,81,9,42,83,82, II, 89, 68,66, 77,67, 54, 43, 12, 64, 60, 66, 37, 48, 78, 88, 76, 47, 14, 69, 92, 99, 10, 44, 32, I, 74, 23, 98,55, 27, 72, 28, 50, 71, 86, 26, 62, 34, 70, 79, 91, 85, 51,59,45, 61,22,39,21,19,100,33,53,41,63)

SKO lOot 149036 ... _ (92,94,53,62,67,39,22,27,17,78,63,20,18,82,36,34,1,48, 75, 59, 96, 4, 12, 45, 26, 44, 14, 70, 98, 64, 33, 81, 72, 76, 91, 56, 41,46,19,42,28,23,87,84,9,47,11,88,55,31,13,51,89,49, 95, 83, 37, 24, 10, 40, 54, 65, 5, 25, 74, 35, 50, 38, 52, 58, 77, 99, 100, 85, 71,60, 8, 6, 30, 15, 80, 57, 29, 7, 68, 93, 79, 66, 43, 90, 97, 32, 3, 21, 16, 69, 61, 2, 86, 73)

REFERENCES

[1] R. Battiti and G. Tecchiolli, 1992. Parallel biased search for combinatorial optimization: genetic algorithms and tabu search. Microprocessors and Microsystems. 16, 351-367.

[2] R. Battiti and G. Tecchiolli, 1992. The reactive tabu search. Preprint, Department of Mathematics, University of Trento.


[3] E.C. Buffa, G.C. Armour and T.E. Vollmann, 1962. Allocating facilities with CRAFT. Ha1'1Jard. Business Review, 42, 136-158.

[4] R.E. Burkard, S. Karisch and F. Rendl, 1991. QAPLIB-a quadratic assignment problem library. European Journal of Operational Research, 55, 115-119.

[5] J. Chakrapani and J. Skorin-Kapov, 1992. A connectionist approach to the quadratic assignment problem. Computers fJ Operations Research, 19, 287-295.

[6] J. Chakrapani and J. Skorin-Kapov, 1993. Massively parallel tabu search for the quadratic assignment problem. Annals of Operations Research, 41, 327-341.

[7] F. Dammeyer, P. Forst and S. VoS, 1991. On the cancellation sequence method of tabu search. ORSA Journal on Computing, 9, 262-265.

[8] F. Dammeyer and S. VoB, 1993. Dynamic tabu list management using the reverse elimination method. Annals of Operations Research, 41, 31-46.

[9] W. Domschke, P. Forst and S. VoB, 1992. Tabu search techniques for the quadratic semi-assignment problem. In: G. Fandel, T. Gulledge and A. Jones (eds.), New Directions for Operations Research in Manufacturing (Springer, Berlin), pp. 389-405.

[10] C.N. Fiechter, A. Rogger and D. de Werra, 1992. Basic ideas of tabu search with an application to traveling salesman and quadratic assignment. Ricerca Operativa, 62, 5-28.

[11] G. Finke, R.E. Burkard and F. Rendl, 1987. Quadratic assignment problems. Annals of Discrete Mathematics, 91, 61-82.

[12] F. Glover, 1990. Tabu search-part II. ORSA Journal on Computing, 2, 4-32.

[13] F. Glover and M. Laguna, 1993. Tabu search. In: C.R. Reeves (ed.), Modern Heuristic Techniques for Combinatorial Problems (Blackwell, Oxford), pp. 70-150.

[14] V. Nissen, 1993. A new efficient evolutionary algorithm for the quadratic assignment problem. In: K.-W. Hansmann, A. Bachem, M. Jarke, W.E. Katzenberger and A. Marusev (eds.), Operations Research Proceedings 199! (Springer, Berlin), pp. 259-267.

296 CHAPTER 14

[15] J. Skorin-Kapov, 1990. Tabu search applied to the quadratic assignment problem. ORSA Journal on Computing, R, 3~5.

[16] E. Taillard, 1991. Robust taboo search for the quadratic assignment problem. Parallel Computing, 17, 44~55.

[17] K.Y. Tam, 1992. Genetic algorithms, function optimization, and facility layout design. European Journal of Operational Re,earch., 63, 322-346.

[18] S. Yo!, 1993. Tabu search: applications and prospects. In: D.-Z. Du and P.M. Pardalos (eds.), Networlc Optimization Problem.: Algoritkm., Applicationl and Complezitll (World Scientific, Singapore), pp. 333-353.

[19] M.R. William and T.L. Ward, 1987. Solving quadratic assignment problems by 'simulated annealing'. lIE Tranlactionl, 19, 107-119.

ABSTRACT

15 NEURAL NETWORKS

FOR HEURISTIC SELECTION:

AN APPLICATION IN

RESOURCE-CONSTRAINED

PROJECT SCHEDULING Dan Zhu

Rema Padman

The Heinz School of Public Policy and Management Carnegie Mellon Univerlity

Pittlbu"h, P A 15213

The Resource Constrained Project Scheduling Problem (RCPSP) is concerned with the scheduling of a collection of precedence-related activitieB Bubject to constraints on resources and the objective of maximizing Net Present Value (NPV). It is a complex combinatorial optimization problem which findB wide application. Due to the intractable nature of thiB problem, many heuristic procedures have been developed to obtain near optimal solutions. Extensive studies have shown

that there is no single heuristic that dominates in all project environments. In addition, among the many problem parameters that describe project networks, the critical variables that may contribute to heuristic performance are not readily apparent. In this research, we explore the use of neural networks to induce the relationship between problem parameters and heuristic performance. lee

The objective is to select the best heuristic procedure or category of procedures for different project environments. We employ neural networks to summarize the information about project parameters, and to predict appropriate heuristic(s) for any given instance of the RCPSP. Baaed on an extensive empirical study, we report on appropriate problem representations, input and output preprocessing methods, and experiments with topology and parameters of the neural networks to enhance prediction accuracy and performance.

297

298 CHAPTER 15

1 INTRODUCTION

In this paper, we explore the application of Artificial Neural Networks (ANN) to aid the selection of the best heuristic algorithm for the resource-constrained project scheduling problem with cash flows (RCPSP). This problem is concerned with the scheduling of a collection of precedence related activities subject to constraints on available resources. Given the presence of cash flows in the form of outflows for project expenditures and inflows as payments for completed work, maximization of the project Net Present Value (NPV) is a desirable objective for the RCPSP. Due to the intractable nature of this problem [6], many heuristic procedures have been developed to obtain good solutions [2, 4, 13, 16]. However, it is not clear which among these many heuristics are most appropriate for any given instance of the RCPSP. Higher NPV and better schedules result from choosing the best set of heuristics.

Previous research on the comparison of some of these heuristic rules utilized regression models to give a general guide for heuristic selection [3, 15, 10, 13, 16]. However, difficulty in identifying the set of project parameters that summarize project characteristics combined with the varying performance of the heuristics under different project environments have contributed to the lack of success in providing reasonable guidance for heuristic selection. The only way to know which heuristic to choose and how good the performance will be is to apply the heuristic to the problem and see what happens [3]. Once a rule is selected, it will generally be used throughout the whole project. Such ad-hoc techniques used in scheduling large projects can result in significant monetary loss.

In this paper, we evaluate the ability of Artificial Neural Networks to help select good scheduling heuristics for the resource-constrained project scheduling problem with cash flows. Our study incorporates sixteen heuristic rules from [13] that includes nine optimization-based heuristic rules developed for the RCPSP with the objective of maximizing project net present value. These priority dispatching scheduling rules along with seven other heuristics from the literature were then embedded in a single-pass greedy algorithm and applied to a set of 1440 randomly generated problems describing 144 different project environments. The computational study in [13] indicated the existence of some complex relationships between the performance of various heuristic rules and the project characteristics such as network structure, levels of resource constrainedness and cash flow parameters.

Neural networks have been successful in pattern recognition, and the classification and selection process [9, 12, 17]. Previous studies have shown encouraging

Neural Networks in Project Scheduling 299

results on the application of neural networks to combinatorial problems such as the traveling salesman problem and vehicle routing model management systems, among others [9, 12, 18]. However, no attempt has been made to classify and predict project scheduling heuristic performance using neural networks. The availability of a large data set from [13] for training and testing in conjunction with the previous success of neural network applications in similar domains argues for the feasibility of this approach for selecting heuristics for the -constrained project scheduling problem.

In this study, we have designed and conducted a series of experiments on neural networks for heuristic selection for the RCPSP using a collection of problems from [13]. Significant effort has been devoted to data inspection, data preprocessing methods and data representation to enhance the system performance. Empirical results show that the system is successful in selecting an appropriate category of heuristics rather than a single best heuristic, for a majority of the randomly generated project scheduling problems. The integration of statistical, optimization, and neural network techniques that we have employed to address the problem of heuristic selection for the RCPSP is generalizable and has the potential to be applied successfully for other combinatorial optimization problems as well [19].

The paper is organized as follows. In Section 2, we describe the resource constrained project scheduling problem and the data on project characteristics and heuristics. Section 3 presents some data preprocessing techniques and data representation schemes employed in this study. Section 4 discusses the experimental design and analyzes the results. Section 5 concludes with a summary and discussion of future research.

2 DESCRIPTION OF PROBLEM AND DATA

The resource-constrained project scheduling problem (RCPSP) with cash flows can be formulated mathematically as a mixed integer nonlinear program [13, 16]. Given a project with m activities and n events, with activity durations d., (k = 1, ... , m), and cash flows Fi , (i = 1, ... , n), at the events discounted at the rate Ct, the formulation is as shown below. R~. represents the resource requirements for the set Z. of activities in progress at time s, (s = 1, ... , q), where q is the duration of the entire project. RL is the resource limit vector that is assumed to remain constant. The variable Ti indicates the scheduled time of each event i. The objective is to obtain the optimal time for scheduling

300 CHAPTER 15

each activity such that the project NPV is maximized subject to the constraints on precedence (1) and resources (2) .

maximize

subject to

.. LF.e-aTi

.=1

7i -11 ~ d,

Hz. :5 HL

AI= l, ... ,m

8 = l, ... ,q

(1) (2)

Figure 1 depicts an example project network with durations represented on the arcs and cash Hows at the nodes. Each activity requires three types of resources, as shown in the figure. The resource limit vector is [12, 12, 12]. The optimal schedule, indicated in the figure, achieves a project NPV of $5857.604. As the resource usage of each activity increases or the resource limit decreases, it is significantly more difficult to solve this example problem to optimality. In particular, for practical problems with hundreds, if not thousands, of activities, it is computationally impractical to attempt optimal solutions. Hence heuristic procedures have found extensive application in project-oriented manufacturing and service industries [2, 3, 10, 13, 14, 15, 16].

Table 1 presents a listing and description of 30 project and schedule characteristics assembled from the literature that may be relevant in the classifica.tion/prediction of the scheduling rules [3, 10, 13, 14, 15]. We assume that project networks can be characterized by a vector of these 30 variables which measure the complexity of a given project along several dimensions. These summary measures are used in subsequent neural network training.

In order to study the effects of project parameters on heuristic performance, we investigated various optimization-based heuristics and heuristics derived from the critical path method that have also been proposed in the literature [4, 10, 13, 16]. The sixteen heuristic rules examined in this study are the the ones discussed in [13], which are listed in Table 2. Some of these heuristic rules are proposed based on the same type of information such as opportunity cost, dual price, and cash How weight. Therefore, these sixteen heuristics can be roughly divided into the six different categories shown in Table 2. This information on categorization of heuristics is useful for output preprocessing which will be discussed in Section 3.

The data set used in this paper consists of 1440 randomly generated project networks [13]. These problems are generated from six experimental parameters including project size(size), project network structure(shape), resource


-2000 (1,1,2)

900 .... ".~ --- --~, -4000 (2,2,1) J)' 6 8 .~~(1'2'1) -11 09("" ': 6000 .... 8 3 0 3 6 (3,1,2) .-rJ

~. -4000 / "--:" 11, :-: 3 03 8 , (9,9,1) .... / -2000 (3,4,4) ......... , .... 5 ~ -3000 13500

~ ...... ~" -2000 ~ 2 3 ,,..~ ., ..... 1 o 2 \ 2 5 ,.... . .•. -. .~ 11 12

\ 4,'

Fi

o Ti

\ 7 11,: (8,7,3) (2,5,1) :

(rl,f2, (3) ~

dk

~ -1000 /

~ 7~ 2 '£:~" -~850 (6,4,6) 7 '" 10

... , 3

7 (2,3,5)

Figure 1 Example Project Network

302 CHAPTER 15

Table 1 Project Characteristics

Variable Description

NNODE N wnber of nodes to be scheduled

S NARC Number of arcs (activities) to be scheduled

I NDUMMY N umber of dwnmy activities

Z XDUR Average activity duration

E XDENSITY Average activity density

COMPLEXITY Project complexity

C CPL Critical path length

P NSLACK Total slack of all activities

L PCTSLACK Percent of activities possessing positive total slack

I XFREESLK Average total slack per activity

B TSLACK_R Total slack ratio

A XSLACK_R Average slack ratio

S PDENSITY Project density - Total

E TFREESLK Free slack of all activities (Johnson)

D NFREESLK N wnber of activities possessing positive free slack

PCTFREESLK Percent of activities possessing positive free slack

XFREESLK Average free slack per activity

PDENSITY_F project density - free(pascoe)

R UTILI Utilization of resource I (Rl)

E UTILl Utilization of resource 2 (R2)

S UTIL3 Utilization ofresource 3 (R3)

0 XUTIL Average resource utilization

U TCONI Resource constrainedness over time for R I

R TCON2 Resource constrainedness over time for R2

S TCON3 Resource constrainedness over time for R3

E XCON_TM Average resource constrainedness over time

CASH PROFIT_MARGIN Profit margin

FLOW- FREQ] Frequency of payment

BASED INT_RATE Interest rate

SHAPE NSTR Network structure


Table 2 Heuristic: Priority Rula and Descriptions

Heuristics Description.

I.OCS/LAN Adivities are selected for scheduling in ascending order of opportunity cost of scheduling. Adivities with zero tardiness penalties are scheduled acoonling to lowest activity number (LAN).

OPPORTUNITY 2.OCR!LAN Adivities are scheduled in asceoding order of the opportunity COST cost of resources, with zero tardiness penalty activities sched-

uled according to LAN.

3.NOC/LAN Adivities are scheduled in increasing order of net opportunity cost which balances the cost of delaying some activities against scheduling others. Zero tardiness penalty activities are sched-uled according to LAN.

4. LFT/LAN Priority is given to the activities with the minimum latest finish

DURATION- time as determined by critical path analysis. LAN is used to break ties between the activities.

BASED S.MS/LAN Select activity with the smallest slack. where the critical path is

continuously revised as each activity is scheduled. LAN i. u.ed to break ti ...

6. TS/LAN Select activity with the maximum difference between current

TARGET early finish time and optimal finish time as given by the relax-ation model. LAN is used to break tie •.

7. DUAIJI'S Activities with the highest dual price are selected for scheduling. SCHEDUUNG Ties are broken using TS rule.

8. TS/DUAL Select activities with maximwn difference between current early finish time and optimal finish time as given by the relaxation model. Maximum dual price is used to break ties_

RANDOM 9. RAND-SO Random selection of eligible activities is chosen for scheduling, with the best of 50 replications being reported.

IO.MTP/LAN Activities are selected from the schedule queue in descending order of their tardiness penalties. Activities with zero tardiness penaltie. are scheduled acoording to LAN.

DUAL 12.LTP/LAN Adivities are scheduled in ascending order of tardiness penal-

ties. Activities with zero tardiness penalties are scheduled PRICE according to LAN.

IS.MTP/ET Same as MTP/LAN, except that activities with zero tardiness penalties are schednled according to early time rule.

II.CFW/LAN Select activity with highest cash flow weight for scheduling. LAN i. used to break ties.

CASH FLOW I3.CFW-CFW Use highest cash flow weight activities for scheduling. Activities

WEIGHT with zero tardiness penalties are then scheduled using highest cash flow weights.

14.CFW-OCC Use cash flow weights for the scheduling from the first queue, and opportunity cost of cash flows for scheduling from the sec-ondqueue.

16. TSCFW Schedule activities based on highest cash flow weight.

304 CHAPTER 15

constrainedness (AUF), interest-rate (CC), profit margin (PM), and frequency of progress payments (FP). All six parameters were each varied over two or three different levels in constructing the project networks.

The sise of the project is characterised by the number of activities, where all project networks consisting of 48 activities are considered small sise problems while the medium me problems have 110 activities. The degree of resource constrainedness, charactemed by AUF (average utwation factor) [10], is set at three levels: low level of 1.0; medium level of 1.5; and high level of 2.0. Project network shape also varies over three levels from balanced, to skewed to the right and skewed to the left. Three cash flow parameters are utiliaed to describe the financial characteristics of the project. The first, interest rate varies over two levels from a low of 10% to a high of 20% annually; the second, frequency of progress payments, is also set at two levels where a low level indicates that on average every seventh activity yields a positive payment upon its completion while a high level indicates that on average every third activity yields a positive payment; and the third, the profit margin reflected by the final payment for the completed project computed as the sum ofthe cash expenditures for the project plus a profit percentage, is set at two levels, 30% and 50% respectively. A full factorial experiment was conducted which resulted in 144 different scheduling environments leading to 1440 problems with 10 replicates in each environment.

A data flow diagram that links the data, project characteristics, and heuristics is illustrated in Figure 2. It includes the generation of the project networks, the process of summarising the project parameters that serve as the vector of teaching input, and the heuristics that serve as the vector of teaching output to the neural network, and the various preprocessing methods. These techniques will be further discussed in the next section.

3 DATA PREPROCESSING AND REPRESENTATION

The effectiveness and performance of the neural network can be enhanced by good data preparation [1, 7, 19]. This section provides a brief description of a set of data preprocessing strategies used in this study to preprocess the inputs and outputs of the neural network.

An input preprocessing strategy was used to reduce input dimensionality and extract significant features of the project. It consists of a self-organizing feature

Neural Networlcs in Project Scheduling

PROJECf

CHARACfERISTICS

INPUT

PREPROCESSING

Heuristic 1

OU1PUT

PREPROCESSING

Figure:l Data Flow Diagram

305

Heuristic 16

NPV16

extraction neural network [11]. In this network, the same 30 inputs also serve as teaching outputs. The 30 summary measures are all continuous values and

306 CHAPTER 15

they are normalized in the range -0.5 and 0.5 for fast convergence [1). The six hidden nodes in the middle layer serve as feature extractors and the condensed information is stored in this layer of network. These six units are further used as inputs (FEN6) to the generalization network.

Output preprocessing categorizes the net present values obtained by executing all sixteen heuristics on the 1440 problems. They are then used to provide the teaching output. The desired output is represented by a vector of 16 continuous values in the range of [0, 1), which represent the performance of each of sixteen heuristics relative to the best and the worst performing heuristic for that problem. One observation from the data is that differences among some of the NPV s are not significant. Some sets of heuristics behave quite similarly due to the fact that some of heuristics were proposed in [13) based on the same criterion. Hence, preprocessing of the heuristics to categorize them into groups based on their similarities in performance was considered to enable the neural network to learn better.

Using expert knowledge, the sixteen heuristics can be approximately grouped into six categories so that the heuristics in each category share some common properties. Thus heuristics 1, 2, and 3 belong to the category of opportunity cost rules; heuristics 10, 12, and 15 fall within the dual price-based category; cash flow weight rules are 11, 13, 14, and 16; duration based rules include 4 and 5; heuristics 6, 7, and 8 are target scheduling rules; and the final category consists of the random selection rule 9. The average NPV of heuristics in each category is used as teaching output in representing that category. These outputs are then normalized in the range of [0,1), where the best category is given a value of 1.0 and the worst 0.0.

In the next section, we describe the design of our experiments and analysis of the results from training and testing the different networks.

4 EXPERIMENTAL DESIGN AND RESULTS

The back-propagation learning paradigm along with its variants such as quickpropagation [5) was applied to this problem. Extensive experimental analysis with network topology and learning parameters were conducted on a benchmark network to investigate how those learning parameters (learning rate f,

momentum el, initial weight range 1') affect the neural network performance. The benchmark network had 30 input nodes, 16 output nodes and 12 nodes in


the middle layer (30-12-16). We first conducted a series of exploratory experiments to find out the most promising range. Extensive experimentation was then performed within these ranges. Studies show that f: = 0.01, Q = 0.9, and 'Y = 0.7 generally gave better results.

While learning parameters impact the convergence of the network, one of the factors that contribute to the generalization of neural network is the configuration of the network. A network configuration is characterized by the arrangement of processing units and weight connections. Various results from previous studies have shown that three layers are sufficient for the generalization of network. Adding more hidden layers may only affect the convergence of the network [8]. We designed several experiments which varied over the number of inputs, number of outputs (resulting from input and output preprocessing methods) and number of hidden nodes. A combined network system which consists of a feature detection network and a generalization network is shown in Figure 3.

All of the neural networks were trained on 1015 training samples randomly selected from 1440 problem instances described in Section 3. The resulting network was tested on the remaining 425 testing samples. Learning parameters including learning rate, momentum and initial weight range described earlier were obtained through extensive experimentation. Neural network performance was evaluated based on the prediction accuracy. The prediction accuracy is calculated by comparing the desired outputs and the corresponding actual outputs. If the index of the highest actual output for an instance is equal to the index of the highest desired output (i.e., the best heuristic recommend by the neural network is the same as that given by the real data), we say it is a good prediction/classification (1st hit). Otherwise, if the best heuristic recommended by the neural network falls in the second best category of the real data, we say it is a 2nd-hit and so on. In our study, the training and testing were performed iteratively throughout the experiments. Testing was performed every 10 epochs so that the generalization could be observed during the training. We found that almost all the networks converge in about 200 to 300 epochs. The average training and testing errors were between 0.04 to 0.09. All of the training and testing were performed on a DEC 5000 using a revised version of quick-prop simulator (in C).

The neural network we employed is a multi-layered fully-connected feed forward Since there is no prior knowledge to decide any specific non-connections between nodes, all the nodes are fully connected with the nodes in its immediate layer, but not connected with the nodes across layers. In each case, we measured the improvement of prediction accuracy. Experiments on different configurations

308 CHAPTER 15

Feature Extraction Network

Generalization Network

Figure a Combined Feature Extraction and Generalization Networks

of networks resulted in the analysis of networks with the input layer containing 30 nodes, the hidden layer varying from 3 to 18 nodes, and the output layer from 6 to 16 nodes.

Table 3 presents the results of networks with all sixteen heuristics as outputs for different input representations. The network with 30 mixed-continuous inputs, 16 normalized NPVs as outputs and a hidden layer with 12 nodes, was successful in selecting one of the best three heuristics with an accuracy of about 58% on the training data and 56% on the testing data. The FEN6-4-16 network is the neural network with 6 feature detectors as inputs, 4 hidden nodes and 16 outputs. It has the same outputs as the 30-12-16 network, but the number of inputs is reduced from 30 to 6 using feature extraction. The performance of the neural network with inputs from feature extraction network deteriorates when compared with that of the neural network with the 30 normalized inputs.


This may be due to the fact that the condensed network loses some information during the feature extraction process.

Table a Generalisation Network with 16 Output. (016)

Network Topology

Table 4 presents the experiments on networks resulting from applying the output processing method of expert categorization. The 30-12-6 network has the same teaching inputs as the 30-12-16 network except that number of outputs is reduced from 16 to 6 through expert categorization. The performance of neural network with output reduction results in more than 30% increase in prediction accuracy for selecting the best three heuristics. In particular, comparing the first-hit rates, we see that the prediction accuracy of the network is more than doubled. Similarly, the neural network with inputs from feature extraction network together with output reduction performs much better than the neural network without output reduction.

Table" Generali .. tion Network with Expert Categorization (EXP)

Network Topology 1st·hit 2nd·hit 3rd·hit best three

The results from the neural network system were also compared with those from applying multiple regression and discriminant analysis models to the same training and test data sets. The results indicate that the neural network predictions are marginally better than multiple regression and significantly better than discriminant analysis. A detailed discussion is presented in [19].

310 CHAPTER 15

5 CONCLUSION

In this paper, we investigate the application of Artificial Neural Networks for the selection of appropriate heuristics for the resource-constrained project scheduling problem with cash flows. Neural network methodology can be used both to extract the information about the project conditions as illustrated by the feature extraction network, as well as to provide predictions for new instances of the problem. We have conducted experiments on several neural network topologies resulting from using different input and output preprocessing and representation strategies. The results show that input processing through feature extraction tends to lose some information. In general, the higher the number of inputs, the better the performance. With input preprocessing, the prediction accuracy is in the range of 40% to 60% for the best three heuristics. What makes a significant contribution to the performance of neural network is the preprocessing of outputs. With output preprocessing, the performance increased significantly to about 90% for the best three heuristic categories. The problems analyzed in this study have potential to be useful in many project management applications. Future extensions to this study include expanding input and output preprocessing using statistical methods; designing and evaluating the performance of a modular system of neural networks to improve prediction accuracy; and testing the efficacy of the neural network system on large real world projects.

REFERENCES

[1] S. Ahmad and G. Tesauro, 1988. Scaling and Generalization in Neural Networks: a Case Study, Connectionist Models Summer School, Carnegie Mellon University.

[2] S. Baroum and J.H. Patterson, 1989. A Heuristic Algorithm for Maximizing the Net Present Value of Cash Flows in Resource-Constrained Project Schedule, Working Paper, Indiana University.

[3] E.W. Davis, 1973. Project Scheduling under Resource ConstraintsHistorical Review and Categorization of Procedures, AIlE Transactions 5:4, 297-313.

[4] E.W. Davis and J .H. Patterson, 1975. A Comparison of Heuristic and Optimum Solutions in Resource-Constrained Project Scheduling, Management Science, 21:8, 944-955.


[5] S.E. Fahlman, 1988. An Empirical Study of Learning Speed in BackPropagation Networks, Technical Report CMU-CS-88-162, Carnegie Mellon University.

[6] M.R. Garey, and D.S. Johnson, 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman and Co., New York.

[7] J. Hertz, A. Krough and R.G. Palmer, 1991. Introduction to the Theory of Neural Computation, Addison-Wesley Publishing Company.

[8] K. Hornik, M. Stinchcombe and H. White, 1989. Multilayered Feedforward Neural Networks are Universal Approximators, Neural Networks, 2, 359-366.

[9] N. Kadaba, 1990. XROUTE: A Knowledge-based Routing System using Neural Networks and Genetic Algorithms, unpublished Ph.D. thesis, North Dakota State University.

[10] I.S. Kurtulus, and E.W. Davis, 1982. Multi-Project Scheduling: Categorization of Heuristic Rules Performance, Management Science, 28:2, 161-172.

[11] T. Kohonen, 1989. Self-Organization and Associative Memory, SpringerVerlag, Berlin-Heidelberg, New York, 3rd edition.

[12] K.E. Nygard, P. Juell and N. Kadaba, 1990. Neural Networks for Selecting Vehicle Routing Heuristics, ORSA Journal on Computing, 2:4, 353-364.

[13] R. Padman, D.E. Smith-Daniels, and V.L. Smith-Daniels, 1990. Heuristic Scheduling of Resource-Constrained Projects with Cash Flows: An Optimization-Based Approach, Working Paper 90-6, Carnegie Mellon University, Pittsburgh.

[14] T.L. Pascoe, 1965. An Experimental Comparison of Heuristic Methods for Allocating Resources, unpublished Ph.D. thesis, Cambridge University.

[15] J .H. Patterson, 1976. Project Scheduling: The Effects of Problem Structure On Heuristic Performance, Naval Res. Logist. Quart., 23:1, 95-122.

[16] R.A. Russell, 1986. A Comparison of Heuristics for Scheduling Projects with Cash Flows and Resource Restrictions, Management Science, 32:10, 1291-1300.

[17] K.Y. Tam, and M.Y. Kiang, 1992. Managerial Applications of Neural Networks: The Case of Bank Failure Predictions, Management Science, 38:7, 926-947.

312 CHAPTER 15

[18] E. Wacholder, J. Han, and R.C. Mann, 1988. A Neural Network Algorithm for the Multiple Traveling Salesman Problem, Proceedings of the IEEE Annual International Conference on Neural Networks, San Diego, pp 305-324.

[19] D. Zhu and R. Padman, 1993. Heuristic Selection in Resource-Constrained Project Scheduling: Experiments with Neural Networks, Working Paper 93-43, Carnegie Mellon University, Pittsburgh, PA 15213.

the impact of emerging technologies on computer science and operations research

Documents