1 integration of artificial intelligence and operations research techniques for combinatorial...
TRANSCRIPT
1
Integration of Artificial Intelligence and Operations Research Techniques
for Combinatorial Problems
Carla P. GomesCornell University
Ken McAloon and Carol TretkoffILOG
{mcaloon,tretkoff}@ilog.com
Integration of Artificial Intelligence and Operations Research Techniques
for Combinatorial Problems
Carla P. GomesCornell University
Ken McAloon and Carol TretkoffILOG
{mcaloon,tretkoff}@ilog.com
2
AI, OR, and CSAI, OR, and CS
AI OR
CS
3
Integration of Artificial Intelligence &Operations Research
TechniquesAI
RepresentationsConstraint Languages
Logic FormalismsObject-Oriented Prog.
Bayesian NetsRule Based Systems
• • •Tools
Constraint PropagationSystematic SearchStochastic Search
• • •Pros / Cons
Rich RepresentationsComputational Complexity
OR
RepresentationsMathematical
Modeling LanguagesLinear & Non-linear
(In)Equalities• • •Tools
Linear ProgrammingMixed-Integer Prog.Non-linear Models
• • •Pros / Cons
More Tractable (LP)Primarily Complete InfoLimited Representations
Combinatorial Problems
Planning Scheduling
THE CHALLENGE
AI OR
UNIFY APPROACHES TO:
SCALE UP SOLUTIONS
HANDLE UNCERTAINTY
ANALYZE COMPLEXITY (phase transition)
FRAGILE
EXPLOIT PROBLEM STRUCTURE
INCREASE ROBUSTNESS
31 - 45: ACPOWER? 0 NUM-UNAV-RESS 1UNAV-RES-MAP (DIV2 D24BUS-3 D24-2 D24-1) (ACPLOSS D24BUS-3 D24-2
ROME LABORATORY OUTAGE MANAGER (ROMAN)
Parameters Load Run Gantt Charts Utilities Parameters Load Run Gantt Charts Utilities
AC-POWER StatusAC PowerDIV1DIV2DIV3DIV4
0 10 20 30 40 50 60 70 80 90
GoalStart
4
OutlineOutline
I. Short Overview of OR
II. Disjunctive Programming and Hybrid Solvers
III. Exploiting Randomization to Solve Hard Combinatorial Problems
IV. Conclusions
5
I. Short OR OverviewI. Short OR Overview
6
Outline for Linear Programming and Integer Programming
Outline for Linear Programming and Integer Programming
• Standard Form of LP and a Simple Example
• Geometric Interpretation of LP
• Complexity issues
• MIP
• Example: Fast Food
• Example: Capacitated Warehouse
• Example: 911
7
OutlineOutline
1. Short Overview of OR
2. Constraint Programming
3. Cooperating Solvers
4. Disjunctive Programming
5. Exploiting Randomization to Solve Hard Combinatorial Problems
6. Conclusions
8
Optimization Technology Evolution
Dispatch Rules
1960 1970 1980 1990
SA, GA, Tabu
CPMPERT
Constraint-based Scheduling
19981947
Primal Simplex LP
ParallelLP/MIP
ConcurrentScheduling
Interior Point
ConstraintPropagation
Large IPsMIP
ShiftingBottleneck
First CP Systems
CooperatingSolvers (LP/CP)
Global constraints
Barrier LPBarrier Crossover
Dual Simplex Implementation
Dual Simplex
9
1. Short OR Overview1. Short OR Overview
10
Outline for Linear Programming and Integer Programming
Outline for Linear Programming and Integer Programming
• Standard Form of LP and a Simple Example
• Geometric Interpretation of LP
• Complexity issues
• MIP
• Example: Fast Food
• Example: Capacitated Warehouse
• Example: 911
11
An LP StoryAn LP Story
A factory can produce n products from m parts
For product j it needs aij units of part i
There are bi units of part i available
Each unit of product j sold earns cj
Amount of each product to make is unknown xj 0
Each part i determines a constraint
ai1 x1 + … + ain xn bi
Obvious solution: do nothing
Better: maximize c1 x1 + … + cn xn
12
Standard Forms of LPStandard Forms of LP
A linear program (LP) in standard form (Dantzig 1947)
max cTx
subject to Ax b
x 0
Input data: c (n x 1), A (m x n), b (m x 1).
Variables: x (n x 1)
13
Standard Forms of LPStandard Forms of LP
// The objective function
max c1 x1 + … + cn xn
// The constraints
subject to
a11 x1 + … + a1n xn b1
...
am1 x1 + … + amn xn bm
x1 0 , … , xn 0
14
Standard Forms of LPStandard Forms of LP
• In OR emphasis is on optimality
• Solution means optimal solution
• Feasible solution means solution in the ordinary sense
15
Standard Forms of LPStandard Forms of LP
Interpretation of standard form:
• xj = amount of product j to make
• cj = revenue per unit product j
• bi = available amount of component i
• aij = units of i used per unit of j produced
The constraints “say”:
aijxj = units of i used by j
= units of i used
bi
16
What are models?What are models?
A model is a data-independent abstraction of a problem
A model lets you write down the mathematical representation of a model independently of the data
Project
Model Data
OneProblemInstance
17
Products Could be Jewelry
Products: Rings and Earrings
Components: Gold and Diamonds
One ring requires 3 units of Gold, and 1 DiamondOne set of earrings requires 2 units of Gold, and 2 Diamonds
Total Gold and Diamonds are limited
Profit is different for Rings than for Earrings
Products = { rings, earrings };Components = { Gold, Diamonds };
demand = [ [3, 1], [2, 2] ];
stock = [150, 180];
profit = [60, 40];
18
Products: Ammonium Gas = NH3 Ammonium Chloride = NH4Cl
Components: Nitrogen, Hydrogen, Chlorine
One unit of Gas requires 1 unit of Nitrogen, 3 units Hydrogen
One unit of Chloride requires 1 unit of Nitrogen, 4 units Hydrogen, and 1 unit of Chlorine
Total Nitrogen, Hydrogen, Chlorine is limited
Profit is different for Gas than Chloride
Products Could be Chemicals
Products = { gas, chloride };
Components = { nitrogen, hydrogen, chlorine };
demand = [ [1, 3, 0], [1, 4, 1] ];
stock = [50, 180, 40];
profit = [30, 40];
19
The Problems Have One ModelThe Problems Have One Model
enum Products ...;enum Components ...;
float+ demand[Products, Components] = ...;float+ profit[Products] = ...;float+ stock[Components] = ...;
var float+ production[Products];
maximize sum (p in Products) profit[p] * production[p]
subject to { forall (c in Components) sum (p in Products) demand[p, c] * production[p] <= stock[c]};
Data
DecisionVariables
Objective Function
Constraints
20
OR Modeling SystemsOR Modeling Systems
• OPL
• AMPL
• 2LP
• AIMMS
• GAMS
• MPL
• ILOG Planner
• etc
21
The Dual The Dual
The dual linear program (von Neumann 1947);
min yTb
subject to yTA c
y 0
Variables y (m x 1)
Awesome Symmetry -
The dual of the dual is the primal
22
Rows and Columns ExchangedRows and Columns Exchanged
min b1 y1 + … + bm yn
subject to
a11 y1 + … + am1 ym c1
...
a1n y1 + … + amn ym cn
y1 0 , … , ym 0
23
Duality TheoremDuality Theorem
Theorem: min yTb = max cTx
• Consequence: This turns optimality problem into a feasibility problem in x and y
Ax b
x 0
yTA cT
y 0
yTb = cTx
• Consequence: Enumeration not needed to verify optimality
24
Duality TheoremDuality Theorem
• Sensitivity Analysis
• Consequence: The solution values y* for the y variables yield the Lagrange multipliers of the primal constraints which measure the rate of change of the objective function with respect to the right hand side bounds b
yi * = Z / bi where Z is the optimum
Reference: McAloon and Tretkoff [1996] Wiley
Duality
Two different views of the same phenomenon
Point vs Set
Arc vs Node
Momentum vs Position
Vector vs Hyperplane
Landlord vs Renter
26
Simplex and BarrierSimplex and Barrier
• The simplex algorithm turns the feasibility problem into a iterative repair process with a powerful evaluation function
• The barrier method transforms the LP into a system of differential equations that describe a vector field of flow on the polytope
27
Geometric Interpretation of LPGeometric Interpretation of LP
X
Y
Max: Xsubject to:
-X + Y <= 4X + 4*y <= 362*X + y <= 23X + Y >= 4Y >= X + 10
(0,4)
(4,0) (8,0)
(10,3)
(4,8)
Barrier
Simplex
28
Complexity of Linear ProgrammingComplexity of Linear Programming
Simplex Method
Worst-case --- exponential (Klee and Minty 72)
Practice --- good performance
Ellipsoid MethodKhachian’s Ellipsoid Method
Worst-case --- polynomial
Practice --- poor performance
29
Complexity of Linear ProgrammingComplexity of Linear Programming
Interior Point Methods or Barrier Methods“Karmarkar’s” (and variants) Method
Worst-case --- polynomial
Practice --- good performance
30
Complexity of Linear ProgrammingComplexity of Linear Programming
• Despite its worst case exponential time complexity, the simplex method is usually the method of choice since it provides tools for sensitivity analysis and its performance is very competitive in practice.
• Which method performs best is problem dependent.
31
Success StoriesSuccess Stories
• Industrial Planning
Given current resources, decide what to produce in what quantity
• Supply Chain Management
Multiperiod planning models that link flow from one period to the next
• Network Flow
How best to route goods across a network
32
Assumptions of Linear Programming
Assumptions of Linear Programming
• Linearity
when violated: ( xy = 50)
Nonlinear programming
• Continuity
when violated: (x integral)
(Mixed) Integer programming
33
Assumptions of Linear Programming - continued
Assumptions of Linear Programming - continued
• No Disjunctive Constraints
when violated: (x 100 or x 0)
Disjunctive programming
Additional 0-1 variables and Big M constraints
• Certainty
when violated: (cost c is a random variable)
Stochastic programming
34
Search and MIPSearch and MIP
• In order to deal with variables that must have integer values in the solution, a search must be performed.
• Mixed Integer Programming problems are combinatorial optimization problems and are NP hard
• feasibility is NP-Complete
• verifying optimality is co-NP-Complete
35
MIP and Combinatorial Optimization
MIP and Combinatorial Optimization
• These problems have been attacked by both the AI and OR communities.
• In AI, these problems are attacked as CSPs or as Planning Problems.
• In OR, they are done as MIPs and use linear relaxation to help guide the search.
• The overriding idea in each case is to limit search.
36
Integer Program: All Integer Points in Region
Integer Program: All Integer Points in Region
37
Cut to Create Integer VertexCut to Create Integer Vertex
Integer Vertex
38
Example - Fast FoodExample - Fast Food
• Question: Is it possible for a male college student to eat at the local fast food outlet and still meet the requirements of a balanced diet?
• If so, what is the least he can do it for?
39
Nutritional RequirementsNutritional Requirements
• At least 100% of vitamins A, C, B1, B2, niacin, calcium and iron
• At least 55 grams of protein
• At most 3000 milligrams of sodium
• At most 30% of the calories can come from fat
• Nutritional information is available from fast food outlets
40
College Student’s RequirementsCollege Student’s Requirements
• At least 2000 calories a day
• No more than 3 servings of any one food
• Milk only with cereal and not as a stand-alone drink
41
Fast Food - MIP ModelFast Food - MIP Model
• We will have variables Servk to represent the number of servings of item k in the plan.
• The variable Servk will have to take an integer value for the solution to be valid.
• The objective function: Z for cost
42
Fast Food - MIP ModelFast Food - MIP Model
• Let foodk,j represent the percent of RDA of nutrient j in a serving of item k
• The for each nutrient j, we have a constraint
foodk,j Servk 100 k
43
Fast Food - MIP ModelFast Food - MIP Model
• Let sodiumk represent the amount of salt in a serving of item k
• For salt we have the constraint
sodiumk Servk 3000 k
• Similarly for fat
44
Fast Food - MIP ModelFast Food - MIP Model
• Let costk represent the cost of a serving of item k
• For the objective function we have the defining constraint
costk Servk = Z k
45
Fast Food - SolutionFast Food - Solution
• With a MIP solver and a way to input these constraints we ask for
• a solution that makes the variables Servk integral
• and which minimizes Z
46
MIP Solution TechniqueMIP Solution Technique
• What the MIP solver does is to carry out a branch and bound search guided by
• the linear relaxation– the solution to the problem with the integrality
requirements relaxed
• Initialize the global variable best_so_far to 1000 (or something else very big).
47
At a NodeAt a Node
• Compute a solution to the linear relaxation which minimizes Z yielding z*. Prune this node if
z* best_so_far ,
• If all values of Servk are integral, this is a solution. Set best_so_far = z*. Save this node.
48
Branching at a nodeBranching at a node
• Choose a variable Servk whose value s* is not integral.
• Typical heuristic: most non-integral variable
• Create two child nodes,
• add Servk floor(s*)
• add Servk ceil(s*)
49
Good NewsGood News
• The linear relaxation can prune nodes before all variables Servk are forced to be integral.
• Surprisingly often a node “high in the tree” will turn up with all relevant variables integer. Here’s why
• A solution to the LP is at a vertex
• A vertex is defined as the simultaneous solution of the equality form of n linearly independent constraints
• Many of these constraints are integer bounding constraints yielding X = integer
50
Arboreally SpeakingArboreally Speaking
• Breadth first search is often preferred - it visits the “smallest” number of nodes needed to find and verify the optimal solution - analogous to A*
• If the linear relaxation is tight
| z*linear - z*integral | is relatively small
then z*linear is an excellent evaluation function
51
Answer - Fast FoodAnswer - Fast Food
Total cost is 8.71
Buy 3 burgers
Buy 2 fries
Buy 3 honeys
Buy 1 yogurt
...
52
Example - Fixed CostExample - Fixed Cost
• Warehouses must be rented in order to supply stores and we must decide which to use
• For each store j we know its monthly demand dj
• For each warehouse i we know its capacity ki
• For each warehouse i we know the fixed cost to run it each month fci
• For each pair i, j we know the monthly cost cij of supplying j from i
53
Example - Fixed CostExample - Fixed Cost
• Xij is the fraction of store j’s demand met by i
• Xij 1
• Yi is a “fuzzy” boolean
• it will be 1 if the warehouse is rented
• 0 if it is not rented
• Yi 1
54
Example - Fixed CostExample - Fixed Cost
• Each store must be supplied
X ij = 1 i
• Warehouse capacity can not be exceeded
dj Xij ki j
• Tighter
dj Xij ki Yi j
55
Example - Fixed CostExample - Fixed Cost
• Objective function
fci Yi + cij Xij
• This yields a MIP with 0-1 variables Yi
56
Branch and Cut: An Enhanced Solution Method
Branch and Cut: An Enhanced Solution Method
• Cuts - redundant constraints for the MIP model but not redundant for the linear relaxation
Xij Yi
• Add at a node if violated by solution to linear relaxation
• Powerful method - will solve the Imperial College OR lib CW problems very easily
57
Example - Call 911Example - Call 911
• PCTs answer the phone 24 hours a day, 7 days a week.
• It is known how many PCTs should be on duty during each of the 168 hours during the week in order to assure the necessary response rate.
• Workers can arrive at any hour and they work for 8 hours except for a one hour break after 4 hours.
58
Example - Call 911Example - Call 911
• Each PCT has a work week of 5 days followed by 2 days off.
• Want to meet the demand with minimal or near-minimal number of PCTs.
• So need to determine how many PCTs start their work week at each hour h of the week
59
Modeling 911
• A continuous variable Pcth will represent the number of workers who start their work week at hour h, 0 h < 168.
60
Modeling 911
• A continuous variable Z will represent the objective function
Pcth = Z h
• There will be a constraint for each hour h to assert that there are enough workers on duty at that time. The rhs of this constraint is bh = the number of workers needed.
61
Modeling 911
• For this constraint we need to represent the number of workers who are on duty at time h
• Certainly, those who start the week at time h are here, as are those who started the week at time h - 1
• And so on back to time h - 7 with the exception of those who started at time h - 4 and who are now on break.
62
Modeling 911
• This also applies to the previous 4 days. When the smoke clears, we sum over the workers w who are working at time h
Pctw bh w
63
Call 911 solved with progressive roundoff
int b[168] = { // New York City 91130,24,18,15,14,14,15,25,34,36,38,40,41,43,46,57,57,59,61,59,55,50,45,38,32,25,20,17,15,13,17,25,32,35,38,40,42,43,47,58,57,57,59,57,55,52,47,41,33,25,20,17,15,13,15,25,32,33,37,39,42,43,47,57,56,57,57,56,53,50,47,41,34,27,22,19,16,15,16,25,31,35,37,40,44,45,48,57,57,56,58,56,53,53,46,41,34,28,23,19,16,15,17,25,33,37,39,42,45,47,51,59,58,60,61,61,57,56,57,55,48,41,35,30,26,20,18,22,26,32,42,46,49,53,54,56,56,56,59,59,57,57,56,56,52,46,41,34,29,23,18,19,25,31,36,41,46,50,52,53,52,53,54,53,50,49,45,40
};
64
Modeling 911
• Subject to these constraint we want to find a solution which makes the Pcth integer and which makes Z small.
• The naïve approach is to compute the minimal linear solution and to round up all the values of Pcth to the nearest integer.
• The linear relaxation yields Z = 204.67 “fuzzy workers” but rounding yields a mediocre integral solution of 259 workers.
65
Modeling 911
• For this and many other applications, heuristics can be used to develop good solutions
• Progressive Roundoff - solve the linear relaxation, round up first variable and freeze it, re-solve etc.
66
Solving the Integer Problem
main() // Planner Code
{
IlcInitFloat();
IlcManager m(IlcNoEdit);
IlcLinOpt simplex(m);
IlcFloatVarArray Pct(m,168,0,1000);
IlcFloatArray coeffs(m,168);
int i,j,k,h,n;
67
Solving the Integer Problem
// Pctw bh w
for(h=0;h<168;h++) { // for each hour of 168 in week
for(j=0;j<168;j++)
coeffs[j] = 0;
for(k=0;k<5;k++) // for each of 5 days
for(j=k*24;j<k*24+8;j++) // for each of 8
if (j!=(k*24+4)) // hours
coeffs[(h+168-j)%168] = 1;
simplex.add(IlcScalProd(coeffs,Pct) >= b[h]);
}
68
Solving the Integer Linear Problem
IlcFloatVar Z = IlcSum(Pct);// Objective
simplex.setObjMin(Z);
for(i=0;i<168;i++) { //Progressive roundoff
n =ceil(simplex.getCurrentValue(Pct[i]));
// Fix variable and re-optimize
simplex.add(Pct[i] == n);
}
m.out() << “Number of Pcts needed is “ << Z << endl;
m.end();
}
69
Solution
• This code finds a solution with 208 workers in a couple of seconds. The optimum is 207.
• The heuristic works well in part because if there were no lunch breaks, it would find the guaranteed optimal solution
• [Bartholdi,Ratliffe,Orlin]
70
2. Constraint Programming2. Constraint Programming
LP/MIP is Beautiful, except when
• Variable domain information is important to the search strategy
• especially critical in scheduling
• The problem variables range over symbolic entities and there are lots of symmetries
• timetabling
• The MIP representation can be too verbose or awkward
• configuration
• There are just too many constraints e.g. vehicle routing
72
Mathematical Basis of Constraint Programming (CP)
Mathematical Basis of Constraint Programming (CP)
The Constraint Satisfaction Problem:
• Suppose a finite set of variables is given and with each variable is associated a non-empty finite domain.
• A constraint on k variables X1,…,Xk is a relation R(X1,…,Xk) D1 x …x Dk.
• A constraint satisfaction problem (CSP) is given by a finite set of constraints.
• A solution to a CSP is an assignment of values to all the variables so that the constraints are satisfied.
73
Domain ReductionDomain Reduction
• In CP, each constraint of a CSP is considered as a subproblem and techniques are developed for handling frequently encountered constraints.
• With each constraint is associated a domain reduction algorithm which reduces the domains of the variables that occur in the constraint.
• Accelerates convergence toward a solution
• Detects infeasibility
74
Constraint PropagationConstraint Propagation
• The other key issue is communication among the constraints or subproblems.
• The basic method used is called constraint propagation which links the constraints through their shared variables.
• The important thing about this setup is that it is very modular and independent of the particular structure of the individual constraints.
Monsieur Jordan Phenomenon
• Like prose, you have been doing constraint propagation all your life.
– Crossword puzzles
• Incomplete and so backtracking is needed
– NY Times Sunday Crossword
– Optical Illusions
• Origin: Vision analysis (Marr,Waltz et al)
76
Strengths of Constraint Programming
Strengths of Constraint Programming
Constraint Programming provides a rich Rich
• Rich representation language.
• CP variables naturally represent problem entities and the constraints do not have to be translated into a specific problem format such as MIP or SAT.
• Opportunity to choose a good heuristic for the solution strategy.
77
Which Method for Which App?Which Method for Which App?
ProductMix
LP
ProductionPlanning
MIP
DistributionPlanning
MIP
Scheduling
ConstraintBased
Scheduling
Dispatching
CPLocal search
Configuration
CP Technology
Application
Linear => Disjunctive Constraints
Strategic => Operational Optimization
78
3. Cooperating Solvers3. Cooperating Solvers
First Stop
CP/CP
80
Mother of All Examples - N Queens
Mother of All Examples - N Queens
• Do we think in terms of queens
• Where do we place this queen ?
• Do we think in terms of squares
• Will this square contain a queen ?
• These views are dual to one other
The Primal ViewThe Primal View
For each queen assign it a square
Place this queen in this square ?
The Dual ViewThe Dual ViewFor each square decide whether it will
have a queen
Place a queen in this square ?
83
The Primal ModelThe Primal Model
In which row do we place q[j] - the queen in column j
The constraints
q[i] != q[j]
q[i] - q[j] != i - j
q[i] - q[j] != j - i
Note: no alldifferent constraint
84
Yet Another duality - rows vs columns
Yet Another duality - rows vs columns
In which column do we place qq[i] the queen in row i
The constraints are the same
qq[i] != qq[j]
qq[i] - qq[j] != i - j
qq[i] - qq[j] != j - i
85
The RelationshipThe Relationship
Can link them as inverse functions:
q[qq[i]] = i
qq[q[j]] = j
The constraint propagation
i leaves domain of q[j] iff j leaves domain of qq[i]
86
In this primal/dual modelIn this primal/dual model
• Apply first-fail to q[i]
• Lo and behold
one-third fewer fails
(Example from Jean Jordan’s thesis)
87
88
Q
Q
X
X
X XX X X
X
X
X
X
X
X
X
X
89
Q
Q
X
X
X XX X X
X
X
X
X
X
X
X
X
Q
X
X
X
X
X XX
X
X
X
90
Q
Q
X
X
X XX X X
X
X
X
X
X
X
X
X
Q
X
X
X
X
X XX
X
X
X
Q
X X
X
X
91
So … So …
The cooperating primal-dual formulation “captured” the generalized arc consistency of the alldifferent constraint
The arc consistency of this global constraint is non-trivial to maintain
Network flow algorithmsflow goes from values to variables
each variable has unit demand and capacity
92
Remarks Remarks
• An IP model will encode the first dual solution
Will this square contain a queenx[i][j] = 0 or x[i][j] = 1
• A disaster beyond 30 queens
• network structure on rows and columns lost
• Another example - sports scheduling
93
Constraints and IndicesConstraints and Indices
• In IP symbols are represented by indices as opposed to values for variables.
• nurses, teams
• Paris-St-Germain plays Manchester United on day k
xijk {0-1} to represent “team i plays team j on day k”
• You can’t put symmetry breaking and other constraints on indices.
Second Stop
CP/IP
95
CP Is Powerful, But ….CP Is Powerful, But ….
• Sometimes, inconsistencies can be overlooked
X - Y 12
X + Y 10
X in [1..20]
Y in [1..20]
• Domain reduction on each constraint and constraint propagation will not reduce the domains although the system has no linear solution
• but an LP solver would spot this
96
2 Dimensional Bin Packing2 Dimensional Bin Packing
• Application for the Automobile Industry built by Greg Glockner
97
2 Dimensional Bin Packing2 Dimensional Bin Packing
• The problem here is to put as many small rectangles in a big rectangle with 90 degree rotation allowed.
The actual application involves circuit boards
• There are two complete models, one a CP model and the other an IP model.
The CP model directs the search
The LP relaxation prunes the search space by detecting infeasible nodes
98
2-D Bin Packing2-D Bin Packing
Arrange circuit boards onto raw material
Boards may be rotated
Use same number of each board
Objective: minimize scrap
Classic combinatorial optimization problem
99
Solving 2-D Bin PackingSolving 2-D Bin Packing
Use CP to generate partial solutions (nodes)
Restrict placement to reduce fragmentation of blank space
Use tight LP to test feasibility
If any partial solution is infeasible in the LP, prune the tree immediately
CP constraints reduce the tree widthLP allows us to prune quickly
100
2 Dimensional Bin Packing2 Dimensional Bin Packing
• As the search tree is traversed, the two models are in sync.
• Note that the variables used in the 2 models are disjoint
• The two models are dual to each other
• The IP sees the model from the point of view of the board, the large rectangle
• The CP sees the model from the point of view of the small rectangles
• Solutions are obtained in minutes
101
2DBP: Basic CP Formulation2DBP: Basic CP Formulation
• Let (xi, yi) be the location and (wi, hi) be the dimensions of the ith tile
• Basic constraints:
– Disjunctive constraints to prevent overlapping tiles
xi + wi xj yi + hi yj
xj + wj xi yj + hj yi
– Constraints to count the number of each tile type
Tile-oriented formulation
102
2DBP: Basic IP Formulation2DBP: Basic IP Formulation
Let xijnt = 1 if tile n of type t is in position (i, j)
The constraints are:
tnjix
jix
tnx
ijnt
hjjwiijjiitnji
ntji
jiijnt
tt
,,,}1,0{
,1
,1
,,,:,,,
,
Grid-oriented formulation
103
2DBP: LP Issues2DBP: LP Issues
• The LP is large• The LP exhibits significant primal degeneracy• The LP exhibits significant dual degeneracy
104
2DBP: LP Issues2DBP: LP Issues
• The simplex algorithm cannot solve the LP• There is no way for a MIP solver to solve the IP
as such• The barrier method can solve the LP
105
2DBP: Summary2DBP: Summary
• CP as master problem– Orders tiles
– Places tiles by position, then type
– Selects tile type by frequency to scatter tiles throughout the bin
– Uses a one-ply lookahead constraint to limit the position of following tile
• LP relaxation prunes the CP search space– Checks whether the partial solution will
lead to an infeasible instance• Use idiomatic formulations for CP and IP
106
2DBP: Remarks2DBP: Remarks
• The CP fixes significant numbers of variables at each node
• The LP pre-processor greatly simplifies the LP
• Therefore, the lack of incrementality of the barrier method does not cost us
107
2DBP: Cooperative Algorithm Demo
2DBP: Cooperative Algorithm Demo
108
Last StopLast Stop
• Constraint Programming and Local Search cooperation
• Another example of duality in action
109
CP/LSCP/LS
Parallel machines with set-up times
Ready times
Dues dates
Splittable jobs
Rogue machines
Objectives
meet due dates
minimize setup costs
110
Two Phase CooperationTwo Phase Cooperation
Phase I - the Primal (Work on first objective)
• Configure and schedule the jobs
– Use constraint based scheduling
111
Two Phase CooperationTwo Phase Cooperation
Machines morph into trucks
112
Two Phase CooperationTwo Phase Cooperation
• Phase 2 - the Dual (Work on second objective)
• Schedule the trucks
– Use Lin, Lin-Kernighan, tabu etc
113
Parallel Machines: Cooperative Algorithm Demo
Parallel Machines: Cooperative Algorithm Demo
114
IC Park ExampleIC Park Example
• Hoist Scheduling (Rodosek and Wallace)
• The original model is an IP
• The CP model is “the same”
• CP guides search, LP relaxation and CP share pruning duties
• No apparent duality
115
RemarksRemarks
• One can get great benefit with CP/IP algorithms CP/CP algorithms and CP/LS algorithms
• IP/LS is just around the corner
• IP/IP cooperation is hard because one can’t formulate truly dual views
– either simply not there
– or too verbose
counterexamples welcome
116
4. DISJUNCTIVE PROGRAMMING 4. DISJUNCTIVE PROGRAMMING
117
Disjunctive Linear ProgrammingDisjunctive Linear Programming
• An extension of Mixed Integer Programming
• A union of polyhedral sets (feasible regions) is called a disjunctive set.
118
Disjunctive SetDisjunctive Set
119
Disjunctive Linear ProgrammingDisjunctive Linear Programming
• The problem of determining whether the intersection of a family of disjunctive sets is non-empty is called the disjunctive linear programming problem or simply disjunctive programming problem.
• The solution set of the disjunctive programming problem is
Fij
i<M j<N
120
Disjunctive Linear Programming Examples
Disjunctive Linear Programming Examples
• Semi-continuous variables
either X >= 100
or X == 0
• Rather than
X <= BigM*Y ,
X >=100*Y,
Y a 0-1 variable
121
Solution Set Inside Initial RegionSolution Set Inside Initial Region
122
Disjunctive Linear Programming Examples - continued
Disjunctive Linear Programming Examples - continued
• Bollapragada, Ghattas and Hooker
Truss structure design problem
Branches directly on alternatives dictated by Hooke’s Law
• Wyatt
Disjunctive programming and mean absolute deviation models (MAD) for portfolio optimization
Extends Bender’s decomposition to disjunctive linear programs
123
Disjunctive Linear Programming continued
Disjunctive Linear Programming continued
• Balas, Cornuejols and Ceria
Generating cuts for disjunctive programming problems.
• McAloon and Tretkoff
Basic mathematical results: Optimization and Computational Logic, Wiley
124
Disjunctive Linear Programming continued
Disjunctive Linear Programming continued
• Dealing with the disjunctive part requires search.
• This requires an engine which is not available in MIP packages
• Also the linear relaxation is not as tight and the evaluation function is not as faithful
• The solution is to use a CSP solver and an LP based solver in tandem - cooperating solvers
• Beringer and DeBacker for MIP
125
To Keep It Simple
GERALD + DONALD = ROBERT
An AI classic
Newell and Simon
Assignment problem + 1 constraint
Surprisingly hard for MIP solvers
CPLEX MIP takes 1 minute and 29048 nodes (on Sun Enterprise) to find a feasible integer solution
126
The Disjunctive ProgramThe Disjunctive Program
• One constraint for the equation
• 100000 G + … + D = 100000 R + … + T
• For each variable X among G,…,T
• X = 0 or X = 1 or … or X = 9
• For each pair X, Y
• X Y-1 or Y X-1
127
Solution Set: SOME of the Integer Points in the RegionSolution Set: SOME of the
Integer Points in the Region
128
The Twin Variables for Cooperating Solvers
The Twin Variables for Cooperating Solvers
• Integer variables for the letters
0 g, e, r, a, l, d, o, n, b, t 9
• With continuous doppelgangers
0 G, E, R, A, L, D, O, N, B, T 9
129
The VariablesThe Variables
• One multi-variable constraint on the continuous doppelgangers posted to an LP solver and to the CSP solver
100000 G + 10000 E + 1000 R + … + D +
100000 D + 10000 O + 1000 N + … + D
=
100000 R + 10000 O + 1000 B + … + T
130
The VariablesThe Variables
• One CSP constraint on the integer variables posted to a discrete constraint propagation engine
AllDifferent(g, e, r, a, l, d, r, n, b, t )
131
The SearchThe Search
• Bounding information from the discrete variables is passed to the continuous doppelgangers and conversely
• The branching strategy is guided by the linear relaxation on the continuous variables
• if there is a non-integral variable X, branch on it
X floor(X*)
or
X ceil(X*)
132
The SearchThe Search
• If the AllDifferent constraint, the initial bounding constraints and the bounding constraints from branching detect a contradiction on the discrete variables, both sides backtrack
• If the linear relaxation is made infeasible by the bounding constraints that come from the discrete computation or from branching, both sides backtrack
133
The SearchThe Search
• New wrinkle
• The solution to the linear relaxation might have all variables integral - but the AllDifferent constraint can be violated by this set of values
• In this case, branch to keep them apart
• either X Y - 1
• or Y X - 1
134
The Variables
void main()
{
IlcInitFloat();
IlcManager m(IlcNoEdit);
IlcIntVar D(m, 1, 9), O(m, 0, 9), N(m, 0, 9), A(m, 0, 9), L(m, 0, 9),
G(m, 1, 9), E(m, 0, 9), R(m, 1, 9), B(m, 0, 9), T(m, 0, 9);
IlcIntVarArray vars (m, 10, D, O, N, A, L, G, E, R, B, T);
// Continued on next slide
135
The Constraints
m.add(IlcAllDiff(vars,IlcWhenValue));
IlcLinOpt simplex(m);
simplex.add(
100000*R + 10000*O + 1000*B + 100*E + 10*R + T
==
100000*G + 10000*E + 1000*R + 100*A + 10*L + D
+
100000*D + 10000*O + 1000*N + 100*A + 10*L + D ,
IlcTrue // Post to Solver as well
);
136
The Search for solutions
m.add(Generate(m,simplex,vars)); // Search strategy
if (m.nextSolution()) { // Find a solution
m.out() << " solution found " << endl;;
}
m.printInformation();
m.end();
}
137
Branch if a variable is non-integer
ILCGOAL2(Generate, IlcSimplex, simplex, IlcIntVarArray, vars)
{
IlcInt varIndex = MostNotInteger(vars, simplex);
if (varIndex >= 0) // There is a non-integer variable
return IlcAnd(IlcTryUpwardFirst(vars[varIndex], simplex), this);
138
Is integer relaxation a solution ?
IlcManager m = getManager();
if(m.solve(TestIntegerRelaxation(m,simplex)))
return 0;
139
Find two variables with same value
IlcInt j;
for(i=0;i<vars.getSize()-1;i++) {
if (vars[i].isBound()) continue; // Can’t both be bound
IlcInt n = simplex.nearest(simplex.getCurrentValue(vars[i]));
for(j=i+1;j<vars.getSize();j++) {
IlcInt m =
simplex.nearest(simplex.getCurrentValue(vars[j]));
if (m == n) break;
}
if (j< vars.getSize()) break;
}
140
Branch to push them apart
// j and i are the indices of two variables with same current value
return
IlcAnd(IlcOr(
Smaller(m,vars[i],vars[j],simplex),
Smaller(m,vars[j],vars[i],simplex)),
this // Recursion
);
}
141
Pushing two variables apart
ILCGOAL3(Smaller,IlcIntVar,x,IlcIntVar,y,IlcSimplex,simplex)
{
simplex.add(x <= y-1,IlcTrue);
return 0;
}
142
Testing the integer relaxation
ILCGOAL1(TestIntegerRelaxation, IlcSimplex, simplex)
{
simplex.trySolution();
return 0;
}
143
Results
• ILOG Solver/Planner finds a solution in 6
nodes (.29 seconds on laptop)
• Straightforward ILOG Solver finds a solution
in 8024 nodes (1.8 seconds on a laptop)
• Again, CPLEX MIP takes 1 minute and 29048
nodes (on Sun Enterprise) to find a feasible
integer solution
144
Example: The Dutch Trains
Scheduling intercity trains
Amsterdam,Rotterdam,Roosendaal,Vlissengen
Without coupling constraints, multi-commodity integer flow problem
With coupling constraints, a DLP with an integer relaxation
Additional logic handled directly in 2LP with CPLEX
Disjunctive Programming and Cooperating Solvers, CSTS 98 (Kluwer, edited by D. Woodruff)
ConclusionsConclusions
• CP and MIP are powerful techniques that can solve many combinatorial problems
• Each has preferred formulations
• Can get even greater benefits when combining CP and IP algorithms
146
Recent and Current Work
Beaumont
Beringer, DeBacker
Balas, Ceria, Cornuejols.
Wallace, Rodosek, Schrimpf
Heipke, Colombani
Bockmayr
McAloon, Tretkoff, Wetzel
147
III. Exploiting Randomization to Solve Hard Combinatorial
Problems
III. Exploiting Randomization to Solve Hard Combinatorial
Problems
148
BackgroundBackground
Combinatorial search methods often exhibit
a remarkable variability in performance. It is common to observe significant differences between:
- different heuristics
- same heuristic on different instances
- different runs of same heuristic with different seeds (stochastic methods)
149
Main ClaimMain Claim
One can take advantage of the extreme variability of combinatorial search methods:
One can One can improve the performance of a improve the performance of a deterministic complete methoddeterministic complete method, by , by introducing a introducing a stochastic elementstochastic element, while , while maintaining maintaining completeness.completeness.
We’ll explain We’ll explain WHYWHY that is the case. that is the case.
150
A Structured Benchmark Domain for Studying the Distributions of Search Methods
Stochasticity in Search Procedures
Intriguing Properties of Complete BacktrackStyle Algorithms
Consequences for Algorithm Design - Rapid Randomized Restarts
Portfolio of Algorithms
151
Structured Benchmark DomainStructured Benchmark Domain
152
Study of local and systematic search methods has been driven by:
Random instance distributions (Hogg et al. 96). Limitation: lack of structure that characterizes realistic problems;
Highly structured problems (Fujita at al. 93). Limitation: “too much” structure.
We propose a benchmark domain that We propose a benchmark domain that bridges the gap between purely random bridges the gap between purely random instances and highly structured instances and highly structured problems.problems.
Background Background
Gomes and Selman 1997 - Proc. AAAI-97
153
Defn.: a pair (Q, *) where Q is a set, and * is a binary
operation on Q such that
a * x = b ; y * a = b
are uniquely solvable for every pair of elements a,b in Q.
The multiplication table of its binary operation defines a
latin square (i.e., each element of Q appears exactly once
in each row/column).
Example:Quasigroup of order 4
QuasigroupsQuasigroups
154
Given a partial latin square, can it be completed?
Example:
Quasigroup Completion Problem (QCP)
Quasigroup Completion Problem (QCP)
155
Quasigroup Completion Problem A Framework for Studying SearchQuasigroup Completion Problem
A Framework for Studying Search
NP-Complete (Colbourn 1983, 1984; Anderson 1985).
Has a structure not found in random instances.
Leads to interesting search problems when structure is perturbed.
The study of this problem led us to identify
the unusual distributions of combinatorial search (Gomes, Selman & Crato --- CP97)
156
Aside: Applications of QuasigroupsAside: Applications of Quasigroups
Design of statistical experiments
eliminating data dependencies Scheduling/Timetabling (Anderson 1992)
completing a schedule given a set of pre-defined events
Automated theorem proving (Fujita et al. 1993)
existence vs. non-existence of quasigroups with intricate mathematical properties
157
Example: Scheduling of Drug Experiment
Example: Scheduling of Drug Experiment
Given 5 different drugs, test the effects of the
different medications on 5 different subjects over
different days of the week.
Use constraint:
No two people get same brand on the same day
(eliminate bias for day of the week).
158
Quasigroup Completion Quasigroup Completion S
UB
JEC
T
DAY
Mon. Tues. Wed. Thurs. Fri.
Tim
Sue
Frank
Teresa
Todd
Tylenol Aleve Bayer ExhedrinExhedrin Advil
Aleve Bayer Exhedrin Advil Tylenol
Bayer Exhedrin AdvilAdvil Tylenol Aleve
Exhedrin AdvilAdvil Tylenol Aleve Bayer
Advil Tylenol Aleve Bayer Exhedrin
(*) Pre-assigned(*) Pre-assigned
159
QCP has a natural formulation as a Constraint
Satisfaction Problem
variable for each NxN entry
constraints capture row/column requirement
variable assignments capture pre-assigned values
160
How does the difficulty of
QCP vary with the fraction
of pre-assignment?
161
Fraction of pre-assignment
Med
ian
num
ber
of
back
track
s (l
og
)
Overconstrained areaUnderconstrained
area
Critically constrained area
162
Complexity Graph shows (up to order 20):
curve peaks around 42% of pre-assignment ---
critically constrained area.critically constrained area.
under-constrainedunder-constrained and over-over-constrainedconstrained areas are easier.
163
Directly related to the peak in
computational difficulty is the so-
called phase transition graph for
the QCP problem.
164
Fraction of pre-assignment
Fract
ion o
f U
nso
lved
case
s
Almost all unsolvable area
Almost all solvable area
Phase transition area
165
Phase TransitionPhase Transition
QCP Phase Transition --- threshold phenomenonthreshold phenomenon from almost all solvable to almost all from almost all solvable to almost all unsolvableunsolvable --- occurs around 42% of preassignment.
It’s called a phase transition because of the close
relation to state transition phenomena studied in
physics, such as the melting of a solid into a
liquid.
166
Exploiting StructureExploiting Structure
167
Forward Checking Arc Consistency on binary constraints
Exploiting Structure in QCP
168
Arc Consistency on Binary Constraints
Further Exploiting Structure in QCP
Shaw, Stergiou and Walsh - ECAI98
General Arc Consistency on all different
constraints
169
Enforcing General Arc Consistency on All Different Constraints
Enforcing General Arc Consistency on All Different Constraints
• Beautiful example of integration of AI/OR techniques for a well defined sub-problem
• Propagation uses Maximum Matching problem (particular case of Network Flow problems which have polynomial time complexity)
Regin - AAAI94
170
Further Exploiting Structure in QCPFurther Exploiting Structure in QCP
By enforcing general arc consistency on all different constraints problems up to order 50 could be solved!
Shaw, Stergiou and Walsh - ECAI98
Regin - AAAI94
171
Stochasticity in Search ProceduresStochasticity in Search ProceduresStochasticity in Search ProceduresStochasticity in Search Procedures
172
BackgroundBackground
Stochastic strategies have been very successful in the area of local search.
Limitation: inherent incomplete nature of local search methods.
We want to explore the addition of a We want to explore the addition of a stochastic element to a systematic search stochastic element to a systematic search procedure without losing completeness.procedure without losing completeness.
173
We introduce stochasticity in a
backtrack search method by randomly
breaking ties in variable and/or value
selection.
Compare with standard lexicographic
tie-breaking.
174
Randomized StrategiesRandomized Strategies
Strategy Variable sel. Value sel.
DD deterministic deterministic
DR deterministic random
RD random deterministic
RR random random
175
176
177
178
Lesson: Randomized tie-breaking can
improve performance over a purely
deterministic strategy.
Next: But we can obtain a more dramatic
advantage from randomization ...
179
Cost DistributionsCost Distributions
Key Properties:
I Erratic behavior of mean.I Erratic behavior of mean.
II Distributions have “II Distributions have “heavy tailsheavy tails”. ”.
180
Median = 1!
samplemean
number of runs
3500!
500
2000
181
1
182
75%<=30
Number backtracks Number backtracks
Pro
port
ion o
f ca
ses
Solv
ed
5%>100000
183
Heavy-Tailed DistributionsHeavy-Tailed Distributions
… … infinite variance … infinite meaninfinite variance … infinite mean
Introduced by Pareto in the 1920’s
--- “probabilistic curiosity.”
Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena.
Examples: stock-market, earth-quakes, weather,...
184
Decay of DistributionsDecay of Distributions
Standard --- Exponential Decay
e.g. Normal:
Heavy-Tailed --- Power Law Decay
e.g. Pareto-Levy:
Pr[ ] , ,X x Ce x for someC x 2 0 1
Pr[ ] ,X x Cx x 0
185Standard Distribution
(finite mean & variance)
Power Law Decay
Exponential Decay
186
Normal, Cauchy, and LevyNormal, Cauchy, and Levy
Normal - Exponential Decay
Cauchy -Power law DecayLevy -Power law Decay
187
Tail Probabilities (Standard Normal, Cauchy, Levy)
Tail Probabilities (Standard Normal, Cauchy, Levy)
c Normal Cauchy Levy0 0.5 0.5 11 0.1587 0.25 0.68272 0.0228 0.1476 0.52053 0.001347 0.1024 0.43634 0.00003167 0.078 0.3829
188
How to Check for “Heavy Tails”?How to Check for “Heavy Tails”?
Log-Log plot of tail of distribution
should be approximately linear.
Slope gives value of
infinite mean and infinite varianceinfinite mean and infinite variance
infinite varianceinfinite variance
1
1 2
189
Example of Heavy Tailed Model(Random Walk)
Example of Heavy Tailed Model(Random Walk)
Random Walk:•Start at position 0
•Toss a fair coin:
• with each head take a step up (+1)
• with each tail take a step down (-1)
X --- number of steps the random walk takes to return to position 0.
190
The record of 10,000 tosses of an ideal coin
(Feller)
Zero crossing Long periods without zero crossing
191
Random Walk
Heavy-tails vs. Non-Heavy-TailsHeavy-tails vs. Non-Heavy-Tails
Normal(2,1000000)
Normal(2,1)
O,1%>200000
50%
2
Median=2
1-F
(x)
Unso
lved f
ract
ion
X - number of steps the walk takes to return to zero (log scale)
192
466.0
319.0153.0
Number backtracks (log)
1-F
(x)
Unso
lved f
ract
ion
1 => Infinite mean
Heavy-tails in QCP Domain
193
The Log-Log plot shows a linear relation
over many orders of magnitude. This isclear evidence of heavy-tailed behavior.
194
195
196
Heavy Tailed Cost DistributionHeavy Tailed Cost Distribution
0.1
1
1 10 100 1000 10000 100000
log( Backtracks )
log
( 1
- F
(x)
)
197
The Log-Log plot shows a linear relation
over many orders of magnitude. This isclear evidence of heavy-tailed behavior.
198
By studying larger problems we discovered that not only does the heavy tail phenomenon occur at the right-hand side of the distribution, but we also observed a high frequency of data points on the left-hand side of the distribution.
Right-hand side: non-negligible fraction of very long runs
Left-hand side: non-negligible fraction of very short runs
199
70%>250000
15!
1%<=650!
Sports Scheduling
Number backtracks (log)
Cu
mula
tive D
istr
ibu
tion F
un
ctio
n
200Standard Distribution
(finite mean & variance)
Power Law Decay
Exponential Decay
Also, heavy tails on left. (High probability of very short runs.)
201
Consequence for algorithm design:
Use rapid restarts or parallel / inter-leaved runs
Super linear speedups!!!
202
X XX XX
solved10 101010 10
Sequential: 50 +1 = 51 seconds
Parallel: 10 machines --- 1 second 51 x speedup
Super-linear Speedups
Interleaved (1 machine): 10 x 1 = 10 seconds 5 x speedup
203
Rapid Restarts work particularly well on hard computational problems because of the Heavy Tailed Phenomena in the run time distribution.
RAPID RANDOMIZED RESTARTS strategy avoids the tail on the right and exploits the short runs on the left.
Restarts provably eliminate heavy tails (Gomes, Selman & Crato )
204
Sketch of proof of elimination of heavy tails
Sketch of proof of elimination of heavy tails
Let’s truncate the search procedure after m backtracks.
Probability of solving problem with truncated version:
Run the truncated procedure and restart it repeatedly.
pm X m Pr[ ]
X numberof backtracks to solve the problem
205
Y total number backtracks with restarts
F Y y pmY m
c e c y
Pr[ ] ( )
/1
12
Number of starts Y m Geometric pmRe / ~ ( )
Y - does not have Heavy Tails
206
RestartsRestarts
70%unsolved
250~ 62.5 restarts
1-F
(x)
Unso
lved f
ract
ion
Number backtracks (log)
207
Example of Rapid Restart Speedup(planning)
Example of Rapid Restart Speedup(planning)
1000
10000
100000
1000000
1 10 100 1000 10000 100000 1000000
log( cutoff )
log
( b
ackt
rack
s )
20
2000 ~100 restarts
Cutoff (log)
Num
ber
back
track
s (l
og)
~10 restarts
100000
208
Deterministic
Logistics Planning 108 mins. 95 sec.
Scheduling 14 411 sec 250 sec
(*) not found after 2 days
Scheduling 16 ---(*) 1.4 hours
Scheduling 18 ---(*) ~18 hrs
Circuit Synthesis 1 ---(*) 165sec.Circuit Synthesis 2 ---(*) 17min.
Summary Results
R3
209
Our results provide the first indication of heavy-tailed distri-butions in a computational model.
Overall insight:Overall insight: Randomized tie-breaking with rapid restarts gives powerful search strategy.
210
Heavy-Tailed Distributionsin Other Domains
Heavy-Tailed Distributionsin Other Domains
Quasigroup Completion Problem
Graph Coloring
Logistic Planning
Circuit Synthesis
Gomes, Selman, and Crato 1997;
Gomes, Selman, McAloon, and Tretkoff 1998;
Gomes,Kautz, and Selman 1998;
211
Deterministic
Logistics Planning 108 mins. 95 sec.
Scheduling 14 411 sec 250 sec
(*) not found after 2 days
Scheduling 16 ---(*) 1.4 hours
Scheduling 18 ---(*) ~18 hrs
Circuit Synthesis 1 ---(*) 165sec.Circuit Synthesis 2 ---(*) 17min.
Summary Results
R3
212
Rapid Restart SpeedupRapid Restart Speedup
1000
10000
100000
1000000
1 10 100 1000 10000 100000 1000000
log( cutoff )
log
( b
ackt
rack
s )
213
Our results provide the first indication of heavy-tailed distri-butions in a computational model.
Overall insight:Overall insight:
Randomized tie-breaking with
rapid restarts gives powerful
search strategy.
214
Heavy-Tailed Distributionsin Other Domains
Heavy-Tailed Distributionsin Other Domains
Quasigroup Completion Problem
Graph Coloring
Logistic Planning
Circuit Synthesis
Gomes, Selman, and Crato 1997 - Proc. CP97;
Gomes, Selman, McAloon, and Tretkoff 1998 - Proc AIPS98;
Gomes,Kautz, and Selman 1998 - Proc. AAAI98.
215
Algorithm Portfolio DesignAlgorithm Portfolio Design
Gomes and Selman 1997 - Proc. UAI-97;
Gomes, Selman, and Crato 1997 - Proc. CP97.
216
MotivationMotivation
The runtime and performance of randomized algorithms can vary dramatically on the same instance and on different instances.
Goal: Improve the performance of different algorithms by combining them into a portfolio to exploit their relative strengths.
217
Branch & Bound:Best Bound vs. Depth First Search
Branch & Bound:Best Bound vs. Depth First Search
218
Branch & Bound(Randomized)
Branch & Bound(Randomized)
Standard OR approach for solving Mixed Integer Programs (MIPs)
• Solve linear relaxation of MIP
• Branch on the integer variables for which the solution of the LP relaxation is non-integer:
apply a good heuristic (e.g., max infeasibility) for variable selection ( + randomization ) and create two new nodes (floor and ceiling of the fractional value)
• Once we have found an integer solution, its objective value can be used to prune other nodes, whose relaxations have worse values
219
Branch & BoundDepth First vs. Best bound
Branch & BoundDepth First vs. Best bound
Critical in performance of Branch & Bound:
the way in which the next node to be expanded is selected.
Best-bound - select the node with the best LP bound
(standard OR approach) ---> this case is equivalent to A*, the LP relaxation provides an admissible search heuristic
Depth-first - often quickly reaches an integer solution
(may take longer to produce an overall optimal value)
220
Portfolio of AlgorithmsPortfolio of Algorithms
A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors.
Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost
“risk” (variance)
Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk.
221
Depth-first vs. Best-bound(logistics planning)
Depth-first vs. Best-bound(logistics planning)
Number of nodes
Cu
mula
tive F
requ
en
cies
Depth-First
~50%
Best-Bound
~30%
222
Depth-First and Best and Bound do not dominate each other overall.
223
Heavy-tailed behavior of Depth-firstHeavy-tailed behavior of Depth-first
224
Portfolio for heavy-tailed search procedures (2 processors)
Portfolio for heavy-tailed search procedures (2 processors)
0 DF / 2 BB
2 DF / 0 BB
Standard deviation of run time of portfolios
Expect
ed r
un t
ime o
f p
ort
folio
s
225
Portfolio for heavy-tailed search procedures (6 processors)
Portfolio for heavy-tailed search procedures (6 processors)
0 DF / 6 BB
6 DF / 0BB
Standard deviation of run time of portfolios
Expect
ed r
un t
ime o
f p
ort
folio
s
5 DF / 1BB
3 DF / 3 BB
4 DF / 2 BB
Efficient set
226
Portfolio for heavy-tailed search procedures (20 processors)
Portfolio for heavy-tailed search procedures (20 processors)
0 DF / 20 BB
20 DF / 0 BB
Standard deviation of run time of portfolios
Expect
ed r
un t
ime o
f p
ort
folio
s
227
Portfolio for heavy-tailed search procedures (2-20 processors)
Portfolio for heavy-tailed search procedures (2-20 processors)
228
A portfolio approach can lead to substantial improvements in the expected cost and risk of stochastic algorithms, especially in the presence of heavy-tailed phenomena.
229
Summary of RandomizationSummary of Randomization
Considered randomized backtrack search.
Showed Heavy-Tailed Distributions.
Suggests: Rapid Restart Strategy.
--- cuts very long runs
--- exploits ultra-short runs
Experimentally validated on previously unsolved planning and scheduling problems.
Portfolio of Algorithms for cases where no single heuristic dominates
230
Summary of RandomizationSummary of Randomization
Considered randomized backtrack search.
Showed Heavy-Tailed Distributions.
Suggests: Rapid Restart Strategy.
--- exploits ultra-short runs
--- cuts very long runs
Experimentally validated on previously unsolved
planning and scheduling problems.
Portfolio of Algorithms for cases where no single heuristic dominates
231
IV. CONCLUSIONSIV. CONCLUSIONS
232
Important Themes in ORImportant Themes in OR
Linear Programming (Mixed) Integer Programming
Exploit Structure e.g., Network Flow Problems
Duality very elegant theory in LP
sensitivity analysis
233
Opportunities for Integration of AI/OR
Opportunities for Integration of AI/OR
OR methods:Have focused on tractable representations (LP)
Have demonstrated the ability to identify optimal and locally optimal solutions
LIMITATION: Restricted to rigid models with limited expressive power
AI methods:Richer and more flexible representations,supporting constraint-based reasoning
mechanisms as well as mixed initiative frameworks, allowing the human expertise to be in the loop.
LIMITATION: Rich representations in general lead to intractable problems
CHALLENGE: good representations / fast & good solutions
,
234
Opportunities for Integration of AI/OR
Opportunities for Integration of AI/OR
AI methods are becoming competitive
AI methods used to be considered not suitable for realworld scheduling problems. Recent developments have shown they can be competitive. Examples:
SAP, Peoplesoft, I2, … -> provide solutions for scheduling combining constraint programming and mathematical programming approaches.
ILOG (CP language) has several fielded applications in different scheduling areas; ILOG has integrated a CSP solver with CPLEX.
OR people have acknowledge the benefits of combining OR and AI methods
235
Opportunities for Integration of AI/OR
Opportunities for Integration of AI/OR
Exploiting Duality in CSP frameworks
Exploiting Randomization
Hybrid Solvers
236
Opportunities for Integration of AI/OR
Opportunities for Integration of AI/OR
Hybrid Solvers - emerging area of research (CSP+OR); it started with CLP(R), Prolog III and CHIP; ILOG integrates a CSP solver with CPLEX
local constraint propagation - local consistency algs
global constraint propagation - LP relaxations
Only a hybrid approach could prove optimality, e.g.:
Hoist scheduling (Rodosek & Wallace 1998)
Multicommodity integer network flow problem (Dutch Railways) (McAloon, Tretkoff, Wetzel 1998)
237
Updated version of tutorial slides
www.cs.cornell.edu/gomes/
TalksDemos
Updated version of tutorial slides
www.cs.cornell.edu/gomes/
TalksDemos
238
AppendixAppendix
239
Portfolio of AlgorithmsPortfolio of Algorithms
A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors.
A portfolio has an expected computational cost and a standard deviation, a measure of the dispersion of the computational cost.
The standard deviation of the portfolio is a measure of the risk inherent to the portfolio.
240
Portfolio of AlgorithmsPortfolio of Algorithms
Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost
“risk” (variance)
Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk.
241
AppendixPortfolio of Algorithms
AppendixPortfolio of Algorithms
Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost;
risk;
Efficient Set or Efficient Frontier - set of portfolios that are the best in terms of expected value and risk.
Within the efficient set, in order to minimize the risk, one has to deteriorate the expected value or, in order to improve the expected value, one has to increase the risk.
242
Appendix Portfolio of Two Algorithms
Appendix Portfolio of Two Algorithms
Let us consider the random variables:
A1 - the number of backtracks that algorithm 1 takes to find a solution or prove that a solution doesn’t exist;
A2 - the number of backtracks that algorithm 2 takes to find a solution or prove that a solution doesn’t exist;
243
Appendix Portfolio of Two Algorithms
Appendix Portfolio of Two Algorithms
Let us consider that we have N processors and we design a portfolio using n1 processors with algorithm 1 and n2 processors with algorithm2 (N = n1 + n2).
Let us consider the random variable:
X - the number of backtracks that the portfolio takes to find a solution or prove that a solution doesn’t exist;
244
AppendixPortfolio of Two Algorithms
AppendixPortfolio of Two Algorithms
Given N processors, and
P[X x]N
ii 1
NP[A1 x]i P[A1 x]
(N i)
n N1 n2 0
245
Appendix Portfolio of Algorithms
Appendix Portfolio of Algorithms
Given N processors, such that and n1n N n2 1 ,
0 1 n N
P[X x]n1
i'i' 0
n1P[A1 x]i
'P[A1 x]
(n1 i' )
i
N
1
n2
i' 'P[A2 x]i
' 'P[A2 x]
(n2 i' ' )
i i i' ' ' and the term in the summation is 0 when 2'',0'' nii
246
Preliminary Research on Structure of Search Spaces
Preliminary Research on Structure of Search Spaces
247
Fringe of Search TreeFringe of Search Tree
248
Fractal DimensionFractal Dimension
249
Fractal DimensionFractal Dimension
When plotting the length of a curve as a function of the measuring tool on a log-log plot, one obtains a linear relationship:
L - the measured length;
s - length of the yardstick;
c and d are constants;
Mandelbrot introduced the fractal dimension D = d +1;
A straight line has D = 1.0;
The coast of Britain has fractal dimension 1.22;
The higher D the more fractal the curve is.
dscL )/1(
250
Heavy-Tailed Behavior vs Non-heavy-tailed behaviorHeavy-Tailed Behavior vs
Non-heavy-tailed behavior