strong mixed-integer programming - mcmaster universitydeza/slidesriken2019/huchette.pdf · strong...
TRANSCRIPT
![Page 1: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/1.jpg)
Strong mixed-integer programming formulations for trained neural networks
Joey Huchette1
with Ross Anderson2, Will Ma4, Christian Tjandraatmadja2, and Juan Pablo Vielma2,3
1Rice University, Computational and Applied Mathematics2Google Research, Operations Research Team3MIT, Sloan School of Management4Columbia University, Columbia Business School
![Page 2: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/2.jpg)
Strong mixed-integer programming formulations for trained neural networks
Joey Huchette1
with Ross Anderson2, Will Ma4, Christian Tjandraatmadja2, and Juan Pablo Vielma2,3
and Yeesian Ng5 and Ondrej Sykora2
1Rice University, Computational and Applied Mathematics2Google Research, Operations Research Team3MIT, Sloan School of Management4Columbia University, Columbia Business School5MIT, Operations Research Center
![Page 3: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/3.jpg)
Strong mixed-integer programming formulations for trained neural networks
Joey Huchette1
with Ross Anderson2, Will Ma4, Christian Tjandraatmadja2, and Juan Pablo Vielma2,3
and Yeesian Ng5 and Ondrej Sykora2
and Craig Boutilier6, Martin Mladenov6, and Moonkyung Ryu6
1Rice University, Computational and Applied Mathematics2Google Research, Operations Research Team3MIT, Sloan School of Management4Columbia University, Columbia Business School5MIT, Operations Research Center6Google Research, MudCats
![Page 4: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/4.jpg)
The Problem
![Page 5: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/5.jpg)
Optimize
End goal is optimization: Make the best possible decision
Unknown
![Page 6: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/6.jpg)
Predict,
DataLearning
End goal is generalization: Given “reasonable” unseen point , want
y
Supervised learning: Learn a function using historical input/output data
x22
x21
x23
x1
x2
![Page 7: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/7.jpg)
Optimize
End goal is optimization: Make the best possible decision
Unknown
![Page 8: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/8.jpg)
Predict, then Optimize
End goal is optimization: Make the best possible decision
Complex?Combinatorial?
Nonconvex
![Page 9: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/9.jpg)
Application: Verifying robustness
Adversarial Examples: find small change in the input to change the output
Predict: Classify input image
Optimize: Prove (or disprove) robustness of model to small perturbations
Very popular research topic in ML community right now
Note that “proofs” of robustness (not just feasible solutions) are very useful!
![Page 10: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/10.jpg)
Application: Deep reinforcement learning
Predict: future costs for action a in state x:
Optimize: to pick the lowest cost action:
In a combinatorial or continuous action space, optimization can be hard! Teach a cheetah to run (without falling over)
![Page 11: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/11.jpg)
Application: Designing DNA for protein binding
Predict: probability/strength of DNA sequence binding to a given protein
Optimize: find the best/many binding sequences (potentially with side constraints)
x (n, 4) h1 (n, 6)
conv1d reducemax
h2 (1,6) h3 (1,32)
ReLU linear
prediction
A typical architecture for predicting protein binding, e.g. (Alipanahi et al 2015, Zeng et al 2016)
![Page 12: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/12.jpg)
● Goal: Minimize black-box function○ “Neural architecture search”○ Drug design
● Sequential experimentation, each sample is costly
Application: Bayesian optimization
Predict: Amount new point improves modelOptimize: Choose best point to sample● Pros of Gaussian process:
○ Bayesian calculations are straightforward● (Potential) pros of neural network:
○ Incremental training; handles discontinuities + high dims (“model-free”), input constraints
● Uncertainty models: dropout, ensembling, Thompson sampling, etc.
![Page 13: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/13.jpg)
The Solution
![Page 14: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/14.jpg)
How to “Optimize” in “Predict, then Optimize”
Heuristic/Primal● Gradient Descent
(Projected/Conditional)● Local Search
○ Over input domain○ Over neuron activity
● Cross Entropy Method● Genetic Algorithms● (Quasi-)Second Order Methods
Bounds/Dual● Interval Arithmetic● Abstract Domains ● Lagrangian Relaxation● Linear Programming● Semidefinite Programming
Exact Methods● Mixed Integer Programming● SAT/SMT
Exploit Problem-Specific Methods within Generic Solvers
![Page 15: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/15.jpg)
A (not comprehensive) Literature Sampling
● Gradient descent on continuous problems (e.g. deep dream, style transfer,
adversarial examples)
● Cross Entropy Method (importance sampling based guesses, brain robotics)
● Input Convex Neural Networks + LP (Amos 2016)
● The big-M MIP (too many authors to list all)
○ Tightening big-Ms: (Tjeng 2017, Fischetti 2018)
● CP/SAT/SMT (often from the “verifiability” community)
○ Reluplex (Katz 2017), Planet (Elhers 2017)
● Lagrangian Relaxation (Dvijotham 2018)
● Abstract Domains/Zonotopes (Singh 2019)
● Nice recent survey (Bunel 2017)
![Page 16: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/16.jpg)
Contributions and Outline
1. MIP formulations for popular neural network building blocksStronger than existing approaches (theoretically and empirically)○ Most common case: Affine function on box domain → ReLU
nonlinearity○ Other nonlinearities: max-pooling, reduce max, clipped ReLU, …○ Other domains: Product of simplices (one-hot encodings)○ Structure: Tight big-M formulation + efficiently separable cuts
2. General machineryRecipes for strong MIP formulations for the maximum of d affine functions○ One for ideal formulations, another for hereditarily sharp ones○ Efficiently separable via subgradient-method○ A series of simplifications under common setting
3. tf.opt: Software with TensorFlow-like modeling, multiple backends4. Computational results on verification and reinforcement learning problems
![Page 17: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/17.jpg)
MIP Formulations for Neural Networks
![Page 18: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/18.jpg)
MIP formulations in one slide
● A MIP formulation for some set S⊆Rn is:○ A polyhedra Q⊆Rn+r, where○ Projx({ (x,z) ∈ Q | z ∈ Zr }) = S
● What makes a MIP formulation good?○ Size: r is small, Q is “simple”○ Sharp: Projx(Q) = Conv(S)○ Hereditarily sharp: Sharp after any fixings of binary variables z○ Ideal (perfect): ext(Q) ⊆ Rn ⨉ Zr
● Ideal ⇒ sharp, so...
Ideal formulations = Best possible = Our goal
![Page 19: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/19.jpg)
Neural networks = Piecewise linear functions
● Standard ReLU-based network: , where
● Big-M formulation for single ReLU neuron
is x22
x21
x23
x12
x11
x3
● How strong is it?
![Page 20: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/20.jpg)
MIP formulation strength
● How strong is it? Not very!Can be arbitrarily bad, even in fixed input dimension
big-M formulation tightest possible formulation
● How to close the gap?
![Page 21: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/21.jpg)
An ideal formulation for ReLU neurons
Theorem (Anderson, H., Tjandraatmadja, Vielma 2018)● An ideal formulation for is
● Each inequality in (1b) is facet-defining (under very mild conditions).● Moreover, we can identify the most violated constraint in (1b) in O(n) time.
Big-M formulation = (1a), (1c), and two constraints from (1b)Idea: Start with big-M formulation, use cut callbacks to separate (1b) as-needed
![Page 22: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/22.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 1: Write down ideal “multiple choice” formulation (i.e. the “Balas” formulation):
![Page 23: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/23.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 2: Re-write constraints in “set” form:
where
![Page 24: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/24.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 3: Rewrite all logic as bounds on output y (a primal characterization):
where
![Page 25: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/25.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 4: Apply Lagrangian relaxation to aggregation constraints (a dual characterization)
End of analysis: in general, is complicated. Can separate over via subgradient method, or...
![Page 26: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/26.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 1: Write down ideal “multiple choice” formulation (i.e. the “Balas” formulation):
![Page 27: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/27.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 2: Re-write constraints in “set” form:
where
![Page 28: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/28.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 3: Relax domain constraint:
Still a valid formulation.
![Page 29: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/29.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 4: Rewrite all logic as bounds on output y (a primal characterization):
![Page 30: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/30.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 5: Replace lower bounds for a hereditarily sharp formulation:
![Page 31: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/31.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 6: Apply Lagrangian relaxation to aggregation constraints:
![Page 32: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/32.jpg)
How do we get here?
Modeling the maximum of d affine functions over shared input domain D:
Step 7: Analyze further:● Proposition If d=2, then the hereditarily sharp formulation is ideal.
● Proposition If the domain is a product of simplices, separation over hereditarily sharp formulation reduces to a transportation problem.
● Proposition If d=2 and the domain is a product of simplices, the transportation problem has a closed form solution and efficient separation.
![Page 33: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/33.jpg)
An ideal formulation for ReLU neurons
Theorem (Anderson, H., Tjandraatmadja, Vielma 2018)● An ideal formulation for is
● Each inequality in (1b) is facet-defining (under very mild conditions).● Moreover, we can identify the most violated constraint in (1b) in O(n) time.
Big-M formulation = (1a), (1c), and two constraints from (1b)Idea: Start with big-M formulation, use cut callbacks to separate (1b) as-needed
![Page 34: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/34.jpg)
Hereditarily sharp formulation for max-of-d affine functions
● Max-of-d affine functions ≣ ○ Max pooling (small d)○ Reduce max (large d)
Proposition (Anderson, H., Ma, Tjandraatmadja, Vielma 2019)● A hereditarily sharp MIP formulation for is:
.● The most violated inequality can be identified in time linear in O(dn) time.
![Page 35: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/35.jpg)
Computational Results
![Page 36: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/36.jpg)
Network 1: Small network with standard training
Network is (almost) completely dense
![Page 37: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/37.jpg)
Network 2: Small network with L1 regularization (Xiao et al. 2019)
Weight sparsity makes cuts much more effective
![Page 38: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/38.jpg)
Q-learning: Are optimal actions tractable?
MIP is very slow for early Policy Iterations
Active Index Local Search and Gradient Descent solves take ~100ms
Hybrid approach: 1. Run local search to
produce decent solutions quickly
2. Feed into MIP with small time budget
Ongoing joint work with Craig Boutilier, Martin Mladenov, and Moonkyung Ryu (Google Research)
![Page 39: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/39.jpg)
Q-learning: Are optimal actions better?
Ongoing joint work with Craig Boutilier, Martin Mladenov, and Moonkyung Ryu (Google Research)
![Page 40: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/40.jpg)
Q-learning: Are optimal actions better?
Ongoing joint work with Craig Boutilier, Martin Mladenov, and Moonkyung Ryu (Google Research)
![Page 41: Strong mixed-integer programming - McMaster Universitydeza/slidesRIKEN2019/huchette.pdf · Strong mixed-integer programming formulations for trained neural net works Joey Huchette1](https://reader033.vdocuments.us/reader033/viewer/2022052518/5f0cc2297e708231d436fc9f/html5/thumbnails/41.jpg)
Conclusion
● Strong MIP formulations for optimizing over trained neural networks● Applications abound: verification, drug design, reinforcement learning, etc.● Framework of independent interest:
Recipes for strong formulations for the max of d affine functions
● Questions going forward:○ Separation-based algorithm and implementation○ How to train the network for optimization in a principled way○ How best to formulate entire network (not just individual neurons)