design considerations for gpu-based mixed integer

8
ORNL is managed by UT-Battelle LLC for the US Department of Energy Design Considerations for GPU-based Mixed Integer Programming on Parallel Computing Platforms Kalyan Perumalla Maksudul Alam International Workshop on Parallel and Distributed Algorithms for Decision Sciences (PDADS) 2021 Int’l Conference on Parallel Processing (ICPP) 2021 August 9-12, 2021

Upload: others

Post on 11-Jun-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Considerations for GPU-based Mixed Integer

ORNL is managed by UT-Battelle LLC for the US Department of Energy

Design Considerations for GPU-based Mixed Integer Programming on Parallel Computing PlatformsKalyan Perumalla

Maksudul Alam

International Workshop on Parallel and Distributed Algorithms for Decision Sciences (PDADS) 2021

Int’l Conference on Parallel Processing (ICPP) 2021August 9-12, 2021

Page 2: Design Considerations for GPU-based Mixed Integer

22 Kalyan Perumalla and Maksudul Alam, PDADS/ICPP’21

GPUs enable much of the recent supercomputing power

Page 3: Design Considerations for GPU-based Mixed Integer

33 Kalyan Perumalla and Maksudul Alam, PDADS/ICPP’21

Mixed Integer Programming (MIP)

General Problem Formulation

ICPP Workshops ’21, August 9–12, 2021, Lemont, IL, USA K. Perumalla and M. Alam

In the next section, a brief background is presented for mixed in-teger programming and recent advancements in accelerator-based,large-scale parallel computing, along with coverage of related work.This is followed in Section 3 by an identi�cation of di�erent parallelexecution strategies for MIP solution over accelerated parallel com-puting platforms. Key linear algebra support available on the GPUplatforms is listed in Section 4 with selected packages suitable forMIP solvers. Section 5 captures important modes in which the linearalgebra support of the GPUs is utilized for parallel MIP solution.The article is concluded in Section 6.

2 BACKGROUND2.1 Mixed Integer ProgrammingEquation 1 shows the basic structure of a mixed integer program(MIP).

Maximize cT xsuch that Ax b,

where x = {xr ,xz },xr 2 R (reals), andxz 2 Z (integers).

(1)

This formulation can be transformed into an equivalent formwhere the inequality of Ax b can be replaced with equality(Ax̂ = b̂) with the introduction of variables � � 0 to capture theinequality slack. Also, upper and lower bounds, if any, on x areimplicit in Ax b.

Many parallel solvers have experienced success in solving thisformulation using a branch-and-bound approach that uses a di-vide and conquer strategy by relaxing the integrality of xz 2 Zto make x 2 R and solving the relaxed problem, which is a linearprogramming problem without integer constraints. The linear pro-gram, if integer-infeasible, can be used to partition the solutionspace (for example, by branching on appropriate partitioning ofone or more xi 2 xz ), which splits the problem into two or moresub-problems. The linear program relaxation provides an upperbound on the objective value of the corresponding integer problem.The branches create a tree of sub-problems that are accumulatedfor systematic evaluation. This basic branch-and-bound procedureis often enhanced with a range of variants such as branch-and-cutusing dynamic cut generation with various types of cuts. The issuesof branching, priming, variable value �xing, etc. have all been wellstudied in the past few decades and published in the literature.

The corresponding branch-and-bound search tree for a MIP isillustrated in Figure 1. All leaves in the tree are evaluated andtagged as feasible, infeasible or pruned. Intermediate nodes aretagged by their LP solutions and branching variables. Note thatsome leaves might be tagged as active during search. However, bythe completion of the entire search, no nodes remain tagged asactive – all of them are converted to feasible, infeasible or pruned.

A consistent snapshot of the branch-and-bound tree is de�ned asthe set of leaves that preserves the optimal solution to the problem.Two simple consistent snapshots are easy to obtain: (1) the root nodealone, and (2) the set of all leaves after the entire search. However,even during search, snapshots can be obtained that are consistent,

Figure 1: Solution tree

even though the optimal value is not yet found. In a sequentialexecution, a consistent snapshot is easy to obtain as follows. Assoon as a node is solved and its children, if any, are generated, theactive set is examined. The set of all leaf nodes from the activeset constitute a valid consistent snapshot. In a parallel/distributedexecution of branch-and-bound or branch-and-cut, a consistentsnapshot is non-trivial to obtain. The complexity in this case comesfrom the fact that all processors need to synchronize with eachother to account for (a) nodes that are being evaluated, i.e., whoseLP relaxation is being computed, etc. (b) nodes that are in transitacross processors, in the absence of a centralized active set, i.e.,when a distributed scheme for exchange of active set nodes is used.

2.2 Parallel Computing AdvancementsParallel MIP solution methods traditionally have been designedand scaled on large parallel machines that were built with conven-tional CPU architectures. In recent years, however, the increasesin computational capacity have been obtained not from increasednumber of CPU cores but from the addition of GPU-based accel-erators. High performance computing in general has come to bedominated by GPUs so much so that GPUs are predominantly beingused for scaling in much of the supercomputing world. In fact, thecore computational capacity of seven of the top ten supercomputersis powered by GPUs1. The rate of performance gains using GPUsoutpaces traditional CPUs by a wide margin. GPUs have becomethe core to modern AI-capable supercomputers. Summit, the fastestsupercomputer in the USA (as of this writing) at the Oak RidgeNational Laboratory uses NVIDIA Tesla V100 GPUs to achieve 200petaFLOPS performance. The upcoming Aurora supercomputer atthe Argonne National Laboratory is expected to have 1 exaFLOPSperformance using Intel Xe GPUs. Frontier, the successor to Summitat the Oak Ridge National Laboratory is expected to deliver 1.5 ex-aFLOPS using AMD’s Radeon Instinct GPUs. Lawrence LivermoreNational Laboratory also announced the El Capitan supercomputerto be launched in 2023 with 2 exaFLOPS using AMD’s Radeon In-stinct GPUs. This trend in the supercomputing arena provides aclear motivation for a focus on GPU-aware algorithms and packages1http://top500.org

Basic Solution Approach

Page 4: Design Considerations for GPU-based Mixed Integer

44 Kalyan Perumalla and Maksudul Alam, PDADS/ICPP’21

CPU vs. GPU-based Parallel MIP Solvers

CPU-based• Fairly mature technology

• Many open-source implementations available

• Many commercial packages available

• Extremely fast solvers based on advanced– Linear algebra– Branch-and-Cut/Price– Heuristics

• References are provided in our paper

GPU-based• Very few available to exploit the power

of latest parallel processing

• Technical solution approaches are yet to be fully unraveled

• Need to guide the field with design considerations– To evaluate different choices– To determine most promising

approach(es)

Page 5: Design Considerations for GPU-based Mixed Integer

5

Entirely GPU-based

• Entire solution tree stored in GPU memory

• Tree updated by GPU only

• Branch-and-cut algorithm performed on GPU

• Linear algebra steps performed on GPU

CPU-driven GPU

• Entire solution tree stored in main memory

• Tree updated by CPU only

• Branch-and-cut algorithm performed by CPU

• Linear algebra steps delegated to GPU

Hybrid CPU-GPU

• Solution tree split across main memory and GPU memory

• Tree updated by GPU and CPU

• Both CPU and GPU participate as peers in branch-and-cut

• CPU & GPU perform linear algebra steps

Our Identification of GPU-based Parallel Execution Strategies

Distributed Big-MIP

• Solution tree in small number of main node memories

• Tree updated by CPU only

• Branch-and-cut algorithm by lead-CPU orchestration

• Each linear algebra step on many GPUs

Optimal MIP Problem SizeMost effective GPU execution = Each tree node occupies one GPU memory

Small MIP Problem SizeSpecialized GPU execution = Multiple tree nodes fit simultaneously in each GPU

Huge MIP Problem Size

Every tree node spans many GPUs

Page 6: Design Considerations for GPU-based Mixed Integer

66 Kalyan Perumalla and Maksudul Alam, PDADS/ICPP’21

Linear Algebra Support

Software Considerations• Matrix packages on GPU

– Dense matrices: fairly mature on NVIDIA platforms

– Sparse matrices: not as mature on any platform

– Not easy to choose between dense and sparse, statically or dynamically

• GPUs are extremely efficient in dense linear algebra– Interior point methods for LP relaxation in

branch-and-cut tree may work better

Algorithmic Considerations• Sharing/reusing solutions across tree

nodes– E.g., initial vector in iterative solvers

• Incorporation of generated cuts into GPU matrix structure

• Concurrent solution of small problems

• Solving multiple nodes simultaneously on same GPU

References are provided in the paper

Page 7: Design Considerations for GPU-based Mixed Integer

77 Kalyan Perumalla and Maksudul Alam, PDADS/ICPP’21

Summary and Future Work

• GPUs dominate the current and future parallel processing

• Mixed integer programming (MIP) forms the core of several important applications

• Parallel MIP on GPU-based parallel platforms is not adequately understood

• Here, we unraveled key design considerations towards efficient execution of MIP on GPU-based parallel platforms

• Implementation of the most promising designs in actual software is needed as next step

Page 8: Design Considerations for GPU-based Mixed Integer

Thank you for your attention!

• Q&A