approximability of combinatorial optimization problems with submodular cost functions

Approximability of Combinatorial Problems with Multi-agents Submodular Cost Functions

Pushkar TripathiGeorgia Institute of TechnologyApproximability of Combinatorial Optimization Problems with Submodular Cost FunctionsBased on joint work with Gagan Goel, Chinmay Karande, and Wang Lei1

MotivationNetwork Design Problem

Objective: Find minimum spanning tree that can be built collaboratively by these agentsfhg

As a motivational example consider the following network design problem, we have an undirected graph G as shown here. There are 3 agents who have cost functions over subsets of edges. They agents wish to collectively build a MST for the graph2Additive Cost FunctionFunctions which capture economies of scalecost(a) = 1cost(b) = 1cost(a,b) = 2cost(a) = 1cost(b) = 1cost(a,b) = 1.5How to mathematically model these functions? - We use Submodular Functions as a starting point.

Can one design efficient approximation algorithms under Submodular Cost Functions?This problem is easy to solve with linear cost functions, however we wish to model a setting that captures the economy of scale. We will use submodular functions for this purpose as a starting point. The question we seek to answer is that can we design efficient approximation algorithms in this paradigm.3Assumptions over cost functionsNormalized:Monotone:

Decreasing Marginal: Submodularity

++Submodular FunctionsHere are some natural properties that the cost functions must satisfy--------Normailzedcost of the empty set must be zero--------Monotone------And they must satisfy the following inequalty..in simple words this inequality means, that if we have sets S and T such that S is contained in T then the incremental cost of adding an element to S is more than that of adding the same element to T. This captures the economy of scale which is observed in most real world applications. It is often easier to extend a large set of items than to extend a smaller set.

Such functions are called submodular funct. And have been the area of intense research over the last 5 years, due to their economic importance and interesting structure.4General FrameworkGround set X and collection C 2X C: set of all tours, set of all spanning trees

k agents, each specifies fi:2X R+ fi is submodular and monotone

Find S1, , Sksuch that: [ Si 2 C i fi(Si) is minimizedORACLESf(S)Here is the general framework for our problems5Our ResultsLower Bounds : Information theoretic Upper Bounds : Rounding of configurational LPs, Approximating sumdodular functions and GreedyProblemUpper BoundLower BoundUpper BoundLower BoundVertex Cover 22 - 2. log n(log n)Shortest PathO(n2/3 )(n2/3)O(n2/3 )(n2/3)Spanning Treen(n)n(n)Perfect Matchingn(n)n(n)Multiple AgentsSingle AgentNote that each of these three problems is solvable in poly time for linear cost fuction but becomes extremely hard for submodular functions whereas vertex cover which is np hard is somewhat easier in this setting. Our lower bounds are .6Selected Related Work [Grtschel, Lovsz, Schrijver 81] Minimizing non-monotone submodular function is poly-time [Feige, Mirrokni, Vondrak 07] Maximizing non-monotone function is hard. 2/5-Approximation Algorithm. [Calinescu, Chekuri, Pal, Vondrak 08] Maximizing monotone function subject to Matroid constraint: 1-1/e Approximation.

[Svitkina, Fleischer 09] Upper and lower bounds for Submodular load balancing, Sparsest Cut, Balanced Cut [Iwata, Nagano 09] Bounds for Submodular Vertex Cover, Set Cover [Chekuri, Ene 10] Bounds for Submodular Multiway Partition In this talkSubmodular Shortest Path with single agent O(n 2/3) approximation algorithm Matching hardness of approximationIn this talkSubmodular Shortest Path with single agent O(n 2/3) approximation algorithm Matching hardness of approximationSubmodular Shortest PathstGiven: Graph G, Two nodes s and t f : 2E R+ Submodular, MonotoneGoal: Find path P s.t. f(P) is minimized G=(V,E)|V| =n , |E| =mtAttempt 1: Approximate by Additive functionLet we = f({e})Idea : we OPT we

s2. Pruning: Remove edges costlier than e*

1. Guess e* = argmax{we| e 2OPT }

3. Search: Find the shortest length s-t path in the residual graphALG diameter(G).we* diameter(G).OPTe 2 OPTe 2 OPTAs the first attempt let us try to approximate f by a linear cost function. We will use w_e to denote the cost of a edge e. The idea is that simply guessing the costliest edge in the optimal path gives a good enough approximation if the optimal path isnt too long. So suppose e* is the costliest edge in the optimal path. Let us remove all edges that are costlier than e* from the graph and find s-t path with the fewest edges. This can be as long as the diameter of the residual graph.

Thus we have the following chain of inequalities showing that if the diameter of the residual graph is small then we have a good approximation to the optimal solution11Attempt 2: Ellipsoid Approximation Johns theorem : For every polytope P, there exists an ellipsoid contained in it that can be scaled by a factor of O(n) to contain P [GHIM 09]: If the convex body is a polymatroid , then there is a poly-time algorithm to compute the ellipse. PHere is another attempt towards approximating f i.e. through ellipsoidal approximation. Johns theorem states that 12Attempt 2: Ellipsoid Approximation P[GHIM 09]: If the convex body is a polymatroid , then there is a poly-time algorithm to compute the ellipse. S: e 2 S x(e) f(S)e: x(e) 0f: Submodular, monotone PolymatroidApproximating Submodular FunctionsXf : Monotone submodular functiong(S) = dee 2 Sg(S) f(S) n g(S)d1d2d6d4d5d3Polynomial time|X| = n14Attempt 2: Ellipsoid Approximation

f: 2E R+ Submodular, Monotone STEP 1: STEP 2: Min g(S)s.t. S 2 PATH(s,t) * Minimizing over g(S) is equivalent to minimizing just the additive part [GHIM 09]Analysis: f(P) g(P) g(O) f(O) g(S): = de P: Optimum path under gO: Optimum path under f{de}

EEEWe begin by converting our given cost function to simpler form by using the ellipsoidal approximation. We then find the shortest s-t path wrt this simplified cost function. Note that this is easy to do since for the purpose of the algorithm we can ignore the sqrt sign.

As for the analysis.let By the following chain of inequalities we have.15Recap. Approximating by linear functions : Works for graphs with small diameter Approximating by ellipsoid functions : Works for sparse graphsn/2n/2Dense Graph with large diameter

We have a technique that works well.And another one that works well.However none of these would work well for dense graphs with large diameter.Next I will show how to adaptively combine these techniques to get an algorithm that works well for all graphs.16Algorithm for Shortest PathSTEP 1: Pruning - Guess edge e* = argmax {we | e OPT path} - Remove edges costlier than we*

The algorithm begins in similar fashion as our first attempt. We start by guessing the most expensive edge in the optimal path and prune away all edges costlier that it.17Algorithm for Shortest PathSTEP 1: Pruning - Guess edge e* = argmax {we | e OPT path} - Remove edges costlier than we*STEP 2 : Contraction - if v , s.t. degree(v) > n1/3, contract neighborhood of v - repeat

The aim of the second phase is to identify the dense components of the graph. We do this by iteratively contracting the neighborhoods of all vertices with large degree.18ststDense connected componentSo this.turns in to this after the contraction phase19Algorithm for Shortest PathSTEP 1: Pruning - Let we = f({e}) - Guess edge e* = argmax {we | e OPT path} - Remove edges costlier than we*STEP 2 : Contraction - if v , s.t. degree(v) < n1/3, contract neighborhood of v - repeatSTEP 3 : Ellipsoid Approximation - Calculate ellipsoidal approximation (d,g) for the residual graph

In the third phase we use the ellipsoidal approximation to approximate the submodular function f on the residual graph by a simpler function g.20Algorithm for Shortest PathSTEP 1: Pruning - Let we = f({e}) - Guess edge e* = argmax {we | e OPT path} - Remove edges costlier than we*STEP 2 : Contraction - if v , s.t. degree(v) < n1/3, contract neighborhood of v - repeatSTEP 3 : Ellipsoid Approximation - Calculate ellipsoidal approximation (d,g) for the residual graphSTEP 4 : Search - Find shortest s-t path according to g.

Then we do the search step where we use standard techniques to find the shortest s-t path respect to the function g in the residual graph.21stSo this is what we have got at the end of the search phase.an s-t path in the residual graph.This path may pass through some of these contracted vertices which are not present in the original graph. Now all that is left is to use this path to find a path in the original graph.22Algorithm for Shortest PathSTEP 1: Pruning - Let we = f({e}) - Guess edge e* = argmax {we | e OPT path} - Remove edges costlier than we*STEP 2 : Contraction - if v , s.t. degree(v) < n1/3, contract neighborhood of v - repeatSTEP 3 : Ellipsoid Approximation - Calculate ellipsoidal approximation (d,g) for the residual graphSTEP 4 : Search - Find shortest s-t path according to g.STEP 5 : Reconstruction - Replace the path through each contracted vertex with one having the fewest edges.

We call the last phase reconstruction where we undo the contraction step and replace the path through each contracted component by the path with the having the fewest number of edges23stPath having fewest edgesAnalysisst P1

P2RLet us proceed to the analysis.Let R be the edges in the residual graph after the contraction stepLet us divide the solution returned by our algorithm in to two parts. Let P1 be the parts of the solution that lie in R and let P2 be those edges that belong to the dense components25Bounding the cost of P1st P1

P2Has at most n4/3 edgesR f(P1) E(R) .g(P1) E(R).g(OPT) E(R) .f(OPT) n2/3 f(OPT) We will bound the cost of P1 and P2 wrt OPT. The first thing to note is that since the degree of each vertex in R is at most n^1/3 the total number of edges in R is n^4/3.

So we have the following chain of inequalities.the first and third again follow from the ellipsoidal appx and the second one is because we chose the shortest path wrt g in the search phase.. Finally we get the last line since R has at most n^4/3.26Bounding the cost of P2stDiam(Gi) |Gi|/n1/3f(P2) (dia(G1) +.. +dia(Gk) ) we* (|G1| / n1/3 + . ) we* (n / n1/3) we* n2/3 f(OPT)G1G2G3To bound the cost of P2 we note that every vertex in the dense region has degree larger than n^1/3 thus the diameter of any of the dense regions is bounded by as shown.

Since we pick the path with the fewest edges through each of the dense regions, in the worst case we will use as many edges as the diameter while bridging across a dense region.

Summing over all dense regions we see that here too the cost of segment P2 is at most O(n^2/3) times the optimal solution27In this talkSubmodular Shortest Path with single agent O(n 2/3) approximation algorithm Matching hardness of approximationInformation Theoretic Lower BoundPolynomial number of queries to the oracleAlgorithm is allowed unbounded amount of time to process the results of the queriesNot contingent on P vs NP

fS1f(S1)S2f(S2)S3f(S2)Let me define what we mean by information theoretic lower bounds.Suppose we are given a function f in the form of a value oracle. We are allowed to make polynomially many queries to f. The algorithm is allowed unbounded amount of time to process the results of these queries. If suppose we can show that even if given infinite amount of time that any algorithm can not get the optimal solution using only polynomially many queries then the problem is said to have an information theoretic lower bound..

Intuitively what it means is that, computation is not the bottle neck, it is the lack of information that is preventing us from getting the optimal solution.

Note that such a bound is not contingent on P vs NPsince we allow for unbounded computation time..it is much stronger than that.29General TechniqueCost functions f , g satisfyingOPT( f ) >> OPT( g )f (S) = g(S) for most sets S

A any randomized algorithm

f(Q ) = g( Q ) with high probability for every query Q made by A. Probability over random bits in A.

Now how do we prove something as strong as that.Let us assume that we have two functions f and g which satisfy these two properties.The first property says that the optimal value of g is much larger than the optimal value of f. Remember the pneumonic g for the greater value

The second property demands that f and g agree on most sets SLet A be any randomized algo.such that f(Q) = g(Q) with high probabilty for every query Q made by A. Here prob is calculated over random bits in the algorithm. In this talk high prob would mean exponentially low prob.

Also note that this must be true for every arbitrary algorithm A..not just a particular algorithm30Yaos Lemmaf(Q) = g(Q) with high probability for every query Q made by randomized algorithm A.

f and a distribution D from which we choose g, such that for an arbitrary query Q , f(Q) = g(Q) with high probability

Now we use yaos lemma to convert a statement over all randmized algorithms over deterministic input, to one over deterministic algorithms

Hence the previous requirement that f and g agree with high prob for every query made by an arbit randomized algo A, is equivalent to the following statement.

If g is a fixed function and f is chosen from a distribution D, such that for any arbitrary query f and g agree whp.

Before we jump in to the analysis for the lower bound let us do a warm up by deriving a lower bound while ignoring the combinatorial structure.

31Non-combinatorial SettingX : Ground set

f(S) = min{ |S|, }D : R X, |R| = gR(S) = min{| S Rc| + min( S R, ) }So in order to show a lower bound we need to define a function f and a distribution from which we choose function g. The function f is defined as follows. We choose g from the following distribution, pick a random subset R of size alpha from X, and define the function g_R as shown here.

Here the constants alpha and beta will be determined later The next step is to show that these two functions are hard to distinguish. To do so we derive properties of the optimal query that has the largest probability of distinguishing f and g.in particular we will show that the optimal query has to be of size alpha32Optimal Query Claim : Optimal query has size Case 1 : |Q| <

Probability can only increase if we increase |Q|

Consider a query Q of size less than alpha. So if f and g are to be different for Q, g(Q) has to be strictly less than f(Q). Substituting the values of f and g we find that the probability of this event turns out to be.

Observe that this prob. Can only increase if we increase size of Q, thus this was not an optimal query33Case 2 : |Q| >

Probability can only increase if we decrease |Q|

Optimal query size to distinguish f and gR is Now consider the case when 34Distinguishing f and gR

= (1+ ) E[|Q R|]f and g are hard to distinguishChernoff BoundsFinally let us calculate the prob. To distinguish f and g using a query of size alpha. This prob is equal to the prob. of Q \cap R being greater than beta

Now if we set beta to be slightly greater than the expected value of Q \cap R and we can ensure that E[] > is super logarithmic, we see that the prob. Of distinguishing f and g becomes negligible by chernoff bounds. Here is an interesting corollary of what we just showed.35Hardness of learning submodular functionsSet = n1/2log nOptimal query size = = n1/2log n|R| = = n1/2log nE[ Q R] = log2n = (1+) E[ Q R] = (1+)log2n

Super logarithmicCorollary : Hard to learn a submodular function to a factor better than n1/2/log n in polynomial value queries.f and g are indistinguishablef(R) = min{ |R|, } = |R| = = n1/2 log ngR(R) = min{| R Rc| + min( R R, ) } = = log2 nSo the take away message here is that we have two functions f and g_R that are hard to distinguish and have vastly different values for the set R. Here R serves as a proxy for the optimal solution for our problem, thus g has a smaller optimal solution than f.36Randomly chosen set may not be a feasible solution in the combinatorial setting.

Eg. Randomly chosen set of edges rarely yield a s-t path.Difficulty in Combinatorial SettingSolution : Do not choose R randomly from the entire domain X.Use a subset of R as a proxy for the solution.

Rarely yield a s-t path unless the graph is very densebut as we saw, dense graphs have small diameters for which case we have a good algorithm.37Base Graph G...stn2/3 levelsn1/3 verticesFunctions f and g.stYB.f(S) = f( S B ) & g(S) = g( S B )Let us partition the edges in to two sets. Let Y be the edges in the odd levels and let B be the edges in the even levelsNow the functions f and g would be defined such that they ignore the edges in set Y, i.e..39Functions f and g.stYB.f(S) = min( |S B|, )Set RSet alphaSet beta40Functions f and gYB.st.gR(S) = min{| S R B| + min( S R B, )}Uniform random subset of B of size Solution : Do not choose R randomly from the entire domain X.Use a subset of R as a proxy for the solution.

Set RSet alphaSet beta41Functions f and g.stYB.gR(S) = min{| S R B| + min( S R B, ) Solution : Do not choose R randomly from the entire domain X.Use a subset of R as a proxy for the solution.

R = n2/3 log2 nWe will use the second statement as a guiding principle to determine the size of R.We want a subset of R to be a proxy for the optimal solution under function g. thus if we set R to be n^{2/3} log^2 n then by coupon collector arguments we have an edge from R landing in each of the even levels..these blue bins..

Using these edges we can build a s-t path having small cost.. beta42Setting the constantsSet = n2/3 log2 nOptimal Query size = = n2/3 log2 n = log2 n

f and g are indistinguishablef(OPT) = min{ |R|, } = |R| = = O(n2/3 log2 n)gR(OPT) = min{| R Rc| + min( R R, ) } = = log2 nTheorem : Submodular Shortest Path problem is hard to approximate to a factor better than O(n2/3)ProblemUpper BoundLower BoundUpper BoundLower BoundVertex Cover 22 - 2. log n(log n)Shortest PathO(n2/3 )(n2/3)O(n2/3 )(n2/3)Spanning Treen(n)n(n)Perfect Matchingn(n)n(n)Multiple AgentsSingle Agentn: # of vertices in graph GWhats the right model to study economies of scale? SummaryTo summarize here are our results once again. Even though the results here are tight, the lower bounds are polynomially large and raise the question about the right model to study economies of scale.44Newer ModelsDiscount Models

fhgE RE RE RWe also considered the discounted price model that is a spl case of the above model that deals with succinctly representable submodular functions45Task: Minimize sum of paymentsCostPaymentf(a) + f(b) + f(c) .Sub modularfunctionsApproximability under Discounted Costs[GTW 09] ProblemLower BoundUpper boundEdge CoverO(log n)Spanning TreeO(log n)Shortest PathnMinimum Perfect Matching O(poly log n)nO(log n)O(poly log n)O(log n)Shortest Path : O(logc n) hardnessSet Cover InstanceUSstAgents - Cost of every edge is 1

11Claim : Set cover of size |S| Shortest path of length |S|We start with an instance of set cover and generate an instance of the submodular shortest path problem. We have the sets on the left and the elements of the universe on the right.

The instance of discounted shortest path that is generated as follows. We have an agent corresponding to every set in the our collection. . The discount function for the agnets is as shown. So if an agent buys a single edge then he can buy all the other edges for no costWe use parallel edges between successive elements to represent elements of our universe.48Hardness Gap AmplificationststOriginal InstanceHarder InstanceReplace each edge by a copy of the original graph.

Edges of the same color get the same copy.

Edges of different colors gets copies with new colors(agents)Claim : The new instance has a solution of cost 2 iff the original instance has a solution of cost .

For any fixed constant c iterate this construction c times to further amplify the lower bound to O(logcn).

Q.E.FAnd I am done.51Why is it so hard to distinguish f and g ?Observation: fR(S ) is at most g(S ) for any set S.Case 1: Small size queries - |Q | n

This probability can only increase if we increase |Q |

So why is it so hard to distinguish f and gLet us begin by the following observation.f is always at most g for any set set QNow two cases may arise based on the size of QConsider the case when the size of the query set is less than n.

By our observation the prob of diff f & g by some query Q is same as the prob that f is less than g. We get the second eqn by substituting for g and using the knowledge that |Q| < nThe third one follows by substituting for fRearranging the terms brings us to the final eqn.

Thus the prob of diff would have only inc. had we chosen a larger set..52Case 2: Large size queries - |Q | n

This probability can only increase if we decrease |Q |

Now let us consider the case when the query Q is larger than nOnce again f and g can be distinguished if we can get a query for which f has a value lower than gThe second eqn follows by substituting for g and again using the fact that the size of the query set is more than nSubstituting for f gives the final eqn

Note that here the prob can increase if we increase the size of the query set..

So we conclude that the optimal query size at which the prob of differentiating f and g is maximized is n53Combinatorial OptimizationC - Ground setf - Valuation function over subsets of CX - Collection of some subsets C having a special propertyTask - Find the set in X that has minimum cost under a given valuation function.

Now that we are at it, let us define formally what we mean by combinatorial optimization. Here we are given a ground set C. We are also given a collection of some subsets of C defined by a special property. Our task is to find the element in X that has the minimum cost under the given valuation function.

For example let C be the set of edges of a graph.

And let X be the set of edges corresponding to the spanning trees of the graph. And our task is to find the MST for this graph.

In this work we consider combinatorial optimization problems where X is to be divided among agents who have possibly different sub-modular valuation functions.54General Technique cont.fS2S3S1f(S1) = g(S1)f(S2) = g(S2)f(S3) = g(S3)A cannot distinguish between f and g Output is at least OPT( g ) OPT( g ) OPT( f )Suppose that the given value oracle houses the function f, i.e its optimal value is small..but when the algorithm queries the oracle the results that it gets are the same as what it would have got had the oracle contained g.

Thus the algorithm has no reason to believe that the optimal value is small, it can at best return the optimal value for g

Hence owing to the lack of accurate information to distinguish f and g the algorithm falsely returns a larger optimal value..thus giving rise to an information theoretic gap.55PlanFix a cost function fFix a distribution D of functions such that for every g in D OPT(f ) >> OPT (g)For an arbitrary query Q , f(Q) = g(Q) with high probability

So here is our plan..

56Optimal size queriesQueries of size n- |Q | = n

Once again following standard manipulations we come to equation 3..Now the left hand side of this eqn is less than n only if (1+delta)n/2 is the smaller element in the inner min clauseOtherwise we would end up with |Q| on the left hand side which is also n..

Hence in order to distinguish f and g, Q must have size n.and it should intersect with R at more than elements57

approximability of combinatorial optimization problems with submodular cost functions

Documents

linear cost functions

submodular cost functionsbased

cost functionsnormalized

incremental cost

linear cost fuction

nonmonotone function

smaller set

monotone function subject