partial optimality of dual decomposition for map …proceedings.mlr.press/v89/bauer19b/bauer19b.pdfa...

8
Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs Alexander Bauer Shinichi Nakajima Nico Görnitz Klaus-Robert Müller TU Berlin TU Berlin AIP RIKEN TU Berlin TU Berlin Korea University MPI for Informatics Abstract Markov random fields (MRFs) are a powerful tool for modelling statistical dependencies for a set of random variables using a graphical representation. An important computational problem related to MRFs, called maximum a posteriori (MAP) inference, is finding a joint variable assignment with the maximal proba- bility. It is well known that the two popular optimisation techniques for this task, linear programming (LP) relaxation and dual decom- position (DD), have a strong connection both providing an optimal solution to the MAP problem when a corresponding LP relaxation is tight. However, less is known about their relationship in the opposite and more realis- tic case. In this paper, we explain how the fully integral assignments obtained via DD partially agree with the optimal fractional as- signments via LP relaxation when the latter is not tight. In particular, for binary pairwise MRFs the corresponding result suggests that both methods share the partial optimality property of their solutions. 1 Introduction The framework of graphical models such as Markov random fields (MRFs) [8, 15, 21] provides a powerful tool for modelling statistical dependencies for a set of random variables using a graphical representation. It is of a fundamental importance for many practical ap- plication areas including natural language processing, information retrieval, computational biology, and com- puter vision. A related computational problem, called Proceedings of the 22 nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by the author(s). maximum a posteriori (MAP) inference, is finding a joint variable assignment with the maximal probability. The vast amount of existing methods for solving the discrete MAP problem (see [7] for an overview) can be divided roughly into three dierent groups: meth- ods based on graph-cuts, methods based on message passing, and polyhedral methods. The latter group is tightly connected to a popular approach of linear pro- gramming (LP) relaxation [17, 21, 2, 5], which is based on reformulating the original combinatorial problem as an integer linear problem (ILP) and then relaxing the integrality constraints on the variables. Besides provid- ing a lower bound on the optimal value its popularity is partially 1 due to the fact that for binary pairwise MRFs with submodular energies it is guaranteed to find an optimal solution [8, 22]. Unfortunately, for bigger problems with a high number of variables and constraints it becomes impractical due to the expensive consumption in memory and computation. As an alternative to the LP relaxation, dual decom- position (DD) [12, 18, 16, 4, 13] provides an eective parallelisation framework for solving the MAP prob- lem. Furthermore, it has an appealing property that any found solution comes with a certificate of opti- mality which allows for ecient evaluation whether a corresponding variable assignment is primal optimal. Finally, it is well known that the two optimisation techniques have a strong connection both providing an optimal solution if the LP relaxation is tight. However, less is know about their relationship in the opposite and more realistic case. Instead the main focus of the existing literature is on tightening the standard LP relaxation [12, 19, 17]. In contrast, the aim of this paper is to investigate the connections between the two techniques (in the original formulation) if the LP relaxation is not tight. More precisely, we focus on the following issue. The main idea of DD is to decompose a given MRF into dierent trees (or other subgraphs) on 1 It also provides an optimal solution for any tree- structured MRF.

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Partial Optimality of Dual Decomposition

for MAP Inference in Pairwise MRFs

Alexander Bauer Shinichi Nakajima Nico Görnitz Klaus-Robert Müller

TU Berlin TU BerlinAIP RIKEN

TU Berlin TU BerlinKorea University

MPI for Informatics

Abstract

Markov random fields (MRFs) are a powerfultool for modelling statistical dependencies fora set of random variables using a graphicalrepresentation. An important computationalproblem related to MRFs, called maximum aposteriori (MAP) inference, is finding a jointvariable assignment with the maximal proba-bility. It is well known that the two popularoptimisation techniques for this task, linearprogramming (LP) relaxation and dual decom-position (DD), have a strong connection bothproviding an optimal solution to the MAPproblem when a corresponding LP relaxationis tight. However, less is known about theirrelationship in the opposite and more realis-tic case. In this paper, we explain how thefully integral assignments obtained via DDpartially agree with the optimal fractional as-signments via LP relaxation when the latteris not tight. In particular, for binary pairwiseMRFs the corresponding result suggests thatboth methods share the partial optimalityproperty of their solutions.

1 Introduction

The framework of graphical models such as Markovrandom fields (MRFs) [8, 15, 21] provides a powerfultool for modelling statistical dependencies for a set ofrandom variables using a graphical representation. Itis of a fundamental importance for many practical ap-plication areas including natural language processing,information retrieval, computational biology, and com-puter vision. A related computational problem, called

Proceedings of the 22nd International Conference on Ar-tificial Intelligence and Statistics (AISTATS) 2019, Naha,Okinawa, Japan. PMLR: Volume 89. Copyright 2019 bythe author(s).

maximum a posteriori (MAP) inference, is finding ajoint variable assignment with the maximal probability.The vast amount of existing methods for solving thediscrete MAP problem (see [7] for an overview) canbe divided roughly into three different groups: meth-ods based on graph-cuts, methods based on messagepassing, and polyhedral methods. The latter group istightly connected to a popular approach of linear pro-gramming (LP) relaxation [17, 21, 2, 5], which is basedon reformulating the original combinatorial problem asan integer linear problem (ILP) and then relaxing theintegrality constraints on the variables. Besides provid-ing a lower bound on the optimal value its popularityis partially1 due to the fact that for binary pairwiseMRFs with submodular energies it is guaranteed tofind an optimal solution [8, 22]. Unfortunately, forbigger problems with a high number of variables andconstraints it becomes impractical due to the expensiveconsumption in memory and computation.

As an alternative to the LP relaxation, dual decom-position (DD) [12, 18, 16, 4, 13] provides an effectiveparallelisation framework for solving the MAP prob-lem. Furthermore, it has an appealing property thatany found solution comes with a certificate of opti-mality which allows for efficient evaluation whether acorresponding variable assignment is primal optimal.Finally, it is well known that the two optimisationtechniques have a strong connection both providing anoptimal solution if the LP relaxation is tight. However,less is know about their relationship in the oppositeand more realistic case. Instead the main focus of theexisting literature is on tightening the standard LPrelaxation [12, 19, 17]. In contrast, the aim of thispaper is to investigate the connections between thetwo techniques (in the original formulation) if the LPrelaxation is not tight. More precisely, we focus on thefollowing issue. The main idea of DD is to decompose agiven MRF into different trees (or other subgraphs) on

1It also provides an optimal solution for any tree-structured MRF.

Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs

which inference can be performed efficiently and tryingto enforce an agreement on the overlapping variablesbetween different trees to obtain global consistency.If the LP relaxation is not tight, some trees will pro-vide inconsistent assignments which disagree on theoverlapping parts. Here we analyse the nature of thisdisagreement and get the following main results:

• given an optimal (fractional) solution of the LPrelaxation, there always exists an optimal variableassignment via DD which agrees with the integralpart of the LP solution

• for binary pairwise MRFs in a non degeneratecase2, the unambiguous part among all optimalassignments from different trees in a decompositioncoincides with the integral part of the (fractional)optimal assignment via LP relaxation.

We note that the first result holds also for non binaryMRFs with arbitrary higher order potentials. On theother hand, for binary pairwise MRFs it implies thata corresponding assignment contains the strongly per-sistent part3 of the LP solution [21, 6]. The secondresult suggests a strategy how to extract the stronglypersistent part by looking at the intersection of alloptimal assignments to the overlapping trees. In thissense, both methods LP relaxation and DD share apartial optimality property of their solutions.

Note that a corresponding fractional solution obtainedvia LP relaxation (for binary pairwise MRFs) is halfintegral. That is, every fractional node is equal to 0.5.This provides no information about the preferences ofthe fractional variables in that case preventing use ofrounding techniques. In contrast, DD always providesa fully integral assignment which inherits the stronglypersistent part of the LP relaxation making DD anappealing optimisation method.

Finally, we argue that depending on the final goalthere is a decision to make with respect to the degreeof a corresponding decomposition. Usually, the maingoal is to find an accurate (integral) assignment tothe variables in a MRF. In that case a decompositionover spanning trees is more beneficial. It significantlyspeeds up the convergence but most importantly it isstraightforward how to extract an optimal assignment.On the other hand if we want to extract the strongly

2By a non degenerate case we mean the case where theLP relaxation has a unique (fractional) solution. That is,the optimum is attained at a corner and not at an edge ora facet of a corresponding polytope.

3For binary pairwise MRFs, the integral part of an opti-mal solution of the LP relaxation is known to be stronglypersistent. That is, any optimal solution of a correspond-ing MAP problem must agree with this partial integralassignment.

persistent part, then a decomposition over edges ismore appropriate. In that case it is straightforward toextract the unambiguous part, but at the same timeit is NP-hard to construct an optimal (and globallyconsistent) assignment from the individual edges in adecomposition.

2 Notation and Background

2.1 MAP Inference as an Optimisation

Problem

For a set of n discrete variables x = {x1, ..., xn} takingvalues from a finite set S we define the energy of apairwise MRF factorising over a graph G = (V, E)according to:

E(x) =X

i2V✓i(xi) +

X

(i,j)2E

✓i,j(xi, xj), (1)

where the functions ✓i(·) : S ! R, ✓i,j(·, ·) : S ⇥ S ! Rdenote the corresponding unary and pairwise potentials,respectively. The maximum a posteriori (MAP) prob-lem, that is, computing an assignment with the highestprobability is equivalent to the problem of finding anassignment which minimises the energy.

Probably the most popular method for solving thisproblem is based on the linear programming (LP) re-laxation technique. For this purpose, the MAP prob-lem is first represented as an (equivalent) integer linearproblem (ILP):

minimiseµ2XG

✓>µ (2)

where µ corresponds to a joint variable assignment xin the standard overcomplete representation [21]. Thatis, ✓ is a vector with entries ✓i(xi) for all i 2 V, xi 2 Sand ✓i,j(xi, xj) for all (i, j) 2 E , xi, xj 2 S, and µ is abinary vector of indicator functions for nodes and edgesµi(xi), µi,j(xi, xj) 2 {0, 1}, where µi(s) = 1 , xi = sand µi,j(s1, s2) = 1 , xi = s1 ^ xj = s2. The setXG corresponds to all valid assignments of a pairwiseMRF over a graph G and has the following compactrepresentation:

XG :=

8>>>>>>>>>>>>>><

>>>>>>>>>>>>>>:

µ 2 Rd

Pxiµi(xi) = 1

8i 2 VPxiµi,j(xi, xj) = µj(xj)

8(i, j) 2 E , 8xj 2 SPxj

µi,j(xi, xj) = µi(xi)

8(i, j) 2 E , 8xi 2 Sµi(xi) 2 {0, 1}

8i 2 V, 8xi 2 Sµi,j(xi, xj) 2 {0, 1}

8(i, j) 2 E , 8xi, xj 2 S

9>>>>>>>>>>>>>>=

>>>>>>>>>>>>>>;

(3)

Alexander Bauer, Shinichi Nakajima, Nico Görnitz, Klaus-Robert Müller

A convex hull of this set, which we denote by MG :=conv XG plays a special role in the optimisation andis known as the marginal polytope of a correspondingMRF. Namely, the problem (2) is equivalent to the onewhere we replace the set XG by its convex hull MG,that is

minµ2XG

✓>µ = minµ2MG

✓>µ. (4)

Since finding an optimal solution of the above ILP orequivalently minimising its linear objective over themarginal polytope is in general intractable, we usuallyconsider the following relaxation:

minimiseµ2LG

✓>µ (5)

where we optimise over a bigger set LG ◆ MG ◆ XG

called the local consistency polytope of a MRF over agraph G, which results from relaxing the integralityconstraints µi(xi), µi,j(xi, xj) 2 {0, 1} in the definitionof XG by allowing the corresponding variables to takeall real values in the interval [0, 1]. That is,

LG :=

8>>>>>>>>>><

>>>>>>>>>>:

µ 2 Rd

Pxiµi(xi) = 1

8i 2 VPxiµi,j(xi, xj) = µj(xj)

8(i, j) 2 E , 8xjPxj

µi,j(xi, xj) = µi(xi)

8(i, j) 2 E , 8xi

µi,j(xi, xj) > 08(i, j) 2 E , 8xi, xj 2 S

9>>>>>>>>>>=

>>>>>>>>>>;

(6)

Note that the non-negativity of the unary variablesµ(xi) > 0 implicitly follows from the combination of theagreement constraints between node and edge variablesand the non-negativity of the latter.

2.2 Optimisation via Dual Decomposition

We now briefly review the DD framework for MAPinference in (pairwise) MRFs [12]. The main idea is todecompose the original intractable optimisation prob-lem (OP) in (2) over a graph G into a set of tractable in-ference problems over subtrees {Tj}m

j=1, Tj ✓ G, whichare coupled by a set of agreement constraints to en-sure the consistency. That is, each of the individualsubproblems j 2 {1, ...,m} corresponds to the MAPinference on a subtree Tj = (Vj , Ej) of the originalMRF. More precisely, we define the following OP

minimiseµ12XT1 ,...,µm2XTm ,⌫

mX

j=1

✓j>µj (7)

subject to µji (xi) = ⌫i(xi)

8j 2 {1, ...,m}, 8i 2 Vj , 8xi 2 S

where each vector µj denotes the variables of a localsubproblem with respect to Tj , and ⌫ is a set of global

variables ⌫i(xi) on which the variables µji (xi) of the

(overlapping) subproblems must agree. We can chooseany decomposition with the only condition that thecorresponding trees together must cover all the nodesand edges of G, that is, V =

Smj=1 Vj and E =

Smj=1 Ej ,

as well as ✓>µ =Pm

j=1 ✓>j µj . Note that OP in (7)

is equivalent to the ILP in (2). A corresponding LPrelaxation given by

minimiseµ12LT1 ,...,µm2LTm ,⌫

mX

j=1

✓j>µj (8)

subject to µji (xi) = ⌫i(xi)

8j 2 {1, ...,m}, 8i 2 Vj , 8xi 2 S

is equivalent to the OP in (5) in the sense that bothhave the same optimal value and the same optimalsolution set.

In the corresponding dual problems the goal is to max-imise the dual function of the OPs in (7) and (8) ac-cording to

maximiseu2U

g7(u), (9)

where

g7(u) = infµ12XT1 ,...,µm2XTm

8<

:

mX

j=1

(✓j + uj)>µj

9=

;

and

maximiseu2U

g8(u), (10)

where

g8(u) = infµ12LT1 ,...,µm2LTm

8<

:

mX

j=1

(✓j + uj)>µj

9=

;

respectively, over a restricted set of dual values

U :=

8<

:u :X

j : i2Vj

uji (xi) = 0, i 2 {1, ..., n}, xi 2 S

9=

;

(11)We here overload the notation in the following sense.The dual variables have the following form u =(u1, ...,um), uj = (..., uj

i (xi), ...). Therefore, since weignore the edges, the number of dual variables uj issmaller than the dimensionality of µj (or ✓j). How-ever, for the algebraic operations (e.g. inner product)to make sense we implicitly assume that the vector uj

(if required) is appropriately filled with zeros to getthe same dimensionality as µj . A derivation of theabove dual problems can be found in [12]. There aredifferent ways to solve a corresponding dual problem.The most popular is a subgradient method for convexnon differentiable objectives. Alternatively, we coulduse a variant of block coordinate descent or cuttingplane algorithm.

Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs

3 Connections between LP Relaxation

and DD

The facts summarised in Subsection 3.1 are mainlyknown. We provide them for the sake of complete-ness. In Subsection 3.2 we present new insights in theconnections between the two optimisation techniques.

3.1 Case 1: LP Relaxation yields an Integral

Solution

It is well known that if the LP relaxation is tight,both the LP relaxation and DD provide an integraloptimal solution to the MAP problem. From a differentperspective, this means that strong duality holds for theOP in (7). That is, there is zero duality gap betweenoptimal values of OPs in (7) and (9). Equivalently, itimplies the existence of consistent optimal assignmentsto subtrees according to a chosen decomposition. Wesummarise these insights in the following lemma.

Lemma 1 The following claims are equivalent:

(i) LP relaxation in (5) has an integral solution

(ii) strong duality holds for problem (7)

(iii) µj1i (xi) = µj2

i (xi) 8j1, j2 2 {1, ...,m}, i 2Vj1 \ Vj2 , xi 2 S

where µ = (µ1, ..., µm) is a (not necessarily unique)

minimiser of the Lagrangian L(·, ..., ·,u⇤) for OP in

(7) and u⇤is a dual optimal.

3.2 Case 2: LP Relaxation yields a

Fractional Solution

If the LP relaxation is not tight, a corresponding opti-mal solution µ⇤ will have fractional components. Givensuch a fractional solution, we denote by I ✓ {1, ..., n}the indices of the variables x1, ..., xn, which have beenassigned an integral value in µ⇤, and by F ✓ {1, ..., n}the remaining set of indices corresponding to the frac-tional part. Formally, i 2 I , 8xi 2 S : µ⇤

i (xi) 2{0, 1} and F = {1, ..., n} \ I. In contrast to LP relax-ation, assignments produced via DD are fully integral.Given a tree decomposition of a corresponding MRF,the optimal assignments to different subtrees, however,will partially disagree on the overlapping parts. Herewe denote by A ✓ {1, ..., n} the indices of the variablesx1, ..., xn with unique assignment. Formally, i 2 A, iffor every tree which contains xi and every optimal as-signment to that tree, xi has the same value. Similarlywe denote by D ✓ {1, ..., n} the set of indices for whichat least two different trees disagree on their optimal as-signments, that is, D = {1, ..., n} \ A. We now providea formal analysis of the relationship between the setsI and A, or equivalently between F and D.

Theorem 1 Let µ⇤be an optimal fractional solution

of the LP relaxation (5), u a dual optimal for OP in

(7), and L : XT1 ⇥ ... ⇥ XTm ⇥ U ! R a corresponding

Lagrangian. There always exists a set of minimisers

µ1 2 XT1 , ..., µm 2 XTm of the Lagrangian L(·, ..., ·, u)

which agree with the integral part I of µ⇤, that is,

8j 2 {1, ...,m}, i 2 I \ Vj , xi 2 S : µji (xi) = µ⇤

i (xi),

for short µjI = µ⇤

I .

The result in the above theorem has the most intu-itive interpretation in the case of a decomposition intospanning trees, that is, when every tree Tj covers allthe nodes (Vj = V) of the original graph. In that caseTheorem 1 implies that for any optimal solution of theLP relaxation and a corresponding assignment x⇤ withan integral part I, there exist optimal assignmentsx1, ...,xm (from a dual solution u) for the differentspanning trees which agree on a set of nodes A withxj

i = x⇤i for all i 2 I ✓ A. An immediate question

arising is whether the two sets I and A are equal. Theanswer is no. In general, the two sets are not the sameand I will usually be a proper subset of A.

Theorem 1 motivates the following simple heuristic forgetting an approximate integral solution, which is espe-cially suitable for a decomposition over spanning treeswhen using subgradient optimisation. Namely, we canconsider optimal assignments for every spanning treeand choose the best according to the value of the primalobjective. Increasing the number of trees in a decom-position also increases the chance of finding a goodassignment. Furthermore, during the optimisation wecan repeat this for the intermediate results after eachiteration of a corresponding optimisation algorithmsaving the currently best solution. Obviously, this onlycan improve the quality of the resulting assignment.Note that it is the usual praxis with subgradient meth-ods to save the intermediate results since the objectiveis not guaranteed to improve in every step but can evenget worse. In that sense, the above heuristic does notimpose additional computational cost.

Finally, it turns out that in a non degenerate case,where LP relaxation has a unique solution, the rela-tionship I ✓ A (and therefore D ✓ F) holds for allminimiser of a corresponding Lagrangian L(·, ..., ·, u)supported by the following theorem.

Theorem 2 Let µ⇤be a unique optimal solution of the

LP relaxation (5), u a dual optimal for OP in (7), and

L : XT1 ⇥...⇥XTm ⇥U ! R a corresponding Lagrangian.

Each set of minimisers µ1 2 XT1 , ..., µm 2 XTm of the

Lagrangian L(·, ..., ·, u) agrees with the integral part Iof µ⇤

, that is,

8j 2 {1, ...,m}, i 2 I \ Vj , xi 2 S : µji (xi) = µ⇤

i (xi)

Alexander Bauer, Shinichi Nakajima, Nico Görnitz, Klaus-Robert Müller

for short µjI = µ⇤

I .

In particular, for binary pairwise MRFs this result im-plies that each assignment obtained via DD is partiallyoptimal in a sense that it always contains the stronglypersistent part of the (fractional) solution of the LPrelaxation. Provided the fractional part is small, itsuggests that the obtained assignments will often havea low energy close to the optimum even if the LP re-laxation is not tight. This fact and the possibility of aparallel computation renders the dual decompositiona practical tool for the MAP inference upon the LPrelaxation. We also note that the property I ✓ Ain Theorem 1 still holds for non binary MRFs witharbitrary higher order cliques, but I is not guaranteedto be strongly persistent anymore. In the following webuild on an additional lemma.

Lemma 2 Assume the setting of Theorem 1. For ev-

ery subproblem j 2 {1, ...,m} over a tree Tj there are

minimisers µj , µj 2 XTj , where

µ⇤i (xi) =

1

2(µj

i (xi) + µji (xi)) (12)

holds for all i 2 Vj , xi 2 S.

The above lemma ensures that for each optimal solu-tion µ⇤ of the LP relaxation, for each tree in a givendecomposition there always exist two different assign-ments which agree exactly on the nodes correspondingto the integral (strongly persistent) part I of µ⇤. It isalso worth noting that when LP relaxation is not tightthe sets of minimising assignments to the different treesare disjoint on the overlapping parts due to Lemma 1.

Lemma 2 and Theorem 2 together imply that the un-ambiguous part among all optimal assignments fromall trees in a decomposition coincides with the inte-gral part of the (fractional) optimal assignment via LPrelaxation giving rise to the following theorem.

Theorem 3 Let µ⇤be a unique optimal solution for

the LP relaxation (5) and u a dual optimal for OP

in (7). Then the unambiguous part A of optimal as-

signments among all the overlapping subproblems in a

decomposition coincides with the integral part I of µ⇤.

That is, A = I.

That is, we can extract the strongly persistent partfrom DD by considering the intersection of optimalassignments for individual trees. This is in particularconvenient for a decomposition over single edges, sincecomputing the set of optimal assignments for an edgeis straightforward.

4 Related Work

Several previous works [20, 9, 11, 16] including the orig-inal paper [12] on DD for MAP inference comment onthe optimality of DD and LP relaxation when the latteris tight. While in [12] the authors make an explicitstatement, in [20] the same question is approached byintroducing a notion of a tree agreement, which is gen-eralised to a weak tree agreement in [9]. Both, however,can be seen as a special case of [12] in the sense thatthe corresponding lower bound on the optimal value ofthe MAP problem is equal to the optimal value of acorresponding dual problem in case of tree agreementof all the involved subproblems.

Concerning the case where the LP relaxation is nottight, to the best of our knowledge, there are no previ-ous works which directly address the question how theoptimal solutions of the LP relaxation are related tothe assignments produced via DD. Instead the existingworks proceed with a discussion of tightening a corre-sponding relaxation [12, 19, 17]. In [10] the authorsprovide a related discussion for tree-reweighted (TRW)message passing and show that the partial assignmentcorresponding to the unambiguous part in the inter-section of all optimal assignments for individual treesis strongly persistent. The TRW algorithms are lessaccurate than DD, however, in case of binary pairwiseMRFs both achieve dual optimal value. Our results arebased on a different proof than in [10], which supportsa similar statement that the unambiguous part in acorresponding assignment is strongly persistent, butadditionally implies that it is exactly the integral partof the LP relaxation when the corresponding solution isunique. Furthermore, our result in Theorem 1 extendsalso to the case of arbitrary graphs.

Another related question to the discussion in thepresent paper is the problem of recovering optimalprimal solutions of an LP relaxation from dual optimalsolutions via DD [12, 18, 14, 1]. In particular, [1] showsthat the strongly persistent part of an LP solution (fora binary pairwise MRF) can be recovered from the DDbased on subradient optimisation. However, it is notthe case with other optimisation methods. In contrast,our corresponding result holds independently of thechosen optimisation technique.

In the case of binary pairwise MRFs each optimal so-lution of the LP relaxation is known to be stronglypersistent. In this paper we show that (in a non de-grease case) this property is inherited by every solutionprovided via DD in the sense that persistent part is asubset of the corresponding assignment. Since the frac-tional part of a solution of the LP relaxation conveys nouseful information about the preference of the variablesin the fractional area to be in specific state, the pre-

Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs

sented result further supports the practical usefulnessof the DD in that case.

5 Numerical Validation

Here we present a numerical experiment summarisedin Figure 1 to validate the theoretical statements inthe paper. For this purpose we considered a 5 ⇥ 5Ising grid model corresponding to 25 pixels and definedits energy according to the following procedure. Theunary potentials have been all set to zero. The valuesof the corresponding edge potentials have been selecteduniformly at random from an interval [�0.5, 0.5]. Thisprocess has been repeated until the corresponding LPrelaxation yielded a fractional solution µ⇤ (see thefourth plot in the first row).

Given such an energy, we then considered a decompo-sition of the above Ising model into two spanning treesT1 (all the vertical edges) and T2 (all the horizontaledges). We used the subgradient method to solve a cor-responding dual problem and got optimal assignmentsµ1 and µ2 corresponding to subproblems T1 and T2,respectively (see the first two plots in the first row).

The last two plots in the upper row in Figure 1 canserve as a validation of Theorem 1. Namely, the redarea in these two plots corresponds to the variables onwhich each assignment disagrees with the LP solutionµ⇤. As we can see it happens only for the fractionalarea, where each variable in µ⇤ has the value 0.5. Inother words, µ1 and µ2 agree with the integral part ofµ⇤.

To support Theorem 2 we computed further optimalassignments µ1

2, ..., µ17 for the subproblem T1 addition-

ally to µ1. These are visualised in the second rowin Figure 1. We overlay these assignments with theassignment µ⇤ in a transparent way to emphasise thedifference to the LP solution. We see that all theseoptimal assignments agree with the integral part of µ⇤.

The last plot in the third row validates the statementin Lemma 2. It visualises the average of the the twoplots above corresponding to the optimal assignmentsµ1

6 and µ17 for the subproblem T1.

Finally, the first plot in the third row supports the claimin Theorem 3. Namely, the blue area corresponds to theunambiguous part A where each variable has a uniquevalue in all the optimal assignments among the twosubproblems. The red area marks the ambiguous part,where each variable has different values in differentassignments. We see that the equality A = I holds.

6 Conclusion

We presented the established frameworks of linear pro-gramming (LP) relaxation and dual decomposition(DD) for the task of MAP inference in discrete MRFs.In the case when a corresponding LP relaxation istight, these two methods are known to be equivalentboth providing an optimal MAP assignment. However,less is known about their relationship in the oppositeand more realistic case. While it is known that bothmethods have the same optimal objective value alsoin the non tight regime, it is an interesting questionif there are other properties they share. In particular,the connection between the solutions of LP relaxationand the assignments which can be extracted by DD hasnot been clarified. For example, even if the solutionof LP relaxation is unique, there might be multipleoptimal (but disagreeing) assignments via DD. What isthe nature of this ambiguity? Are all these assignmentsequivalent or can we even extract additional informa-tion about the optimal solutions from analysing thedisagreement behaviour? These and other questionswere the main motivation for our paper. Here we suc-cessfully provided a few novel findings explaining howthe fully integral assignments obtained via DD agreewith the optimal fractional assignments via LP relax-ation when the latter is not tight. More specifically, wehave proved:

• given an optimal (fractional) solution of the LPrelaxation, there always exists an optimal vari-able assignment via DD which agrees with theintegral part of the LP solution; this also holdsfor non binary models with arbitrary higher orderpotentials

• for binary pairwise MRFs (in a non degeneratecase) the first result holds for every optimal as-signment which can be extracted via DD

• for binary pairwise MRFs (in a non degeneratecase) the unambiguous part among all optimalassignments from different trees in a decompositioncoincides with the integral part of the (fractional)optimal assignment via LP relaxation

In particular, for binary pairwise MRFs the integralpart of an optimal solution provided via LP relaxationis known to be strongly persistent. Therefore, due tothe properties listed above, we can conclude that (forthis case) both methods LP relaxation and DD sharethe partial optimality property of their solutions.

Practically, it has the following implications. 1) If thegoal is to find an accurate MAP assignment we canuse LP relaxation first to fix the integral part and thenapply an approximation algorithm (e.g. loopy belief

Alexander Bauer, Shinichi Nakajima, Nico Görnitz, Klaus-Robert Müller

µ1 µ2 µ1 �= µ2 µ⇤ �= µ2µ⇤ �= µ1

µ12 µ1

3 µ14 µ1

5 µ16 µ1

7

µ⇤

\µj1

2(µ1

6 + µ17)

Figure 1: Illustration of numerical validation of the theoretical results including Theorem 1, 2, 3 and Lemma 2.We consider a decomposition of an 5 ⇥ 5 Ising grid model into two spanning trees (more precisely, disconnectedforests): T1 consisting of all vertical edges and T2 consisting of all horizontal edges. The corresponding optimalassignments for both trees are denoted by µ1 and µ2, respectively. µ⇤ is a corresponding optimal assignmentfrom the LP relaxation. Here, the red area corresponds to label 1, blue area to label 0, and green area to label0.5. The individual plots in the first row visualise a corresponding assignment or a disagreement between twoassignments. The second row provides (additionally to µ1) further optimal assignments for the subproblem T1.The first plot in the third row illustrates the unambiguous part among all optimal and overlapping assignmentsin blue and the ambiguous part in red. The last plot illustrates an average of the two assignments above.

propagation) to set the remaining variables in the frac-tional part – each fractional node has the value 0.5showing no preference of a corresponding variable tobe in a specific state. On the other hand, DD providesfully integral assignments, which agree with the integralpart of LP relaxation. 2) If we are only interested in thepersistent part, we can extract the corresponding par-tial assignment from DD by considering an intersectionof all the optimal assignments to the individual trees ina decomposition. The unambiguous part then coincideswith the strongly persistent part of LP relaxation, pro-vided a corresponding solution is unique, or to a subsetof it in the opposite case. In particular, considering adecomposition over individual edges, is more beneficialin that case. Namely, finding all optimal assignmentsto trees becomes a trivial task.

To summarise, the LP relaxation is a popular methodfor discrete MAP inference in pairwise graphical modelsbecause of its appealing (partial) optimality properties.However, this method does not scale nicely due to theextensive memory requirements restricting its practicaluse for bigger problems. Here, DD provides an effectivealternative via distributed optimisation, which scalesto problems of arbitrary size — we can always consider

a decomposition over the individual edges. Finally,the results presented in this paper suggest, that whenusing DD instead of the LP relaxation we do not loseany of the nice properties of the latter. Both methodsprovide (for binary pairwise models) exactly the sameinformation about their solutions.

An interesting question is which findings in the papercan be extended beyond the pairwise models involvinghigher order potentials. We will continue this investi-gation in the future works.

Acknowledgments

This research was supported by the Federal Ministryof Education and Research under the Berlin Big DataCenter 2 project (FKz 01IS18025A), and by the WorldClass University Program through the National Re-search Foundation of Korea funded by the Ministryof Education, Science, and Technology, under GrantR31-10008.

References

[1] K. M. Anstreicher and L. A. Wolsey. Two "well-known"properties of subgradient optimization. Math. Pro-

Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs

gram., 120(1):213–220, 2009.[2] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear

optimization. Athena scientific series in optimizationand neural computation. Athena Scientific, Belmont(Mass.), 1997.

[3] M. M. Deza and M. Laurent. Geometry of Cutsand Metrics. Algorithms and Combinatorics, vol. 15.Springer, 1997.

[4] H. Everett. Generalized lagrange multiplier method forsolving problems of optimum allocation of resources.Operations Research, 11(3):399–417, may 1963.

[5] M. Grötschel, L. Lovasz, and A. Schrijver. Geometricalgorithms and combinatorial optimization. Algorithmsand combinatorics. Springer-Verlag, Berlin, New York,1988.

[6] P. L. Hammer, P. Hansen, and B. Simeone. Roof dual-ity, complementation and persistency in quadratic 0-1optimization. Math. Program., 28(2):121–155, 1984.

[7] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr,S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger,J. Lellmann, N. Komodakis, B. Savchynskyy, andC. Rother. A comparative study of modern inferencetechniques for structured discrete energy minimizationproblems. International Journal of Computer Vision,115(2):155–184, 2015.

[8] D. Koller and N. Friedman. Probabilistic GraphicalModels: Principles and Techniques - Adaptive Compu-tation and Machine Learning. The MIT Press, 2009.

[9] V. Kolmogorov. Convergent tree-reweighted messagepassing for energy minimization. IEEE Trans. PatternAnal. Mach. Intell., 28(10):1568–1583, 2006.

[10] V. Kolmogorov and M. J. Wainwright. On the optimal-ity of tree-reweighted max-product message-passing.In UAI ’05, Proceedings of the 21st Conference in Un-certainty in Artificial Intelligence, Edinburgh, Scotland,July 26-29, 2005, pages 316–323, 2005.

[11] V. Kolmogorov and M. J. Wainwright. On the optimal-ity of tree-reweighted max-product message-passing.CoRR, abs/1207.1395, 2012.

[12] N. Komodakis, N. Paragios, and G. Tziritas. MRF en-ergy minimization and beyond via dual decomposition.IEEE Trans. Pattern Anal. Mach. Intell., 33(3):531–552, 2011.

[13] D. G. Luenberger. Introduction to Linear and Nonlin-ear Programming. Addison-Wesley, 1973.

[14] A. Nedic and A. E. Ozdaglar. Approximate primalsolutions and rate analysis for dual subgradient meth-ods. SIAM Journal on Optimization, 19(4):1757–1780,2009.

[15] J. Pearl. Probabilistic reasoning in intelligent systems- networks of plausible inference. Morgan Kaufmannseries in representation and reasoning. Morgan Kauf-mann, 1989.

[16] A. M. Rush and M. Collins. A tutorial on dual de-composition and lagrangian relaxation for inference innatural language processing. CoRR, abs/1405.5208,2014.

[17] D. Sontag. Approximate Inference in Graphical Mod-els using LP Relaxations. PhD thesis, MassachusettsInstitute of Technology, Department of Electrical En-gineering and Computer Science, 2010.

[18] D. Sontag, A. Globerson, and T. Jaakkola. Introduc-tion to dual decomposition for inference. In S. Sra,S. Nowozin, and S. J. Wright, editors, Optimizationfor Machine Learning. MIT Press, 2011.

[19] D. Sontag, T. Meltzer, A. Globerson, T. S. Jaakkola,and Y. Weiss. Tightening LP relaxations for MAPusing message passing. CoRR, abs/1206.3288, 2012.

[20] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky.MAP estimation via agreement on trees: message-passing and linear programming. IEEE Trans. Infor-mation Theory, 51(11):3697–3717, 2005.

[21] M. J. Wainwright and M. I. Jordan. Graphical models,exponential families, and variational inference. Foun-dations and Trends in Machine Learning, 1(1-2):1–305,2008.

[22] J. Wang and S. Yeung. A compact linear program-ming relaxation for binary sub-modular MRF. In En-ergy Minimization Methods in Computer Vision andPattern Recognition - 10th International Conference,EMMCVPR 2015, Hong Kong, China, January 13-16,2015. Proceedings, pages 29–42, 2014.