submitted to ieee transactions on pattern analysis and machine intelligence, november ... · 2016....

16
SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 1 Factorized Graph Matching Feng Zhou and Fernando De la Torre Abstract—Graph matching (GM) is a fundamental problem in computer science, and it plays a central role to solve correspondence problems in computer vision. GM problems that incorporate pairwise constraints can be formulated as a quadratic assignment problem (QAP). Although widely used, solving the correspondence problem through GM has two main limitations: (1) the QAP is NP-hard and difficult to approximate; (2) GM algorithms do not incorporate geometric constraints between nodes that are natural in computer vision problems. To address aforementioned problems, this paper proposes factorized graph matching (FGM). FGM factorizes the large pairwise affinity matrix into smaller matrices that encode the local structure of each graph and the pairwise affinity between edges. Four are the benefits that follow from this factorization: (1) There is no need to compute the costly (in space and time) pairwise affinity matrix; (2) The factorization allows the use of a path-following optimization algorithm, that leads to improved optimization strategies and matching performance; (3) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid and non-rigid) to the GM problem. (4) Using a matrix formulation for the GM problem and the factorization, it is easy to reveal commonalities and differences between different GM methods. The factorization also provides a clean connection with other matching algorithms such as iterative closest point; Experimental results on synthetic and real databases illustrate how FGM outperforms state-of-the-art algorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm. Index Terms—Graph matching, Feature matching, Quadratic assignment problem, Iterative closet point method. 1 I NTRODUCTION E STABLISHING correspondence between two sets of visual features is the key of many computer vi- sion tasks, such as object tracking [1], structure-from- motion [2], and image classification [3]. While solving the correspondence between images is still an open prob- lem in computer vision, much progress has been done in the last decades. Early works such as RANSAC [4] and iterative closest point (ICP) [5] assume that the location of image features are constrained explicitly (e.g., a planar affine transformation) or implicitly (e.g., epipolar ones) by a parametric form. While these meth- ods achieved great success in exploring consistent cor- respondence between feature points, they are limited to correspondence problems with rigid transformations. Unlike conventional methods, graph matching (GM) formulates the correspondence problem as solving the matching between two graphs. In the past decades, GM has found wide application to address several problems such as shape matching in 2-D [6] and 3-D [7], object categorization [8], [9], feature tracking [1], symmetry analysis [10], action recognition [11], [12], kernelized sorting [13], and protein alignment [14]. Compared to RANSAC and ICP, GM incorporates pairwise node in- teractions which are important features when matching structural objects (e.g., human bodies). Fig. 1 illustrates an example of matching two graphs, where the nodes are image locations returned by a body part detector. Match- ing the graphs using only similarity between nodes (i.e., features) might lead to undesirable results because some features of nodes (e.g., hands, feet) are likely to be simi- lar. GM adds pairwise information between nodes (e.g., F. Zhou and F. De la Torre are with Robotics Institute, Carnegie Mellon University. Fig. 1. Matching two human bodies with 5 and 4 fea- tures using FGM. FGM simultaneously estimates the correspondence and a smooth non-rigid transformation between shapes. FGM is able to factorize the 20 × 20 pairwise affinity matrix as a Kronecker product of six smaller matrices. The first two groups of matrices of size 5 × 16 and 4 × 10 encode the structure of each of the graphs. The last two matrices encode the affinities for nodes (5 × 4) and edges (16 × 10). length of the limbs, orientation of edges) that constraints the problem and better correspondences are found. Although extensive research has been done for decades, there are still two main challenges in solving GM. (1) Mathematically, GM is formulated as a quadratic assignment problem (QAP) [15]. Unlike the linear assign- ment problem, which can be efficiently solved with the Hungarian algorithm [16], the QAP is known to be NP-

Upload: others

Post on 23-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 1

Factorized Graph MatchingFeng Zhou and Fernando De la Torre

Abstract—Graph matching (GM) is a fundamental problem in computer science, and it plays a central role to solve correspondenceproblems in computer vision. GM problems that incorporate pairwise constraints can be formulated as a quadratic assignment problem(QAP). Although widely used, solving the correspondence problem through GM has two main limitations: (1) the QAP is NP-hard anddifficult to approximate; (2) GM algorithms do not incorporate geometric constraints between nodes that are natural in computer visionproblems. To address aforementioned problems, this paper proposes factorized graph matching (FGM). FGM factorizes the largepairwise affinity matrix into smaller matrices that encode the local structure of each graph and the pairwise affinity between edges.Four are the benefits that follow from this factorization: (1) There is no need to compute the costly (in space and time) pairwise affinitymatrix; (2) The factorization allows the use of a path-following optimization algorithm, that leads to improved optimization strategiesand matching performance; (3) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid andnon-rigid) to the GM problem. (4) Using a matrix formulation for the GM problem and the factorization, it is easy to reveal commonalitiesand differences between different GM methods. The factorization also provides a clean connection with other matching algorithmssuch as iterative closest point; Experimental results on synthetic and real databases illustrate how FGM outperforms state-of-the-artalgorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm.

Index Terms—Graph matching, Feature matching, Quadratic assignment problem, Iterative closet point method.

F

1 INTRODUCTION

E STABLISHING correspondence between two sets ofvisual features is the key of many computer vi-

sion tasks, such as object tracking [1], structure-from-motion [2], and image classification [3]. While solvingthe correspondence between images is still an open prob-lem in computer vision, much progress has been donein the last decades. Early works such as RANSAC [4]and iterative closest point (ICP) [5] assume that thelocation of image features are constrained explicitly(e.g., a planar affine transformation) or implicitly (e.g.,epipolar ones) by a parametric form. While these meth-ods achieved great success in exploring consistent cor-respondence between feature points, they are limitedto correspondence problems with rigid transformations.Unlike conventional methods, graph matching (GM)formulates the correspondence problem as solving thematching between two graphs. In the past decades, GMhas found wide application to address several problemssuch as shape matching in 2-D [6] and 3-D [7], objectcategorization [8], [9], feature tracking [1], symmetryanalysis [10], action recognition [11], [12], kernelizedsorting [13], and protein alignment [14]. Compared toRANSAC and ICP, GM incorporates pairwise node in-teractions which are important features when matchingstructural objects (e.g., human bodies). Fig. 1 illustratesan example of matching two graphs, where the nodes areimage locations returned by a body part detector. Match-ing the graphs using only similarity between nodes (i.e.,features) might lead to undesirable results because somefeatures of nodes (e.g., hands, feet) are likely to be simi-lar. GM adds pairwise information between nodes (e.g.,

F. Zhou and F. De la Torre are with Robotics Institute, Carnegie MellonUniversity.

Fig. 1. Matching two human bodies with 5 and 4 fea-tures using FGM. FGM simultaneously estimates thecorrespondence and a smooth non-rigid transformationbetween shapes. FGM is able to factorize the 20 × 20pairwise affinity matrix as a Kronecker product of sixsmaller matrices. The first two groups of matrices of size5 × 16 and 4 × 10 encode the structure of each of thegraphs. The last two matrices encode the affinities fornodes (5× 4) and edges (16× 10).

length of the limbs, orientation of edges) that constraintsthe problem and better correspondences are found.

Although extensive research has been done fordecades, there are still two main challenges in solvingGM. (1) Mathematically, GM is formulated as a quadraticassignment problem (QAP) [15]. Unlike the linear assign-ment problem, which can be efficiently solved with theHungarian algorithm [16], the QAP is known to be NP-

Page 2: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 2

hard. Therefore, the main body of research in GM hasfocused on devising more accurate algorithms to solve itapproximately. Nevertheless, approximating GM by re-laxing the combinatorial constraints is still a challengingproblem. This is mainly because the objective functionis in general non-convex and thus existing methods areprone to local optima. (2) Many matching problemsin computer vision naturally require global constraintsamong nodes. For instance, given two sets of coplanarpoints in two images, the matching between pointsunder orthographic projection should be constrained byan affine transformation. Similarly, when matching thedeformations of non-rigid objects between two consec-utive images, that deformation is typically smooth inspace and time. Existing GM algorithms do not constrainthe nodes of both graphs to a given geometric transfor-mation (e.g., similarity, affine or non-rigid). For instance,how to constrain the global deformation of the humanbody configuration shown in Fig. 1.

In order to address these issues, this paper presentsfactorized graph matching (FGM), a novel frameworkfor optimizing and constraining GM problems. The keyidea behind FGM is a closed-form factorization of thepairwise affinity matrix that decouples the local graphstructure of the nodes and edge similarities. This factor-ization is general and can be applied to both undirectedand directed graphs. For instance, consider the matchingof the two human bodies shown in Fig. 1. The bodyconfigurations are represented by two directed graphswith 5 and 4 features, and 16 and 10 edges respectively.Conventional GM algorithms need to construct a largepairwise affinity matrix (20-by-20 in this case). WhereasFGM only requires six small matrices to be computed.In our example (Fig. 1), the first two binary matrices areof dimension 5-by-16 and the second two of dimensions4-by-10 and describe the local structure of the first andsecond graph. The last two matrices are of dimensions5-by-4 and 16-by-10, and they encode the similaritybetween nodes and edges respectively.

The main contribution of this paper is to propose thefactorization of the affinity matrix and illustrate fourmain benefits of this factorization in GM problems:• Using the factorization, there is no need to compute

the expensive (in space and time) affinity matrix,which computational cost scales quartically with thenumber of features.

• We derive a new path-following algorithm thatleads to improved optimization strategies andmatching performance. This is possible because us-ing the factorization it is relatively easy to providea concave and convex approximation, without theproposed factorization it is unclear how to derivethis path-following algorithm.

• A major contribution of this work is to incorporateglobal geometric constraints into GM. To the best ofour knowledge, this is the first work that addressesadding non-rigid constraints into GM. This is pos-sible due to the factorization, which decouples the

local structure in each of the graphs.• We provide a simple and compact matrix notation

to formulate GM problems. Using the factorizationand the matrix formulation, it is very easy to un-derstand the commonalities and differences betweenGM methods and relate them to other problems likeICP and Markov Random Fields (MRF).

2 PREVIOUS WORK ON GMIn this section, we review previous work on GM witha unified matrix formulation. Formulating GM in ma-trix form has several benefits beyond simplicity of un-derstanding that will be discussed through the paper.Section A introduces the GM problem and formulatesGM as a quadratic assignment problem in matrix form.Section B discusses several advances of GM methods.

2.1 Definition of GMTo better understand the difference among previouswork, we provided here a brief overview of the graphmatching problem and its mathematical definition. Wedenote (see notation1) a graph with n nodes and mdirected edges as a 4-tuple G = {P,Q,G,H}. Thefeatures for nodes and edges are specified by P =[p1, · · · ,pn] ∈ Rdp×n and Q = [q1, · · · ,qm] ∈ Rdq×mrespectively, where dp and dq are the dimensionality ofthe features. For instance, pi could be a 128-D SIFThistogram extracted from the image patch around theith node and qc could be a 2-D vector that encodes thelength and orientation of the cth edge. The topologyof the graph is encoded by two node-edge incidencematrices G,H ∈ {0, 1}n×m, where gic = hjc = 1 ifthe cth edge starts from the ith node and ends at thejth node. Fig. 2a illustrates two synthetic graphs, whoseedge connection between nodes is encoded by the binarymatrices shown in Fig. 2b-c. In our previous work [17], asimpler representation of graph was adopted and validonly for undirected graphs. This representation is moregeneral and applicable for both undirected and directedgraphs. Directed graphs occur when the edge featuresare asymmetrical such as the angle between an edge andthe horizontal line.

Given a pair of graphs, G1 = {P1,Q1,G1,H1} andG2 = {P2,Q2,G2,H2}, we compute two affinity matri-ces, Kp ∈ Rn1×n2 and Kq ∈ Rm1×m2 , to measure thesimilarity of each node and edge pair respectively. Morespecifically, κpi1i2 = φp(p

1i1,p2

i2) measures the similarity

between the ith1 node of G1 and the ith2 node of G2, andκqc1c2 = φq(q

1c1 ,q

2c2) measures the similarity between the

1. Bold capital letters denote a matrix X, bold lower-case letters acolumn vector x. xi represents the ith column of the matrix X. xij or[X]ij denotes the scalar in the ith row and jth column of the matrixX. All non-bold letters represent scalars. 1m×n,0m×n ∈ Rm×n arematrices of ones and zeros. In ∈ Rn×n is an identity matrix. |X|represents the determinant of the square matrix X. vec(X) denotes thevectorization of matrix X. diag(x) is a diagonal matrix whose diagonalelements are x. X ◦Y and X ⊗Y are the Hadamard and Kroneckerproducts of matrices.

Page 3: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 3

cth1 edge of G1 and the cth2 edge of G2. For instance, Fig. 2dillustrates an example pair of Kp and Kq for the twosynthetic graphs.

Given two graphs and the associated affinity matrices,the problem of GM consists in finding the optimalcorrespondence X between nodes, such that the sum ofthe node and edge compatibility is maximized:

Jgm(X) =∑i1i2

xi1i2κpi1i2

+∑

i1 6=i2,j1 6=j2g1i1c1

h1j1c1

=1

g2i2c2h2j2c2

=1

xi1i2xj1j2κqc1c2 , (1)

where X ∈ Π is constrained to be a one-to-one mapping,i.e., Π is the set of partial permutation matrices:

Π = {X|X ∈ {0, 1}n1×n2 ,X1n2 ≤ 1n1 ,XT1n1 = 1n2}. (2)

The inequality in the above definition is used for thecase when the graphs are of different sizes. Without lossof generality, we assume n1 ≥ n2 from now on. Forinstance, the matrix X shown in the top row of Fig. 2edefines the node correspondence shown in Fig. 2a.

To be concise in notation, we encode the node andedge affinities in a symmetrical matrix K ∈ Rn1n2×n1n2 ,whose elements are computed as follows:

κi1i2,j1j2 =

κpi1i2 , if i1 = j1 and i2 = j2,κqc1c2 , if i1 6= j1 and i2 6= j2 and

g1i1c1h

1j1c1g

2i2c2h

2j2c2 = 1,

0, otherwise,

where the diagonal and off-diagonal elements encode thesimilarity between nodes and edges respectively.

Using K, GM can be concisely formulated as thefollowing quadratic assignment problem (QAP) [15]:

maxX∈Π

Jgm(X) = vec(X)TK vec(X). (3)

2.2 Advances on GMAs a generic problem for matching structural data, GMhas been studied for decades in computer science andmathematics [18]. Early works [19]–[21] on GM oftentreated the problem as a special case of (sub)graphisomorphism, where the graphs are binary and an ex-act matching between nodes and edges is of interest.Nevertheless, the stringent constraints imposed by exactmatching are too rigid for real applications in computervision. Modern works focus on finding an inexact match-ing between graphs with weighted attributes on nodesand edges. Due to the combinatorial nature, however,globally optimizing GM is NP-hard. A relaxation of thepermutation constraints (Eq. 2) is necessary to find anapproximation to the problem. Based on the approxima-tion strategy, previous work (see Fig. 3 for a summary)can be broadly categorized in three groups: spectralrelaxation, semidefinite-programming (SDP) relaxationand doubly-stochastic relaxation.

The first group of methods approximates the permu-tation matrix with an orthogonal one, i.e., XTX = I.Under the orthogonal constraint, optimizing GM can besolved in closed-form as an eigen-value problem [22]–[25]. However, these methods can only work for the

(c)

(a)

(e)(d)

(b)

(f)

1

2

3 4

a

b c

1 1 4

10 3 2

2 9 3

1 2 8

a b c

1

2

3

4

5 2 4 1

9 4 8 5

3 10 2 9

4 1 5 2

8 5 9 4

2 9 3 10

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

1

2

3

4

1 0 0 0

0 1 1 0

0 0 0 1

a

b

c

0 0 0

1 0 0

0 1 0

0 0 1

a b c

1

2

3

4

0 0 1 0

1 0 0 1

0 1 0 0

a

b

c

0 0 0 1 0 0

1 0 0 0 1 0

0 1 0 0 0 1

0 0 1 0 0 0

1

2

3

4

0 0 0 0

1 0 0 0

0 1 0 0

0 0 0 0

0 0 1 0

0 0 0 1

1 0 0 0 0 5 0 0 0 0 0 00 10 0 0 4 0 9 0 0 0 0 00 0 2 0 0 8 0 3 0 0 0 00 0 0 1 0 0 2 0 0 0 0 00 4 0 0 1 0 0 0 0 2 0 05 0 8 0 0 3 0 0 1 0 4 00 9 0 2 0 0 9 0 0 5 0 100 0 3 0 0 0 0 2 0 0 9 00 0 0 0 0 1 0 0 4 0 0 00 0 0 0 2 0 5 0 0 2 0 00 0 0 0 0 4 0 9 0 0 3 00 0 0 0 0 0 10 0 0 0 0 81a 2a 3a 4a 1b 2b 3b 4b 1c 2c 3c 4c

1a2a3a4a1b2b3b4b1c2c3c4c

Fig. 2. An example GM problem. (a) Two synthetic graphs.(b) The 1st graph’s incidence matrices G1 and H1, wherethe non-zero elements in each column of G1 and H1 indi-cate the starting and ending nodes in the correspondingdirected edge, respectively. (c) The 2nd graph’s incidencematrices G2 and H2. (d) The node affinity matrix Kp andthe edge affinity matrix Kq. (e) The node correspondencematrix X and the edge correspondence matrix Y. (f) Theglobal affinity matrix K.

Koopmans-Beckmann’s QAP [26], a special case of QAP(See Section 5 for more details). To handle more complexproblems in computer vision, Leordeanu and Hebert [27]approximated Eq. 3 by relaxing X to be of unit length,i.e., ‖ vec(X)‖2 = 1. The optimal X can then be efficientlycomputed as the leading eigen-vector of K. Cour etal. [28] incorporated an affine constraint to solve a moregeneral spectral problem, hence finding better approxi-mations to the original problem.

As a general tool for approximating combinatorialproblems, SDP was also used to approximate GM. In[29], [30], the authors reformulated the objective of GMby introducing a new variable Y ∈ Rn1n2×n1n2 subject toY = vec(X) vec(X)T . SDP approximates GM by relaxingthe non-convex constraint on Y as a convex semi-definiteone, Y − vec(X) vec(X)T � 0. After computing Y usingSDP, the correspondence X can be approximated by arandomized algorithm [29] or a winner-take-all strat-egy [30]. The main advantage of using SDP is its theo-retical guarantees [31] to find a polynomial time 0.879approximation to many NP-hard problems. However,in practice it is too expensive to use SDP because the

Page 4: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 4

variable Y squares the problem size.Most methods relax X ∈ D to be a doubly stochastic

matrix, the convex hull of the permutation matrices:

D = {X|X ∈ [0, 1]n1×n2 ,X1n2 ≤ 1n1 ,XT1n1 = 1n2}. (4)

Under this constraint, optimizing GM can be treatedas a non-convex quadratic programming problem andvarious strategies have been proposed to find a localoptima. Almohamad and Duffuaa [32] approximatedthe quadratic cost using linear programming. Zaslavskiyet al. [33] employed a path-following strategy to ap-proach the non-convex quadratic programming problem.Recently, Liu et al. [34] extended the path-followingalgorithm for matching asymmetrical adjacency matricesin more general problems. However, these works areonly applicable to Koopmans-Beckmann’s QAP and itis unclear how to apply it to the more general Lawler’sQAP problem [35]. Gold and Rangarajan [36] proposedthe graduated assignment algorithm to iteratively solve aseries of linear approximations of the cost function usingTaylor expansions. Its convergence has been recentlystudied and improved in [37] with a soft constrainedmechanism. Van Wyk and Van Wyk [38] proposed toiteratively project the approximate correspondence ma-trix onto the convex domain of the desired integer con-straints. Leordeanu et al. [39] proposed an integer pro-jection algorithm to optimize the objective function in aninteger domain. Torresani et al. [40] designed a complexobjective function which can be efficiently optimizedby dual decomposition. In addition to the optimization-based work, probabilistic frameworks were shown to beuseful for interpreting and solving GM problems. Egoziet al. [41] presented a probabilistic interpretation of thespectral matching algorithm [27], which is a maximum-likelihood estimate of the assignment probabilities. Zassand Shashua [42] proposed a convex relative-entropyerror that emerges from a probabilistic interpretation ofthe GM problem. Inspired by the PageRank algorithm,Cho et al. [43] introduced a random-walk algorithm toapproximate GM problem.

In addition to the efforts on devising better approx-imations for Eq. 3, there are three advances in GMmethods leading to improved results: learning the pair-wise affinity matrix [44], [45], incorporate high-orderfeatures [42], [46], and progressive mechanism [47]. First,it has been shown that a better K for GM can be learnedin an unsupervised [44] or a supervised [45] manner.Second, the matrix K encoding the pairwise geometryis susceptible to scale and rotation differences betweensets of points. To make GM invariant to rigid defor-mations, [42], [46] extended the pairwise K to a tensorthat encodes high-order geometrical relations. However,a small increment in the order of relations leads to acombinatorial explosion of the amount of data neededto support the algorithm. Therefore, most of high-orderGM methods can only work on very sparse graphs withno more than 3-order features. In addition, it is unclearon how to extend high-order methods to incorporate

tr(XTA1XA2) + tr(KTpX) vec(X)TK vec(X)

(Koopmans-Beckmann’s QAP) (Lawler’s QAP)

SpectralRelaxation

Umeyama [22]Scott & Longuet-Higgins [23]

Shapiro & Brady [24]Caelli & Kosinov [25]

Leordeanu & Hebert [27]Cour et al. [28]

SDPRelaxation

Torr [29]Schellewald & Schnorr [30]

Doubly-stochasticRelaxation

Almohamad & Duffuaa [32]Zaslavskiy et al. [33]

Liu et al. [34]

Gold & Rangarajan [36]Tian et al. [37]

Van Wyk & Van Wyk [38]Leordeanu et al. [39]

Torresani et al. [40]Egozi et al. [41]

Zass & Shashua [42]Cho et al. [43]

Ours

Fig. 3. Different relaxations on the constraints for GM.

non-rigid deformations. Third, the performance of GMmethods in real applications is often limited by the initialconstruction of graphs. To resolve this issue, Cho andLee [47] proposed a progressive framework which com-bines probabilistic progression of graphs with matchingof graphs. The algorithm re-estimates in a Bayesianmanner the most plausible target graphs based on thematching result, and guarantees to boost the matchingobjective at subsequent steps.

Unlike most of previous work, we tackle the GMproblems from a different perspective. We first deriveda principled factorization of the affinity matrix K. Thisfactorization enables the utilization of a path-followingalgorithm that leads to state-of-the-art performance inapproximating GM problems. Thanks to the factoriza-tion, we could further introduce global geometrical con-straints to better handle non-rigid deformation in GM.

3 PREVIOUS WORK ON ICPThis section describes the ICP method, that as we willsee in future sections has a close connection to GM.

3.1 Definition of ICPGiven two sets of points, P1 = [p1

1, · · · ,p1n1

] ∈ Rd×n1 andP2 = [p2

1, · · · ,p2n2

] ∈ Rd×n2 , the iterative closest point(ICP) algorithm (e.g., [5]) aims to find the correspondenceand the geometric transformation between points suchthat the sum of distances is minimized:

minX∈Π,T ∈Ψ

Jicp(X, T ) =∑i1i2

xi1i2‖p1i1 − τ(p2

i2)‖22 + ψ(T ), (5)

where X ∈ {0, 1}n1×n2 denotes the correspondencebetween points. Depending on the problem, X denoteseither a one-to-one or many-to-one matching. In this pa-per, we consider a one-to-one matching between pointsand X is thus constrained to be a permutation matrixi.e., X ∈ Π, where Π is defined as Eq. 2. τ(·) : Rd → Rddenotes a geometric transformation parameterized by T .The transformation is softly penalized by the function

Page 5: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 5

Similarity Affine RBF Non-rigid

Ts ∈ R

V ∈ R2×2

t ∈ Rd W ∈ R2×nR ∈ R2×2

t ∈ R2

τ(p) sRp + t Vp + t p + Wφ(p)

τ(P) sRP + t1Tn VP + t1Tn P + WLpψ(T ) 0 0 λw tr(WLpWT )

ΨRTR = I2 ∅ ∅|R| = 1

Fig. 4. Parameterization of three geometrical transforma-tions in 2-D space, where T , τ(·), Ψ and ψ(·) denote theparameter set, transformation function, constraint set andpenalization function respectively.

ψ(·) and constrained within the set Ψ. Fig. 4 lists theparameterization of three common transformations: sim-ilarity, affine and RBF non-rigid transformation.

Similarity transformation: Given a point p ∈ R2 ora point set P ∈ R2×n, its similarity transformation iscomputed as τ(p) = sRp + t or τ(P) = sRP + t1Tn ,where s ∈ R, R ∈ R2×2 and t ∈ R2 denote the scalingfactor, rotation matrix and translation vector respectively.In addition, the rotation matrix R ∈ Ψ has to satisfy theconstraints Ψ = {R|RTR = I2, |R| = 1}.

Affine transformation: Given a point p ∈ R2 or a pointset P ∈ R2×n, its affine transformation is computed asτ(p) = Vp+t or τ(P) = VP+t1Tn , where V ∈ R2×2 andt ∈ R2 denote the affine matrix and translation vectorrespectively.

RBF non-rigid transformation: The parameterizationof the RBF non-rigid transformation [48] depends onthe choice of the basis points. We choose the basispoints as the nodes of the second graph. Given a pointp ∈ R2 or a point set P ∈ R2×n, the transformationis computed as a displacement shifted from its initialposition, i.e., τ(p) = p + Wφ(p) or τ(P) = P + WLp,where W ∈ R2×n2 is the weight matrix and φ(p) =[φ1(p), · · · , φn2(p)]T ∈ Rn2 denotes the n2 displacementfunctions corresponding to the basis points. Each dis-placement function, φi(p) = exp(−‖p − p2

i ‖22/σ2w), is

computed between p and the basis p2i with the band-

width σw. Lp = [φ(p21), · · · , φ(p2

n2)] ∈ Rn2×n2 is the

RBF kernel defined on all basis pairs. Following [48],we regularize the non-smoothness of the displacementfield, i.e., ψ(T ) = tr(WLpW

T ).

3.2 Advances on ICPIn the past decade, ICP has been widely used to solvecorrespondence problems in image, shape, and surfaceregistration [49] due to its simplicity and low computa-tional complexity. ICP methods can be classified by thetype of deformation they recover as rigid or non-rigid.

A major limitation of ICP methods is that they arehighly sensitive to the initialization. There have been var-ious strategies proposed to address this issue [50]. Onecommon choice is to introduce structural informationinto the registration schemes [7], [51]. For instance, Be-longie et al. [52] proposed shape context, a robust shape

representation for finding correspondence between de-formed shapes. Advanced optimization scheme can alsobe employed to improve the performance of ICP. Forinstance, Chui and Rangarajan [53] proposed to combinesoft-assignment with deterministic annealing for non-rigid point registration. This work has been recentlyextended in [48] for both rigid and non-rigid registration.The most closely related work to our method is thestatistical approach [54], [55], where a binary neighbor-hood graph is used for guiding the ICP minimizationfor finding better correspondence. Unlike existing ICPalgorithms, FGM introduces pairwise constraints thatmakes the matching problem less prone to local minima.

Alternatively, RANSAC [4] type of algorithms havebeen widely used to find reliable correspondences be-tween two or more images in the presence of outliers.RANSAC estimates a global relation that fits the data,while simultaneously classifying the data into inliers andoutliers. Unlike RANSAC-type of algorithms, FGM isable to efficiently model non-rigid transformations andlarge rigid transformations.

4 FACTORIZED GRAPH MATCHING

This section proposes a novel factorization of the pair-wise affinity matrix K. As we will see in the followingsections, this factorization provides a light-weight repre-sentation for GM problems, allows a unification of GMmethods, elaborates a better optimization strategy andmakes it easy to add geometric constraints to GM.

The pairwise affinity matrix K plays a central role inGM because it encodes all the first-order and second-order relations between graphs. Two properties of K,full-rankness and indefiniteness, pose key challenges toconventional GM methods such as spectral methods [27],[28] and gradient-based ones [38], [39], [42], [43]. Fig. 5 il-lustrates an empirical study on a thousand Ks computedfrom three realistic benchmark datasets in Section 8. Thefirst row of Fig. 5 shows that the ratio between the rankand the dimension of K equaled to one, i.e., K was afull-rank matrix in all the experiments. This fact explainswhy the spectral methods [27], [28] usually performworse than others: spectral methods employ the rank-one approximation of K which inevitably leads to loss inaccuracy. The second row of Fig. 5 shows that K was anindefinite matrix because the ratio between its maximumand minimum eigenvalues was smaller than −0.4 inthe experiments. The indefiniteness of K indicates thatthe maximization of vec(X)TK vec(X) is a non-convexproblem and thus the gradient-based methods [38], [39],[42], [43] is prone to local optima. Instead of treating K asa black box, we propose a factorization that decouplesthe structure of K. To the best of our knowledge, thiswork is the first to propose a closed-form factorizationfor K and its applications to GM problems.

To illustrate the intuition behind the factorization, letus revisit the synthetic problem shown in Fig. 2. Noticethat K ∈ Rn1n2×n1n2 (Fig. 2f) is composed by two typesof affinities: the node affinity (Kp) on its diagonal and the

Page 6: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 6

Houses Cars Motorbikes#g

raph

pai

rs#g

raph

pai

rs

#gra

ph p

airs

#gra

ph p

airs

#gra

ph p

airs

#gra

ph p

airs

0 0.5 1 1.5 20

200

400

600

−0.7 −0.6 −0.5 −0.40

50

100

−0.7 −0.6 −0.5 −0.40

20

40

60

−0.7 −0.6 −0.5 −0.40

10

20

30

with−diagonalzero−diagonal

0 0.5 1 1.5 20

50

100

150

200

0 0.5 1 1.5 20

100

200

300

Fig. 5. Statistics of K computed for the graphs extractedfrom three real image datasets used in the experiments.The histograms in the top row illustrate the ratio betweenthe rank and the dimension of K. The histograms in thebottom row show the ratio between the minimum and themaximum eigenvalues of the K. For the last two datasets(columns), we test K with zero values on the diagonalas well. The top and bottom rows indicate that all thecomputed K are full-rank and indefinite respectively.

pairwise edge affinity (Kq) on its off-diagonals. Withoutthe diagonal, K is a sparse block matrix with threeunique structures: (1) K is composed by n2-by-n2 smallerblocks Kij ∈ Rn1×n1 . (2) Some of the Kijs are emptyif there is no edge connecting the ith and jth nodes ofG2. These empty blocks can be indexed by G2H

T2 , i.e.,

Kij = 0 if [G2HT2 ]ij = 0. (3) For the non-empty blocks,

Kij can be computed in a closed form as G1 diag(kqc)HT1 ,

where c is the index of the edge connecting the ith andjth nodes of G2, i.e., g2

ic = h2jc = 1. Based on these

observations, and after some linear algebra, it can beshown that K can be factorized exactly as:

K = diag(vec(Kp)) + (G2 ⊗G1) diag(vec(Kq))(H2 ⊗H1)T .(6)

This factorization decouples the graph structure (G1, H1,G2 and H2) from the similarity (Kp and Kq).

Eq. 6 is the key contribution of this work. Previouswork on GM explicitly computed the computationallyexpensive (in space and time) K, which is of sizeO(n2

1n22). On the contrary, Eq. 6 offers an alternative

framework by replacing K with six smaller matrices,which are of size O(n1m1 + n2m2 + n1n2 + m1m2). Forinstance, plugging Eq. 6 into Eq. 3 leads to an equivalentobjective function:

Jgm(X) = tr(KTpX)

+ tr(KTq (GT

1 XG2 ◦HT1 XH2︸ ︷︷ ︸

Y

)), (7)

where Y ∈ {0, 1}m1×m2 can be interpreted as a corre-spondence matrix for edges, i.e., yc1c2 = 1 if both nodesof cth1 edge in G1 are matched to the nodes of cth2 edge inG2. For instance, Fig. 2d illustrates the node and edgecorrespondence matrices for the matching defined inFig. 2a.

5 A UNIFIED FRAMEWORK FOR GMIn past decades, a myriad of GM methods have beenproposed for solving a variety of applications. Previ-ous GM methods varied in their objective formulationsand optimization strategies. Although much effort [18],[56]–[58] has been made on comparing GM methods,a mathematical framework to connect the various GMmethods in a coherent manner is lacking. This sectionderives a unified framework to cast many GM methodsusing our proposed factorization. Beyond the unificationof GM methods, we show that a close relation existsbetween GM and ICP objectives. This insight allows usto augment GM problems with additional geometricalconstraints that are necessary in many computer visionapplications. Moreover, we extend ICP to incorporatepair-wise constraints.

5.1 Unification of GM methodsEq. 3 summarizes one of the most important GM prob-lems studied in recent research on computer vision [27]–[30], [36]–[43]. However, the way of formulating GM isnot unique. In other applications [20]–[22], [25], [32], [33],the goal of GM was alternatively formulated as:

maxX∈Π

tr(KTpX) + tr(A1XA2X

T ), (8)

where the main difference from Eq. 3 is in the sec-ond term, which measures the edge compatibility asthe linear similarity between two weighted adjacencymatrices A1 ∈ [0, 1]n1×n1 and A2 ∈ [0, 1]n2×n2 , given thepermutation X. In this case, the topology of a graph isdefined by a weighted adjacency matrix, A ∈ [0, 1]n×n,where each element, aij , indicates the soft connectivitybetween the ith and the jth nodes.

GM can be formulated as the classical QAP, thathas been well studied in the literature of operationresearch since the 40′s. In operation research litera-ture [15], Eq. 3 and Eq. 8 are known as Lawler’sQAP [35] and Koopmans-Beckmann’s QAP [26] respec-tively. Please refer to Fig. 3 for a taxonomy of previousGM works in computer vision from a QAP perspective.Roughly speaking, Lawler’s QAP is more general thanKoopmans-Beckmann’s QAP because it has 1

2n21n

22 free

variables in K compared to 12n

21 + 1

2n22 free variables in

A1 and A2. In fact, Koopmans-Beckmann’s QAP canalways be represented as a special case of Lawler’sQAP if we assume K = A2 ⊗ A1. In this particularLawler’s QAP, the edge feature has to be a 1-D scalar(i.e., qc = aij) and the edge similarity has to be linear (i.e.,κqc1c2 = 〈q1

c1 ,q2c2〉). This special QAP can be limited when

representing the challenging matching problem encoun-tered in real cases. However, Lawler’s QAP considersmore general similarity measures including Gaussiankernels, which take the linear similarity as an instance.

In general, it is unclear on how to understand thecommonalities and differences between these two typesof GMs. In the following, we will propose a simpleand clean connection that exists using the proposedfactorization. Observe that Kq can always be factorized

Page 7: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 7

(e.g., SVD) as Kq = UVT , where U ∈ Rm1×c andV ∈ Rm2×c. Taking advantage of the low-rank structureof Kq , Eq. 7 can be re-formulated2 as follows:

Jgm(X) = tr(KTp X)

+ tr(

(UVT

)T

(GT1 XG2 ◦H

T1 XH2)

)= tr

(KTp X)

+ tr(

(c∑i=1

uivTi )T

(GT1 XG2 ◦H

T1 XH2)

)= tr

(KTp X)

+c∑i=1

tr(A

1iXA

2iX

T), (9)

where A1i = G1 diag(ui)H

T1 ∈ Rn1×n1 and A2

i =G2 diag(vi)H

T2 ∈ Rn2×n2 can be interpreted as adjacency

matrices for the two graphs. Eq. 9 reveals that c, the rankof the edge affinity matrix Kq , is a key characteristic ofGM methods. If c = 1, Lawler’s QAP downgrades to theKoopmans-Beckmann’s QAP. In other cases when c > 1,solving Lawler’s QAP can be cast as a joint optimizationof c Koopmans-Beckmann’s QAPs.

Despite its importance, the relation between Eq. 3and Eq. 8 has been rarely explored in the literature ofGM. In fact, many GM methods can benefit from thisconnection. Consider the special case when the nodeaffinity matrix Kp is empty and the edge affinity matrixKq = uvT has a rank-1 structure. According to thefactorization, we can conclude K = A2 ⊗ A1, whereA1 = G1 diag(u)HT

1 and A2 = G2 diag(v)HT2 . In this

case, the spectral matching algorithm [27] can be moreefficiently computed as eig(K) = eig(A2) ⊗ eig(A1),where eig(K) denotes the leading eigen-vector of K.

5.2 Connections between GM and ICPTo connect ICP with GM methods, we denote the nodeaffinity matrix as, Kp(T ) ∈ Rn1×n2 , where each element,κpi1i2(T ) = −‖p1

i1− τ(p2

i2)‖22, encodes the negative Eu-

clidean distance between nodes. Using this notation, theoriginal objective (Eq. 5) can be concisely formulated as:

Jicp(X, T ) = − tr(Kp(T )TX

)+ ψ(T ). (10)

Eq. 10 reveals a clean connection between the ICP andGM objective (Eq. 7). Given the transformation (T ),minimizing Jicp over X is equivalent to maximize thenode compatibility in Eq. 7. This optimization can be castas a linear matching problem, which can be efficientlyoptimized by the Hungarian algorithm if X is a one-to-one mapping or the winner-take-all manner if X isa many-to-one mapping. In general, however, the jointoptimization over X and T is non-convex, and no-closed form solution is known. Typically, some sort ofalternated minimization (e.g., EM, coordinate-descent) isneeded to find a local optima.

6 A PATH-FOLLOWING ALGORITHM

This section presents a path-following algorithm forapproximating the GM problem. Traditionally, Eq. 3 isapproached by a two-step scheme: (1) solving a continu-ously relaxed problem and (2) rounding the approximate

2. The equation, tr((uvT )T (A ◦ B)) = tr(diag(u)A diag(v)BT ),always holds for arbitrary u ∈ Rm, v ∈ Rn and A,B ∈ Rm×n.

solution to a binary one. This procedure has at leasttwo limitations. First, the continuous relaxation is non-convex and prone to local optima. Second, the roundingstep will inevitably cause accuracy loss because it isindependent of the cost function. Inspired by [33], [59],we address these two issues using a path-followingalgorithm by iteratively optimizing an interpolation oftwo relaxations. This new scheme has three theoret-ical advantages: (1) The optimization performance isinitialization-free; (2) The final solution is guaranteed toconverge to an integer one and therefore no roundingstep is needed; (3) The iteratively updating procedure re-sembles the idea of numerical continuation methods [60],which have been used for solving nonlinear systems ofequations for decades.

6.1 Convex and concave relaxationTo employ the path-following algorithm, we need tofind a convex and concave relaxation of Jgm(X). Thefactorization (Eq. 6) offers a simple and principled wayto derive these approximations. To do that, let’s firstintroduce an auxiliary function:

Jcon(X) =

c∑i=1

tr(A1iTXXTA1

i

)+ tr

(A2iX

TXA2iT).

As we know, a permutation matrix3 X ∈ Π is also anorthogonal one, satisfying XXT = XTX = I. Therefore,Jcon(X) = γ is always a constant,

γ =

c∑i=1

tr(A1iTA1i

)+ tr

(A2iA

2iT),

if X is a permutation matrix. Introducing into Jgm(X) theconstant term, 1

2Jcon(X), yields two equivalent objectives:

Jvex(X) = Jgm(X)− 1

2Jcon(X)

= tr(KTpX)− 1

2

c∑i=1

‖XTA1i −A2

iXT ‖2F , (11)

Jcav(X) = Jgm(X) +1

2Jcon(X)

= tr(KTpX)

+1

2

c∑i=1

‖XTA1i + A2

iXT ‖2F . (12)

The above two functions achieve the same value upto a constant difference γ

2 as Jgm(X) with respect toany permutation and orthogonal matrix. Yet, they aresupplementary to each other in the domain of doubly-stochastic matrices. In particular, the problems of max-imizing Jvex(X) and Jcav(X) are convex and concaverespectively. This is because their Hessians,

∇2XJvex(X) = −

c∑i=1

(I⊗A1i −A2

i ⊗ I)T (I⊗A1i −A2

i ⊗ I),

∇2XJcav(X) =

c∑i=1

(I⊗A1i + A2

i ⊗ I)T (I⊗A1i + A2

i ⊗ I),

are negative and positive semi-definite, respectively.

3. We need to introduce dummy selection variables if the two graphsare of different sizes.

Page 8: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 8

6.2 A path-following strategyThe main challenge of approximating GM comes from itsnon-convex objective and the combinatorial constraints.Inspired by [33], [59], we optimize Eq. 3 by iterativelyoptimizing a series of the following convex-concaveproblems [61]:

maxX∈D

Jα(X) = (1− α)Jvex(X) + αJcav(X), (13)

where α ∈ [0, 1] is a tradeoff between the convex relax-ation and the concave one.

The advantages of the path-following algorithm overconventional GM algorithms are three-fold: (1) The algo-rithm starts with a convex problem when α = 0. Becausethe problem is convex, a numerical optimizer can findthe global optimal solution independently of the initial-ization. (2) When α = 1 (at the end of the iterations) theproblem is concave. It has been well-studied [62] that anylocal optima of a concave optimization problem must lieat the extreme point of the constraint set. According tothe Birkhoff-von Neumann theorem [63], all the doubly-stochastic matrices (X ∈ D) form the Birkhoff polytope,whose vertices are precisely the permutation matrices(X ∈ Π). Therefore, the converged solution is guaranteedto be binary and no further discrete rounding step isneeded. (3) By smoothly increasing α from 0 to 1, thepath-following algorithm is more likely to find betterlocal optima than gradient-based method.

To have a better understanding, we provide an ex-ample of optimizing GM using this strategy in Fig. 6.The graphs to be matched are extracted from one pairof car images used in the experiment (See Section 8.4).Fig. 6a plots the real GM objective (Jgm) and the objective(Jα) to be optimized with respect to the change of α.The convex (Jvex) and concave (Jcav) components arealso plotted. Overall, four observations can be concludedlooking at this figure. First, Jα equals to Jvex and Jcavwhen α = 0 and α = 1 respectively. Second, thecurves of Jvex and Jcav are monotonically decreasingand increasing as α→ 1. This is because as α increases,the concave part gets more control than the concvex partto direct the optimization of Jα. Third, the curve of Jαintersects Jgm at α = 1

2 . This is true due to the fact thatthe following equation holds:

Jα(X) = Jgm(X) + (α− 1

2)Jcon(X), (14)

by plugging Eq. 11 and Eq. 12 into Eq. 13. At last, whenthe algorithm converges at α = 1, the gap between Jgmand the two relaxations is a constant γ

2 . This is becauseX turns to be a permutation matrix (Fig. 6c) and theequation, Jcon(X) = γ, always holds at α = 1.

6.3 Sub-problem optimizationFor a specific α, we optimize Jα(X) using the Frank-Wolfe’s algorithm (FW) [64], a simple yet powerfulmethod for constrained nonlinear programming. Unlikegradient-based methods that require a projection ontothe constraint, FW algorithm efficiently updates the so-lution by automatically staying in the feasible set.

Given an initial X0, FW successively updates thesolution as X∗ = X0 + ηD until converged. At each

(c)(a) (b)

#iterations

obje

ctiv

e va

lue

0.2 0.4 0.6 0.8 1.0

0

0.2

0.4

0.6

0.8

1

0 50 10022.15

22.16

22.17

22.18

0 50 10077.90

77.91

77.92

77.94

FWMFW

#iterations

FWMFW

0.05 0.20 0.35 0.50 0.80 1.00

−50

0

50

100

150

200

250

300

350

obje

ctiv

e va

lue

obje

ctiv

e va

lue

Fig. 6. An example of using the path-following algorithmfor optimizing the GM problem. (a) The comparison ofthe objectives during the optimization with respect to thechange of α. (b) The comparison of using FW and MFWfor optimizing Jα(X) at two αs. (c) The change of xij(each curve) as α→ 1.

step, it needs to compute two components: the optimaldirection D ∈ D and the optimal step size η ∈ [0, 1]. Tocompute Y, we solve the following LP problem:

maxD∈D

tr(∇Jα(X0)T (D−X0)

), (15)

whose objective is the first-order approximation of Jα(X)at the point X0. The gradients ∇Jα(X0) can be efficientlycomputed using matrix operation as follows:

∇Jα(X) = ∇Jgm(X) + (α−1

2)∇Jcon(X),

∇Jgm(X) = Kp + H1(GT1 XG2 ◦Kq)H

T2 + G1(H

T1 XH2 ◦Kq)G

T2 ,

∇Jcon(X) = 2(G1(H

T1 H1 ◦UU

T)G

T1 + G2(H

T2 H2 ◦VV

T)G

T2

)X.

Similar to [33], [39], we adopt the Hungarian algorithmto efficiently solve this linear programming.

Once the gradient is computed, the line search for theoptimal η can be found in closed form. The optimal η isthe maximum point of the following parabola:

Jα(X0 + ηD) = aη2 + bη + const, (16)

where a and b are the fixed 2nd and 1st order coefficientsrespectively. It is well-known that the optimum pointof the parabola must be one of the three points, η∗ ∈{0, 1,− b

2a}. A straightforward way to obtain the optimalη∗ is to compute Jα(X0 +ηD) at each point and pick theone yielding the largest value. In fact, it is more efficientto choose η∗ based on the geometry of the parabola.Fig. 7 illustrates the eight possible configurations of theparabola and the corresponding optimal η.

6.4 Other implementation detailsA similar path-following strategy was proposed in [33]and its performance over the state-of-the-art methodshas been therein demonstrated for solving the lessgeneral GM problem (Eq. 8). After exhaustive testingthis strategy for solving the most general GM problem(Eq. 3), we empirically found that results can be im-proved using the following two steps:

Convergence: Although the FW algorithm is easyto implement, it converges sub-linearly. To get fasterconvergence speed while keeping its advantages in ef-ficiency and low memory cost, we adopt a modified

Page 9: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 9

Fig. 7. Eight possible shapes of the parabola f(η) =aη2 + bη and the corresponding optimal step sizes η∗ =arg maxη∈[0,1] f(η).

Frank-Wolfe (MFW) [65] to find a better searching direc-tion D by a convex combination of previously solutions.Fig. 6b compares MFW with the FW for two αs. As wecan see, MFW converges much faster than FW.

Local vs global: Although the path-following strategyreturns an integer solution by smoothly tracking the localoptima in a convex space, it does not guarantee to obtainthe global optimal solution of the non-convex objectivefunction. An important reason is that at each step, itlocally optimizes over Jα(X) instead of the global oneJgm(X). It is possible that Jα(X) increases while Jgm(X)decreases. To escape from this phenomenon, we keepincreasing the global score of Jgm(X) by discarding thebad temporary solution that worsens the score of Jgm(X)and computing an alternative one by applying one stepof FW for optimizing Jgm(X).

Algorithm 1 summarizes the work-flow of our algo-rithm. The initial X can be an arbitrary doubly stochasticmatrix. The complexity of our algorithm can be roughlycalculated as O

(T (τhun + τ∇ + τλ) + τsvd

), where T is

the number of iterations for the FW, and τsvd = m1m22

is the cost of computing the SVD of Kq . The Hungarianalgorithm can be finished in τhun = max(n3

1, n32). The

gradient of ∇Jα and the line search of λ incur the samecomputational cost, τ∇ = τλ = m1m2.

7 DEFORMABLE GRAPH MATCHING

GM has been widely used as a general tool in matchingpoint sets with similar structures. In many computervision applications, it is often required that the matchingbetween two sets is constrained by a geometric trans-formation. It is unclear how to incorporate geometricconstraints into GM methods. This section describes de-formable graph matching (DGM), that incorporates rigidand non-rigid transformations into graph matching.

7.1 Objective functionTo simplify the discussion and to be consistent withICP, we compute the node feature of each graph G =

Algorithm 1: A path-following algorithminput : Kp, Kq , G1, H1, G2, H2, δ, X0

output: X

1 Initialize X to be a doubly stochastic matrix;2 Factorize Kq = UVT ;3 for α = 0 : δ : 1 do Path-following4 if α = 0.5 and Jgm(X) < Jgm(X0) then5 Update X← X0;

6 Optimize Eq. 13 via MFW to obtain X∗;7 if Jgm(X∗) < Jgm(X) then8 Optimize Eq. 3 via one step of FW to obtain

X∗;

9 Update X← X∗;

{P,Q,G,H} simply as the node coordinates, P =[p1, · · · ,pn] ∈ Rd×n. Similarly, the edge features Q =[q1, · · · ,qm] ∈ Rd×m are computed as the coordinatedifference between the connected nodes, i.e., qc = pi−pj ,where gic = hjc = 1. In this case, the edge featurecan be conveniently computed in a matrix form as,Q = P(G−H).

Suppose that we are given two graphs, G1 ={P1,Q1,G1,H1} and G2 = {P2,Q2,G2,H2}, and ageometrical transformation defined on points by τ(·).Similar to ICP, we compute the node affinity Kp(T ) ∈Rn1×n2 and the edge affinity Kq(T ) ∈ Rm1×m2 as linearfunctions of the Euclidean distance, i.e.:

κpi1i2(T ) = −‖p1i1 − τ(p2

i2)‖22,κqc1c2(T ) = β − ‖ (p1

i1 − p1j1)︸ ︷︷ ︸

q1c1

− (τ(p2i2)− τ(p2

j2))︸ ︷︷ ︸τ(q2

c2)

‖22, (17)

where β is chosen to be reasonably large to ensure thatthe pairwise affinity is greater than zero.

Recall that the factorization (Eq. 6) reveals that thegoal of GM (Eq. 7) is similar to ICP (Eq. 10). In orderto make GM more robust to geometric deformations,DGM aims to find the optimal correspondence X as wellas the optimal transformation T such that the globalconsistency can be maximized:

maxX∈Π,T ∈Ψ

Jdgm(X, T ) = tr(Kp(T )TX

)+ λ tr

(Kq(T )TY

)− ψ(T ), (18)

where λ ≥ 0 is used to balance between the importanceof the node and edge consistency. Similar to ICP, ψ(T )and Ψ are used to constrain and penalize the transfor-mation parameter. Eq. 18 extends ICP by adding thetransformation. In particular, if λ = 0, solving DGM isequivalent to ICP. In other case when λ > 0 and T isknown, solving DGM is identical to a GM problem.

7.2 OptimizationDue to the non-convex nature of the objective, we op-timize Jdgm by alternatively solving the correspondence(X) and the transformation parameter (T ). The initializa-tion is important for the performance of DGM. However,

Page 10: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 10

how to select a good initialization is beyond the scope ofthis paper and we simply set the initial transformationas an identity one, i.e., τ(p) = p.

Optimizing Eq. 18 consists of alternation betweenoptimizing for the correspondence and the geometrictransformation. Given the transformation T , DGM isequivalent to a traditional GM problem. To find the nodecorrespondence X, we adopt the path-following algo-rithm by optimizing Eq. 13. Given the correspondencematrix X, the optimization over the transformation pa-rameter T is similar to ICP. After some linear algebra,it can be shown that for the following transformations,the parameter can be computed in closed-form.

Similarity transformation: According to the definitionin Eq. 17, the similarity transformation of the edge fea-ture is τ(q) = sRq or τ(Q) = sRQ. Since the translationt only affects the node feature, the optimal t∗ can becomputed as follows once the optimal scalar s∗ androtation R∗ are known:

t∗ = p1 − s∗R∗p2, where p1 =P1X1n2

1Tn1X1n2

, p2 =P2X

T1n1

1Tn1X1n2

.

After centerizing the data, P1 = P1 − p11Tn1

and P2 =P2 − p21

Tn2

, the optimal rotation and scaling can beachieved at:

R∗

= U diag(1, · · · , |UVT |)VT

,

s∗

=tr(Σ)

tr(1n1×2(P2 ◦ P2)XT

)+ λq tr

(1m1×2(Q2 ◦Q2)YT

) ,where UΣVT = P1XPT

2 + λqQ1YQT2 .

Affine transformation: After applying the transforma-tion, the edge feature becomes τ(q) = Vq or τ(Q) = VQ.Similar to the similarity transformation, the optimaltranslation is dependent on the optimal affine matrix V∗:

t∗ = p1 −V∗p2, where p1 =P1X1n2

1Tn1X1n2

, p2 =P2X

T1n1

1Tn1X1n2

.

After centerizing all the data, we can compute the opti-mal affine transformation:

V∗ = AB−1, where A = P1XPT2 + λqQ1YQT

2 ,

B = P2 diag(XT1n1)PT2 + λqQ2 diag(YT1m1)QT

2 .

Non-rigid transformation: According to the definitionin Eq. 17, the non-rigid transformation of the edgefeatures is τ(qc) = qc+W(φ(pi)−φ(pj)), where edge qcconnects between point pi and pj . In matrix form, wecan show that τ(Q) = Q+WLq , where Lq = Lp(G−H).Setting the gradient of Jdgm(W) to zero yields the opti-mal weight matrix W as W∗ = AB−1, where

A = P1X−P2 diag(XT

1n1) + λqQ1Y(G2 −H2)

T

− λqQ2 diag(YT

1m1)(G2 −H2)

T,

B = Lp diag(XT

1n1) + λwIn2

+ λqLq diag(YT

1m1)(G2 −H2)

T.

7.3 Comparison with ICPIt is well known that the performance of ICP algorithmslargely depends on the effectiveness of the initializationstep. In the following example, we empirically illustratehow by adding additional pairwise constrains, DGMis less sensitive to the initialization. Fig. 8a illustratesthe problem of aligning two fish shapes under varying

2 iterations (DGM) 4 iterations (DGM) 8 iterations (DGM) 24 iterations (DGM)

(c)

(a) (b)−1 0 1

−1

0

1

−1 0 1

−1

0

1

−2 −1 0 1

−1

0

1

0.251.5

34

−120

−60

0

0.251.5

34

−120

−60

0

0.251.5

34

−120

−60

0

0.251.5

34

−120

−60

0

0.251.5

34

−120

−60

0

24 iterations (ICP)

scale

angle angle angle angle

scale

scale

scale

angle

scale

Fig. 8. Comparison between ICP and DGM for aligningshapes for several initial values of rotation and scaleparameters. (a) Examples of initializations for differentangles and scales. (b) Objective surfaces obtained by ICP.(c) Objective surfaces obtained by DGM.

values for the initial rotation and scale parameters. Asshown in Fig. 8b, ICP gets trapped into a local optimaif the orientation gap is larger than 1

3π (the error shouldbe 0). Similarly, DGM fails for large orientation gap aftertwo iterations (the left column of Fig. 8c). However,as the number of iterations increasing, DGM is able tomatch shapes with very large deformation in rotationand scales. After 24 iterations, DGM ultimately findsthe optimal matching for all the initializations (the rightcolumn of Fig. 8c). This experiment shows that addingpairwise constraints can make the ICP algorithm morerobust to the problem of local optima.

8 EXPERIMENTS

This section reports experimental results on four bench-mark datasets and compares FGM to several state-of-the-art methods for GM and ICP. In the first threeexperiments, we test the performance of using the path-following algorithm for optimizing GM problems. In thelast experiment we add a known geometrical transfor-mation between graphs and compare with ICP on theproblem of matching rigid and non-rigid shapes.

8.1 Baselines and evaluation metricsIn the experiment, we compared against 8 algorithms.

Hungarian algorithm (HUN): HUN [16] is the al-gorithm that solves the linear assignment problem inpolynomial time. We use it as a baseline to find thecorrespondence X that maximize the 1st-order term ofGM’s objective: tr(KpX

T ).Graduated assignment (GA): GA [36] performs gradi-

ent ascent on a relaxation of Eq. 3 driven by an annealingschedule. At each step, it maximizes a Taylor expansionof the non-convex QP around the previous approximatesolution. The accuracy of the approximation is controlledby a continuation parameter, βt+1 ← αβt ≤ βmax. In allexperiments, we set α = 1.075, β0 = .5 and βmax = 200.

Page 11: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 11

Spectral matching (SM): SM [27] optimizes a relaxedproblem of Eq. 3 that drops the affine constraints andintroduces a unit-length constraint on X, that is:

maxX

Jgm(X), s. t. ‖ vec(X)‖22 = 1.

The globally optimal solution of the relaxed problem isthe leading eigenvector of K.

Spectral matching with affine constraints (SMAC):SMAC [28] adds affine constraints to SM maximizing:

maxX

Jgm(X), s. t. A vec(X) = b and ‖ vec(X)‖22 = 1.

The solution is also an eigenvalue problem.Integer projected fixed point method (IPFP): IPFP [39]

can take any continuous or discrete solution as inputsand iteratively improve the solution. In our experiments,we implemented two versions: (1) IPFP-U, which startsfrom the same initial X as our method; (2) IPFP-S, whichis initialized by SM.

Probabilistic matching (PM): PM [42] designsminX∈D S(Y‖X), a convex objective function that canbe globally optimized, where S(·) denotes the relativeentropy based on the Kullback-Leibler divergence andY is calculated by marginalizing K.

Re-weighted random walk matching (RRWM):RRWM [43] introduces a random walk view on theproblem and obtains the solution by simulating randomwalks with re-weighting jumps enforcing the matchingconstraints on the association graph. We fixed its param-eters α = 0.2 and β = 30 in all experiments.

Concave optimization (CAV): Concave optimizationwas firstly used in [59] for optimizing GM. Comparedto conventional method, concave optimization is guar-anteed to converge at an integer solution and no furtherrounding step is needed. Similar to [59], we imple-mented CAV by employing the Frank-Wolfe algorithmto iteratively optimizing the concave relaxation (Eq. 12)of GM. To demonstrate the benefit of the path-followingprocedure, CAV was initialized by solving the same con-vex relaxation (Eq. 11) as FGM-D. The main differencebetween CAV and FGM-D is that the latter employs apath-following strategy to iteratively improve the result.

Degenerate similiary (DEN): To evaluate the effect ofeach component in the factorization (Eq. 6), we gener-ated a degenerate edge similarity by setting Kq = 1 sothat the edge similarly is only dependent on the topologyof the graph defined by G1, G2, H1 and H2. DEN wasoptimized using the same path-following method.

Factorized graph matching (FGM): We implementedtwo versions of our method, FGM-U for undirectedgraphs with only symmetric edge features and FGM-D for directed graphs with either symmetric or asym-metric edge features. FGM-U was first proposed in [17],which contains a similar factorization to Eq. 6. Themain difference between FGM-U and FGM-D is that theformer factorization can only be used for undirectedgraphs, while FGM-D can handle both undirected anddirected graphs. To use FGM-U for matching graphswith directed edges, we converted the directed edges

to undirected edges and computed the edge affinity asthe average value of the original affinity between thedirected edges. The parameters were fixed to δ = 0.01.

We used codes from the author’s websites for allmethods. The code was implemented in Matlab on alaptop platform with 2.4G Intel CPU and 4G memory.FGM-U and FGM-D were able to obtain the solutionwithin a minute for graphs with 50 nodes.

We evaluated both the matching accuracy and theobjective score for the comparison of performance. Thematching accuracy, acc = tr(XT

algXtru)/ tr(1n2×n1Xtru),

is calculated by computing the consistent matches be-tween the correspondence matrix Xalg given by algo-rithm and ground-truth Xtru. The objective score, obj =Jgm(Xalg)/Jgm(Xours) is computed as the ratio betweenthe objective values of FGM-D and other algorithms.

8.2 Synthetic datasetThis experiment performed a comparative evaluation ofGM algorithms on randomly synthesized graphs follow-ing the experimental protocol of [28], [36], [43]. For eachtrial, we constructed two identical graphs, G1 and G2,each of which consists of 20 inlier nodes and later weadded nout outlier nodes in both graphs. For each pairof nodes, the edge was randomly generated according tothe edge density parameter ρ ∈ [0, 1]. Each edge in thefirst graph was assigned a random edge score distributeduniformly as q1

c ∼ U(0, 1) and the corresponding edgeq2c = q1

c +ε in the second graph was perturbed by addinga random Gaussian noise ε ∼ N (0, σ2). Notice that theedge feature was asymmetrical. The edge affinity Kq wascomputed as kqc1c2 = exp(−(q1

c1−q2c2)2/0.15) and the node

affinity Kp was set to zero.The experiment tested the performance of GM meth-

ods under three parameter settings. For each setting, wegenerated 100 different pairs of graphs and evaluated theaverage accuracy and objective ratio. In the first setting(Fig. 9a), we increased the number of outliers from 0 to20 while fixing the noise σ = 0 and considering onlyfully connected graphs (i.e., ρ = 1). In the second case(Fig. 9b), we perturbed the edge weights by changingthe noise parameter σ from 0 to 0.2, while fixing theother two parameters nout = 0 and ρ = 1. In the lastcase (Fig. 9c), we verified the performance of matchingsparse graphs by varying ρ from 1 to 0.3. In most cases,FGM-D achieved the best performance over all otheralgorithms in terms of both accuracy and objective ratio.RRWM and CAV are our closest competitors. FGM-U did not achieve comparatively good results becauseit solved an undirected approximation of the originalproblem. Initialized by the same convex solution asFGM-D, however, CAV performed worse than FGM-Ddue to the lack of the robust path-following strategy.We also found by only relying on the graph topology(Kq = 1), DEN would perform dramatically worse thanother baselines even using the path-following algorithm.This indicates the importance of designing a good edgesimilarity for the problem.

Page 12: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 12

obje

citv

e ra

tio(a) (c)(b)

accu

racy

#outliers #outliers

accu

racy

edge noise edge noise densitydensity

obje

citv

e ra

tio

obje

citv

e ra

tio

accu

racy

0 4 8 12 16 200

0.2

0.4

0.6

0.8

1

0 4 8 12 16 200.4

0.5

0.6

0.7

0.8

0.9

1

0.00 0.04 0.08 0.12 0.16 0.200

0.2

0.4

0.6

0.8

1

0.00 0.04 0.08 0.12 0.16 0.200.4

0.5

0.6

0.7

0.8

0.9

1

1.0 0.8 0.6 0.40

0.2

0.4

0.6

0.8

1

1.0 0.8 0.6 0.40.4

0.5

0.6

0.7

0.8

0.9

1

GA PM SM SMAC IPFP−U IPFP−S RRWM CAV DEN FGM−U FGM−D

Fig. 9. Comparison of GM methods on synthetic datasets. (a) Performance as a function of the outlier number (nout).(b) Performance as a function of the edge noise (σ). (c) Performance as a function of the density of edges (ρ).

#ele

men

ts

(a)10 1000 2000 3000 4000

105

1010

1015

factorization

16 GB Memory

(b) #nodes

Fig. 10. Memory cost for GM problems. (a) An exampleof pair of graphs with 100 nodes. The edge connectionis computed using Delaunay triangulation. (b) Number ofelements to represent the affinity matrix.

FGM provides a superior matching performance. Inaddition, FGM uses less memory to represent the affinitymatrix. Fig. 10a plots two random graphs with 100nodes. We computed the graph connection using De-launay triangulation. Fig. 10b compares the number ofelements in the original affinity matrix K with the sixsmaller components. The proposed factorization allowsto represent large-scale GM with much less memory.

8.3 House image datasetThe CMU house image sequence4 was commonlyused to test the performance of graph matching algo-rithms [40], [43], [45], [46]. This dataset consists of 111frames of a house, each of which has been manuallylabeled with 30 landmarks. We used Delaunay trian-gulation to connect the landmarks. The edge weightqc was computed as the pairwise distance between theconnected nodes. Due to the symmetry of distance-basedfeature, the edge was undirected. Given an image pair,the edge-affinity matrix Kq was computed by kqc1c2 =exp(−(q1

c1−q2c2)2/2500) and the node-affinity Kp was set

to zero. We tested the performance of all methods as afunction of the separation between frames. We matchedall possible image pairs, spaced by 0 : 10 : 90 framesand computed the average matching accuracy and ob-jective ratio per sequence gap. Fig. 11a demonstrates anexample pair of two frames.

We tested the performance of GM methods under twoscenarios. In the first case (Fig. 11b) we used all 30 nodes

4. http://vasc.ri.cmu.edu//idb/html/motion/house

(i.e., landmarks) and in the second one (Fig. 11c) wematched sub-graphs by randomly picking 25 landmarks.First of all, DEN achieved the worst performance, be-cause it only relies on the graph topology to find thecorrespondence. Secondly, IPFP-S, RRWM, CAV, FGM-U and FGM-D almost obtained perfect matching of theoriginal graphs in the first case (Fig. 11b). As somenodes became invisible and the graph got corrupted(Fig. 11c), the performance of all the methods degraded.However, FGM-U and FGM-D consistently achieved thebest performance. The results demonstrate the advan-tages of the path-following algorithm over other state-of-the-art methods in solving general GM problems. Inaddition, it is interesting to notice that FGM-U and FGM-D performed similarly in both cases. This is becauseFGM-U can be considered as a special case of FGM-Dwhen the graph consists of undirected edges.

8.4 Car and motorbike image dataset

The car and motorbike image dataset was created in [44].This dataset consists of 30 pairs of car images and 20pairs of motorbike images taken from the PASCAL chal-lenges. Each pair contains 30 ∼ 60 ground-truth corre-spondences. We computed for each node the feature, pi,as the orientation of the normal vector to the contour. Weadopted the Delaunay triangulation to build the graph.In this experiment, we considered the most generalgraph where the edge was directed and the edge featurewas asymmetrical. More specifically, each edge was rep-resented by a couple of values, qc = [dc, θc]

T , where dcwas the pairwise distance between the connected nodesand θc was the angle between the edge and the horizon-tal line. Thus, for each pair of images, we computed thenode affinity as kpij = exp(−|pi−pj |) and the edge affinityas kqc1c2 = exp(−σ|dc1 − dc2 | − (1 − σ)|θc1 − θc2 |), whereσ = [0, 1] is the parameter that is a trade-off betweenthe distance and angle features. We set σ = 1

2 in theexperiments to provide the same importance to the twoedge features. Fig. 12a and Fig. 12b demonstrate examplepairs of car and motorbike images respectively. To testthe performance against noise, we randomly selected0 ∼ 20 outlier nodes from the background. Similarly, wecompared FGM-D against 11 state-of-the-art methods.However, we were unable to directly use FGM-U tomatch directed graphs. Therefore, we ran FGM-U on

Page 13: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 13

0 30 60 90

0.2

0.4

0.6

0.8

1

0 30 60 90

0.2

0.4

0.6

0.8

1

0 30 60 90

0.2

0.4

0.6

0.8

1

0 30 60 90

0.2

0.4

0.6

0.8

1

(c)(b)(a) baseline baseline baseline baseline

accu

racy

obje

citv

e ra

tio

accu

racy

obje

citv

e ra

tio

GA PM SM SMAC IPFP−U IPFP−S RRWM CAV DEN FGM−U FGM−D

Fig. 11. Comparison of GM methods on the CMU house datasets. (a) An example pair of frames with thecorrespondence generated by our method, where the blue lines indicate incorrect matches. (b) Performance of severalalgorithms using 30 nodes. (c) Performance using 25 nodes.

(d)(c)

(a)

(b)

#outliers #outliers #outliers #outliers

accu

racy

obje

citv

e ra

tio

accu

racy

obje

citv

e ra

tio

0 4 8 12 16 200

0.2

0.4

0.6

0.8

1

0 4 8 12 16 200.2

0.4

0.6

0.8

1

0 4 8 12 16 200

0.2

0.4

0.6

0.8

1

0 4 8 12 16 200.2

0.4

0.6

0.8

1

HUN GA PM SM SMAC IPFP−U IPFP−S RRWM CAV DEN FGM−U FGM−D

Fig. 12. Comparison of GM methods for the car and motorbike dataset. (a) An example pair of car images with thecorrespondence generated by our method, where the blue lines indicate incorrect matches. (b) An example pair ofmotorbike images. (c) Performance as a function of the outlier number for the car images. (d) Performance as afunction of the outlier number for the motorbike images.

an approximated undirected graph, where we computedthe feature of the undirected edge as the average valueof the original affinities between the directed edges.

As observed in Fig. 12c-d, only using the node sim-ilarity (HUN) or a degenerate edge similarity (DEN)cannot perform well on this challenging dataset. Thisindicates the importance of designing good similarityfunctions to solve matching problem in realistic datasets.In addition, the proposed FGM-D consistently outper-formed other methods on both datasets. Based on similarpath-following strategy, however, FGM-U was unable toachieve the same accuracy because the graph consistedof directed edges and FGM-U was only applicable toundirected edges. It is worth to point that CAV per-formed not very well compared to FGM-D although itsolved the same initial convex problem and optimizedthe same concave relaxation at the final step. This resultas well as the previous experiment clearly demonstratedthe path-following algorithm used by FGM-D provideda better optimization strategy than existing approaches.Finally, it is important to remind the reader that withoutthe factorization proposed in this work it is not possibleto apply the path-following method to general graphs.

Fig. 13a-b evaluate the performance of GM methodson the car and motorbike datasets for different weights(σ) between the distance and angle edge features. Whenσ = 0, the edge similarity is solely dependent on theangle feature. On the other hand, when σ = 1, the edgesimilarity is computed from edge distances. It can beobserved that for most methods, the best performance is

(a)

accu

racy

HUNGAPMSMSMACIPFP−UIPFP−SRRWMCAVDENFGM−UFGM−D

0.0 0.2 0.4 0.6 0.8 1.00

0.2

0.4

0.6

0.8

1

0.0 0.2 0.4 0.6 0.8 1.00

0.2

0.4

0.6

0.8

1

accu

racy

(b)

Fig. 13. Comparison of GM methods using different fea-ture weights (σ) on the car (a) and motorbike (b) datasets.

achieved around σ = 0.5. This indicates a combination ofdistance and angle features provides a better similarityfor this challenging task. In addition, our method (FGM-D) consistently achieved the best performance comparedto others for most settings. Interestingly, when σ is closeto 1 and the similarity is becoming a symmetric one(using distance features), our un-directed method (FGM-U) slightly out-performed the more general FGM-D.

8.5 Fish and character shape dataset

The UCF shape dataset [53] has been widely used forcomparing ICP algorithms. In our experiment, we usedtwo different templates. The first one has 91 pointssampled from the outer contour of a tropical fish. Thesecond one consist of 105 points sampled from a Chinesecharacter. For each template, we designed two series ofexperiments to measure the robustness of an algorithm

Page 14: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 14

(c) (e)0

0

0.2

0.4

0.6

0.8

1

accu

racy

accu

racy

0 7 13 200

0.2

0.4

0.6

0.8

1

00

0.2

0.4

0.6

0.8

1

rotation #outliers

accu

racy

accu

racy

#outliers

(a)

(b) (d) (f)

0 7 13 200

0.2

0.4

0.6

0.8

1

ICP CPD DGM

rotation

Fig. 14. Comparison between DGM and ICP on theUCF shape datasets. (a-b) Two example pairs of shapesaligned using DGM. The red shape (left) is a rotatedversion of the blue one (right) by 2

3π and 20 randomoutliers were added. (c-d) Matching performance as afunction of the initial rotations. (e-f) Matching performanceas a function of the number of outliers.

under different deformations and outliers. In the firstseries of experiments, we rotated the template with avarying degree (between 0 and π). In the second set ofexperiments, a varying amount of outliers (between 0and 20) were randomly added into the bounding boxof template. For instance, Fig. 14a-b illustrate two pairsof example shapes with 20 outliers. We repeated therandom generation 50 times for different levels of noiseand compared DGM with ICP and the coherent pointdrifting (CPD) [48]. The ICP algorithm was implementedby ourselves and CPD implementation was taken fromthe authors’ website. We initialized all the algorithmswith the same transformation, i.e., τ(p) = p. In DGM,Delaunay triangulation was employed to compute thegraph structure. Recall that DGM simultaneously com-putes the correspondence and the rotation.

As shown in Fig. 14c-d, the proposed DGM can per-fectly match the shapes across all the rotations with-out outliers, whereas both ICP and CPD get trappedin the local optimal when the rotation is larger than23π. When the number of outliers increases, DGM canstill match most points under large rotation at 2

3π. Incontrast, ICP and CPD drastically failed in presence ofoutliers and large rotations (Fig. 14e-f). In addition toa similarity transform, DGM can also incorporate non-rigid transformations in GM. Similar to the rigid casedescribed in the main submission, we synthesized thenon-rigid shape from the UCF shape dataset [53]. Togenerate the nonrigid transformation, we followed asimilar setting in [48], where the domain of the pointset was parameterized by a mesh of control points. Thedeformation of the mesh was modeled as an spline-based interpolation of the perturbation of the controlpoints. We repeated the random generation 50 times. Forinstance, Fig. 15a illustrates a synthetic pair of graphs.

We compared DGM with other two state-of-the-art

SM RRWM

FGM-D DGM −2 −1 0 1 2−2

−1

0

1

2

3

4

0

0.2

0.4

0.6

0.8

1

SM RRWM FGM-D DGM

(a)

(b) (c)

Fig. 15. Comparison between DGM and GM methods foraligning non-rigidly deformed shapes. (a) An example oftwo fishes, where the red one is generated by a non-rigid transformation from the blue one. (b) Accuracy. (c)Results, where the green and black lines indicate correctand incorrect correspondence respectively.

GM methods: SM [27] and RRWM [43]. In addition, wetested the performance of our algorithm (FGM-D) onlyusing the path-following algorithm for computing thecorrespondence but without estimating the transforma-tion. As shown in Fig. 15b-c, FGM-D performed betterthan the other two GM methods. This is due to the path-following algorithm that is more accurate in optimizingGM problems. In addition, DGM significantly improvedFGM-D by estimating the transformation.

9 CONCLUSIONS AND FUTURE WORK

This paper proposes FGM, a new framework for under-standing and optimizing GM problems. The key ideais a novel factorization of the pairwise affinity matrixinto six smaller components. Four main benefits followfrom factorizing the affinity matrix. First, there is no needto explicitly compute the costly affinity matrix. Second,it allows for a unification of GM methods as well asexisting ICP algorithms. Third, the factorization enablesthe use of path-following algorithms that improve theperformance of GM methods. Finally, it becomes easierto incorporate global geometrical transformation in GM.

The most computationally consuming part of the al-gorithm is the large number of iterations needed forFW method to converge when Jα is close to a convexfunction. Therefore, more advanced techniques (e.g., con-jugate gradient) can be used to speedup FW. In addition,we are exploring the extension of FGM to other higher-order graph matching problems [42], [46] as well aslearning parameters for graph matching [44], [45].

Beyond GM problems, there are other computer visionproblems using pairwise constraints that can benefitfor the proposed factorization. A popular problem issegmentation using Markov random fields (MRFs) [66].Given n1 nodes and n2 labels, MRF problems consist infinding the optimal labeling matrix X ∈ Φ that definesa many-to-one mapping from nodes to labels, i.e.:

Φ = {X|X ∈ {0, 1}n1×n2 ,X1n2 = 1n1}. (19)

Page 15: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 15

Fig. 16. An MRF example of four nodes (red circles) withthree labels (blue boxes). Similar to GM, the affinity matrixK can be factorized into six smaller matrices.

Fig. 16 illustrates a simple MRF problem, where fournodes need to be assigned to three labels. Under themany-to-one constraint (Eq. 19), MRF can be formalizedas a similar QAP as follows:

maxX∈Φ

Jmrf (X) = vec(X)TK vec(X).

Although it is originally designed for GM problems,our factorization (Eq. 6) can be applied to model andunderstand MRF problems as well. For instance, the 12-by-12 K modeling the problem shown in Fig. 16 can bedecomposed into six components. In particular, G1 andH1 are two 4-by-8 binary matrices, encoding the node-edge incidence of the node graph. G2 and H2 are two3-by-6 binary ones, describing the fully connected labelgraph. Kp and Kq are used to encode the first-order andsecond-order potentials respectively. In future work, weplan to further explore this connection.

Acknowledgments This work was partially supportedby the National Science Foundation under Grant No.EEEC-0540865, RI-1116583 and CPS-0931999. Any opin-ions, findings, and conclusions expressed in this materialare those of the author(s) and do not necessarily reflectthe views of the National Science Foundation.

REFERENCES[1] H. Jiang, S. X. Yu, and D. R. Martin, “Linear scale and rotation

invariant matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33,no. 7, pp. 1339–1355, 2011.

[2] R. Szeliski, Computer vision: algorithms and applications. Springer,2010.

[3] K. Grauman and T. Darrell, “The pyramid match kernel: Discrim-inative classification with sets of image features,” in ICCV, 2005.

[4] M. A. Fischler and R. C. Bolles, “Random sample consensus: Aparadigm for model fitting with applications to image analysisand automated cartography,” Commun. ACM, vol. 24, no. 6, pp.381–395, 1981.

[5] P. J. Besl and N. D. McKay, “A method for registration of 3-dshapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp.239–256, 1992.

[6] A. C. Berg, T. L. Berg, and J. Malik, “Shape matching and objectrecognition using low distortion correspondences,” in CVPR,2005.

[7] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer,“Articulated shape matching using Laplacian eigenfunctions andunsupervised point registration,” in CVPR, 2008.

[8] M. Leordeanu, M. Hebert, and R. Sukthankar, “Beyond localappearance: Category recognition from pairwise interactions ofsimple features,” in CVPR, 2007.

[9] O. Duchenne, A. Joulin, and J. Ponce, “A graph-matching kernelfor object categorization,” in ICCV, 2011.

[10] J. Hays, M. Leordeanu, A. A. Efros, and Y. Liu, “Discoveringtexture regularity as a higher-order correspondence problem,” inECCV, 2006.

[11] W. Brendel and S. Todorovic, “Learning spatiotemporal graphs ofhuman activities,” in ICCV, 2011.

[12] U. Gaur, Y. Zhu, B. Song, and A. Roy-Chowdhury, “A “stringof feature graphs model” for recognition of complex activities innatural videos,” in ICCV, 2011.

[13] N. Quadrianto, A. J. Smola, L. Song, and T. Tuytelaars, “Kernel-ized sorting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 10,pp. 1809–1821, 2010.

[14] M. Zaslavskiy, F. R. Bach, and J.-P. Vert, “Global alignment ofprotein-protein interaction networks by graph matching meth-ods,” Bioinformatics, vol. 25, no. 12, pp. 1259–1267, 2009.

[15] E. M. Loiola, N. M. De Abreu, P. O. Boaventura, P. Hahn, and T. M.Querido, “A survey for the quadratic assignment problem,” Eur.J. Oper. Res., vol. 176, no. 2, pp. 657–690, 2007.

[16] R. Burkard, M. DellAmico, and S. Martello, Assignment Problems.SIAM, 2009.

[17] F. Zhou and F. De la Torre, “Factorized graph matching,” in CVPR,2012.

[18] D. Conte, P. Foggia, C. Sansone, and M. Vento, “Thirty years ofgraph matching in pattern recognition,” Int. J. Pattern Recognit.and Artificial Intelligence, vol. 18, no. 3, pp. 265–298, 2004.

[19] J. R. Ullmann, “An algorithm for subgraph isomorphism,” J.ACM, vol. 23, pp. 31–42, 1976.

[20] M. Pelillo, “Replicator equations, maximal cliques, and graphisomorphism,” Neural Comput., vol. 11, no. 8, pp. 1933–1955, 1999.

[21] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, “A (sub)graphisomorphism algorithm for matching large graphs,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 26, no. 10, pp. 1367–1372, 2004.

[22] S. Umeyama, “An eigendecomposition approach to weightedgraph matching problems,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 10, no. 5, pp. 695–703, 1988.

[23] G. Scott and H. Longuet-Higgins, “An algorithm for associatingthe features of two images,” Biological Sciences, vol. 244, no. 1309,pp. 21–26, 1991.

[24] L. S. Shapiro and M. Brady, “Feature-based correspondence: aneigenvector approach,” Image Vision Comput., vol. 10, no. 5, pp.283–288, 1992.

[25] T. Caelli and S. Kosinov, “An eigenspace projection clusteringmethod for inexact graph matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 26, no. 4, pp. 515–519, 2004.

[26] T. C. Koopmans and M. Beckmann, “Assignment problems andthe location of economic activities,” Econometrica, vol. 25, no. 1,pp. 53–76, 1957.

[27] M. Leordeanu and M. Hebert, “A spectral technique for corre-spondence problems using pairwise constraints,” in ICCV, 2005.

[28] T. Cour, P. Srinivasan, and J. Shi, “Balanced graph matching,” inNIPS, 2006.

[29] P. H. S. Torr, “Solving Markov random fields using semidefiniteprogramming,” in AISTATS, 2003.

[30] C. Schellewald and C. Schnorr, “Probabilistic subgraph matchingbased on convex relaxation,” in EMMCVPR, 2005.

[31] M. X. Goemans and D. P. Williamson, “Improved approximationalgorithms for maximum cut and satisfiability problems usingsemidefinite programming,” J. ACM, vol. 42, no. 6, pp. 1115–1145,1995.

[32] H. A. Almohamad and S. O. Duffuaa, “A linear programmingapproach for the weighted graph matching problem,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 15, no. 5, pp. 522–525, 1993.

[33] M. Zaslavskiy, F. R. Bach, and J.-P. Vert, “A path followingalgorithm for the graph matching problem,” IEEE Trans. PatternAnal. Mach. Intell., vol. 31, no. 12, pp. 2227–2242, 2009.

[34] Z. Liu, H. Qiao, and L. Xu, “An extended path following al-gorithm for graph-matching problem,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 34, no. 7, pp. 1451–1456, 2012.

[35] E. L. Lawler, “The quadratic assignment problem,” ManagementScience, vol. 9, no. 4, pp. 586–599, 1963.

[36] S. Gold and A. Rangarajan, “A graduated assignment algorithmfor graph matching,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 18, no. 4, pp. 377–388, 1996.

[37] Y. Tian, J. Yan, H. Zhang, Y. Zhang, X. Yang, and H. Zha,“On the convergence of graph matching: Graduated assignmentrevisited,” in ECCV, 2012.

[38] B. J. van Wyk and M. A. van Wyk, “A POCS-based graph match-ing algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26,no. 11, pp. 1526–1530, 2004.

[39] M. Leordeanu, M. Hebert, and R. Sukthankar, “An integer pro-jected fixed point method for graph matching and MAP infer-ence,” in NIPS, 2009.

Page 16: SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER ... · 2016. 5. 3. · SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2015 16

[40] L. Torresani, V. Kolmogorov, and C. Rother, “Feature correspon-dence via graph matching: Models and global optimization,” inECCV, 2008.

[41] A. Egozi, Y. Keller, and H. Guterman, “A probabilistic approach tospectral graph matching,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35, no. 1, pp. 18–27, 2013.

[42] R. Zass and A. Shashua, “Probabilistic graph and hypergraphmatching,” in CVPR, 2008.

[43] M. Cho, J. Lee, and K. M. Lee, “Reweighted random walks forgraph matching,” in ECCV, 2010.

[44] M. Leordeanu, R. Sukthankar, and M. Hebert, “Unsupervisedlearning for graph matching,” Int. J. Comput. Vis., vol. 95, no. 1,pp. 1–18, 2011.

[45] T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola,“Learning graph matching,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 31, no. 6, pp. 1048–1058, 2009.

[46] O. Duchenne, F. Bach, I.-S. Kweon, and J. Ponce, “A tensor-basedalgorithm for high-order graph matching,” IEEE Trans. PatternAnal. Mach. Intell., vol. 33, no. 12, pp. 2383–2395, 2011.

[47] M. Cho and K. M. Lee, “Progressive graph matching: Making amove of graphs via probabilistic voting,” in CVPR, 2012.

[48] A. Myronenko and X. B. Song, “Point set registration: Coherentpoint drift,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 12,pp. 2262–2275, 2010.

[49] O. van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or, “Asurvey on shape correspondence,” Comput. Graph. Forum., vol. 30,no. 6, pp. 1681–1707, 2011.

[50] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICPalgorithm,” in 3DIM, 2001.

[51] A. M. Bronstein, M. M. Bronstein, and R. Kimmel, Numericalgeometry of non-rigid shapes. Springer, 2008.

[52] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and objectrecognition using shape contexts,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 24, no. 4, pp. 509–522, 2002.

[53] H. Chui and A. Rangarajan, “A new point matching algorithmfor non-rigid registration,” Comput. Vis. Image Underst., vol. 89,no. 2-3, pp. 114–141, 2003.

[54] B. Luo and E. R. Hancock, “A unified framework for alignmentand correspondence,” Comput. Vis. Image Underst., vol. 92, no. 1,pp. 26–55, 2003.

[55] Y. Zheng and D. S. Doermann, “Robust point matching fornonrigid shapes by preserving local neighborhood structures,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 643–649,2006.

[56] H. Bunke, “Graph matching: Theoretical foundations, algorithms,and applications,” in VI, 2000.

[57] C. Schellewald, S. Roth, and C. Schnorr, “Evaluation of convexoptimization techniques for the weighted graph-matching prob-lem in computer vision,” in DAGM-Symposium, 2001.

[58] T. S. Caetano, T. Caelli, D. Schuurmans, and D. Augusto, “Graphi-cal models and point pattern matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 28, no. 10, pp. 1646–1663, 2006.

[59] J. Maciel and J. Costeira, “A global solution to sparse correspon-dence problems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25,no. 2, pp. 187–199, 2003.

[60] E. L. Allgower and K. Georg, Introduction to Numerical Continua-tion Methods. SIAM, 2003.

[61] A. L. Yuille and A. Rangarajan, “The concave-convex procedure,”Neural Comput., vol. 15, no. 4, pp. 915–936, 2003.

[62] R. Horst and P. M. Pardalos, Handbook of Global Optimization.Kluwer, 1995.

[63] J. Von Neumann, “A certain zero-sum two-person game equiv-alent to the optimal assignment problem,” Contributions to theTheory of Games, vol. 2, pp. 5–12, 1953.

[64] M. Frank and P. Wolfe, “An algorithm for quadratic program-ming,” Naval Research Logistics Quarterly, vol. 3, no. 1-2, pp. 95–110, 1956.

[65] M. Fukushima, “A modified Frank-Wolfe algorithm for solvingthe traffic assignment problem,” Transp. Res. Part B: Methodological,vol. 18, no. 2, pp. 169–177, 1984.

[66] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov,A. Agarwala, M. F. Tappen, and C. Rother, “A comparative studyof energy minimization methods for Markov random fields withsmoothness-based priors,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 30, no. 6, pp. 1068–1080, 2008.

Feng Zhou received the BS degree in computerscience from Zhejiang University in 2005, theMS degree in computer science from ShanghaiJiao Tong University in 2008, and the PhD de-gree in robotics from Carnegie Mellon Universityin 2014. He is now a researcher in the MediaAnalytics group at NEC Laboratories America.His research interests include machine learningand computer vision.

Fernando de la Torre is an Associate ResearchProfessor in the Robotics Institute at CarnegieMellon University. He received his B.Sc. degreein Telecommunications, as well as his M.Sc. andPh. D degrees in Electronic Engineering fromLa Salle School of Engineering at Ramon LlullUniversity, Barcelona, Spain in 1994, 1996, and2002, respectively. His research interests arein the fields of Computer Vision and MachineLearning. Currently, he is directing the Compo-nent Analysis Laboratory (http://ca.cs.cmu.edu)

and the Human Sensing Laboratory (http://humansensing.cs.cmu.edu)at Carnegie Mellon University. He has over 150 publications in referredjournals and conferences. He has organized and co-organized severalworkshops and has given tutorials at international conferences on theuse and extensions of Component Analysis.