a multithreaded method for network alignment

24
A multithreaded algorithm for network alignment David F. Gleich Computer Science Purdue University with Arif Khan, Alex Pothen Purdue University, Computer Science Mahantesh Halappanavar Pacific Northwest National Labs v r t s w u w tu Overlap A L B Work supported by DOE CSCAPES Institute grant (DE- FC02-08ER25864), NSF CAREER grant 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract DE-AC06-76RL01830 1

Upload: david-gleich

Post on 15-Jan-2015

509 views

Category:

Documents


0 download

DESCRIPTION

My talk from SC about network alignment.

TRANSCRIPT

Page 1: A multithreaded method for network alignment

A multithreaded algorithm for network alignment

David F. Gleich Computer Science Purdue University

with

Arif Khan, Alex Pothen

Purdue University, Computer Science

Mahantesh Halappanavar Pacific Northwest National Labs

v

r

t

s

w

uwtu

Overlap

A L B

Work supported by DOE CSCAPES Institute grant (DE-FC02-08ER25864), NSF CAREER grant 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract DE-AC06-76RL01830 1

Page 2: A multithreaded method for network alignment

Network alignment"What is the best way of matching "graph A to B using only edges in L?

Find a 1-1 matching between vertices with as many overlaps as possible.

v

r

t

s

w

uwtu

Overlap

A L B

2

Page 3: A multithreaded method for network alignment

objective = α matching + βoverlap

v

r

t

s

w

uwtu

Overlap

A L B

•  Computer Vision •  Ontology matching •  Database matching •  Bioinformatics

3

Network alignment"… is NP-hard"… has no approximation algorithm

Page 4: A multithreaded method for network alignment

Network alignment"

review articles

MAY 2012 | VOL. 55 | NO. 5 | COMMUNICATIONS OF THE ACM 91

subgraph under some mapping of the proteins between the two species) or inexact, allowing unmatched nodes on either subnetwork. This problem was first studied by Kelley et al.17 in the context of local network alignment; its later development accompanied the growth in the number of mapped organ isms.5,7,9,33 The third problem that has been considered is global net-work alignment (Figure 1c), where one wishes to align whole networks, one against the other.4,34 In its simplest form, the problem calls for identifying a 1-1 mapping between the proteins of two species so as to optimize some conservation criterion, such as the number of conserved interactions be-tween the two networks.

All these problems are NP-hard as they generalize graph and subgraph isomorphism problems. However, heuristic, parameterized, and ILP ap-proaches for solving them have worked remarkably well in practice. Here, we review these approaches and demon-strate their good performance in prac-tice both in terms of solution quality and running time.

Heuristic ApproachesAs in other applied fields, many prob-lems in network biology are amenable to heuristic approaches that perform well in practice. Here, we highlight two such methods: a local search heuristic for local network alignment and an eigenvector-based heuristic for global network alignment.

NetworkBLAST32 is an algorithm for local network alignment that aims to identify significant subnetwork matches across two or more networks. It searches for conserved paths and conserved dense clusters of interac-tions; we focus on the latter in our de-scription. To facilitate the detection of conserved subnetworks, Network-BLAST first forms a network alignment graph,17,23 in which nodes correspond to pairs of sequence-similar proteins, one from each species, and edges cor-respond to conserved interactions (see Figure 2). The definition of the latter is flexible and allows, for instance, a di-rect interaction between the proteins of one species versus an indirect interac-tion (via a common network neighbor) in the other species. Any subnetwork of the alignment graph naturally corre-

Figure 2. The NetworkBLAST local network alignment algorithm. Given two input networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30

Figure 3. Performance comparison of computational approaches.

(a) An evaluation of the quality of NetworkBLAST’s output clusters. NetworkBLAST was applied to a yeast network from Yu et al.39 For every protein that served as a seed for an output cluster, the weight of this cluster was compared to the optimal weight of a cluster containing this protein, as computed using an ILP approach. The plot shows the % of protein seeds (y-axis) as a function of the deviation of the resulting clusters from the optimal attainable weight (x-axis).

(b) A comparison of the running times of the dynamic programming (DP) and ILP approaches employed by Torque.7 The % of protein complexes (queries, y-axis) that were completed in a given time (x-axis) is plotted for the two algorithms. The shift to the left of the ILP curve (red) compared with that of the dynamic programming curve (blue) indicates the ILP formulation tends to be faster than the dynamic programming implementation.

(a)

(b)

From Sharan and Ideker, Modeling cellular machinery through biological network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. 4

Page 5: A multithreaded method for network alignment

Our contribution

Multi-threaded network alignment via a new, multi-threaded approximation algorithm for max-weight bipartite matching procedure with linear complexity

High performance C++ implementations "40-times faster (on 16 cores – Xeon E5-2670)"(C++ ~ 3, complexity ~ 2, threading ~ 8)"www.cs.purdue.edu/~dgleich/codes/netalignmc

5

415 sec

10 sec.

... enabling interactive computation!

Page 6: A multithreaded method for network alignment

Belief Propagation" Use a probabilistic relaxation and iteratively find the probability that an edge is in the matching, given the probabilities of its neighboring edges

Klau’s Matching Relaxation!Iterative improve an upper-bound on the solution via a sub-gradient method applied to the Lagrangian"

… the best methods in a recent survey … Bayati, Gleich, et al. TKDE forthcoming

6

Page 7: A multithreaded method for network alignment

Each iteration involves Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Bipartite max-weight matching using a different weight vector at each iteration "No “convergence” "100-1000 iterations

Let x[i] be the score for each pair-wise match in L

for i=1 to ...

update x[i] to y[i]

compute a max-weight match with y

update y[i] to x[i] (using match in MR)

7

Page 8: A multithreaded method for network alignment
Page 9: A multithreaded method for network alignment

The methods Each iteration involves! Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Bipartite max-weight matching using a different weight vector at each iteration

Belief Propagation!!!!!!!!

Listing 2. A belief-propagation message passing procedure for networkalignment. See the text for a description of othermax and round heuristic.

1 y

(0)= 0, z(0) = 0,d(0)

= 0,S(k)= 0

2 for k = 1 to niter

3 F = bound0,� [�S+ S

(k)T] Step 1: compute F

4 d = ↵w + Fe Step 2: compute d5 y

(k)= d� othermaxcol(z

(k�1)) Step 3: othermax

6 z

(k)= d� othermaxrow(y

(k�1))

7 S

(k)= diag(y

(k)+ z

(k) � d)S� F Step 4: update S8 (y

(k), z(k),S(k)) �k

(y

(k), z(k),S(k))+

9 (1� �k)(y

(k�1), z(k�1),S(k�1)) Step 5: damping

10 round heuristic (y(k)) Step 6: matching11 round heuristic (z(k))12 end13 return y

(k) or z(k) with the largest objective value

interpretation, the weight vectors are usually called messagesas they communicate the “beliefs” of each “agent.” In thisparticular problem, the neighborhood of an agent representsall of the other edges in graph L incident on the same vertexin graph A (1st vector), all edges in L incident on the samevertex in graph B (2nd vector), or the edges in L that arepart of an overlap. The message vectors do not generallyconverge, and thus, the iteration is artificially damped toenforce convergence. We only describe one type of damping.See [13] for other variations.

After each update to the messages, we round the messagesto a matching using a bipartite maximum weight matchingprocedure, and then evaluate the objective function.

We present a pseudo-code for the method in Figure 2. Thiscode uses the mildly curious function othermaxrow. Supposethat g is a weight vector on the edges of a bipartite graph L.This means we can index g with the edges of L such thatgi,i0 is the weight on the edge (i, i0) 2 EL. The othermaxrowfunction then computes a new weight for each edge in L:

[othermaxrow(g)]i,i0 = bound

0,1[ max

(i,k0)2EL,k0 6=i0gi,k0

].

This function computes something rather simple. Given a row,replace all non-zeros in that row with the maximum value forthe row; except, for the element that is the maximum value,replace it with the second largest value. The othermaxcolfunction works on columns instead of rows.

C. Stopping Criteria

Both algorithms generate a sequence of heuristic weightvectors whose solution quality varies continually. There is nomonotonicity in the solution quality, which can vary greatlybetween iterations. Thus, no simple stopping criteria is pos-sible. Due to the shrinking step length in Klau’s method andthe artificial damping in BP, there is also no point in runningfor more than 500-1000 iterations with reasonable choices ofthese two parameters.

D. ComplexityThe complexity of each iteration of Klau’s method and

the BP method is O(nnz(S) + |EL| + matching), whereO(matching) is the complexity of the bipartite matching instep 5. Let N = (|VA| + |VB |). Currently, the best knownalgorithm for computing an optimal edge-weighted match-ing has complexity O(|EL|N + N2

logN) [20]. Practicalimplementations have complexity O(|EL|N logN) [21]. Thehalf-approximate matching discussed below has complexityO(|EL|). Thus, when we replace the exact matching step withapproximate matching for our experiments, the complexity ofeach iteration will be O(nnz(S) + |EL|).

IV. PARALLEL NETWORK ALIGNMENT IMPLEMENTATIONS

We now consider a shared-memory multi-core implementa-tion of these procedure with OpenMP. All required memory ispre-allocated before the first iteration and there are no dynamicmemory allocations. We avoid computing intermediate resultswhenever possible, see the online codes for details.

A. Matrix computations in both methodsAll matrices are sparse and are stored as compressed

sparse row arrays. All non-zero patterns and structures remainfixed throughout iterations. We found using simple OpenMP“parallel for” loops faster than using a matrix library suchas Intel’s Math Kernel Library. This result is due to thesimplicity of the matrix computations. For instance, becauseS and U are structurally symmetric with the same structure,the transposes have the same row pointer and the columnindex arrays. But the value array is permuted. So we computethe permutation and whenever we need to transpose one ofthese matrices, we just permute the values array according tothe permutation. Since these matrices do not change structureduring the algorithm, we can compute the permutation once.Sometimes – such as line 5 of Klau’s method or line 3 of BP– we simply use the permutation array to pull elements fromappropriate memory locations without any intermediate write.

The matrix S can be highly imbalanced (some rows areempty and others have many non-zeros) and so we foundthat using a dynamic schedule in OpenMP’s “parallel for”construction yielded better performance than a static schedule.After some experimentation, we found that a chunk-size of1000 seemed to produce the best performance for theseoperations. Indeed, we found this observation to be the casefor all operations involving the matrix S. Synchronization onlyoccurs at the end of each “parallel for” loop.

B. Specifics about Klau’s methodIn the first step of the iteration, we need to solve a bipartite

matching problem for each row of the matrix S with weightsthat change based on U

(k). We compute �2S+U

(k)�U

(k)T

using the permutation trick, and then we parallelize the op-eration over rows. Each of these matching problems is smallbecause there are only a few non-zeros in each row of S andso we do not consider using the parallel approximation here.We precompute the maximum memory required for p threads

Step 6: matching

9

Page 10: A multithreaded method for network alignment

The NEW methods Each iteration involves! Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Approximate bipartite max-weight matching is used here instead!

Belief Propagation!!!!!!!!

Listing 2. A belief-propagation message passing procedure for networkalignment. See the text for a description of othermax and round heuristic.

1 y

(0)= 0, z(0) = 0,d(0)

= 0,S(k)= 0

2 for k = 1 to niter

3 F = bound0,� [�S+ S

(k)T] Step 1: compute F

4 d = ↵w + Fe Step 2: compute d5 y

(k)= d� othermaxcol(z

(k�1)) Step 3: othermax

6 z

(k)= d� othermaxrow(y

(k�1))

7 S

(k)= diag(y

(k)+ z

(k) � d)S� F Step 4: update S8 (y

(k), z(k),S(k)) �k

(y

(k), z(k),S(k))+

9 (1� �k)(y

(k�1), z(k�1),S(k�1)) Step 5: damping

10 round heuristic (y(k)) Step 6: matching11 round heuristic (z(k))12 end13 return y

(k) or z(k) with the largest objective value

interpretation, the weight vectors are usually called messagesas they communicate the “beliefs” of each “agent.” In thisparticular problem, the neighborhood of an agent representsall of the other edges in graph L incident on the same vertexin graph A (1st vector), all edges in L incident on the samevertex in graph B (2nd vector), or the edges in L that arepart of an overlap. The message vectors do not generallyconverge, and thus, the iteration is artificially damped toenforce convergence. We only describe one type of damping.See [13] for other variations.

After each update to the messages, we round the messagesto a matching using a bipartite maximum weight matchingprocedure, and then evaluate the objective function.

We present a pseudo-code for the method in Figure 2. Thiscode uses the mildly curious function othermaxrow. Supposethat g is a weight vector on the edges of a bipartite graph L.This means we can index g with the edges of L such thatgi,i0 is the weight on the edge (i, i0) 2 EL. The othermaxrowfunction then computes a new weight for each edge in L:

[othermaxrow(g)]i,i0 = bound

0,1[ max

(i,k0)2EL,k0 6=i0gi,k0

].

This function computes something rather simple. Given a row,replace all non-zeros in that row with the maximum value forthe row; except, for the element that is the maximum value,replace it with the second largest value. The othermaxcolfunction works on columns instead of rows.

C. Stopping Criteria

Both algorithms generate a sequence of heuristic weightvectors whose solution quality varies continually. There is nomonotonicity in the solution quality, which can vary greatlybetween iterations. Thus, no simple stopping criteria is pos-sible. Due to the shrinking step length in Klau’s method andthe artificial damping in BP, there is also no point in runningfor more than 500-1000 iterations with reasonable choices ofthese two parameters.

D. ComplexityThe complexity of each iteration of Klau’s method and

the BP method is O(nnz(S) + |EL| + matching), whereO(matching) is the complexity of the bipartite matching instep 5. Let N = (|VA| + |VB |). Currently, the best knownalgorithm for computing an optimal edge-weighted match-ing has complexity O(|EL|N + N2

logN) [20]. Practicalimplementations have complexity O(|EL|N logN) [21]. Thehalf-approximate matching discussed below has complexityO(|EL|). Thus, when we replace the exact matching step withapproximate matching for our experiments, the complexity ofeach iteration will be O(nnz(S) + |EL|).

IV. PARALLEL NETWORK ALIGNMENT IMPLEMENTATIONS

We now consider a shared-memory multi-core implementa-tion of these procedure with OpenMP. All required memory ispre-allocated before the first iteration and there are no dynamicmemory allocations. We avoid computing intermediate resultswhenever possible, see the online codes for details.

A. Matrix computations in both methodsAll matrices are sparse and are stored as compressed

sparse row arrays. All non-zero patterns and structures remainfixed throughout iterations. We found using simple OpenMP“parallel for” loops faster than using a matrix library suchas Intel’s Math Kernel Library. This result is due to thesimplicity of the matrix computations. For instance, becauseS and U are structurally symmetric with the same structure,the transposes have the same row pointer and the columnindex arrays. But the value array is permuted. So we computethe permutation and whenever we need to transpose one ofthese matrices, we just permute the values array according tothe permutation. Since these matrices do not change structureduring the algorithm, we can compute the permutation once.Sometimes – such as line 5 of Klau’s method or line 3 of BP– we simply use the permutation array to pull elements fromappropriate memory locations without any intermediate write.

The matrix S can be highly imbalanced (some rows areempty and others have many non-zeros) and so we foundthat using a dynamic schedule in OpenMP’s “parallel for”construction yielded better performance than a static schedule.After some experimentation, we found that a chunk-size of1000 seemed to produce the best performance for theseoperations. Indeed, we found this observation to be the casefor all operations involving the matrix S. Synchronization onlyoccurs at the end of each “parallel for” loop.

B. Specifics about Klau’s methodIn the first step of the iteration, we need to solve a bipartite

matching problem for each row of the matrix S with weightsthat change based on U

(k). We compute �2S+U

(k)�U

(k)T

using the permutation trick, and then we parallelize the op-eration over rows. Each of these matching problems is smallbecause there are only a few non-zeros in each row of S andso we do not consider using the parallel approximation here.We precompute the maximum memory required for p threads

Step 6"approx matching

10

Parallel

Page 11: A multithreaded method for network alignment

Approximation doesn’t hurt the belief propagation algorithm

problem from Klau [7] (homo-musm). The first is an alignmentbetween protein interactions in a fly (D. melanogaster) andyeast (S. cerevisiae). The second is an alignment betweenhumans (H. sapiens) and mice (M. musculus). We utilizethese problems solely for the instances of a network alignmentproblem and do not focus on the biological insights suggested.The graph L and associated weights are from the originalpapers.

C. Ontology alignmentWe consider two problems in ontology alignment from [13].

The first is an alignment between the Library of Congresssubject headings and Wikipedia categories (lcsh-wiki). Whileboth ontologies have a core hierarchical tree, they also havemany cross edges for other types of relationships. Thus wecan think of them as general graphs. The second problem is analignment between the Library of Congress subject headingsand its counterpart in the French National Library: Rameau.In both cases, the edges and weights in L are computed via atext-matching of the subject heading strings (and via translatedstrings in the case of Rameau). These problems are larger thanthe bioinformatics ones.

VII. NETWORK ALIGNMENT WITH APPROXIMATEMATCHING

In this section we address the question: how does the be-havior of Klau’s method and the BP method change when wesubstitute the approximate matching procedure from Section Vfor the bipartite matching step in each algorithm? Note thatwe always use exact matching in the first step of Klau’smethod (Step 1: row match) because the problems in eachrow tend to be small and we parallelize over rows. Note alsothat the bipartite matching is much more integral to Klau’smethod than the BP procedure. For the BP procedure, weonly solve a bipartite matching problem to evaluate the qualityof an iterate, whereas in Klau’s method, the results of thematching determine the update to the Lagrange multipliers U.Put another way, the set of iterates from the BP method isindependent of the choice of matching algorithm. At the endof the iteration, each of the methods returns the best heuristicit computed, and we perform one final step of exact maximumweight matching to convert this into the returned matching.

We begin by evaluating the solution quality on syntheticpower-law problems. We use ↵ = 1,� = 2 and 1000 iterationsof each method. We evaluate each solution by comparisonwith the identity alignment. Note that the identity alignment– which assigns each vertex in graph A to its mirror imagein graph B based on the original graph G – may not be theoptimal alignment because the perturbations to the graph couldintroduce a better solution. This seems to occur because wecompute objective values larger than the identity alignment for¯d > 10. We also study how many of the correct matches eachmethod generates with respect to the identity matching. Theresults are shown in Figure 2. In the top plot, we show thefraction of the objective from the identity matching achieved(y-axis) as the expected degree ¯d of random edges in L

0 5 10 15 200

0.2

0.4

0.6

0.8

1

rou

nd

ed

ob

ject

ive

va

lue

s

expected degree of noise in L (p ! n)

MR-upperMRApproxMRBPApproxBP

0 5 10 15 200

0.2

0.4

0.6

0.8

1

rou

nd

ed

ob

ject

ive

va

lue

s

expected degree of noise in L (p ! n)

0 5 10 15 200

0.2

0.4

0.6

0.8

1

fra

ctio

n c

orr

ect

expected degree of noise in L (p ! n)

MRApproxMRBPApproxBP

0 5 10 15 200

0.2

0.4

0.6

0.8

1

fra

ctio

n c

orr

ect

expected degree of noise in L (p ! n)

Fig. 2. Alignment with a power-law graph shows the large effect thatapproximate rounding can have on solutions from Klau’s method (MR). Withthat method, using exact rounding will yield the identity matching for allproblems (bottom figure), whereas using the approximation results in over a50% error rate. The results from the BP method with and without approximatematching are indistinguishable. Small differences were randomly added toshow both lines in the figure.

varies from 2 to 20 (x-axis). In the bottom plot, we showthe fraction of correct matches (y-axis), again as the expecteddegree ¯d varies. Problems with more random edges are morechallenging. The figures demonstrate that Klau’s method issensitive to using an approximate matching routine, whereasthe BP method with exact and approximate matching arenearly indistinguishable.

We also evaluate the how the matching weight (wTx,

plotted on the x-axis) and overlap (xTSx/2, plotted on the

y-axis) change for a bioinformatics problem (dmela-scere)and an ontology problem (lcsh-wiki) in the upper and lowerplots in Figure 3. Again, the BP results with and withoutapproximate matching are virtually indistinguishable. Klau’smethod, however, produces results that are considerably worse.

Randomly perturb one power-law graph to get A, B Generate L by the true-match + random edges

BP and ApproxBP are indistinguishable

The amount of random-ness in L in average expected degree

Frac

tion

of c

orre

ct m

atch

11

Page 12: A multithreaded method for network alignment

The methods (in more detail) Belief Propagation" for i=1 to ...

update x[i] to y[i]

compute a max-weight match with y

save y if it is the best result so far

Klau’s Matching Relaxation!for i=1 to ...

update x[i] to y[i]

compute a max-weight match with y

update y[i] to x[i] based on the match

The matching is incidental to the BP method, but integral to Klau’s MR method

12

Page 13: A multithreaded method for network alignment

On real-world problems, ApproxMR isn’t so different

0 100 200 300 400 500 6000

50

100

150

200

250

300

350375400

Weight

Ove

rlap

max weight671.551

overlap upper bound381

BPAppBPAppMRMR

0 100 200 300 400 500 6000

50

100

150

200

250

300

350375400

Weight

Ove

rlap

max weight671.551

overlap upper bound381

On a protein-protein align-ment problem, there is little difference with exact vs. approximate matching

13

Upper bound on overlap

Upper bound on matching

Page 14: A multithreaded method for network alignment

Algorithmic analysis

Exact runtime matrix + matching with "matrix ≪ matching O(|EL| + |S|) + O(|EL| N log N) Our approx. runtime!matrix + approx. matching!O(|EL| + |S|) + O(|EL|)

v

r

t

s

w

u

A L B

Algorithmic parameters |EL| number of edges in L |S| number of potential overlaps

14

Page 15: A multithreaded method for network alignment

A local dominating edge method for bipartite matching

i

r

t

s

j

uwtu

A L B

The method guarantees •  ½ approximation •  maximal matching based on work by Preis (1999), Manne and Bisseling (2008), and Halappanavar et al (2012)

A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only

15

Page 16: A multithreaded method for network alignment

A local dominating edge method for bipartite matching

i

r

t

s

j

uwtu

A L B

Queue all vertices

Until queue is empty!In Parallel over vertices!

Match to heavy edge and if there’s a conflict, check the winner, and find an alternative for the loser

Add endpoint of non-dominating edges to the queue

16

A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only

Page 17: A multithreaded method for network alignment

A local dominating edge method for bipartite matching

i

r

t

s

j

uwtu

A L B

Customized first iteration (with all vertices) Use OpenMP locks to update choices Use sync_and_fetch_add for queue updates.

17

A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only

Page 18: A multithreaded method for network alignment

Remaining multi-threading procedures are straightforward Standard OpenMP for matrix-computations" use schedule=dynamic to handle skew We can batch the matching procedures in the BP method for additional parallelism for i=1 to ...

update x[i] to y[i] save y[i] in a buffer when the buffer is full compute max-weight match for all in buffer and save the best

18

Page 19: A multithreaded method for network alignment

Real-world data sets parallel time complexity of our implementation is determinedby the number of iterations of the while loop (expected tobe O(log |V |) if the size decreases by a constant in eachiteration).

Algorithm 1 Parallel 12 -Approx Matching. Input: graph G =

(V,E). Output: A matching M represented in vector mate.Data structures: a queue, QC , with vertices to be processedin the current step, and a queue, QN , with vertices to beprocessed in the next step – only matched vertices are queued;a vector candidate that maintains the id of current heaviestneighbor of a vertex.

1: procedure PARALLELMATCH(G(V, E), mate)2: QC ;; QN ;3: mate ;4: for each v 2 V in parallel do . Phase-15: candidate[v] FINDMATE(v)

6: for each v 2 V in parallel do7: MATCHVERTEX(v, QC)

8:9: while QC 6= ; do . Phase-2

10: for each u 2 QC in parallel do11: for each v 2 adj (u) do12: if candidate[v] = u then13: candidate[v] FINDMATE(v, QN )14: MATCHVERTEX(v, QN )

15: QC QN

16: QN ;

Algorithm 2 Find and return the current-heaviest candidatefor a vertex.

1: procedure FINDMATE(s)2: max wt �13: max wt id ;4: for each t 2 adj (s) do5: if (mate[t] = ;) AND (max wt < w(es,t)) then6: . Use vertex id to break ties7: max wt w(es,t)8: max wt id t

9: return (max wt id)

Algorithm 3 Match an edge if locally dominant. Add the twoend-points to Q.

1: procedure MATCHVERTEX(s, Q)2: if candidate[candidate[s]] = s then3: . Found a locally-dominant edge4: mate[s] candidate[s]5: mate[candidate[s]] s

6: Q Q [ {s, candidate[s]}7: . Add both of the end-points; atomic updates to Q

The performance of matching algorithms critically dependson the initialization [25], [26]. The approximate weightedmatching algorithm we are using was developed for non-bipartite graphs, and we have not adapted it to exploit thebipartiteness of the graph L. Hence the approximate matchingalgorithm spawns threads from both vertex sets VA and VB in

TABLE IIFOR EACH PROBLEM IN OUR BIOINFORMATICS AND ONTOLOGY SETS, WEREPORT THE NUMBER OF VERTICES IN GRAPH A AND B, THE NUMBER OF

EDGES IN THE GRAPH L, AND THE NUMBER OF NONZEROS IN S.

Problem |VA| |VB | |EL| |S|

dmela-scere 9,459 5,696 34,582 6,860homo-musm 3,247 9,695 15,810 12,180

lcsh-wiki 297,266 205,948 4,971,629 1,785,310lcsh-rameau 154,974 342,684 20,883,500 4,929,272

order to match vertices. We experimented with an initializationalgorithm tailored for bipartite graphs by spawning threadsonly from one of the vertex sets VA or VB to identify locallydominant edges. If the thread is responsible for matching avertex in VA, then it has to check the adjacency sets of thevertices in VB that are adjacent to it in order to determine if theedge is locally dominant. The vertices in VA (or VB) can bematched in parallel by using __sync_fetch_and_add()operations to avoid conflicts. We found that this initializationnoticably improved the speed of the algorithm. In future work,we will investigate approximation algorithms for weightedmatching in bipartite graphs.

VI. SAMPLE PROBLEMS

In order to evaluate the solution quality and parallel perfor-mance of network alignment with approximate matching, weconsider three types of problems: synthetic, protein-proteinnetwork alignment, and ontology alignment. See Table IIfor summary statistics about the bioinformatics and ontologyalignment problems. For all problems, the degree distributionin L is fairly regular, whereas the non-zero distribution in S

is highly irregular and imbalanced.

A. Synthetic power-law problemsWe use synthetic problems similar to those described

in [13]. Each problem instance is small, and it is primarilyused to evaluate the solution quality of the network alignmentheuristics. We first generate an initial graph G and thenrandomly add edges with probability 0.02 to form the graphsA and B. We let G be a 400 node random power-law graphto approximate the structure of most modern informationnetworks [27]. To produce G, we first sampled a power-law degree distribution and then generated a random graphwith that prescribed degree distribution. Because we startedwith the same underlying graph G, we known the identitymatching between the graphs and utilize this matching asa reference point. To generate L, we start with the identitymatching and then randomly sample all possible edges in Lwith probability p in order to globally diversify the matches.We find it more meaningful to describe the probability p interms of the expected degree in the graph: ¯d = p · |VA|.

B. BioinformaticsAligning between protein-protein interaction networks from

different species is an important problem in bioinformatics [5].We use a problem from Singh et al. [5] (dmela-scere) and a

Our approx. runtime matrix + approx. matching O(|EL| + |S|) + O(|EL|)

Algorithmic parameters |EL| number of edges in L |S| number of potential overlaps

19

Page 20: A multithreaded method for network alignment

Performance evaluation (2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores) 16 GB memory/proc (128 GB) Scaling study 1.  Thread binding "

scattered vs. compact 2.  Memory binding "

interleaved vs. bind

20

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

CPU

Mem

Page 21: A multithreaded method for network alignment

Scaling

21

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

scatter and interleave

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

BP with no batching lcsh-rameau, 400 iterations

1450 seconds for 1-thread

115 seconds for 40-thread

Page 22: A multithreaded method for network alignment

Scaling

22

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

compact and interleavecompact and membindscatter and interleavescatter and membind

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

BP with no batching

Page 23: A multithreaded method for network alignment

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

compact and interleavecompact and membindscatter and interleavescatter and membind

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

Scaling

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

compact and interleavecompact and membindscatter and interleavescatter and membind

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

compact and interleavecompact and membindscatter and interleavescatter and membind

0 20 40 60 800

5

10

15

20

25

Threads

Spee

dup

Klau’s MR method

BP with no batching

BP with batch=20

In all cases, we get a speedup of around 12-15 on 40-cores with scatter threads and interleaved memory 23

Page 24: A multithreaded method for network alignment

Summary & Conclusions •  Tailored algorithm for approx. max-weight bipartite matching •  Algorithmic improvement in network alignment methods •  Multi-threaded C++ code for network alignment

415 seconds -> 10 seconds (40-times overall speedup) For large problems, interactive network alignment is possible Future work Memory control, improved methods

24

Code and data available!www.cs.purdue.edu/~dgleich/codes/netalignmc

Work supported by DOE CSCAPES Institute grant (DE-FC02-08ER25864), NSF CAREER grant 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract DE-AC06-76RL01830