a multithreaded method for network alignment
DESCRIPTION
My talk from SC about network alignment.TRANSCRIPT
A multithreaded algorithm for network alignment
David F. Gleich Computer Science Purdue University
with
Arif Khan, Alex Pothen
Purdue University, Computer Science
Mahantesh Halappanavar Pacific Northwest National Labs
v
r
t
s
w
uwtu
Overlap
A L B
Work supported by DOE CSCAPES Institute grant (DE-FC02-08ER25864), NSF CAREER grant 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract DE-AC06-76RL01830 1
Network alignment"What is the best way of matching "graph A to B using only edges in L?
Find a 1-1 matching between vertices with as many overlaps as possible.
v
r
t
s
w
uwtu
Overlap
A L B
2
objective = α matching + βoverlap
v
r
t
s
w
uwtu
Overlap
A L B
• Computer Vision • Ontology matching • Database matching • Bioinformatics
3
Network alignment"… is NP-hard"… has no approximation algorithm
Network alignment"
review articles
MAY 2012 | VOL. 55 | NO. 5 | COMMUNICATIONS OF THE ACM 91
subgraph under some mapping of the proteins between the two species) or inexact, allowing unmatched nodes on either subnetwork. This problem was first studied by Kelley et al.17 in the context of local network alignment; its later development accompanied the growth in the number of mapped organ isms.5,7,9,33 The third problem that has been considered is global net-work alignment (Figure 1c), where one wishes to align whole networks, one against the other.4,34 In its simplest form, the problem calls for identifying a 1-1 mapping between the proteins of two species so as to optimize some conservation criterion, such as the number of conserved interactions be-tween the two networks.
All these problems are NP-hard as they generalize graph and subgraph isomorphism problems. However, heuristic, parameterized, and ILP ap-proaches for solving them have worked remarkably well in practice. Here, we review these approaches and demon-strate their good performance in prac-tice both in terms of solution quality and running time.
Heuristic ApproachesAs in other applied fields, many prob-lems in network biology are amenable to heuristic approaches that perform well in practice. Here, we highlight two such methods: a local search heuristic for local network alignment and an eigenvector-based heuristic for global network alignment.
NetworkBLAST32 is an algorithm for local network alignment that aims to identify significant subnetwork matches across two or more networks. It searches for conserved paths and conserved dense clusters of interac-tions; we focus on the latter in our de-scription. To facilitate the detection of conserved subnetworks, Network-BLAST first forms a network alignment graph,17,23 in which nodes correspond to pairs of sequence-similar proteins, one from each species, and edges cor-respond to conserved interactions (see Figure 2). The definition of the latter is flexible and allows, for instance, a di-rect interaction between the proteins of one species versus an indirect interac-tion (via a common network neighbor) in the other species. Any subnetwork of the alignment graph naturally corre-
Figure 2. The NetworkBLAST local network alignment algorithm. Given two input networks, a network alignment graph is constructed. Nodes in this graph correspond to pairs of sequence-similar proteins, one from each species, and edges correspond to conserved interactions. A search algorithm identifies highly similar subnetworks that follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30
Figure 3. Performance comparison of computational approaches.
(a) An evaluation of the quality of NetworkBLAST’s output clusters. NetworkBLAST was applied to a yeast network from Yu et al.39 For every protein that served as a seed for an output cluster, the weight of this cluster was compared to the optimal weight of a cluster containing this protein, as computed using an ILP approach. The plot shows the % of protein seeds (y-axis) as a function of the deviation of the resulting clusters from the optimal attainable weight (x-axis).
(b) A comparison of the running times of the dynamic programming (DP) and ILP approaches employed by Torque.7 The % of protein complexes (queries, y-axis) that were completed in a given time (x-axis) is plotted for the two algorithms. The shift to the left of the ILP curve (red) compared with that of the dynamic programming curve (blue) indicates the ILP formulation tends to be faster than the dynamic programming implementation.
(a)
(b)
From Sharan and Ideker, Modeling cellular machinery through biological network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433. 4
Our contribution
Multi-threaded network alignment via a new, multi-threaded approximation algorithm for max-weight bipartite matching procedure with linear complexity
High performance C++ implementations "40-times faster (on 16 cores – Xeon E5-2670)"(C++ ~ 3, complexity ~ 2, threading ~ 8)"www.cs.purdue.edu/~dgleich/codes/netalignmc
5
415 sec
10 sec.
... enabling interactive computation!
Belief Propagation" Use a probabilistic relaxation and iteratively find the probability that an edge is in the matching, given the probabilities of its neighboring edges
Klau’s Matching Relaxation!Iterative improve an upper-bound on the solution via a sub-gradient method applied to the Lagrangian"
… the best methods in a recent survey … Bayati, Gleich, et al. TKDE forthcoming
6
Each iteration involves Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Bipartite max-weight matching using a different weight vector at each iteration "No “convergence” "100-1000 iterations
Let x[i] be the score for each pair-wise match in L
for i=1 to ...
update x[i] to y[i]
compute a max-weight match with y
update y[i] to x[i] (using match in MR)
7
The methods Each iteration involves! Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Bipartite max-weight matching using a different weight vector at each iteration
Belief Propagation!!!!!!!!
Listing 2. A belief-propagation message passing procedure for networkalignment. See the text for a description of othermax and round heuristic.
1 y
(0)= 0, z(0) = 0,d(0)
= 0,S(k)= 0
2 for k = 1 to niter
3 F = bound0,� [�S+ S
(k)T] Step 1: compute F
4 d = ↵w + Fe Step 2: compute d5 y
(k)= d� othermaxcol(z
(k�1)) Step 3: othermax
6 z
(k)= d� othermaxrow(y
(k�1))
7 S
(k)= diag(y
(k)+ z
(k) � d)S� F Step 4: update S8 (y
(k), z(k),S(k)) �k
(y
(k), z(k),S(k))+
9 (1� �k)(y
(k�1), z(k�1),S(k�1)) Step 5: damping
10 round heuristic (y(k)) Step 6: matching11 round heuristic (z(k))12 end13 return y
(k) or z(k) with the largest objective value
interpretation, the weight vectors are usually called messagesas they communicate the “beliefs” of each “agent.” In thisparticular problem, the neighborhood of an agent representsall of the other edges in graph L incident on the same vertexin graph A (1st vector), all edges in L incident on the samevertex in graph B (2nd vector), or the edges in L that arepart of an overlap. The message vectors do not generallyconverge, and thus, the iteration is artificially damped toenforce convergence. We only describe one type of damping.See [13] for other variations.
After each update to the messages, we round the messagesto a matching using a bipartite maximum weight matchingprocedure, and then evaluate the objective function.
We present a pseudo-code for the method in Figure 2. Thiscode uses the mildly curious function othermaxrow. Supposethat g is a weight vector on the edges of a bipartite graph L.This means we can index g with the edges of L such thatgi,i0 is the weight on the edge (i, i0) 2 EL. The othermaxrowfunction then computes a new weight for each edge in L:
[othermaxrow(g)]i,i0 = bound
0,1[ max
(i,k0)2EL,k0 6=i0gi,k0
].
This function computes something rather simple. Given a row,replace all non-zeros in that row with the maximum value forthe row; except, for the element that is the maximum value,replace it with the second largest value. The othermaxcolfunction works on columns instead of rows.
C. Stopping Criteria
Both algorithms generate a sequence of heuristic weightvectors whose solution quality varies continually. There is nomonotonicity in the solution quality, which can vary greatlybetween iterations. Thus, no simple stopping criteria is pos-sible. Due to the shrinking step length in Klau’s method andthe artificial damping in BP, there is also no point in runningfor more than 500-1000 iterations with reasonable choices ofthese two parameters.
D. ComplexityThe complexity of each iteration of Klau’s method and
the BP method is O(nnz(S) + |EL| + matching), whereO(matching) is the complexity of the bipartite matching instep 5. Let N = (|VA| + |VB |). Currently, the best knownalgorithm for computing an optimal edge-weighted match-ing has complexity O(|EL|N + N2
logN) [20]. Practicalimplementations have complexity O(|EL|N logN) [21]. Thehalf-approximate matching discussed below has complexityO(|EL|). Thus, when we replace the exact matching step withapproximate matching for our experiments, the complexity ofeach iteration will be O(nnz(S) + |EL|).
IV. PARALLEL NETWORK ALIGNMENT IMPLEMENTATIONS
We now consider a shared-memory multi-core implementa-tion of these procedure with OpenMP. All required memory ispre-allocated before the first iteration and there are no dynamicmemory allocations. We avoid computing intermediate resultswhenever possible, see the online codes for details.
A. Matrix computations in both methodsAll matrices are sparse and are stored as compressed
sparse row arrays. All non-zero patterns and structures remainfixed throughout iterations. We found using simple OpenMP“parallel for” loops faster than using a matrix library suchas Intel’s Math Kernel Library. This result is due to thesimplicity of the matrix computations. For instance, becauseS and U are structurally symmetric with the same structure,the transposes have the same row pointer and the columnindex arrays. But the value array is permuted. So we computethe permutation and whenever we need to transpose one ofthese matrices, we just permute the values array according tothe permutation. Since these matrices do not change structureduring the algorithm, we can compute the permutation once.Sometimes – such as line 5 of Klau’s method or line 3 of BP– we simply use the permutation array to pull elements fromappropriate memory locations without any intermediate write.
The matrix S can be highly imbalanced (some rows areempty and others have many non-zeros) and so we foundthat using a dynamic schedule in OpenMP’s “parallel for”construction yielded better performance than a static schedule.After some experimentation, we found that a chunk-size of1000 seemed to produce the best performance for theseoperations. Indeed, we found this observation to be the casefor all operations involving the matrix S. Synchronization onlyoccurs at the end of each “parallel for” loop.
B. Specifics about Klau’s methodIn the first step of the iteration, we need to solve a bipartite
matching problem for each row of the matrix S with weightsthat change based on U
(k). We compute �2S+U
(k)�U
(k)T
using the permutation trick, and then we parallelize the op-eration over rows. Each of these matching problems is smallbecause there are only a few non-zeros in each row of S andso we do not consider using the parallel approximation here.We precompute the maximum memory required for p threads
Step 6: matching
9
The NEW methods Each iteration involves! Matrix-vector-ish computations with a sparse matrix, e.g. sparse matrix vector products in a semi-ring, dot-products, axpy, etc. Approximate bipartite max-weight matching is used here instead!
Belief Propagation!!!!!!!!
Listing 2. A belief-propagation message passing procedure for networkalignment. See the text for a description of othermax and round heuristic.
1 y
(0)= 0, z(0) = 0,d(0)
= 0,S(k)= 0
2 for k = 1 to niter
3 F = bound0,� [�S+ S
(k)T] Step 1: compute F
4 d = ↵w + Fe Step 2: compute d5 y
(k)= d� othermaxcol(z
(k�1)) Step 3: othermax
6 z
(k)= d� othermaxrow(y
(k�1))
7 S
(k)= diag(y
(k)+ z
(k) � d)S� F Step 4: update S8 (y
(k), z(k),S(k)) �k
(y
(k), z(k),S(k))+
9 (1� �k)(y
(k�1), z(k�1),S(k�1)) Step 5: damping
10 round heuristic (y(k)) Step 6: matching11 round heuristic (z(k))12 end13 return y
(k) or z(k) with the largest objective value
interpretation, the weight vectors are usually called messagesas they communicate the “beliefs” of each “agent.” In thisparticular problem, the neighborhood of an agent representsall of the other edges in graph L incident on the same vertexin graph A (1st vector), all edges in L incident on the samevertex in graph B (2nd vector), or the edges in L that arepart of an overlap. The message vectors do not generallyconverge, and thus, the iteration is artificially damped toenforce convergence. We only describe one type of damping.See [13] for other variations.
After each update to the messages, we round the messagesto a matching using a bipartite maximum weight matchingprocedure, and then evaluate the objective function.
We present a pseudo-code for the method in Figure 2. Thiscode uses the mildly curious function othermaxrow. Supposethat g is a weight vector on the edges of a bipartite graph L.This means we can index g with the edges of L such thatgi,i0 is the weight on the edge (i, i0) 2 EL. The othermaxrowfunction then computes a new weight for each edge in L:
[othermaxrow(g)]i,i0 = bound
0,1[ max
(i,k0)2EL,k0 6=i0gi,k0
].
This function computes something rather simple. Given a row,replace all non-zeros in that row with the maximum value forthe row; except, for the element that is the maximum value,replace it with the second largest value. The othermaxcolfunction works on columns instead of rows.
C. Stopping Criteria
Both algorithms generate a sequence of heuristic weightvectors whose solution quality varies continually. There is nomonotonicity in the solution quality, which can vary greatlybetween iterations. Thus, no simple stopping criteria is pos-sible. Due to the shrinking step length in Klau’s method andthe artificial damping in BP, there is also no point in runningfor more than 500-1000 iterations with reasonable choices ofthese two parameters.
D. ComplexityThe complexity of each iteration of Klau’s method and
the BP method is O(nnz(S) + |EL| + matching), whereO(matching) is the complexity of the bipartite matching instep 5. Let N = (|VA| + |VB |). Currently, the best knownalgorithm for computing an optimal edge-weighted match-ing has complexity O(|EL|N + N2
logN) [20]. Practicalimplementations have complexity O(|EL|N logN) [21]. Thehalf-approximate matching discussed below has complexityO(|EL|). Thus, when we replace the exact matching step withapproximate matching for our experiments, the complexity ofeach iteration will be O(nnz(S) + |EL|).
IV. PARALLEL NETWORK ALIGNMENT IMPLEMENTATIONS
We now consider a shared-memory multi-core implementa-tion of these procedure with OpenMP. All required memory ispre-allocated before the first iteration and there are no dynamicmemory allocations. We avoid computing intermediate resultswhenever possible, see the online codes for details.
A. Matrix computations in both methodsAll matrices are sparse and are stored as compressed
sparse row arrays. All non-zero patterns and structures remainfixed throughout iterations. We found using simple OpenMP“parallel for” loops faster than using a matrix library suchas Intel’s Math Kernel Library. This result is due to thesimplicity of the matrix computations. For instance, becauseS and U are structurally symmetric with the same structure,the transposes have the same row pointer and the columnindex arrays. But the value array is permuted. So we computethe permutation and whenever we need to transpose one ofthese matrices, we just permute the values array according tothe permutation. Since these matrices do not change structureduring the algorithm, we can compute the permutation once.Sometimes – such as line 5 of Klau’s method or line 3 of BP– we simply use the permutation array to pull elements fromappropriate memory locations without any intermediate write.
The matrix S can be highly imbalanced (some rows areempty and others have many non-zeros) and so we foundthat using a dynamic schedule in OpenMP’s “parallel for”construction yielded better performance than a static schedule.After some experimentation, we found that a chunk-size of1000 seemed to produce the best performance for theseoperations. Indeed, we found this observation to be the casefor all operations involving the matrix S. Synchronization onlyoccurs at the end of each “parallel for” loop.
B. Specifics about Klau’s methodIn the first step of the iteration, we need to solve a bipartite
matching problem for each row of the matrix S with weightsthat change based on U
(k). We compute �2S+U
(k)�U
(k)T
using the permutation trick, and then we parallelize the op-eration over rows. Each of these matching problems is smallbecause there are only a few non-zeros in each row of S andso we do not consider using the parallel approximation here.We precompute the maximum memory required for p threads
Step 6"approx matching
10
Parallel
Approximation doesn’t hurt the belief propagation algorithm
problem from Klau [7] (homo-musm). The first is an alignmentbetween protein interactions in a fly (D. melanogaster) andyeast (S. cerevisiae). The second is an alignment betweenhumans (H. sapiens) and mice (M. musculus). We utilizethese problems solely for the instances of a network alignmentproblem and do not focus on the biological insights suggested.The graph L and associated weights are from the originalpapers.
C. Ontology alignmentWe consider two problems in ontology alignment from [13].
The first is an alignment between the Library of Congresssubject headings and Wikipedia categories (lcsh-wiki). Whileboth ontologies have a core hierarchical tree, they also havemany cross edges for other types of relationships. Thus wecan think of them as general graphs. The second problem is analignment between the Library of Congress subject headingsand its counterpart in the French National Library: Rameau.In both cases, the edges and weights in L are computed via atext-matching of the subject heading strings (and via translatedstrings in the case of Rameau). These problems are larger thanthe bioinformatics ones.
VII. NETWORK ALIGNMENT WITH APPROXIMATEMATCHING
In this section we address the question: how does the be-havior of Klau’s method and the BP method change when wesubstitute the approximate matching procedure from Section Vfor the bipartite matching step in each algorithm? Note thatwe always use exact matching in the first step of Klau’smethod (Step 1: row match) because the problems in eachrow tend to be small and we parallelize over rows. Note alsothat the bipartite matching is much more integral to Klau’smethod than the BP procedure. For the BP procedure, weonly solve a bipartite matching problem to evaluate the qualityof an iterate, whereas in Klau’s method, the results of thematching determine the update to the Lagrange multipliers U.Put another way, the set of iterates from the BP method isindependent of the choice of matching algorithm. At the endof the iteration, each of the methods returns the best heuristicit computed, and we perform one final step of exact maximumweight matching to convert this into the returned matching.
We begin by evaluating the solution quality on syntheticpower-law problems. We use ↵ = 1,� = 2 and 1000 iterationsof each method. We evaluate each solution by comparisonwith the identity alignment. Note that the identity alignment– which assigns each vertex in graph A to its mirror imagein graph B based on the original graph G – may not be theoptimal alignment because the perturbations to the graph couldintroduce a better solution. This seems to occur because wecompute objective values larger than the identity alignment for¯d > 10. We also study how many of the correct matches eachmethod generates with respect to the identity matching. Theresults are shown in Figure 2. In the top plot, we show thefraction of the objective from the identity matching achieved(y-axis) as the expected degree ¯d of random edges in L
0 5 10 15 200
0.2
0.4
0.6
0.8
1
rou
nd
ed
ob
ject
ive
va
lue
s
expected degree of noise in L (p ! n)
MR-upperMRApproxMRBPApproxBP
0 5 10 15 200
0.2
0.4
0.6
0.8
1
rou
nd
ed
ob
ject
ive
va
lue
s
expected degree of noise in L (p ! n)
0 5 10 15 200
0.2
0.4
0.6
0.8
1
fra
ctio
n c
orr
ect
expected degree of noise in L (p ! n)
MRApproxMRBPApproxBP
0 5 10 15 200
0.2
0.4
0.6
0.8
1
fra
ctio
n c
orr
ect
expected degree of noise in L (p ! n)
Fig. 2. Alignment with a power-law graph shows the large effect thatapproximate rounding can have on solutions from Klau’s method (MR). Withthat method, using exact rounding will yield the identity matching for allproblems (bottom figure), whereas using the approximation results in over a50% error rate. The results from the BP method with and without approximatematching are indistinguishable. Small differences were randomly added toshow both lines in the figure.
varies from 2 to 20 (x-axis). In the bottom plot, we showthe fraction of correct matches (y-axis), again as the expecteddegree ¯d varies. Problems with more random edges are morechallenging. The figures demonstrate that Klau’s method issensitive to using an approximate matching routine, whereasthe BP method with exact and approximate matching arenearly indistinguishable.
We also evaluate the how the matching weight (wTx,
plotted on the x-axis) and overlap (xTSx/2, plotted on the
y-axis) change for a bioinformatics problem (dmela-scere)and an ontology problem (lcsh-wiki) in the upper and lowerplots in Figure 3. Again, the BP results with and withoutapproximate matching are virtually indistinguishable. Klau’smethod, however, produces results that are considerably worse.
Randomly perturb one power-law graph to get A, B Generate L by the true-match + random edges
BP and ApproxBP are indistinguishable
The amount of random-ness in L in average expected degree
Frac
tion
of c
orre
ct m
atch
11
The methods (in more detail) Belief Propagation" for i=1 to ...
update x[i] to y[i]
compute a max-weight match with y
save y if it is the best result so far
Klau’s Matching Relaxation!for i=1 to ...
update x[i] to y[i]
compute a max-weight match with y
update y[i] to x[i] based on the match
The matching is incidental to the BP method, but integral to Klau’s MR method
12
On real-world problems, ApproxMR isn’t so different
0 100 200 300 400 500 6000
50
100
150
200
250
300
350375400
Weight
Ove
rlap
max weight671.551
overlap upper bound381
BPAppBPAppMRMR
0 100 200 300 400 500 6000
50
100
150
200
250
300
350375400
Weight
Ove
rlap
max weight671.551
overlap upper bound381
On a protein-protein align-ment problem, there is little difference with exact vs. approximate matching
13
Upper bound on overlap
Upper bound on matching
Algorithmic analysis
Exact runtime matrix + matching with "matrix ≪ matching O(|EL| + |S|) + O(|EL| N log N) Our approx. runtime!matrix + approx. matching!O(|EL| + |S|) + O(|EL|)
v
r
t
s
w
u
A L B
Algorithmic parameters |EL| number of edges in L |S| number of potential overlaps
14
A local dominating edge method for bipartite matching
i
r
t
s
j
uwtu
A L B
The method guarantees • ½ approximation • maximal matching based on work by Preis (1999), Manne and Bisseling (2008), and Halappanavar et al (2012)
A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only
15
A local dominating edge method for bipartite matching
i
r
t
s
j
uwtu
A L B
Queue all vertices
Until queue is empty!In Parallel over vertices!
Match to heavy edge and if there’s a conflict, check the winner, and find an alternative for the loser
Add endpoint of non-dominating edges to the queue
16
A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only
A local dominating edge method for bipartite matching
i
r
t
s
j
uwtu
A L B
Customized first iteration (with all vertices) Use OpenMP locks to update choices Use sync_and_fetch_add for queue updates.
17
A locally dominating edge is an edge heavier than all neighboring edges. For bipartite Work on smaller side only
Remaining multi-threading procedures are straightforward Standard OpenMP for matrix-computations" use schedule=dynamic to handle skew We can batch the matching procedures in the BP method for additional parallelism for i=1 to ...
update x[i] to y[i] save y[i] in a buffer when the buffer is full compute max-weight match for all in buffer and save the best
18
Real-world data sets parallel time complexity of our implementation is determinedby the number of iterations of the while loop (expected tobe O(log |V |) if the size decreases by a constant in eachiteration).
Algorithm 1 Parallel 12 -Approx Matching. Input: graph G =
(V,E). Output: A matching M represented in vector mate.Data structures: a queue, QC , with vertices to be processedin the current step, and a queue, QN , with vertices to beprocessed in the next step – only matched vertices are queued;a vector candidate that maintains the id of current heaviestneighbor of a vertex.
1: procedure PARALLELMATCH(G(V, E), mate)2: QC ;; QN ;3: mate ;4: for each v 2 V in parallel do . Phase-15: candidate[v] FINDMATE(v)
6: for each v 2 V in parallel do7: MATCHVERTEX(v, QC)
8:9: while QC 6= ; do . Phase-2
10: for each u 2 QC in parallel do11: for each v 2 adj (u) do12: if candidate[v] = u then13: candidate[v] FINDMATE(v, QN )14: MATCHVERTEX(v, QN )
15: QC QN
16: QN ;
Algorithm 2 Find and return the current-heaviest candidatefor a vertex.
1: procedure FINDMATE(s)2: max wt �13: max wt id ;4: for each t 2 adj (s) do5: if (mate[t] = ;) AND (max wt < w(es,t)) then6: . Use vertex id to break ties7: max wt w(es,t)8: max wt id t
9: return (max wt id)
Algorithm 3 Match an edge if locally dominant. Add the twoend-points to Q.
1: procedure MATCHVERTEX(s, Q)2: if candidate[candidate[s]] = s then3: . Found a locally-dominant edge4: mate[s] candidate[s]5: mate[candidate[s]] s
6: Q Q [ {s, candidate[s]}7: . Add both of the end-points; atomic updates to Q
The performance of matching algorithms critically dependson the initialization [25], [26]. The approximate weightedmatching algorithm we are using was developed for non-bipartite graphs, and we have not adapted it to exploit thebipartiteness of the graph L. Hence the approximate matchingalgorithm spawns threads from both vertex sets VA and VB in
TABLE IIFOR EACH PROBLEM IN OUR BIOINFORMATICS AND ONTOLOGY SETS, WEREPORT THE NUMBER OF VERTICES IN GRAPH A AND B, THE NUMBER OF
EDGES IN THE GRAPH L, AND THE NUMBER OF NONZEROS IN S.
Problem |VA| |VB | |EL| |S|
dmela-scere 9,459 5,696 34,582 6,860homo-musm 3,247 9,695 15,810 12,180
lcsh-wiki 297,266 205,948 4,971,629 1,785,310lcsh-rameau 154,974 342,684 20,883,500 4,929,272
order to match vertices. We experimented with an initializationalgorithm tailored for bipartite graphs by spawning threadsonly from one of the vertex sets VA or VB to identify locallydominant edges. If the thread is responsible for matching avertex in VA, then it has to check the adjacency sets of thevertices in VB that are adjacent to it in order to determine if theedge is locally dominant. The vertices in VA (or VB) can bematched in parallel by using __sync_fetch_and_add()operations to avoid conflicts. We found that this initializationnoticably improved the speed of the algorithm. In future work,we will investigate approximation algorithms for weightedmatching in bipartite graphs.
VI. SAMPLE PROBLEMS
In order to evaluate the solution quality and parallel perfor-mance of network alignment with approximate matching, weconsider three types of problems: synthetic, protein-proteinnetwork alignment, and ontology alignment. See Table IIfor summary statistics about the bioinformatics and ontologyalignment problems. For all problems, the degree distributionin L is fairly regular, whereas the non-zero distribution in S
is highly irregular and imbalanced.
A. Synthetic power-law problemsWe use synthetic problems similar to those described
in [13]. Each problem instance is small, and it is primarilyused to evaluate the solution quality of the network alignmentheuristics. We first generate an initial graph G and thenrandomly add edges with probability 0.02 to form the graphsA and B. We let G be a 400 node random power-law graphto approximate the structure of most modern informationnetworks [27]. To produce G, we first sampled a power-law degree distribution and then generated a random graphwith that prescribed degree distribution. Because we startedwith the same underlying graph G, we known the identitymatching between the graphs and utilize this matching asa reference point. To generate L, we start with the identitymatching and then randomly sample all possible edges in Lwith probability p in order to globally diversify the matches.We find it more meaningful to describe the probability p interms of the expected degree in the graph: ¯d = p · |VA|.
B. BioinformaticsAligning between protein-protein interaction networks from
different species is an important problem in bioinformatics [5].We use a problem from Singh et al. [5] (dmela-scere) and a
Our approx. runtime matrix + approx. matching O(|EL| + |S|) + O(|EL|)
Algorithmic parameters |EL| number of edges in L |S| number of potential overlaps
19
Performance evaluation (2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores) 16 GB memory/proc (128 GB) Scaling study 1. Thread binding "
scattered vs. compact 2. Memory binding "
interleaved vs. bind
20
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
CPU
Mem
Scaling
21
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
scatter and interleave
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
BP with no batching lcsh-rameau, 400 iterations
1450 seconds for 1-thread
115 seconds for 40-thread
Scaling
22
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
compact and interleavecompact and membindscatter and interleavescatter and membind
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
BP with no batching
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
compact and interleavecompact and membindscatter and interleavescatter and membind
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
Scaling
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
compact and interleavecompact and membindscatter and interleavescatter and membind
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
compact and interleavecompact and membindscatter and interleavescatter and membind
0 20 40 60 800
5
10
15
20
25
Threads
Spee
dup
Klau’s MR method
BP with no batching
BP with batch=20
In all cases, we get a speedup of around 12-15 on 40-cores with scatter threads and interleaved memory 23
Summary & Conclusions • Tailored algorithm for approx. max-weight bipartite matching • Algorithmic improvement in network alignment methods • Multi-threaded C++ code for network alignment
415 seconds -> 10 seconds (40-times overall speedup) For large problems, interactive network alignment is possible Future work Memory control, improved methods
24
Code and data available!www.cs.purdue.edu/~dgleich/codes/netalignmc
Work supported by DOE CSCAPES Institute grant (DE-FC02-08ER25864), NSF CAREER grant 1149756-CCF, and the Center for Adaptive Super Computing Software Multithreaded Architectures (CASS-MT) at PNNL. PNNL is operated by Battelle Memorial Institute under contract DE-AC06-76RL01830