observation on parallel computation of transitive and max-closure problems
Post on 19-Dec-2015
222 views
TRANSCRIPT
Observation on Parallel Computation of Transitive and Max-closure Problems
Slide 2
Motivation
TC problem has numerous applications in many areas of computer science.
Lack of course-grained algorithms for distributed environments with slow communication.
Decreasing the number of dependences in a solution could improve a performance of the algorithm.
Slide 3
What is transitive closure?
GENERIC TRANSITIVE CLOSURE PROBLEM (TC)
Input: a matrix A with elements from a semiring S= < , >
Output: the matrix A*, A*(i,j) is the sum of all simple paths
from i to j
< , > TC
< or , and > boolean closure - TC of a directed graph
< MIN, + > all pairs shortest path
<MIN, MAX> minimum spanning tree {all(i,j): A(i,j)=A*(i,j)}
Slide 4
Fine–grain and Coarse-grained algorithms for TC problem
Warshall algorithm (1 stage)
Leighton algorithm (2 stages)
Guibas-Kung-Thompson (GKT) algorithm (2 or 3 stages)
Partial Warshall algorithm (2 stages)
Slide 5
Warshall algorithm
X
Y
X
Y
X
Y
k
k
Warshall algorithm
k+1
k+2
k+2
k+1
for k:=1 to n
for all 1i,jn parallel do
Operation(i, k, j)
----------------------------------Operation(i, k, j): a(i,j):=a(i,j) a(i,k) a(k,j)----------------------------------
Slide 6
1
6
2
4
3
5
1
2
2
34
4
4
4
4
4
Slide 7
Coarse-Grained computations
n
n
A32
A11
A24
Slide 8
Naïve Course Grained Algorithms
100
011
001
000
111
011001
100
100
111
011
001
Actual =
Slide 9
1
6
2
4
3
5II
I
100
011
001
000
111
011001
100
100
111
011
001
Actual =
Slide 10
Course-grained Warshall algorithm
Algorithm Blocks-Warshallfor k :=1 to N do
A(K,K):=A*(K,K)
for all 1 I,J N, I K J parallel do
Block-Operation(K,K,J) and Block-Operation(I,K,K)
for all 1 I,J N parallel do Block-Operation(I,K,J)
----------------------------------------------------------------------Block-Operation(I, K, J): A(I,J):=A(I,J) A(I,K) A(K,K) A(K,J)
----------------------------------------------------------------------
Slide 11
Implementation of Warshall TC Algorithm
k
k
k k
k
The implementation in terms of multiplication of submatrices
Slide 12
1
6
2
4
3
5II
I
NEW
Slide 13
Decomposition properties In order to package elementary operations
into computationally independent groups we consider the following decomposition properties:
A min-path from i to j is a path whose intermediate nodes have numbers smaller than min (i,j)
A max-path from i to j is a path whose intermediate nodes have numbers smaller than max(i,j)
Slide 14
Slide 15
KGT algorithm
KGT algorithm
AA’B
B’
C
C’
Slide 16
An example graph
a max-path
3
5
1
7
4
6
2
an initial path
It is transitive closure closure of the graph
Slide 17
What is Max-closure problem?
Max-closure problem is a problem of computing all max-paths in a graph
Max-closure is a main ingredient of the TC closure
Slide 18
Max-Closure --> TC
Max-closure computation performs 2/3 of total operations
Max-closure
algorithm
Max-to-Transitiveperforms 1/3 of the
total operations
The algorithm ”Max-to-Transitive ” reduces TC to matrix multiplication once the Max-Closure is computed
Slide 19
A Fine Grained Parallel Algorithm
Algorithm Max-to-TransitiveInput: matrix A, such that Amax = AOutput: transitive closure of AFor all k n parallel do For all i,j max(i,j) <k, ij Parallel do Operation(i,k,j)
Algorithm Max-Closurefor k :=1 to n do for all 1 i,j n, max(i,j) > k,
ij parallel do Operation(i,k,j)
Slide 20
Coarse-grained Max-closure Algorithm
Algorithm CG-Max-Closure {Partial Blocks-Warshall}
for K:=1 to N do
A(K,K):= A*(K,K)
for all 1 I,J N, I K J parallel do
Block-Operation(K,K,J) and Block-Operation(I,K,K)
for all 1 I,J N, max(I,J) > K MIN(I,J) parallel do
Block-Operation(I,K,J)
--------------------------------------------------------------------------------Blocks-Operation(I, K, J): A(I,J):=A(I,J) A(I,K) A(K,J)
Slide 21
Implementation of Max-Closure
Algorithmk
k
k k
k
The implementation in terms of multiplication of submatrices
Slide 22
Experimental results~3.5 h
Slide 23
Increase / Decrease of overall time
While computation time decreases when adding processes the communication time increase => there is an ”ideal” number of processors
All experiments were carried out on cluster of 20 workstations=> some processes were running more than one worker-process.
Slide 24
Conclusion The major advantage of the algorithm is
the reduction of communication cost at the expense of small communication cost
This fact makes algorithm useful for systems with slow communication