parallelizing code to investigate the geometric properties of fullerenes

1
MASTER NODE worker worker worker FORTRAN FORTRAN Parallelizing Code to Investigate the Geometric Properties of Fullerenes A fullerene is a carbon allotrope which is commonly referred to as a “buckyball” when in a pseudo-spherical formation. Mathematicians have been interested in cataloging these structures for some time, and in 1991 a program was written in FORTRAN to accomplish this task. The algorithm used to implement this program has been proven true for numbers of vertices up to n=380, however Due to the enormous complexity of the algorithm It would be unrealistic to do so, and it has been used to compile a catalog of the General Isomer fullerenes from 20-50 vertices, as well as the Isolated-Pentagon fullerenes from 60-100. We have implemented a sequential version of this program in C++, as well as a parallel version using C++ and MPI. The latter has resulted in a significant increase in performance. Jeffery L Thomas Prof. Daniel Bennett, faculty advisor Justification for parallelization. do 1 j1 = 1, m-11*jpr do 2 j2 =j1 +jpr, m- 10*jpr do 3 j3 =j2 +jpr, m- 9*jpr do 4 j4 =j3 +jpr, m- 8*jpr do 5 j5 =j4 +jpr, m- 7*jpr do 6 j6 =j5 +jpr, m- 6*jpr do 7 j7 =j6 +jpr, m- 5*jpr do 8 j8 =j7 +jpr, m- 4*jpr do 9 j9 =j8 +jpr, m- 3*jpr do 10 j10=j9 +jpr, m- 2*jpr do 11 j11=j10+jpr, m- 1*jpr do 12 j12=j11+jpr, m do 14 j=1,m CALL Windup (….) . . . CALL Unwind (….) 14 continue 12 continue 11 continue 10 continue 9 continue -An analysis of this FORTRAN code reveals the complexity of this algorithm to be: O(n 16 ) …mostly a result of the 13-deep nested loop shown to the left, as well as calling two FORTRAN functions. This is further Illustrated in the below abridged table of run times: P. W. Fowler and D. E. Manolopoulos: AN ATLAS OF FULLERENES; Clarendon Press, Oxford 1995 STRUCTURE CHART Our Strategy Our approach to parallelizing this algorithm was to divide up the nested loops. Our master node will handle all user Input/Output, as well as coordinate the distribution of data between itself and the worker processes. The master node will wait to receive a request for data from one of the worker nodes. When it receives such a request, it will send out the current values for loops J1-J4. Once a worker receives this data, it will execute the 9 inner loops and call the FORTRAN functions to determine if the current sequence of J- values produces a unique Fullerene. If so, the worker then sends the appropriate information back to the master node, otherwise It simply requests more data. Why study fullerenes? Skills / Tools developed along the way Results R un tim e for n vertices 0.000 20000.000 40000.000 60000.000 80000.000 100000.000 120000.000 140000.000 160000.000 180000.000 200000.000 40 55 70 85 100 n tim e(sec) C++ P arallel(np = 4) N Sequentia l C++ times Parallel C++/MPI times 52 25.686 6.933 56 68.788 17.875 60 175.920 44779 64 422.372 105.769 68 958.592 239.077 72 2072.44 638.142 76 4300.633 1067.637 80 8565.979 2102.237 Partial Table of numerical results 70.675 70.675 70.675 70.675 70.675 - translatng from FORTAN to C++ - Interfacing FORTRAN with C++ - “Extern” functions - lg2c. lf2c - Using gprof -gprof screenshot for n = 50: Master node Worker node 70.675 70.675 70.675 70.675 Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 86.90 128.79 128.79 5096665 25.27 25.27 windup_ 7.71 140.22 11.43 main 5.16 147.87 7.65 unwind_ 0.22 148.19 0.32 Matrix::ConvertToC(int*) 0.03 148.24 0.05 global constructors keyed to _ZN6MatrixC2Ev 0.00 148.24 0.00 4071 0.00 0.00 std::setw(int) 0.00 148.24 0.00 1 0.00 0.00 global constructors keyed to main 0.00 148.24 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)

Upload: ciara-faulkner

Post on 31-Dec-2015

15 views

Category:

Documents


1 download

DESCRIPTION

Parallelizing Code to Investigate the Geometric Properties of Fullerenes. Jeffery L Thomas Prof. Daniel Bennett, faculty advisor. Why study fullerenes?. STRUCTURE CHART. Our Strategy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallelizing Code to Investigate the Geometric Properties of Fullerenes

MASTER NODE

worker worker worker

FORTRAN FORTRAN

Parallelizing Code to Investigate the Geometric Properties of Fullerenes

A fullerene is a carbon allotrope which is commonly referred to as a “buckyball” when in a pseudo-spherical formation.

Mathematicians have been interested in cataloging these structures for some time, and in 1991 a program was written in FORTRAN to accomplish this task. The algorithm used to implement this program has been proven true for numbers of vertices up to n=380, however Due to the enormous complexity of the algorithm It would be unrealistic to do so, and it has been used to compile a catalog of the General Isomer fullerenes from 20-50 vertices, as well as the Isolated-Pentagon fullerenes from 60-100.

We have implemented a sequential version of this program in C++, as well as a parallel version using C++ and MPI. The latter has resulted in a significant increase in performance.

Jeffery L Thomas Prof. Daniel Bennett, faculty advisor

Justification for parallelization.

do 1 j1 = 1, m-11*jpr do 2 j2 =j1 +jpr, m-10*jpr do 3 j3 =j2 +jpr, m- 9*jpr do 4 j4 =j3 +jpr, m- 8*jpr do 5 j5 =j4 +jpr, m- 7*jpr do 6 j6 =j5 +jpr, m- 6*jpr do 7 j7 =j6 +jpr, m- 5*jpr do 8 j8 =j7 +jpr, m- 4*jpr do 9 j9 =j8 +jpr, m- 3*jpr do 10 j10=j9 +jpr, m- 2*jpr do 11 j11=j10+jpr, m- 1*jpr do 12 j12=j11+jpr, m do 14 j=1,m

CALL Windup (….) .

. .

CALL Unwind (….)14 continue12 continue11 continue10 continue9 continue8 continue7 continue6 continue5 continue4 continue3 continue2 continue1 continue

-An analysis of this FORTRAN code reveals the complexity of this algorithm to be:

O(n16)…mostly a result of the 13-deep nested loop shown to the left, as well as calling two FORTRAN functions. This is further Illustrated in the below abridged table of run times:

P. W. Fowler and D. E. Manolopoulos: AN ATLAS OF FULLERENES; Clarendon Press, Oxford 1995

STRUCTURE CHARTOur StrategyOur approach to parallelizing this algorithm was to divide up the nested loops. Our master node will handle all user Input/Output, as well as coordinate the distribution of data between itself and the worker processes.

The master node will wait to receive a request for data from one of the worker nodes. When it receives such a request, it will send out the current values for loops J1-J4. Once a worker receives this data, it will execute the 9 inner loops and call the FORTRAN functions to determine if the current sequence of J-values produces a unique Fullerene. If so, the worker then sends the appropriate information back to the master node, otherwise It simply requests more data.

Why study fullerenes?

Skills / Tools developed along the way

ResultsRun time for n vertices

0.000

20000.000

40000.000

60000.000

80000.000

100000.000

120000.000

140000.000

160000.000

180000.000

200000.000

40 55 70 85 100

n

time

(sec

)

C++ Parallel (np = 4)

N Sequential C++ times

Parallel C++/MPI times

52 25.686 6.933

56 68.788 17.875

60 175.920 44779

64 422.372 105.769

68 958.592 239.077

72 2072.44 638.142

76 4300.633 1067.637

80 8565.979 2102.237

Partial Table of numerical results

70.67570.67570.67570.67570.675

- translatng from FORTAN to C++

- Interfacing FORTRAN with C++

- “Extern” functions

- lg2c. lf2c

- Using gprof

-gprof screenshot for n = 50:

Master node

Worker node

70.67570.67570.67570.675

Each sample counts as 0.01 seconds.

% cumulative self self total time seconds seconds calls us/call us/call name 86.90 128.79 128.79 5096665 25.27 25.27 windup_ 7.71 140.22 11.43 main 5.16 147.87 7.65 unwind_ 0.22 148.19 0.32 Matrix::ConvertToC(int*) 0.03 148.24 0.05 global constructors keyed to _ZN6MatrixC2Ev 0.00 148.24 0.00 4071 0.00 0.00 std::setw(int) 0.00 148.24 0.00 1 0.00 0.00 global constructors keyed to main 0.00 148.24 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)