a high-level framework for parallelizing legacy applications for multiple platforms ritu arora texas...
TRANSCRIPT
![Page 1: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/1.jpg)
A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms
Ritu AroraTexas Advanced Computing Center
Email: [email protected]
![Page 2: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/2.jpg)
Outline
• Motivation and Goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
2
![Page 3: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/3.jpg)
Plenty of Parallel Programming Languages and Paradigms
MPI OpenMP
CUDAOpenCL
Implicitly Parallel Languages (X10, Fortress, SISAL)PGAS languages (UPC, Co-Array Fortran)
Offload programming for MICSHMEM
Cilk/Intel Cilk PlusCharm++
HPF Hybrid programming (MPI + OpenMP)
3
![Page 4: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/4.jpg)
4
There is a need to develop a tool (high-level framework) that offers a low-risk way for domain-experts to try HPC but first…
MPI
Help
CUDA
helpOpenMP
Help
![Page 5: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/5.jpg)
… Understanding the Mindset of the User Community is Important
“…the history of HPC is littered with new technologies that promised increased scientific productivity but are no longer available.”
“A new technology that can coexist with older ones has a greater chance of success than one requiring complete buy-in at the beginning.”
“For many frameworks, a significant barrier to their use is that you can’t integrate them incrementally.”
“Frameworks provide programmers a higher level of abstraction, but at the cost of adopting the framework’s perspective on how to structure the code.”
5
Source: Understanding the High Performance Computing Community: A Software Engineer’s Perspective, Basili et al.
![Page 6: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/6.jpg)
Standard and Non-Standard Steps for Parallelization that are Repeatable
• Examples of standard steps in developing an MPI application (common in all MPI programs)– Every MPI program has #include "mpi.h"– Every MPI program has MPI_Init and MPI_Finalize function calls
• Non-standard steps in developing an MPI application– for-loop parallelization, data distribution, mapping of tasks
to processes, and orchestration of exchange of messages• Steps for splitting the work in a for-loop amongst all the processes
in MPI_COMM_WORLD are standard for a given load-balancing scheme
6
![Page 7: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/7.jpg)
Goals• Develop a high-level framework for semi-automatic
parallelization that can leverage the investment made in legacy applications
• The framework should be built on top of successful programming paradigms like MPI, OpenMP, and CUDA
• Provide support for incremental parallelism through the framework
• Abstract the standard and non-standard steps in parallelization
7
![Page 8: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/8.jpg)
Outline
• Motivation and Goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
8
![Page 9: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/9.jpg)
How Does the Framework Work?
9
![Page 10: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/10.jpg)
Providing Specifications Through Hi-PaL (1)
Parallel section begins <hook type> (<hook pattern>) mapping is <mapping type> { <Hi-PaL API for specifying the operation> <hook> && in function (<function name>)}
10
OMP_Parallel {
<Hi-PaL API for specifying the operation> && schedule is <schedule type> <hook> && in function (<function name>)
}
General Structure of Hi-PaL Code to Generate MPI Code
General Structure of Hi-PaL Code to Generate OpenMP Code
![Page 11: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/11.jpg)
Providing Specifications Through Hi-PaL (2)A set of Hi-PaL API has been developed for precisely capturing the end-users’ specifications at a high-level
11
Hi-PaL API Description
ParExchange2DArrayInt(<array name>, <num of rows>, <num of columns>)
Exchange neighboring values in stencil-based computations
Parallelize_For_Loop where (<for_init_stmt>; <condition>; <stride>)
Parallelize for-loop with matching condition, stride and initialization statement
ReduceSumInt(<variable name>)
MPI_Reduce with MPI_SUM operation or OpenMP reduction clause with ‘+’ operator; reduced variable is of type integer
![Page 12: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/12.jpg)
Parallelizing Poisson solver (1)
12
1. //other code 2. NTIMES = atoi(argv[3]); 3. a = allocMatrix<double>(a, M, N); 4. b = allocMatrix<double>(b, M, N); 5. f = allocMatrix<double>(f, M, N); 6. start = 0; 7. //other code 8. printMatrix<double>(a, M, N); 9. t1 = gettime(); 10. for (k = start; k < NTIMES && norm >= tolerance; k++) { 11. b = compute(a, f, b, M, N); 12. ptr = a; 13. a = b; 14. b = ptr; 15. norm = normdiff(b, a, M, N); 16. } 17. t2 = gettime();//other code
Code snippet of serial Poisson Solver Code
![Page 13: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/13.jpg)
Parallelizing Poisson solver (2)1. Parallel section begins after ("NTIMES = atoi(argv[3]);") mapping is Linear{ 2. ParExchange2DArrayDouble (a, M, N) before statement ("printMatrix<double>(a, M, N);") && in function ("main"); 3. ParExchange2DArrayDouble (b, M, N) before statement ("printMatrix<double>(a, M, N);") && in function ("main"); 4. ParExchange2DArrayDouble (b, M, N) after statement ("b=compute(a, f, b, M, N);") && in function ("main"); 5. AllReduceSumInt(norm) after statement ("norm = normdiff(b, a, M, N);") && in function ("main") 6. }
13
Hi-PaL Code to Generate MPI Code for Poisson Solver
![Page 14: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/14.jpg)
Generated MPI Code for Poisson Solver (1)1. //other code 2. NTIMES = atoi(argv[3]); 3. MPI_Init(NULL, NULL); 4. MPI_Comm_size(MPI_COMM_WORLD, &size_Fraspa); 5. MPI_Comm_rank(MPI_COMM_WORLD, &rank_Fraspa); 6. create_2dgrid(MPI_COMM_WORLD, &comm2d_Fraspa,…); 7. create_diagcomm(MPI_COMM_WORLD, size_Fraspa, …); 8. rowmap_Fraspa.init(M, P_Fraspa, p_Fraspa); 9. colmap_Fraspa.init(N, Q_Fraspa, q_Fraspa); 10. myrows_Fraspa = rowmap_Fraspa.getMyCount(); 11. mycols_Fraspa = colmap_Fraspa.getMyCount(); 12. M_Fraspa = M; 13. N_Fraspa = N; 14. M = myrows_Fraspa; 15. N = mycols_Fraspa; 16. a = allocMatrix<double>(a, M, N);
14
![Page 15: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/15.jpg)
Generated MPI Code for Poisson Solver (2)17. b = allocMatrix<double>(b, M, N); 18. f = allocMatrix<double>(f, M, N); 19. start = 0; 20. //other code 21. a = exchange<double>(a, myrows_Fraspa + 2, …); 22. b = exchange<double>(b, myrows_Fraspa + 2, …); 23. printMatrix<double>(a, M, N); 24. t1 = MPI_Wtime(); 25. for (k = start; k < NTIMES && norm >= tolerance; k++) { 26. b = compute(a, f, b, M, N); 27. b = exchange<double>(b, myrows_Fraspa + 2, …); 28. ptr = a; 29. a = b; 30. b = ptr; 31. norm = normdiff(b, a, M, N); 32. MPI_Allreduce(&norm, &norm_Fraspa, 1, MPI_INT, MPI_SUM,…); 33. norm = norm_Fraspa; 34. } 36. //other code
15
![Page 16: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/16.jpg)
Snippet of Exchange Template for MPItemplate <typename T>T** exchange(T** data, int nrows, int ncols, int P, int Q, int p, int q, MPI_Comm comm2d, MPI_Comm rowcomm, MPI_Comm colcomm) { //other code above. Create datatype for the recvtype code below MPI_Type_vector(nrows-2, 1, ncols, datatype, &temptype); MPI_Type_extent(datatype, &sizeoftype) ; int blens[2] = {1, 1}; MPI_Aint displ[2] = {0, sizeoftype}; MPI_Datatype types[2] = {temptype, MPI_UB}; MPI_Type_struct (2, blens, displ, types, &vectype); MPI_Type_commit(&vectype); MPI_Cart_shift(rowcomm, 0, -1, &prev, &next); MPI_Cart_shift(colcomm, 0, -1, &down, &up);// send and receive the boundary rows MPI_Irecv(&data[0][1], ncols-2, datatype, up, 0, …); MPI_Irecv(&data[nrows-1][1], ncols-2, datatype, down, 0, …); MPI_Isend(&data[1][1], ncols-2, datatype, up, 0, colcomm, …); MPI_Isend(&data[nrows-2][1], ncols-2, datatype, down, …); …}
16
![Page 17: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/17.jpg)
Providing the Specifications Through Command-Line Interface
17
Would you like to use MPI or OpenMP?(1) MPI(2) OpenMP2==================================================Would you like to use MIC?(1) Yes(2) No1================================================== Would you like to use this for loop? Y or N? for ( i = 0; i < ihi; i++ ){ i4_to_bvec ( i, n, bvec ); value = circuit_value (n, bvec); ... //other lines of code }YThis loop contains the following variables: i,value,j,solution_numChoose the variables for reduction: solution_numOperation Complete...OpenMP code with offload capability is generated
![Page 18: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/18.jpg)
Providing the Specifications Through Graphical User Interface
18
![Page 19: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/19.jpg)
Currently Available Support For…
Parallel Programming Paradigms
MPI
OpenMP
MPI + OpenMP
OpenMP + Offload
CUDA
Parallel Programming
Patterns
For-loops with Reduction
Stencil-Based Computations
(Regular Mesh)
Pipeline
Replicable
Base languages supported
C/C++
19
Support for Fortran will be
added …
![Page 20: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/20.jpg)
Time for demos
• Using the framework through GUI
• Using the framework through Hi-PaL
20
![Page 21: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/21.jpg)
Outline
• Motivation and Goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
21
![Page 22: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/22.jpg)
Results: Poisson Solver (Hi-PaL based MPI)
22
![Page 23: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/23.jpg)
Results: Genetic Algorithm for Content Based Image Retrieval (Hi-PaL based MPI & OpenMP)
23
![Page 24: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/24.jpg)
Results: Seismic Tomography Code (GUI-based CUDA)
24
![Page 25: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/25.jpg)
25
Results: Circuit Satisfiability Code (GUI-based OpenMP + Offload)
![Page 26: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/26.jpg)
Outline
• Motivation and Goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
26
![Page 27: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/27.jpg)
Summary of Features & Benefits of the Framework
• Enhances the productivity of the end-users in terms of the reduction in the time and effort
– reduction in manual effort by over 90% while ensuring that the performance of the generated parallel code is within 5% of the sample hand-written parallel code
• Leverages the knowledge of expert parallel programmers
• Separates the sequential and parallel programming concerns while preserving the existing version of sequential applications
27
![Page 28: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/28.jpg)
Outline
• Motivation and goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
28
![Page 29: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/29.jpg)
Project Status
29
![Page 30: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/30.jpg)
Outline
• Motivation and goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
30
![Page 31: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/31.jpg)
Future Work• Migrate from DMS to Rose source-to-source compiler
and integrate all the interfaces• Usability studies:– Should be able to rank the user preferences for the interface– Prioritize the development effort
• Integration with PerfExpert and Eclipse• Ability to handle irregular meshes and specify pipeline
mode of communication through GUI• Address the demands for – a directives-based interface– the option of editing the log-file to repeat the code-
generation process without going through the GUI
31
![Page 32: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/32.jpg)
Outline
• Motivation and goals• Overview of the Framework with demos• Results• Features and Benefits• Project Status• Future Work• Conclusion• Q & A
32
![Page 33: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/33.jpg)
Conclusion
• Through this research and development effort, we have demonstrated– an approach for lowering the adoption barriers to HPC by
raising the level of abstraction of parallel programming – an interactive tool for teaching parallel programming: think
“Alice” and “DrJava”– the usage of multiple interfaces to accommodate the
preferences of the user-community: one size does not fit all – that it is possible to achieve abstraction and performance at
the same time
33
![Page 34: A High-Level Framework for Parallelizing Legacy Applications for Multiple Platforms Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu](https://reader035.vdocuments.us/reader035/viewer/2022062713/56649cee5503460f949bb619/html5/thumbnails/34.jpg)
Acknowledgement
On behalf of my co-authors, student and XSEDE intern (Julio Olaya), I would like to thank NSF,
XSEDE and TACC!
34