a case study of communication optimizations on 3d ...charm.cs.illinois.edu/newpapers/09-24/talk.pdfa...
TRANSCRIPT
![Page 1: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/1.jpg)
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS
University of Illinois at Urbana-Champaign
Abhinav Bhatele, Eric Bohm, Laxmikant V. KaleParallel Programming Laboratory
Euro-Par 2009
![Page 2: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/2.jpg)
Outline
MotivationSolution: Mapping of OpenAtomPerformance BenefitsBigger Picture:
Resources NeededHeuristic Solutions
Automatic Mapping
August 27th, 2009
2
Abhinav Bhatele @ Euro-Par 2009
![Page 3: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/3.jpg)
OpenAtom
Ab-Initio Molecular Dynamics codeConsider electrostatic interactions between the nuclei and electronsCalculate different energy termsDivided into different phases with lot of communication
August 27th, 2009
3
Abhinav Bhatele @ Euro-Par 2009
![Page 4: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/4.jpg)
OpenAtom on Blue Gene/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
4
0
0.05
0.1
0.15
0.2
0.25
0.3
512 1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w32 Default
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode
w32 = 32 water molecules with 70 Ry cutoff
![Page 5: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/5.jpg)
The problem lies in …
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
5
Performance Analysis and Visualization Tool: Projections (part of Charm++) – Timeline View
![Page 6: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/6.jpg)
Solution –
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
6
Topology Aware Mapping
![Page 7: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/7.jpg)
Processor Virtualization
User View System View
Programmer: Decomposes the computation into objects
Runtime: Maps the computation on to the processors
August 27th, 2009
7
Abhinav Bhatele @ Euro-Par 2009
![Page 8: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/8.jpg)
Benefits of Charm++
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
8
Computation is divided into objects/chares/virtual processors (VPs)Separates decomposition from mappingVPs can be flexibly mapped to actual physical processors (PEs)
![Page 9: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/9.jpg)
Topology Manager API†
The application needs information such asDimensions of the partitionRank to physical co-ordinates and vice-versa
TopoManager: a uniform APIOn BG/L and BG/P: provides a wrapper for system callsOn XT3/4/5, there are no such system callsProvides a clean and uniform interface to the application
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
9
† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm
![Page 10: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/10.jpg)
Parallelization using Charm++
August 27th, 2009
10
Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.
Abhinav Bhatele @ Euro-Par 2009
![Page 11: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/11.jpg)
Mapping Challenge
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
11
Load Balancing: Multiple VPs per PEMultiple groups of communicating objects
Intra-group communicationInter-group communication
Conflicting communication requirements
![Page 12: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/12.jpg)
Topology Mapping of Chare Arrays
August 27th, 2009
12
RealSpace and GSpace have state-wisecommunication Paircalculator and
GSpace have plane-wisecommunication
Abhinav Bhatele @ Euro-Par 2009
![Page 13: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/13.jpg)
Performance Improvements on BG/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
13
0
0.05
0.1
0.15
0.2
0.25
0.3
512 1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w32 Defaultw32 Topology
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode, Year: 2006
w32 = 32 water molecules with 70 Ry cutoff
![Page 14: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/14.jpg)
Improved Timeline Views
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
14
![Page 15: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/15.jpg)
Results on Blue Gene/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
15
0
5
10
15
20
25
1024 2048 4096 8192 16384
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology
GST_BIG = 64 Ge, 128 Sb and 256 Te molecules with 20 Ry cutoff
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode
![Page 16: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/16.jpg)
Results on Blue Gene/P
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
16
0
2
4
6
8
10
12
1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 Topology
w256 = 256 water molecules with 70 Ry cutoff
Runs on Blue Gene/P at Argonne National Laboratory, VN mode
![Page 17: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/17.jpg)
Results on Cray XT3
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
17
2
3
4
5
6
7
8
512 1024 2048
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology
Runs on Cray XT3 (Bigben) at Pittsburgh Supercomputing Center, VN mode(with system reservation to obtain complete 3d mesh shapes)
![Page 18: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/18.jpg)
Performance Analysis
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
18
0
500000
1000000
1500000
2000000
2500000
3000000
1024 2048 4096 8192
Idle
Tim
e (s
ecs)
No. of cores
DefaultTopology
Performance Analysis and Visualization Tool: Projections – Idle time added across all processors
w256M_70Ry on Blue Gene/L
![Page 19: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/19.jpg)
Reduction in Communication Volume
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
19
0
200
400
600
800
1000
1200
1024 2048 4096 8192
Band
wid
th (
GB)
No. of cores
DefaultTopology
Data obtained from Blue Gene/P’s Uniform Performance Counters
w256M_70Ry on Blue Gene/P
![Page 20: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/20.jpg)
Relative Performance Improvement
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
20
0
0.5
1
1.5
2
2.5
512 1024 2048 4096 8192
% Im
prov
emen
t
No. of cores
Cray XT3Blue Gene/LBlue Gene/P
w256M_70Ry
![Page 21: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/21.jpg)
Bigger picture
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
21
Different kinds of applications:Computation boundCommunication bound
Latency tolerantLatency sensitive
Technique:Obtain processor topology and application communication graphHeuristic Techniques for mapping
![Page 22: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/22.jpg)
Why does distance affect message latencies?
Consider a 3D mesh/torus interconnect
Message latencies can be modeled by
(Lf/B) x D + L/B
Lf = length of flit, B = bandwidth,
D = hops, L = message size
When (Lf * D) << L, first term is negligible
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
22
But in presence of contention …
![Page 23: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/23.jpg)
Automatic Topology Aware Mapping
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
23
Many MPI applications exhibit a simple two-dimensional near-neighbor communication patternExamples: MILC, WRF, POP, Stencil, …
![Page 24: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of](https://reader030.vdocuments.us/reader030/viewer/2022040621/5f3468b15bb0953ce727650d/html5/thumbnails/24.jpg)
Acknowledgements:Shawn Brown and Chad Vizino (PSC)Glenn Martyna, Sameer Kumar, Fred Mintzer (IBM)Teragrid for running time on Bigben (XT3)ANL for running time on Blue Gene/P
DOE Grant B341494 (CSAR), DOE Grant DE-FG05-08OR23332 (ORNL LCF) and NSF Grant ITR 0121357
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
Funding
E-mail: [email protected]: http://charm.cs.illinois.edu