profile analysis with parapro · hands-on: profile report exploration • the tutorial contains...
TRANSCRIPT
![Page 1: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/1.jpg)
Profile Analysis with ParaProf
Sameer Shende Performance Research Lab, University of Oregon
http://TAU.uoregon.edu
![Page 2: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/2.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
TAU Performance System® (http://tau.uoregon.edu)
• Parallel performance framework and toolkit – Supports all HPC platforms, compilers, runtime system – Provides portable instrumentation, measurement, analysis
![Page 3: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/3.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
TAU Performance System®
• Instrumentation – Fortran, C++, C, UPC, Java, Python, Chapel – Automatic instrumentation
• Measurement and analysis support – MPI, OpenSHMEM, ARMCI, PGAS, DMAPP – pthreads, OpenMP, hybrid, other thread models – GPU, CUDA, OpenCL, OpenACC – Parallel profiling and tracing – Use of Score-P for native OTF2 and CUBEX generation – Efficient callpath proflles and trace generation using Score-P
• Analysis – Parallel profile analysis (ParaProf), data mining (PerfExplorer) – Performance database technology (PerfDMF, TAUdb) – 3D profile browser
![Page 4: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/4.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
TAU
• TAU supports both sampling and direct instrumentation • Memory debugging as well as I/O performance
evaluation • Profiling as well as tracing • Interfaces with Score-P for more efficient measurements • TAU’s instrumentation covers:
– Runtime library interposition (tau_exec) – Compiler-based instrumentation – PDT based Source level instrumentation: routine & loop – Event based sampling (TAU_SAMPLING=1) – Callstack unwinding with sampling (TAU_EBS_UNWIND=1) – OpenMP Tools Interface (OMPT, tau_exec –T ompt) – CUDA CUPTI, OpenCL (tau_exec -T cupti -cupti)
4
![Page 5: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/5.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
LABS on Poincare: paraprof
module use /gpfslocal/pub/vihps/UNITE/local!module load UNITE VI-HPS-TW!cd tutorial/NPB3.3-MZ-MPI!make suite!cd bin; cp ../jobscript/mds/run.tau.ll!Uncomment the first, then second run block:!# Case 2: MPI with OpenMP (OpenMP Tools Interface (OMPT))!#mpirun -np ${LOADL_TOTAL_TASKS} tau_exec -T ompt ./bt-mz_B.4!!llsubmit run.tau.ll!Wait and then launch after the job finishes:!paraprof (Right Click on node 0 or 1, Show Thread Statistics Table. Show Source Code on an OMPT source location. Also use paraprof on Score-P *.cubex files.) !
5
![Page 6: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/6.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
TAU Analysis
![Page 7: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/7.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf Profile Analysis Framework
![Page 8: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/8.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
Parallel Profile Visualization: ParaProf
![Page 9: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/9.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
Parallel Profile Visualization: ParaProf
![Page 10: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/10.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: 3D Communication Matrix
![Page 11: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/11.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
Hands-on: Profile report exploration
• The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each – collected on a dedicated node of the SuperMUC HPC system
at Leibniz Rechenzentrum (LRZ), Munich, Germany
• Start TAU‘s paraprof GUI with default profile report
11
% cd % ls periscope-1.5 scorep_bt-mz_B_4x4_sum README scorep_bt-mz_B_4x4_sum+mets run.out scorep_bt-mz_B_4x4_trace scorep-20120913_1740_557443655223384
% paraprof scorep-20120913_1740_557443655223384/profile.cubex OR % paraprof scorep_bt-mz_B_4x4_trace/scout.cubex
![Page 12: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/12.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Manager Window: scout.cubex
Metrics in the profile
![Page 13: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/13.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Main window
![Page 14: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/14.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Options
Unselect this to expand each routine in its own
space
![Page 15: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/15.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf:
Each color represents an event executing on one or
more threads
![Page 16: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/16.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Windows
Right click on a given node to choose other windows
![Page 17: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/17.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Thread Statistics Table
Click to sort by a given metric, drag and move to
rearrange columns
![Page 18: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/18.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
Example: Score-P with TAU (LU NPB)
![Page 19: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/19.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Thread Callgraph Window
Click on options to choose a different color or to resize the box based on metrics
![Page 20: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/20.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Callpath Thread Relations Window
![Page 21: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/21.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf:Windows -> 3D Visualization -> Bar Plot
![Page 22: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/22.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: 3D Scatter Plot
![Page 23: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/23.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Scatter Plot
![Page 24: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/24.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: 3D Topology View for a Routine
![Page 25: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/25.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Topology View 3D Torus (IBM BG/P)
![Page 26: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/26.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf:Topology View (6D Torus Coordinates BG/Q)
![Page 27: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/27.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Node View
![Page 28: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/28.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Add Thread to Comparison Window
![Page 29: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/29.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Score-P Profile Files, Database
![Page 30: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/30.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: File -> Preferences
![Page 31: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/31.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Group Changer Window
![Page 32: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/32.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
ParaProf: Options -> Derived Metric Panel
![Page 33: Profile Analysis with ParaPro · Hands-on: Profile report exploration • The Tutorial contains Score-P experiments of BT-MZ – class “B“, 4 processes with 4 OpenMP threads each](https://reader034.vdocuments.us/reader034/viewer/2022042300/5ecb4962103d046d6322bf39/html5/thumbnails/33.jpg)
VI-HPS TW15: VI-HPS Tuning Workshop, Saclay, France
Sorting Derived Flops Metric by Exclusive Time