blaise barney, llnl asc tri-lab code development tools workshop thursday, july 29, 2010 lawrence...
TRANSCRIPT
Blaise Barney, LLNL
ASC Tri-Lab Code Development Tools WorkshopThursday, July 29, 2010
Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344
memP: Lightweight Heap Profiling
LLNL-PRES-426113
Lawrence Livermore National Laboratory
memP: Simple, scalable, heap profiling
memP heap profiling library provides an initial tool for examining application heap use
Helps to simplify first-pass performance analysis• Easy to use: simple, reliable, low learning overhead• Focus on important, yet easy to access information
Key metric: Heap High Water Mark (HWM)• Maximum in-use heap allocation amount• Tracks allocated memory• Retains
HWM HWM call site
Supported platforms:• AMD Opteron w/ IB (TLCC, Peloton, etc.)• BG/L and BG/P
1
Lawrence Livermore National Laboratory
How does it work?
Allocation calls are intercepted by memP wrappers memP wrappers call internal library functions All information is task-local Data collection and report generation is done
• arbitrarily (by runtime parameters)• within MPI_Finalize
Task data is collected by the library using MPI collectives
2
Lawrence Livermore National Laboratory
Using memP on LLNL's Linux Clusters
1. Important: first issue the command (interactive or in batch script):
use memp
2. Compile or re-link your application with the required libraries: -L/usr/local/tools/memp/lib –lmemP –ldl
3. Set selected runtime parameters using the MEMP environment variable. For example:
setenv MEMP "-x -t"
More on this later...
3
Lawrence Livermore National Laboratory
Using memP on LLNL's Linux Clusters
4. Run your application as usual under the srun command. For example:srun -n 16 -p pdebug a.out (interactively from login node)
srun -n 64 a.out (in batch script)
Or, instead of compiling/re-linking, you can also use run-time insertion of memP by using the srun-memp command. For example:
srun-memp -n 16 -p pdebug a.out
5. Successful execution will produce an memP report(s). For example:memP: Storing memP output in [./a.out.8.26996.1.memP].
4
Lawrence Livermore National Laboratory
Viewing memP output
Recommended way to view memP output is with the Tool Gear viewer:• Need to have run with -x included in your MEMP environment variable setting• use mpipview (load Tool Gear components)• setup your local X11 display environment• TGui memP-outputfile
Plain text summary report also available (do not view with GUI)
Examining heap use• Summary report only (text)• Task report - HWM (xml)• Task report - memory in use(xml)
Types of information reported depends mostly on the setting of your MEMP environment variable.
5
Lawrence Livermore National Laboratory
Viewing memP output
Some useful MEMP runtime options:
Usually need to use a combination of options to get desired reports. For example, you always need -t and -x to get any useful information at the task level, and to see source code locations.
Parameter Effect Default
-x Generate XML output Off
-h [#] Task Report with HWM threshold specified No Filter
-i [#] Produce task "memory in use" report(s) Off
-j [#] Report only on a given MPI rank.Report on all ranks
-p [#] Number of HWM task entries to print (text report only) All tasks
-t Generate stack trace data Off
6
Lawrence Livermore National Laboratory
Using memP on LLNL's Linux Clusters
Example MEMP settings:
setenv MEMP "-x -t"
Produce an XML report that includes task HWM details and source code locations
setenv MEMP "-x -t -h 12000000 -i 1"
Same as previous plus generate one "memory in use" report per task if the specified HWM threshhold is reached.
setenv MEMP "-x -t -h 12000000 -i 8"
Same as previous but produce up to 8 "memory in use" reports per task - allows reporting on multiple locations in code that reach the specified HWM
setenv MEMP "-x -t -h 12000000 -i 8 -j 6"
Same as previous, but only generate "memory in use" reports for MPI task 6
7
Lawrence Livermore National Laboratory
Viewing memP outputDefault text summary report
MEMP unset
Text summary report
MEMP="-p 8"
8
Lawrence Livermore National Laboratory
Viewing memP output
Summary Report - HWM• XML only• Detail for each MPI
task• Callsite stack tracing• Source code view• MEMP="-x -t"
9
Lawrence Livermore National Laboratory
Viewing memP output
10
Lawrence Livermore National Laboratory
Viewing memP output
Task Report - memory in use:• XML only• Detail for each MPI
task• Allocations in use by
Call Site ID (CSID)• Can generate
multiple files/reports per task.
• Source code view• MEMP="-x -t -i #"
11
Lawrence Livermore National Laboratory
Viewing memP output
Example: files/reports produced by 8-task job with MEMP="-x -t -h 12000000 -i 2"
-rw------- 1 blaise blaise 16418 Jul 20 14:03 a.out.8.11262.1.memP Summary report
-rw------- 1 blaise blaise 205240 Jul 20 14:03 a.out.rank-0.1.11262.15295893.memP
-rw------- 1 blaise blaise 205831 Jul 20 14:03 a.out.rank-0.2.11262.19295893.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-1.1.11263.15295429.memP Task memory in use reports
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-1.2.11263.19295429.memP Up to 2 call sites per task
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-2.1.11264.15295429.memP where HWM was reached
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-2.2.11264.19295429.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-3.1.11265.15295429.memP
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-3.2.11265.19295429.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-4.1.11266.15295429.memP
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-4.2.11266.19295429.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-5.1.11267.15295429.memP
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-5.2.11267.19295429.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-6.1.11268.15295429.memP
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-6.2.11268.19295429.memP
-rw------- 1 blaise blaise 203160 Jul 20 14:03 a.out.rank-7.1.11269.15295429.memP
-rw------- 1 blaise blaise 203751 Jul 20 14:03 a.out.rank-7.2.11269.19295429.memP
12
Lawrence Livermore National Laboratory
Availability• LLNL: Library installed in /usr/local/tools/memp/lib• SNL
Module load tools/memP-1.0.0 /projects/tools_workshop/memP
Future Work• Measuring available memory• BG/P Kernel_GetMemorySize• How to provide more information about OOM conditions• Reporting based on time of allocation vs allocation size
References• Tool Gear: John Gyllenhaal (LLNL) & John May (LLNL)
http://computation.llnl.gov/casc/tool_gear/• Tool POC: Chris Chambreau <[email protected]>site• Tool mail list : [email protected]• memP site : http://memp.sourceforge.net/
Availability, Future Work, References
13