1 vampir overview
DESCRIPTION
TRANSCRIPT
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Event Tracing withVampirTrace and Vampir
2
Introduction
Event Tracing Overview
Instrumentation
Run-Time Measurement
Conclusions
Overview
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Introduction
4
Moore's Law still in charge, so what?
increasingly difficult to get close to peak performance
– for sequential computation• memory wall• optimum pipelining, ...
– for parallel interaction• Amdahl's law• synchronization with single late-comer, ...
efficiency is important because of limited resources
scalability is important to cope with next bigger simulation
Why bother with performance analysis?
5
Profile Recording
of aggregated information (Time, Counts, …)
about program and system entities
– functions, loops, basic blocks
– application, processes, threads, …
Methods of Profile Creation
sampling (statistical approach)
direct measurement (deterministic approach)
Profiling and Tracing
6
Trace Recording
run-time events (points of interest)
during program execution
saved as event record
– timestamp, process, thread, event type
– event specific information
via instrumentation & trace library
Event Trace
collection of all events of a process / program
sorted by time stamp
Profiling and Tracing
7
Tracing Advantages
preserve temporal and spatial relationships (context)
allow reconstruction of dynamic behavior
profiles can be calculated from traces
Tracing Disadvantages
traces can become very large
may cause perturbation
instrumentation and tracing is complicated
– event buffering, clock synchronization, …
Profiling and Tracing
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Event Tracing Overview
9
Event Tracing from A to Z
Instrumentation Run TimeMeasurement
Visualization / Analysis
src
exec.
instrument
instrument
exec.
trace file(s)
see more belowsee followingpresentation
10
Which events to monitor?
enter/leave of function/routine/region
– time stamp, process/thread, function ID
send/receive of P2P message (MPI)
– time stamp, sender, receiver, length, tag, communicator
collective communication (MPI)
– time stamp, process, root, communicator, # bytes
hardware performance counter value
– time stamp, process, counter ID, value
corresponding “record types” in trace file format
Most common event types
11
10010 P 1 ENTER 5
10090 P 1 ENTER 6
10110 P 1 ENTER 12
10110 P 1 SEND TO 3 LEN 1024 ...
10330 P 1 LEAVE 12
10400 P 1 LEAVE 6
10520 P 1 ENTER 9
10550 P 1 LEAVE 9
...
10020 P 2 ENTER 5
10095 P 2 ENTER 6
10120 P 2 ENTER 13
10300 P 2 RECV FROM 3 LEN 1024 ...
10350 P 2 LEAVE 13
10450 P 2 LEAVE 6
10620 P 2 ENTER 9
10650 P 2 LEAVE 9
...
DEF TIMERRES 1000000000
DEF PROCESS 1 `Master`
DEF PROCESS 1 `Slave`
DEF FUNCTION 5 `main`
DEF FUNCTION 6 `foo`
DEF FUNCTION 9 `bar`
DEF FUNCTION 12 `MPI_Send`
DEF FUNCTION 13 `MPI_Recv`
Parallel Trace Files
Trace Format Schematics
12
Trace Visualization: Timeline Display
13
Trace Visualization: Process Timeline Display
14
Trace Visualization: Statistic Summary Display
15
Trace Visualization: Message Statistics Display
16
The Vampir Tool Family
VampirTrace
convenient instrumentation and measurement
hides away complicated details
provides many options and switches for experts
VampirTrace is part of Open MPI 1.3
Vampir/VampirServer
interactive trace visualization and analysis
intuitive browsing and zooming
scalable to large trace data sizes (100GB)
scalable to high parallelism (2000 processes)
Vampir for Windows in progress, beta versionavailable
17
Open Trace Format (OTF)
Open source trace file format
Includes powerful libotf for use in custom applications
High level interface for tools + low level interface for trace libraries
Other Formats
TAU Trace Format (Univ. of Oregon)
Epilog (ZAM, FZ Jülich)
STF (Pallas, now Intel)
Trace File Formats
18
Other Event Tracing Tools
TAU profiling (University of Oregon, USA)
– profiling and tracing for parallel applications
– http://www.cs.uoregon.edu/research/tau/
Paraver (CEPBA, Barcelona, Spain)
– trace based parallel performance analysis and visualization
– http://www.cepba.upc.edu/paraver/
Scalasca (FZ Jülich)
– tracing and automatic detection of performance problems
– http://www.scalasca.org/
Intel Trace Collector & Analyzer
– Very similar to Vampir
Other Tools
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Instrumentation
20
Instrumentation: Process of modifying programs to detect and reportevents by calling instrumentation functions.
instrumentation functions provided by trace library
notification about run-time event
there are various ways of instrumentation
Instrumentation
21
Edit – Compile – Run Cycle
Edit – Compile – Run Cycle with VampirTrace
Source Code Binary ResultsCompiler Run
Source Code Binary ResultsVT Wrapper
Run
Traces
Compiler
Instrumentation
22
Source code instrumentation
– manually
– automatically
Instrumentation with wrapper functions
Library pre-load instrumentation
Compiler Instrumentation
Binary instrumentation
VampirTrace supports different methods of instrumentation
Hidden in compiler wrappers
Instrumentation Types
23
int foo(void* arg) {
if (cond) {
return 1;
}
return 0;
}
int foo(void* arg) {
enter(7);
if (cond) {
leave(7);
return 1;
}
leave(7);
return 0;
}
manually or automatically
Source Code Instrumentation
24
manually
large effort
error prone
difficult to manage
automatically
via source to source translation
Program Database Toolkit (PDT)http://www.cs.uoregon.edu/research/pdt/
OOpenMP PPragma AAnd Region IInstrumentor (Opari)http://www.fz-juelich.de/zam/kojak/opari/
Source Code Instrumentation
25
provide wrapper functions
– call instrumentation function for notification
– call original target for actual functionality
implement via library pre-load
or via preprocessor directives
suitable for standard libraries (e.g. MPI, glibc)
can evaluate function call semantics (function signature, arguments)
#define fread WRAPPER_glibc_fread
#define fwrite WRAPPER_glibc_fwrite
Instrumentation with Wrapper Functions
26
wrapper library
Instrumentation via library pre-load, e.g. for MPI
Each MPI function has two names:
– MPI_xxx and PMPI_xxx
Selective replacement of MPI routines at link time
user program
MPI library
MPI_Send
PMPI_Send MPI_Send
MPI_Send
MPI_Send
MPI_SendMPI_Send
The MPI Profiling Interface
27
gcc -finstrument-functions –c foo.c
many compilers support instrumentation:
(GCC, Intel, IBM, PGI, NEC, Hitachi, Sun Fortran, …)
no common API, different command line switches, differentbehavior
no source modification necessary
managed by VampirTrace
void __cyg_profile_func_enter( <args> );
void __cyg_profile_func_exit( <args> );
Compiler Instrumentation
28
modify binary executable in main memory (or in a file)
insert instrumentation calls
very platform/machine dependent
expensive
Using the DynInst project
provides common interface to binary instrumentation
available for Alpha/Tru64, MIPS/IRIX, PowerPC/AIX,Sparc/Solaris, x86/Linux+Windows, ia64/Linux
see http://www.dyninst.org
Dynamic Instrumentation
29
Use VampirTrace compiler wrappers
Internals and plattform specifics hidden
Select appropriate way(s) of instrumentation
Substitute calls to the regular compiler with calls to compilerwrappers
CC=mpicc
CC=vtcc
Practical Instrumentation
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Run Time Measurement
31
What does the trace library do?
provide instrumentation functions
receive events of various types
collect event properties
– time stamp
– location (thread, process, cluster node, MPI rank)
– event specific properties
– perhaps hardware performance counter values
record to memory buffer, flush eventually
try to be fast, minimize overhead
Trace Library
32
There are a number of run-time options
Controlled by environment variables
PAPI hardware performance counters
Memory allocation counters
Application I/O calls
Filtering
Grouping
more ...
see more in the following presentations and hands-on parts
Run-Time Options
33
Include hardware performance counters in traces
– via PAPI library
– or Sun Solaris CPC counters
– or NEC SX counters
VT_METRICS can be used to specify a colon-separated list of counters
see papi_avail and papi_command_line tools etc.
see VampirTrace Documentation for CPC and NEC counters
set VT_METRICS environment variable
export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM
Performance Counters
34
monitor memory allocation behavior
record memory volume as counter
record glibc calls like “malloc” and “free” as function calls
via environment variable VT_MEMTRACE
export VT_MEMTRACE=yes
Memory Allocation Tracing
35
monitor POSIX I/O behavior
record read/write rates as counters
record standard I/O calls like “open” and “read”
via environment variable VT_IOTRACE
mmap I/O not supported
export VT_IOTRACE=yes
I/O Tracing
36
selective tracing of certain functions/subroutines
one way to reduce trace file size!
via environment variable VT_FILTER_SPEC
run-time filtering, no re-compilation or re-linking
see also the vtfilter tool
– can create a filter file with rough target size estimate
– can apply a filter to an existing trace file as post processing
export VT_FILTER_SPEC=/home/user/filter.spec
my*;test -- 1000calculate -- -1* -- 1000000
Function Filtering
37
defined user specified groups
highlighting application behavior, different activities, program phases
– communication, computation, initialization, different libraries, ...
groups are assigned to colors in Vampir displays
run-time grouping, no re-compilation or re-linking
via environment variable VT_GROUPS_SPEC
contains a list of groups of associated functions, wildcards allowed
export VT_GROUPS_SPEC=/home/<user>/groups.spec
CALC=calculateMISC=my*;testUNKNOWN=*
Function Grouping
38
Further activities of the trace library:
Data management
– Trace data is written to a buffer in memory first
– When this buffer is full, data is flushed to files
– Data compression, etc
Timer selection and time synchronization between local clocks
– use highly accurate clocks
Unification of local process/thread traces (post processing)
– trace processes/threads separately
– collect all traces of all parallel processes/threads at the end
– add global information about all participants
Behind the Scenes
Zellescher Weg 12
Willers-Bau A114
Tel. +49 351 - 463 - 38323
Andreas Knüpfer ([email protected])
Conclusions
40
performance analysis is very important in HPC
use performance analysis tools for profiling and tracing
do not spend effort in DIY solutions, e.g. like printf-debugging
use tracing tools with some precautions
– overhead
– data volume
let us know about problems and about feature wishes [email protected]
Conclusion
41
available via http://www.vampir.eu/ and http://www.tu-dresden.de/zih/vampirtrace/
Thank you !