trace visualization

35
Trace Visualization Visualization and Analysis of MPI Resources

Upload: ptihpa

Post on 21-May-2015

953 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trace Visualization

Trace Visualization

Visualization and Analysis of MPI Resources

Page 2: Trace Visualization

Motivation & Mission• Motivation

– Parallel programming is about performance!– Scaling to thousands of cores is required– You need a decent MPI implementation, e.g. Open MPI– You also need a ready-to-use performance monitoring and

analysis tool

• Mission– Visualization of dynamics of complex parallel processes– Requires two components

• Monitor/Collector (VampirTrace)• Charts/Browser (Vampir)

– Available for major platforms– Open Source (partially)

Page 3: Trace Visualization

Event Trace Visualization

• Trace Visualization– Alternative and supplement to automatic analysis

– Show dynamic run-time behavior graphically

– Provide statistics and performance metrics• Global timeline for parallel processes/threads

• Process timeline plus performance counters

• Statistics summary display

• Message statistics

• More

– Interactive browsing, zooming, selecting• Adapt statistics to zoom level (time interval)

• Also for very large and highly parallel traces

Page 4: Trace Visualization

Vampir History

• PARvis at Research Center Jülich

• 1995: Vampir at Research Center Jülich

http://www.top500.org/reports/1995/vampir/vampir.html

– 1997: Vampir at TU Dresden

– 2006: new version VampirServer (or Vampir NG)

• Distributed storage, enhanced scalability

• Client/server architecture

– 2009: Vampir7 – redesign of GUI using QT

Page 5: Trace Visualization

Vampir Toolset Architecture

Vampir

Trace

Vampir

Trace

TraceFile

(OTF)

Vampir 7

TraceBundle

VampirServer

CPU CPU

CPU CPUCPU CPU

CPUCPU

Multi-Core

Program

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

CPU CPU CPU CPU

Many-CoreProgram

Page 6: Trace Visualization

Vampir for Windows

• Vampir for UNIX– VampirClassic

(single threaded)

– VampirServer(MPI parallel)

• Vampir for Windows– Based on parallel service

engine

– All new browser

• A beta of the newBrowser for Linuxavailable at www.vampir.eu

Vampir Classic

All in one, single threaded

Vampir 7 for Windows

Threaded

service DLL

Windows

GUIAPI

Vampir Server

Parallelized

service engine

Visualization

(Motif)Sockets

Page 7: Trace Visualization

Usage order of the VampirPerformance Analysis Toolset

1. Instrument your application with VampirTrace

2. Run your application with an appropriate test set

3. Analyze your trace file with Vampir1. Small trace files with a low number of processes can be

analyzed on your local workstation

1. Start your local Vampir

2. Load trace file from your local disk

2. Large trace files should be stored on the cluster file system

1. Start VampirServer on your analysis cluster

2. Start your local Vampir

3. Connect local Vampir with the VampirServer on the analysis cluster

4. Load trace file from the cluster file system

Page 8: Trace Visualization

Vampir Displays

The main displays of Vampir:

• Master Timeline (Global Timeline )

• Process and Counter Timeline

• Function Summary

• Message Summary

• Process Summary

• Communication Matrix

• Call Tree

Page 9: Trace Visualization

Vampir 7: Displays for a WRF trace

Page 10: Trace Visualization

Master Timeline ( Global Timeline )

Master Timeline

Page 11: Trace Visualization

Process and Counter Timeline

Process Timeline

Counter Timeline

Page 12: Trace Visualization

Function Summary

FunctionSummary

Page 13: Trace Visualization

Message Summary

Page 14: Trace Visualization

Process Summary

ProcessSummary

Page 15: Trace Visualization

Communication Matrix

CommunicationMatrix

Page 16: Trace Visualization

Call Tree

Page 17: Trace Visualization

Customizable Chart Layout

•No cluttering

•Time based alignment

•View impact at a glance

•Simple controls (hidden)

•User defined– Combination

– Rows and columns

– Arrangement

– Size

Dresden, September 15thComprehensive Performance Tracking with

Vampir 7.0Slide 17

Master Timeline

Func. GroupSummary

Secondary Timeline

Process Timeline

FunctionSummary

Call Tree

FunctionLegend

ContextView

Toolbars

Page 18: Trace Visualization

Sessions

• What is a session?– Trace file– Chart selection– Layout– Preferences (i. e. colors)– Chart options

• Scope of session properties– Identical for all traces– Trace specific– Matter of taste– Therefore: scope is

customizable

• Can be attached to trace data

Dresden, September 15thComprehensive Performance Tracking with

Vampir 7.0Slide 18

• Toolbars• Master Timeline• Secondary Timeline• Process Timeline• Function Summary• Function Group Summary• Call Tree• Function Legend• Context View

Master Timeline

Func. GroupSummary

Secondary Timeline

Process Timeline

FunctionSummary

Call Tree

FunctionLegend

ContextView

Toolbars

Master Timeline

Func. GroupSummary

Secondary Timeline

Process Timeline

FunctionSummary

Call Tree

FunctionLegend

ContextView

Toolbars

Master Timeline

Func. GroupSummary

Secondary Timeline

Process

Timeline

FunctionSummary

Call Tree

FunctionLegend

ContextView

TOOLBARS

Master Timeline

Func. GroupSummary

Secondary Timeline

Process

Timeline

FunctionSummary

Call Tree

FunctionLegend

ContextView

TOOLBARS

Trace

File

(OTF)

Config

File

Page 19: Trace Visualization

Typical Performance Problems

Page 20: Trace Visualization

Finding Bottlenecks

• Trace Visualization– Vampir provides a number of display types

– Each allows many different options

• Advice– Identify essential parts of an application (initialization,

main iteration, I/O, finalization)

– Identify important components of the code (serial computation, MPI P2P, collective MPI, OpenMP)

– Make a hypothesis about performance problems

– Consider application’s internal workings if known

– Select the appropriate displays

– Use statistic displays in conjunction with timelines

Page 21: Trace Visualization

FINDING BOTTLENECKS

Communication

Computation

Memory, I/O, etc.

Tracing itself

Page 22: Trace Visualization

Bottlenecks in Communication

• Communications as such (dominating over computation)

• Late sender, late receiver

• Point-to-point messages instead of collective communication

• Unmatched messages

• Overcharge of MPI’s buffers

• Bursts of large messages (bandwidth)

• Frequent short messages (latency)

• Unnecessary synchronization (barrier)

All of the above usually result in high MPI time share

Page 23: Trace Visualization

Bottlenecks in Communication

unnecessary MPI_Barriers

Page 24: Trace Visualization

Bottlenecks in Communication

Patterns of successive MPI_Allreduce calls

Page 25: Trace Visualization

Bottlenecks in Communication

Inefficient implementation of MPI_Allgatherv

Page 26: Trace Visualization

Further Bottlenecks

• Unbalanced computation

– Single late comer

• Strictly serial parts of program

– Idle processes/threads

• Very frequent tiny function calls

• Sparse loops

Page 27: Trace Visualization

Further Bottlenecks

Example: Idle OpenMP threads

Page 28: Trace Visualization

Bottlenecks in Computation

• Memory bound computation

– Inefficient L1/L2/L3 cache usage

– TLB misses

– Detectable via HW performance counters

• I/O bound computation

– Slow input/output

– Sequential I/O on single process

– I/O load imbalance

• Exception handling

Page 29: Trace Visualization

Bottlenecks in Computation

Low FP rate due to heavy cache misses

Page 30: Trace Visualization

Bottlenecks in Computation

Low FP rate due to heavy FP exceptions

Page 31: Trace Visualization

Bottlenecks in Computation

Irregular slow I/O operations

Page 32: Trace Visualization

Effects due to Tracing

• Measurement overhead

– Especially grave for tiny function calls

– Solve with selective instrumentation

• Long/frequent/asynchronous trace buffer flushes

• Too man concurrent counters

• Heisenbugs

Page 33: Trace Visualization

Effects due to Tracing

Trace buffer flushes are explicitly marked in the trace.

It is rather harmless at the end of a trace as shown here.

Page 34: Trace Visualization

Conclusion– Performance analysis very important in HPC

– Use performance analysis tools for profiling and tracing

– Do not spend effort in DIY solutions, e.g. like printf-debugging

– Use tracing tools with some precautions• Overhead

• Data volume

– Let us know about problems and about feature wishes

[email protected]

Page 35: Trace Visualization

Summary• Vampir & VampirServer

– Interactive trace visualization and analysis– Intuitive browsing and zooming– Scalable to large trace data sizes (100GByte)– Scalable to high parallelism (20000 processes)

• Vampir for Linux in progress, beta available

• VampirTrace– Convenient instrumentation and measurement– Hides away complicated details– Provides many options and switches for experts

• VampirTrace is part of Open MPI > 1.3