survey of performance evaluation tools last modified: 10/18/05

105
Survey of Performance Evaluation Tools Last modified: 10/18/05

Upload: erika-harrington

Post on 20-Jan-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Survey of Performance Evaluation Tools Last modified: 10/18/05

Survey of Performance Evaluation Tools

Last modified: 10/18/05

Page 2: Survey of Performance Evaluation Tools Last modified: 10/18/05

Summary

Given features of existing performance evaluation tools, want to:

Determine collectable performance metrics What is recorded, hardware counters, etc

Identify tool’s architectural components, e.g., Data and communication (protocol) managements Software capabilities: monitoring or profiling, visualizations, and

modeling Goals are:

Investigate how a component-based performance evaluation framework can be constructed by leveraging existing tools

Investigate scaling (scale up and out) of this framework to large-scale systems

Large scale: ranges from 1000 to 10000 nodes Analyze workload characterization on deployed platforms for real

applications and users

Page 3: Survey of Performance Evaluation Tools Last modified: 10/18/05

Outline

Background Tools

Monitoring Profiling/Tracing

Workload Characterization (WLC) Techniques A proposal: performance evaluation frameworks

Page 4: Survey of Performance Evaluation Tools Last modified: 10/18/05

Background

Page 5: Survey of Performance Evaluation Tools Last modified: 10/18/05

What is Workload?

According to Cambridge dictionary, a workload is defined as:

“The amount of work to be done, especially by a particular person or machine in a period of time”

Given the realm of computer systems, a workload is can be loosely defined as:

A set of requests presented to a computer in a period of time.

Workload can be classified into: Synthetic workload: created for controlled testing Real workload: any observed requests during normal

operations

Page 6: Survey of Performance Evaluation Tools Last modified: 10/18/05

What is WLC?

WLC plays a key role in all performance evaluation studies

WLC is a synthetic description of a workload by means of quantitative parameters and functions

The objective is to formulate a model to show, capture, and reproduce the static and dynamic behavior of real workloads

WLC is a difficult as well as a neglected task A large amount of measurements are collected Extensive analysis has to be performed

Page 7: Survey of Performance Evaluation Tools Last modified: 10/18/05

Performance Modeling

Analyze RequirementsPredict Requirements

Performance Tuning

Optimize Application ResponsivenessPredict Impact of Changes

Performance Analysis

Analyze PerformanceOptimize Resource UsagePredict RequirementsPredict Application Responsiveness Workload

Analysis•Analyze Performance•Profile Application

WLC in Performance Evaluation Life Cycle

Initial Sizing & Resizing

Evaluation

Production

Driving Forces: Competitions

Hardware

Software

Growth

On-goingOperation

Performance Reporting

Report PerformanceReport Resource Usage

Page 8: Survey of Performance Evaluation Tools Last modified: 10/18/05

WLC in Performance Evaluation Methodology

Mathematics Models

Page 9: Survey of Performance Evaluation Tools Last modified: 10/18/05

Workloads Data FlowsExperimentalenvironment

Realsystem

exec-driven

simulation

Trace-driven

simulation

Stochasticsimulation

Real applications

Benchmark applications

Micro-benchmarkprograms

Syntheticbenchmarkprograms

Traces

Distributions& other

statistics

Monitor(or Profiler)

Analysis

Generator Synthetictraces

Made-up data

Datasets

© 2003, Carla Ellis

Page 10: Survey of Performance Evaluation Tools Last modified: 10/18/05

Workload Issues

Selection of benchmarks Requirements:

Repeatability Availability (software) Acceptance (by community) Representative (of typical usage, e.g. timeliness) Realistic (predictive of real performance, e.g. scaling issue)

Types of workloads: Real, Synthetic

Workloads monitoring & tracing Monitor/Profiler design Compression, simulation

Workload characterization Workload generators

Page 11: Survey of Performance Evaluation Tools Last modified: 10/18/05

Types: Real and Synthetic Workloads

Real workloads: Advantages:

Represent reality “Deployment experience”

Disadvantage is they’re uncontrolled Can’t be repeated and described simply Difficult to analyze

Nevertheless, often useful for “final analysis” papers Synthetic workloads:

Advantages: Controllable Repeatable Portable to other systems Easily modified

Disadvantage: can never be sure real world will be the same (i.e., are they

representative?)

Page 12: Survey of Performance Evaluation Tools Last modified: 10/18/05

Types: Instruction Workloads

Useful only for CPU performance But teach useful lessons for other situations

Development over decades “Typical” instruction (ADD) Instruction mix (by frequency of use)

Sensitive to compiler, application, architecture Still used today (MFLOPS)

Modern complexity makes mixes invalid Pipelining Data/instruction caching Prefetching

Kernel is inner loop that does useful work: Sieve, matrix inversion, sort, etc. Ignores setup, I/O, so can be timed by analysis if desired (at least

in theory)

Page 13: Survey of Performance Evaluation Tools Last modified: 10/18/05

Real Applications

Standard Pick a representative application Pick sample data Run it on system to be tested Easy to do, accurate for that sample data Fails to consider other applications, data

Microkernel Choose most important subset of functions Write benchmark to test those functions Tests what computer will be used for Need to be sure important characteristics aren’t missed

Page 14: Survey of Performance Evaluation Tools Last modified: 10/18/05

Synthetic Applications

Complete programs: Designed specifically for measurement May do real or “fake” work May be adjustable (parameterized)

Two major classes: Synthetic benchmarks: often need to compare general-

purpose computer systems for general-purpose use Examples: Sieve, Ackermann’s function, Whetstone,

Linpack, Dhrystone, Livermore loops, SPEC, MAB Microbenchmarks: for I/O, network, non-CPU

measurements Examples: HPCtoolkits

Page 15: Survey of Performance Evaluation Tools Last modified: 10/18/05

Workload Considerations

Services exercised Level of detail Representative Timeliness Other considerations

Page 16: Survey of Performance Evaluation Tools Last modified: 10/18/05

Services Exercised

What services does a system actually use? Faster CPU won’t speed up “cp” Network performance useless for matrix work

What metrics measure these services? MIPS for CPU speed Bandwidth for network, I/O TPS for transaction processing

May be possible to isolate interfaces to just one component

E.g., instruction mix for CPU System often has layers of services

Consider services provided and used by that component Can cut at any point and insert workload

Page 17: Survey of Performance Evaluation Tools Last modified: 10/18/05

Integrity

Computer systems are complex Effect of interactions hard to predict So must be sure to test entire system

Important to understand balance between components

I.e., don’t use 90% CPU mix to evaluate I/O-bound application

Sometimes only individual components are compared

Would a new CPU speed up our system? How does IPV6 affect Web server performance?

But component may not be directly related to performance

Page 18: Survey of Performance Evaluation Tools Last modified: 10/18/05

Workload Characterization

Identify service provided by major subsystem List factors affecting performance List metrics that quantify demands and

performance Identify workload provided to that service

Page 19: Survey of Performance Evaluation Tools Last modified: 10/18/05

Example: Web Service

Web Client AnalysisServices: visit page, follow hyperlink, display informationFactors: page size, number of links, fonts required, embedded graphics, soundMetrics: response timeWorkload: a list of pages to be visited and links to be followed

Network AnalysisServices: connect to server, transmit request, transfer dataFactors: bandwidth, latency, protocol usedMetrics: connection setup time, response latency, achieved bandwidthWorkload: a series of connections to one or more servers, with data transfer

Web Server AnalysisServices: accept and validate connection, fetch HTTP dataFactors: Network performance, CPU speed, system load, disk subsystem performanceMetrics: response time, connections servedWorkload: a stream of incoming HTTP connections and requests

File System AnalysisServices: open file, read file (writing doesn’t matter for Web server)Factors: disk drive characteristics, file system software, cache size, partition sizeMetrics: response time, transfer rateWorkload: a series of file-transfer requests

Disk Drive AnalysisServices: read sector, write sectorFactors: seek time, transfer rateMetrics: response timeWorkload: a statistically-generated stream of read/write requests

Web Client

Network

TCP/IP Connections

Web Server

HTTP Requests

File System

Web Page Accesses

Disk Drive

Disk Transfers

Web Page Visits

Page 20: Survey of Performance Evaluation Tools Last modified: 10/18/05

Level of Detail

Detail trades off accuracy vs. cost Highest detail is complete trace Lowest detail is one request (most common) Intermediate approach: weight by frequency

Page 21: Survey of Performance Evaluation Tools Last modified: 10/18/05

Representative

Obviously, workload should represent desired application

Arrival rate of requests Resource demands of each request Resource usage profile of workload over time

Again, accuracy and cost trade off Need to understand whether detail matters

Page 22: Survey of Performance Evaluation Tools Last modified: 10/18/05

Timeliness

Usage patterns change over time File size grows to match disk size Web pages grow to match network bandwidth

If using “old” workloads, must be sure user behavior hasn’t changed

Even worse, behavior may change after test, as result of installing new system

Page 23: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other Considerations

Loading levels Full capacity Beyond capacity Actual usage

External components not considered as parameters Repeatability of workload

Page 24: Survey of Performance Evaluation Tools Last modified: 10/18/05

Tools

Page 25: Survey of Performance Evaluation Tools Last modified: 10/18/05

Desire Features of a Measurement Tool

Basic usages of performance evaluation tools are: Performance analysis and enhancements of system operations Troubleshooting and recovery of operations of system components Support to the component performing

Job scheduling Resource management (e.g. when accomplishing load

balancing) Collection of information on applications Fault detection or prevention (HA)

Security threats and “holes” detection. Desirable features include, but not limited to,:

Non-intrusiveness Integration with batch job management systems System usage statistics retrieval Availability in cluster distributions Ability to scale to large system Graphic interface (standard GUI or web portal)

Page 26: Survey of Performance Evaluation Tools Last modified: 10/18/05

Criterion for Comparing Tools

Evaluation criteria: Metrics collected Monitored/Profiled entities Visualization Data and Communication management

Other criteria: Knowledge representations Tools interoperability “Standard” APIs Scalability

Page 27: Survey of Performance Evaluation Tools Last modified: 10/18/05

Some Terminology

Monitoring: A program that observes, supervises, or controls the

activities of other programs. Profiling:

A statistical view of how well resources are being used by a program, often in the form of a graph or table, representing distinctive features or characteristics.

Tracing: A graphic record of (system or application) events that is

recorded by a program.

Page 28: Survey of Performance Evaluation Tools Last modified: 10/18/05

Monitoring

System monitoring Provide a continuous collection and aggregation of system

performance data.

Application monitoring Measure actual application performance via a batch

system.PerMiner perfminer.pdc.kth.se/

NWPerf

PerMiner perfminer.pdc.kth.se/

NWPerf

SuperMon supermon.sourceforge.net/

Hawkeye www.cs.wisc.edu/condor/hawkeye/

Ganglia ganglia.sourceforge.net/

CluMon clumon.ncsa.uiuc.edu/

Page 29: Survey of Performance Evaluation Tools Last modified: 10/18/05

Profiling and Tracing

Profiling and Tracing Provide a static, instrumentation tool, which focuses on

source code that users have direct control.

TAU www.cs.uoregon.edu/research/tau/home.php

Paradyn www.paradyn.org/

MPE/Jumpshot www-unix.mcs.anl.gov/perfvis/

Dimemes/Paraver

mpiP www.llnl.gov/CASC/mpip/

DynoProf icl.cs.utk.edu/~mucci/dynaprof/

KOJAK www.fz-juelich.de/zam/kojak/

ICT www.intel.com/cd/software/products/asmo-na/eng/cluster/index.htm

Pablo pablo.renci.org/Software/Pablo/pablo.htm

MPICL/Paragraph www.csm.ornl.gov/picl/, www.csar.uiuc.edu/software/paragraph/

CoPilot www.sgi.com/products/software/co-pilot/

IPM www.nersc.gov/projects/ipm/

PerfSuite perfsuite.ncsa.uiuc.edu/

Page 30: Survey of Performance Evaluation Tools Last modified: 10/18/05

Data Management and Data Formant

Databases/Query Languages JDBC SQL

Data Formats HDF, involves the development and support of software and file

formats for scientific data management. The HDF software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. There are two HDF formats, the original HDF (4.x and previous releases) and HDF5, which is a completely new format and library.

NetCDF, the Network Common Data Form, provides an interface for array-oriented data access and a library that supports an implementation of the interface.

XDR XML, the Extensible Markup Language, provides a standard way

to define application specific data languages.

Page 31: Survey of Performance Evaluation Tools Last modified: 10/18/05

Monitoring Tools

Page 32: Survey of Performance Evaluation Tools Last modified: 10/18/05

CoPilot

Metrics collected Monitored entities Visualizations

Page 33: Survey of Performance Evaluation Tools Last modified: 10/18/05

Hawkeye

Metrics collected Monitored entities Visualizations

Page 34: Survey of Performance Evaluation Tools Last modified: 10/18/05

IPM

Metrics collected Monitored entities Visualizations

Page 35: Survey of Performance Evaluation Tools Last modified: 10/18/05

PerfSuite

Metrics collected Monitored entities Visualizations

Page 36: Survey of Performance Evaluation Tools Last modified: 10/18/05

NWPerf

Metrics collected Profiled entities Visualizations

Page 37: Survey of Performance Evaluation Tools Last modified: 10/18/05

PerMiner

Metrics collected Profiled entities Visualizations

Page 38: Survey of Performance Evaluation Tools Last modified: 10/18/05

SuperMon

Metrics collected Profiled entities Visualizations

Page 39: Survey of Performance Evaluation Tools Last modified: 10/18/05

CluMon

Metrics collected Profiled entities Visualizations

Page 40: Survey of Performance Evaluation Tools Last modified: 10/18/05

Profiling/Tracing Tools

Page 41: Survey of Performance Evaluation Tools Last modified: 10/18/05

TAU

Metrics recorded Two modes: profile, trace

Profile mode Inclusive/exclusive time spent in functions Hardware counter information

PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time

Other OS timers (gettimeofday, getrusage) MPI message size sent

Trace mode Same as profile (minus hardware counters?) Message send time, message receive time, message size,

message sender/recipient(?) Profiled entities

Functions (automatic & dynamic), loops + regions (manual instrumentation)

Page 42: Survey of Performance Evaluation Tools Last modified: 10/18/05

TAU

Visualizations Profile mode

Text-based: pprof (example next slide), shows a summary of profile information

Graphical: racy (old), jracy a.k.a. paraprof Trace mode

No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see MPE),

and Vampir format (see Intel Cluster Tools)

Page 43: Survey of Performance Evaluation Tools Last modified: 10/18/05

TAU – pprof output

Reading Profile files in profile.*

NODE 0;CONTEXT 0;THREAD 0:---------------------------------------------------------------------------------------%Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call---------------------------------------------------------------------------------------100.0 0.207 20,011 1 2 20011689 main() (calls f1, f5) 75.0 1,001 15,009 1 2 15009904 f1() (sleeps 1 sec, calls f2, f4) 75.0 1,001 15,009 1 2 15009904 main() (calls f1, f5) => f1() (sleeps 1 sec, calls f2, f4) 50.0 4,003 10,007 2 2 5003524 f2() (sleeps 2 sec, calls f3) 45.0 4,001 9,005 1 1 9005230 f1() (sleeps 1 sec, calls f2, f4) => f4() (sleeps 4 sec, calls

f2) 45.0 4,001 9,005 1 1 9005230 f4() (sleeps 4 sec, calls f2) 30.0 6,003 6,003 2 0 3001710 f2() (sleeps 2 sec, calls f3) => f3() (sleeps 3 sec) 30.0 6,003 6,003 2 0 3001710 f3() (sleeps 3 sec) 25.0 2,001 5,003 1 1 5003546 f4() (sleeps 4 sec, calls f2) => f2() (sleeps 2 sec, calls f3) 25.0 2,001 5,003 1 1 5003502 f1() (sleeps 1 sec, calls f2, f4) => f2() (sleeps 2 sec, calls

f3) 25.0 5,001 5,001 1 0 5001578 f5() (sleeps 5 sec) 25.0 5,001 5,001 1 0 5001578 main() (calls f1, f5) => f5() (sleeps 5 sec)

Page 44: Survey of Performance Evaluation Tools Last modified: 10/18/05

TAU – paraprof

Page 45: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paradyn

Metrics recorded Number of CPUs, number of active threads, CPU and

inclusive CPU time Function calls to and by Synchronization (# operations, wait time, inclusive wait

time) Overall communication (# messages, bytes sent and

received), collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received)

I/O (# operations, wait time, inclusive wait time, total bytes) All metrics recorded as “time histograms” (fixed-size data

structure) Profiled entities

Functions only (but includes functions linked to in existing libraries)

Page 46: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paradyn

Visualizations Time histograms Tables Barcharts “Terrains” (3-D histograms)

Page 47: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paradyn

Time View Histrogram across multiple hosts

Page 48: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paradyn – table (current metric values)

Table (current metric values)

Bar chart (current or average metric values

Page 49: Survey of Performance Evaluation Tools Last modified: 10/18/05

MPE/Jumpshot

Metrics collected MPI message send time, receive time, size, message

sender/recipient User-defined event entry & exit

Profiled entities All MPI functions Functions or regions via manual instrumentation and

custom events Visualization

Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram

Page 50: Survey of Performance Evaluation Tools Last modified: 10/18/05

Jumpshot

Timeline View Histogram View

Page 51: Survey of Performance Evaluation Tools Last modified: 10/18/05

Dimemas/Paraver

Metrics recorded (MPITrace) All MPI functions Hardware counters (2 from the

following two lists, uses PAPI) Counter 1

Cycles Issued instructions, loads,

stores, store conditionals Failed store conditionals Decoded branches Quadwords written back

from scache(?) Correctible scache data

array errors(?) Primary/secondary I-cache

misses Instructions mispredicted

from scache way prediction table(?)

External interventions (cache coherency?)

External invalidations (cache coherency?)

Graduated instructions Counter 2

Cycles Graduated instructions,

loads, stores, store conditionals, floating point instructions

TLB misses Mispredicted branches Primary/secondary data

cache miss rates Data mispredictions from

scache way prediction table(?)

External intervention/invalidation (cache coherency?)

Store/prefetch exclusive to clean/shared block

Page 52: Survey of Performance Evaluation Tools Last modified: 10/18/05

Dimemas/Paraver

Profiled entities (MPITrace) All MPI functions (message start time, message end time,

message size, message recipient/sender) User regions/functions via manual instrumentation

Visualization Timeline display (like Jumpshot)

Shows Gantt chart and messages Also can overlay hardware counter information

Clicking on timeline brings up a text listing of events near where you clicked

1D/2D analysis modules

Page 53: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paraver – timeline

timeline (HW counter)

timeline (standard)

Page 54: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paraver – text module

Page 55: Survey of Performance Evaluation Tools Last modified: 10/18/05

Paraver

1D analysis

2D analysis

Page 56: Survey of Performance Evaluation Tools Last modified: 10/18/05

mpiP

Metrics collected Start time, end time, message size for each MPI call

Profiled entities MPI function calls + PMPI wrapper

Visualization Text-based output, with graphical browser that displays statistics in-

line with source Displayed information:

Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site

App time = wall clock time between MPI_Init and MPI_Finalize MPI time = all time consumed by MPI functions App% = % of metric in relation to overall app time MPI% = % of metric in relation to overall MPI time

Page 57: Survey of Performance Evaluation Tools Last modified: 10/18/05

mpiP – graphical view

Page 58: Survey of Performance Evaluation Tools Last modified: 10/18/05

Dynaprof

Metrics collected Wall clock time or PAPI metric for each profiled entity Collects inclusive, exclusive, and 1-level call tree %

information Profiled entities

Functions (dynamic instrumentation) Visualizations

Simple text-based Simple GUI (shows same info as text-based)

Page 59: Survey of Performance Evaluation Tools Last modified: 10/18/05

Dynaprof – output

[leko@eta-1 dynaprof]$ wallclockrpt lu-1.wallclock.16143

Exclusive Profile.

Name Percent Total Calls

------------- ------- ----- -------

TOTAL 100 1.436e+11 1

unknown 100 1.436e+11 1

main 3.837e-06 5511 1

Inclusive Profile.

Name Percent Total SubCalls

------------- ------- ----- -------

TOTAL 100 1.436e+11 0

main 100 1.436e+11 5

1-Level Inclusive Call Tree.

Parent/-Child Percent Total Calls

------------- ------- ----- --------

TOTAL 100 1.436e+11 1

main 100 1.436e+11 1

- f_setarg.0 1.414e-05 2.03e+04 1

- f_setsig.1 1.324e-05 1.902e+04 1

- f_init.2 2.569e-05 3.691e+04 1

- atexit.3 7.042e-06 1.012e+04 1

- MAIN__.4 0 0 1

Page 60: Survey of Performance Evaluation Tools Last modified: 10/18/05

KOJAK

Metrics collected MPI: message start time, receive time, size, message

sender/recipient Manual instrumentation: start and stop times 1 PAPI metric / run (only FLOPS and L1 data misses

visualized) Profiled entities

MPI calls (MPI wrapper library) Function calls (automatic instrumentation, only available

on a few platforms) Regions and function calls via manual instrumentation

Visualizations Can export traces to Vampir trace format (see ICT) Shows profile and analyzed data via CUBE (described on

next few slides)

Page 61: Survey of Performance Evaluation Tools Last modified: 10/18/05

CUBE overview: simple description

Uses a 3-pane approach to display information Metric pane Module/calltree pane

Right-clicking brings up source code location Location pane (system tree)

Each item is displayed along with a color to indicate severity of condition

Severity can be expressed 4 ways Absolute (time) Percentage Relative percentage (changes module & location pane) Comparative percentage (differences between executions)

Despite documentation, interface is actually quite intuitive

Page 62: Survey of Performance Evaluation Tools Last modified: 10/18/05

Intel Cluster Tools (ICT)

Metrics collected MPI functions: start time, end time, message size, message

sender/recipient User-defined events: counter, start & end times Code location for source-code correlation

Instrumented entities MPI functions via wrapper library User functions via binary instrumentation(?) User functions & regions via manual instrumentation

Visualizations Different types: timelines, statistics & counter info Described in next slides

Page 63: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – timelines & summaries

Summary Chart Display Allows the user to see how

much work is spent in MPI calls

Timeline Display Zoomable, scrollable

timeline representation of program execution

Fig. 2 Timeline DisplayFig. 1 Summary Chart

Page 64: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – histogram & counters

Summary Timeline Timeline/histogram

representation showing the number of processes in each activity per time bin

Counter Timeline Value over time representation

(behavior depends on counter definition in trace)

Fig. 3 Summary TImeline

Fig 4. Counter Timeline

Page 65: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – message stats & process profiles

Message Statistics Display Message data to/from each

process (count,length, rate, duration)

Process Profile Display Per process data regarding

activities

Fig. 5 Message Statistics

Fig. 6 Process Profile Display

Page 66: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – general stats & call tree

Statistics Display Various statistics regarding

activities in histogram, table, or text format

Call Tree Display

Fig. 7 Statistics Display

Fig. 8 Call Tree Display

Page 67: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – source & activity chart

Source View Source code correlation

with events in Timeline

Activity Chart Per Process histograms of

Application and MPI activity

Fig 9. Source View

Fig. 10 Activity Chart

Page 68: Survey of Performance Evaluation Tools Last modified: 10/18/05

ICT visualizations – process timeline & activity chart

Process Timeline Activity timeline and

counter timeline for a single process

Process Activity Chart Same type of informartion

as Global Summary Chart Process Call Tree

Same type of information as Global Call Tree

Figure 11. Process Timeline

Figure 12. Process Activity Chart & Call Tree

Page 69: Survey of Performance Evaluation Tools Last modified: 10/18/05

Pablo

Metrics collected Time inclusive/exclusive of a function Hardware counters via PAPI Summary metrics computed from timing info

Min/max/avg/stdev/count Profiled entities

Functions, function calls, and outer loops All selected via GUI

Visualizations Displays derived summary metrics color-coded and inline

with source code Shown on next slide

Page 70: Survey of Performance Evaluation Tools Last modified: 10/18/05

SvPablo

Page 71: Survey of Performance Evaluation Tools Last modified: 10/18/05

MPICL/Paragraph

Metrics collected MPI functions: start time, end time, message size, message

sender/recipient Manual instrumentation: start time, end time, “work” done

(up to user to pass this in) Profiled entities

MPI function calls via PMPI interface User functions/regions via manual instrumentation

Visualizations Many, separated into 4 categories: utilization,

communication, task, “other” Described in following slides

Page 72: Survey of Performance Evaluation Tools Last modified: 10/18/05

ParaGraph visualizations

Utilization visualizations Display rough estimate of processor utilization Utilization broken down into 3 states:

Idle – When program is blocked waiting for a communication operation (or it has stopped execution)

Overhead – When a program is performing communication but is not blocked (time spent within MPI library)

Busy – if execution part of program other than communication “Busy” doesn’t necessarily mean useful work is being done since

it assumes (not communication) := busy Communication visualizations

Display different aspects of communication Frequency, volume, overall pattern, etc. “Distance” computed by setting topology in options menu

Page 73: Survey of Performance Evaluation Tools Last modified: 10/18/05

ParaGraph visualizations

Task visualizations Display information about when processors start & stop

tasks Requires manually instrumented code to identify when

processors start/stop tasks Other visualizations

Miscellaneous things

Page 74: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – utilization count

Displays # of processors in each state at a given moment in time

Busy shown on bottom, overhead in middle, idle on top

Displays utilization state of each processor as a function of time (gnatt chart)

Page 75: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – Kiviat diagram

Shows our friend, the Kiviat diagram

Each spoke is a single processor

Dark green shows moving average, light green shows current high watermark Timing parameters for each

can be adjusted Metric shown can be

“busy” or “busy + overhead”

Page 76: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – streak

Shows “streak” of state Similar to winning/losing

streaks of baseball teams Win = overhead or busy Loss = idle

Not sure how useful this is

Page 77: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – utilization summary

Shows percentage of time spent in each utilization state up to current time

Page 78: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – utilization meter

Shows percentage of processors in each utilization state at current time

Page 79: Survey of Performance Evaluation Tools Last modified: 10/18/05

Utilization visualizations – concurrency profile

Shows histograms of # processors in a particular utilization state

Ex: Diagram shows Only 1 processor was busy

~5% of the time All 8 processors were busy

~90% of the time

Page 80: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – color code

Color code controls colors used on most communication visualizations

Can have color indicate message sizes, message distance, or message tag Distance computed by topology set in options menu

Page 81: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – communication traffic

Shows overall traffic at a given time Bandwidth used, or Number of messages in flight

Can show single node or aggregate of all nodes

Page 82: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – spacetime diagram

Shows standard space-time diagram for communication Messages sent from node to node at which times

Page 83: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – message queues

Shows data about message queue lengths Incoming/outgoing Number of bytes queued/number of messages queued

Colors mean different things Dark color shows current moving average Light color shows high watermark

Page 84: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – communication matrix

Shows which processors sent data to which other processors

Page 85: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – communication meter

Show percentage of communication used at the current time

Message count or bandwidth 100% = max # of messages / max

bandwidth used by the application at a specific time

Page 86: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – animation

Animates messages as they occur in trace file

Can overlay messages over topology Available topologies

Mesh Ring Hypercube User-specified

Can layout each node as you want Can store to a file and load later on

Page 87: Survey of Performance Evaluation Tools Last modified: 10/18/05

Communication visualizations – node data

Shows detailed communication data

Can display Metrics

Which node Message tag Message distance Message length

For a single node, or aggregate for all nodes

Page 88: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task count

Shows number of processors that are executing a task at the current time

At end of run, changes to show summary of all tasks

Page 89: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task Gantt

Shows Gantt chart of which task each processor was working on at a given time

Page 90: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task speed

Similar to Gantt chart, but displays “speed” of each task Must record work done by task in instrumentation call (not

done for example shown above)

Page 91: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task status

Shows which tasks have started and finished at the current time

Page 92: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task summary

Shows % time spent on each task Also shows any overlap between tasks

Page 93: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task surface

Shows time spent on each task by each processor

Useful for seeing load imbalance on a task-by-task basis

Page 94: Survey of Performance Evaluation Tools Last modified: 10/18/05

Task visualizations – task work

Displays work done by each processor

Shows rate and volume of work being done

Example doesn’t show anything because no work amounts recorded in trace being visualized

Page 95: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – clock, coordinates

Clock Shows current time

Coordinate information Shows coordinates when you

click on any visualization

Page 96: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – critical path

Highlights critical path in space-time diagram in red Longest serial path shown in red Depends on point-to-point communication (collective can screw it

up)

Page 97: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – phase portrait

Shows relationship between processor utilization and communication usage

Page 98: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – statistics

Gives overall statistics for run Data

% busy, overhead, idle time Total count and bandwidth of

messages Max, min, average

Message size Distance Transit time

Shows max of 16 processors at a time

Page 99: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – processor status

Shows Processor status Which task each processor

is executing Communication (sends &

receives) Each processor is a

square in the grid (8-processor example shown)

Page 100: Survey of Performance Evaluation Tools Last modified: 10/18/05

Other visualizations – trace events

Shows text output of all trace file events

Page 101: Survey of Performance Evaluation Tools Last modified: 10/18/05

WLC Techniques

Page 102: Survey of Performance Evaluation Tools Last modified: 10/18/05

Static vs. Dynamic

Static: Explore the intrinsic characteristics of the workload Correlation between workload parameters and distributions Techniques:

Clustering Principle component analysis Averaging Correlations

Dynamic: Explore the characteristics of the workload over time Predict the workload behavior in the future Techniques:

Markov chains User behavior graphs Regression methods

Page 103: Survey of Performance Evaluation Tools Last modified: 10/18/05

Discussion

Page 104: Survey of Performance Evaluation Tools Last modified: 10/18/05

Proposed WLC Framework1.

Requirements Analysis

3. Model Construction

4. Model Validation

Graphical Analysis

3. Investigate

Real-TimeAnalysis

Collect

UNIX/Linux/Windows XP

2. Measurements

Web AccessWeb AccessODBC: SQL M/S

AccessXMLXML

Apply Criterion

Representative

Execute Model

Workload Model

Calibrate Model

Data Mining

Analytical/StatisticalTools

Database

NO

Yes

Results

6. Visualize

AnalyzeWorkload

Characterization

PredictResponse Time Analysis

Predictive Analysis

5. Evaluation

input

High

Medium

Low

Node CPU Utilization by Priority

0

20

40

60

80

10 AM11 12 PM1 2 3

% Proc

Page 105: Survey of Performance Evaluation Tools Last modified: 10/18/05

References

Network monitoring tools - http://www.caida.org/tools/

PacketBench – network traffic Rubicon - I/O The Tracefile Testbed

http://www.nacse.org/perfdb/index.html